Scorer skipTo() expectations?
Hi, I have a custom Query class that provides a long list of lucene docIds (not for filtering purposes), which is one clause in a standard BooleanQuery (which also contains TermQuery instances). I have a custom Scorer that goes along with the custom Query class. What (if any) document ordering requirements does the Scorer class have for its skipTo(int docId) method? In particular, currently I'm sorting/returning the docIds in ascending order from my custom Query class. That can be expensive for large docId lists; is sorting necessary? It looks like skipTo() might expect the documents it gets to be in ascending order to behave correctly as part of a BooleanQuery, but I can't tell for sure from the doc. If the document list from my custom Scorer class does not have its document list in ascending order (e.g. 10, 80, 40, 60, 50) will whatever uses skipTo() potentially lose hits? If not, is there any performance concern with having the docIds unordered? Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7
RegexQuery on multiple fields?
Hi, I've recently tried the RegexQuery with Lucene which works fine with the following code snippet: Hits hits; String q = someregex; Term t = new Term(content, q); Query query = new RegexQuery(t); hits = searcher.search(query); However, I wonder whether it is possible to use a QueryParser together with the RegexQuery to determine the field to be searched on dynamically? I wasn't able to find a solution in the API. Anybody knows one? Or is this not possible? Thanks in advance! Oliver -- http://merobase.com - find source code, components and web services - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
TermPositionVector.indexesOf()
Hello, I'm using the following method to obtain the position of some terms in a document: int[] indexOfTerms = TermPositionVector.indexesOf(String[] terms, int start, int len); Should I parse the strings contained in terms before I apply indexOf()? Thank you in advance Patricio -- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten Browser-Versionen downloaden: http://www.gmx.net/de/go/browser - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Scorer skipTo() expectations?
Dan, In Scorers, when skipTo() or next() returns true for the second or later time, the result of doc() will be increased. When Scorer.skipTo() does not have document order, documents will be lost, which means that not all matching documents will be found by the search. For disjunctions (OR), one needs to merge the documents of two Scorers using next() to iterate over the documents. The merging is normally done on the fly using a specialized priority queue on the doc() values in DisjunctionSumScorer. No sorting of complete document lists is done at search time, that is done at indexing time. And since TermScorer uses the index directly, it will always return documents in order. The only exception to document ordering is BooleanScorer.next(), which is used by BooleanQuery for some cases of top level disjunctions, and then only when documents are allowed to be scored out of order. The reason for that is performance, BooleanScorer uses a faster data structure than a priority queue, but BooleanScorer does not implement skipTo(). Regards, Paul Elschot On Thursday 04 October 2007 09:12, Dan Rich wrote: Hi, I have a custom Query class that provides a long list of lucene docIds (not for filtering purposes), which is one clause in a standard BooleanQuery (which also contains TermQuery instances). I have a custom Scorer that goes along with the custom Query class. What (if any) document ordering requirements does the Scorer class have for its skipTo(int docId) method? In particular, currently I'm sorting/returning the docIds in ascending order from my custom Query class. That can be expensive for large docId lists; is sorting necessary? It looks like skipTo() might expect the documents it gets to be in ascending order to behave correctly as part of a BooleanQuery, but I can't tell for sure from the doc. If the document list from my custom Scorer class does not have its document list in ascending order (e.g. 10, 80, 40, 60, 50) will whatever uses skipTo() potentially lose hits? If not, is there any performance concern with having the docIds unordered? ___ _ Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Promblems with searching a field
Hi, I am new to lucene and am currently having some problems searching an index. so we make the index like this : doc.add(new Field(itno, item.getMMITNO(), Field.Store.YES, Field.Index.TOKENIZED )); this runs ok the index looks like this : [stored/uncompressed,indexed,tokenizeditno:0002 , But when we try searching this field we get no hits (search is 0002, ItemIndexing.getAnalyzer() == SimpleAnalyzer) try { Hits hits = indexSearcher.search(newQueryParser(itno,ItemIndexing.getAnalyzer()).parse(search)); //Returns 0 log.info(Size + hits.length()); List result = getResult(hits); indexSearcher.close(); return result; } catch (Exception e) { What are we doing wrong, any help would be appreciated.. _ Trangt om plassen? http://www.hotmail.com MSN Hotmail gir deg 250MB gratis lagringsplass - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Promblems with searching a field
It's hard to say, but two things will help you track this down. 1 get a copy of Luke to examine your index (which you may have already). 2 Query.toString is your friend. It'll show you exactly what the parsed query looks like. It may be obvious when you see that output what the problem is, but if not you can try moving the parsed code into the search tab of Luke and glean more info. Where did you get this data: itno:0002 ,? It's kind of interesting that there are spaces AFTER the 2. What analyzer did you use when you indexed it and can you guarantee that it's the same analyzer that you used to parse the query? And one aside. Opening and closing a searcher for each request is very wasteful. Is closing your searcher just an artifact of cutting/pasting? If not, you haven't opened the searcher in the snippet either G... Best Erick On 10/4/07, Mikal skåren [EMAIL PROTECTED] wrote: Hi, I am new to lucene and am currently having some problems searching an index. so we make the index like this : doc.add(new Field(itno, item.getMMITNO(), Field.Store.YES, Field.Index.TOKENIZED )); this runs ok the index looks like this : [stored/uncompressed,indexed,tokenizeditno:0002 , But when we try searching this field we get no hits (search is 0002, ItemIndexing.getAnalyzer() == SimpleAnalyzer) try { Hits hits = indexSearcher.search(newQueryParser(itno,ItemIndexing.getAnalyzer ()).parse(search)); //Returns 0 log.info(Size + hits.length()); List result = getResult(hits); indexSearcher.close(); return result; } catch (Exception e) { What are we doing wrong, any help would be appreciated.. _ Trangt om plassen? http://www.hotmail.com MSN Hotmail gir deg 250MB gratis lagringsplass - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Help with Lucene Indexer crash recovery
Hi, We are using Lucene 2.3. The problem we are facing is quite a few times if our application is stopped (killed or crash) while Indexer is doing its job, the next time when we bring up the application the Indexer fails to run with the following exception, 2007-10-04 12:29:53,089 ERROR [PS thread 10] IndexerJob - Full-text indexer failed to index java.io.FileNotFoundException: /opt/manager/apps/conf/index/MasterIndex/_llb.cfs (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(Unknown Source) at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.init(FSDirectory.java:506) at org.apache.lucene.store.FSDirectory$FSIndexInput.init(FSDirectory.java:536) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445) at org.apache.lucene.index.CompoundFileReader.init(CompoundFileReader.java:70) at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:181) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:167) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:131) at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:206) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:610) The search also doesn't work after this. Looks like the index were left in some weird state (might be corrupted). I was wondering if there is a tool or a way to repair the indexes if we are not able to open them at run-time? Thanks, -vivek - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]