I found the problem. The problem is that I have a custom "query optimizer", and that replaces certain TermQuery's within a Boolean query with a custom Query and this query has its own weight/scorer that retrieves matching documents from an in-memory cache (and that is not Lucene backed). But it looks like my custom hitcollectors are now wrapped in a HitCollectorWrapper which assumes Collect() needs called for multiple segments - so it is adding a start offset to the doc ID that comes from my custom query implementation. I looked at the new Collector class and it seems it works the same way (assumes it needs to set the next index reader with some offset). How can I make my custom query work with the new API (so that there is basically a single "segment" in RAM that my query uses, but still other query clauses in same boolean query use multiple lucene segments)? I am sure that is not clear and will try to provide more detail soon.
Thanks Bob On Jun 9, 2011, at 1:48 PM, Digy wrote: > Sorry no idea. Maybe optimizing the index with 2.9.2 can help to detect the > problem. > DIGY > > -----Original Message----- > From: Robert Stewart [mailto:[email protected]] > Sent: Thursday, June 09, 2011 8:40 PM > To: <[email protected]> > Subject: Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? > > I tried converting index using IndexWriter as follows: > > Lucene.Net.Index.IndexWriter writer = new IndexWriter(TestIndexPath+"_2.9", > new Lucene.Net.Analysis.KeywordAnalyzer()); > > writer.SetMaxBufferedDocs(2); > writer.SetMaxMergeDocs(1000000); > writer.SetMergeFactor(2); > > writer.AddIndexesNoOptimize(new Lucene.Net.Store.Directory[] { new > Lucene.Net.Store.SimpleFSDirectory(new DirectoryInfo(TestIndexPath)) }); > > writer.Commit(); > > > That seems to work (I get what looks like a valid index directory at least). > > But still when I run some tests using IndexSearcher I get the same problem > (I get documents in Collect() which are larger than IndexReader.MaxDoc()). > Any idea what the problem could be? > > BTW, this is a problem because I lookup some fields (date ranges, etc.) in > some custom collectors which filter out documents, and it assumes I dont get > any documents larger than maxDoc. > > Thanks, > Bob > > > On Jun 9, 2011, at 12:37 PM, Digy wrote: > >> One more point, some write operations using Lucene.Net 2.9.2 (add, delete, >> optimize etc.) upgrades automatically your index to 2.9.2. >> But if your index is somehow corrupted(eg, due to some bug in 1.9) this > may >> result in data loss. >> >> DIGY >> >> -----Original Message----- >> From: Robert Stewart [mailto:[email protected]] >> Sent: Thursday, June 09, 2011 7:06 PM >> To: [email protected] >> Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? >> >> I have a Lucene index created with Lucene.Net 1.9. I have a multi-segment >> index (non-optimized). When I run Lucene.Net 2.9.2 on top of that index, > I >> get IndexOutOfRange exceptions in my collectors. It is giving me document >> IDs that are larger than maxDoc. >> >> My index contains 377831 documents, and IndexReader.MaxDoc() is returning >> 377831, but I get documents from Collect() with large values (for instance >> 379018). Is an index built with Lucene.Net 1.9 compatible with 2.9.2? If >> not, is there some way I can convert it (in production we have many > indexes >> containing about 200 million docs so I'd rather convert existing indexes >> than rebuilt them). >> >> Thanks >> Bob= >> >
