Well, IndexSearcher also sorts its readers biggest to smallest (by .numDocs()) for better performance (so that the queues fill up as much as possible before hitting reader transitions).
I think it's the exception, not the rule, for when a custom comparator would require the full array of sub-readers up front (vs, "on the fly" which it already gets with setNextReader), so I think we should not pass it in during construction. This really was a rote carryover from the old API which passed in the top IndexReader when calling newComparator. Mike On Mon, Mar 30, 2009 at 4:39 AM, Uwe Schindler <u...@thetaphi.de> wrote: > Why not call IndexSearcher.getIndexReader().getSequentialSubReaders() (see > http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/all/org/apache/lucene/index/IndexReader.html). > Its public and documented as this: > > > > public IndexReader[] getSequentialSubReaders() > > > > Expert: returns the sequential sub readers that this reader is logically > composed of. For example, IndexSearcher uses this API to drive searching by > one sub reader at a time. If this reader is not composed of sequential child > readers, it should return null. If this method returns an empty array, that > means this reader is a null reader (for example a MultiReader that has no > sub readers). > > NOTE: for a MultiSegmentReader, which is obtained by open(java.lang.String) > when the index has more than one segment, you should not use the sub-readers > returned by this method to make any changes (setNorm, deleteDocument, etc.). > Doing so will likely lead to index corruption. Use the parent reader > instead. > > > > You only have the problem to replicate the code that gathers the subreaders > of the subreaders itself recursively. > > > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > ________________________________ > > From: Shai Erera [mailto:ser...@gmail.com] > Sent: Monday, March 30, 2009 10:20 AM > To: java-dev@lucene.apache.org > Subject: Re: Bug in TopFieldCollector? > > > > Already did ! > > Another question - I think we somehow broke TopFieldCollector ... > Previously, in TopFieldDocCollector, it accepted an IndexReader as a > parameter, and now it requires IndexReader[], which is called subReaders. > Calling the 'fast' search methods with Sort has no problem obtaining that > IndexReader[] (and sort it), but how would someone use TopFieldCollector w/o > calling the appropriate Searcher methods? > > For example, since all the Searcher methods pass in fillFields = true, I > wanted to use the method Searcher.search(Query, TopFieldCollector) in the > test case I wrote, which BTW looks like this: > > public void testSortWithoutFillFields() throws Exception { > > // There was previously a bug in TopFieldCollector when fillFields was > set > // to false - the same doc and score was set in ScoreDoc[] array. This > test > // asserts that if fillFields is false, the documents are set properly. > It > // does not use Searcher's default search methods (with Sort) since all > set > // fillFields to true. > Sort sort = new Sort(); > int nDocs=10; > > TopDocsCollector tdc = new TopFieldCollector(sort, nDocs, > new IndexReader[] { ((IndexSearcher) full).getIndexReader() }, > false); > > full.search(new MatchAllDocsQuery(), tdc); > > ScoreDoc[] sd = tdc.topDocs().scoreDocs; > for (int i = 1; i < sd.length; i++) { > assertTrue(sd[i].doc != sd[i - 1].doc); > } > } > > You'll notice that creating a TopFieldCollector now is much more complicated > and *ugly*. As a user of IndexSearcher, I can only call getIndexReader() > which returns a single IndexReader. I don't have access to gatherSubReaders > and sortSubReaders. I don't see why I should have access to them. So it > forces me to create a dummy array with a single IndexReader. > > There are two ways I see to solve it: > 1. Introduce a getIndexReaders() method on IndexSearcher, which will return > an array of (sorted?) IndexReader. > 2. Introduce a new constructor in TopFieldCollector which accepts a single > IndexReader and make the other one package-private (for use by IndexSearcher > only). That constructor can internally create a dummy array of readers, but > at least it's private to the constructor and not exposed to the rest of the > world. > > Otherwise, I think it ruins TopFieldCollector and will make it a lot less > intuitive to use. At least, people who'd want to move from > TopFieldDocCollector to TopFieldCollector, will find it very inconvenient > and strange. > > What do you think? I can do that (2) as part of 1575. If (1) is better, then > I think a different issue should be opened, because we might want to return > such an array as sorted or something, which makes it less trivial. > > Shai > > On Mon, Mar 30, 2009 at 11:07 AM, Michael McCandless > <luc...@mikemccandless.com> wrote: > > Looks like quite a bug, Shai! Thanks. It came in with LUCENE-1483. > I would say add test case & fix it under 1575. > > Mike > > On Mon, Mar 30, 2009 at 3:50 AM, Shai Erera <ser...@gmail.com> wrote: >> Hi >> >> As I prepared the patch for 1575, I noticed a strange implementation in >> TopFieldCollector's topDocs(): >> >> ScoreDoc[] scoreDocs = new ScoreDoc[queue.size()]; >> if (fillFields) { >> for (int i = queue.size() - 1; i >= 0; i--) { >> scoreDocs[i] = queue.fillFields((FieldValueHitQueue.Entry) >> queue.pop()); >> } >> } else { >> Entry entry = (FieldValueHitQueue.Entry) queue.pop(); >> for (int i = queue.size() - 1; i >= 0; i--) { >> scoreDocs[i] = new FieldDoc(entry.docID, >> entry.score); >> } >> } >> >> return new TopFieldDocs(totalHits, scoreDocs, queue.getFields(), >> maxScore); >> >> >> Notice that if fillFields is true, then documents are popped from the >> queue. >> However if it's false, then the first document is popped out of the queue >> and used to populate the entire ScoreDoc[]? I believe that's wrong, right? >> Otherwise, the returned TopFieldDocs.scoreDocs array will include the same >> document and score? >> >> I noticed there's no test case for that ... TopFieldCollector's >> constructor >> is called only from IndexSearcher.search(Weight, Filter, int, Sort, >> boolean >> /* fillFields */) which is called from IndexSearcher.search(Weight, >> Filter, >> int, sort) with fillFields always set to true. So perhaps that's why this >> hasn't showed up? >> >> BTW, the now deprecated TopFieldDocCollector's topDocs() implementation >> looks like this: >> >> FieldSortedHitQueue fshq = (FieldSortedHitQueue)hq; >> ScoreDoc[] scoreDocs = new ScoreDoc[fshq.size()]; >> for (int i = fshq.size()-1; i >= 0; i--) // put docs in array >> scoreDocs[i] = fshq.fillFields ((FieldDoc) fshq.pop()); >> >> return new TopFieldDocs(totalHits, scoreDocs, >> fshq.getFields(), fshq.getMaxScore()); >> >> It assumes fillFields is always true and always pops elements out of the >> queue. >> >> If this is a bug, I can fix it as part of 1575, as I'm touching that class >> anyway. I can also add a test case ... The fix is very simple BTW, just >> move >> the line "Entry entry = (FieldValueHitQueue.Entry) queue.pop();" inside >> the >> for loop. >> >> Shai >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org