You are right, I forget the sorting. And I also think, the most important thing would be to remove the need for the ctor in the custom sort.
----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Monday, March 30, 2009 10:44 AM > To: java-dev@lucene.apache.org > Subject: Re: Bug in TopFieldCollector? > > Well, IndexSearcher also sorts its readers biggest to smallest (by > .numDocs()) for better performance (so that the queues fill up as much > as possible before hitting reader transitions). > > I think it's the exception, not the rule, for when a custom comparator > would require the full array of sub-readers up front (vs, "on the fly" > which it already gets with setNextReader), so I think we should not > pass it in during construction. > > This really was a rote carryover from the old API which passed in the > top IndexReader when calling newComparator. > > Mike > > On Mon, Mar 30, 2009 at 4:39 AM, Uwe Schindler <u...@thetaphi.de> wrote: > > Why not call IndexSearcher.getIndexReader().getSequentialSubReaders() > (see > > http://hudson.zones.apache.org/hudson/job/Lucene- > trunk/javadoc/all/org/apache/lucene/index/IndexReader.html). > > Its public and documented as this: > > > > > > > > public IndexReader[] getSequentialSubReaders() > > > > > > > > Expert: returns the sequential sub readers that this reader is logically > > composed of. For example, IndexSearcher uses this API to drive searching > by > > one sub reader at a time. If this reader is not composed of sequential > child > > readers, it should return null. If this method returns an empty array, > that > > means this reader is a null reader (for example a MultiReader that has > no > > sub readers). > > > > NOTE: for a MultiSegmentReader, which is obtained by > open(java.lang.String) > > when the index has more than one segment, you should not use the sub- > readers > > returned by this method to make any changes (setNorm, deleteDocument, > etc.). > > Doing so will likely lead to index corruption. Use the parent reader > > instead. > > > > > > > > You only have the problem to replicate the code that gathers the > subreaders > > of the subreaders itself recursively. > > > > > > > > Uwe > > > > ----- > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > ________________________________ > > > > From: Shai Erera [mailto:ser...@gmail.com] > > Sent: Monday, March 30, 2009 10:20 AM > > To: java-dev@lucene.apache.org > > Subject: Re: Bug in TopFieldCollector? > > > > > > > > Already did ! > > > > Another question - I think we somehow broke TopFieldCollector ... > > Previously, in TopFieldDocCollector, it accepted an IndexReader as a > > parameter, and now it requires IndexReader[], which is called > subReaders. > > Calling the 'fast' search methods with Sort has no problem obtaining > that > > IndexReader[] (and sort it), but how would someone use TopFieldCollector > w/o > > calling the appropriate Searcher methods? > > > > For example, since all the Searcher methods pass in fillFields = true, I > > wanted to use the method Searcher.search(Query, TopFieldCollector) in > the > > test case I wrote, which BTW looks like this: > > > > public void testSortWithoutFillFields() throws Exception { > > > > // There was previously a bug in TopFieldCollector when fillFields > was > > set > > // to false - the same doc and score was set in ScoreDoc[] array. > This > > test > > // asserts that if fillFields is false, the documents are set > properly. > > It > > // does not use Searcher's default search methods (with Sort) since > all > > set > > // fillFields to true. > > Sort sort = new Sort(); > > int nDocs=10; > > > > TopDocsCollector tdc = new TopFieldCollector(sort, nDocs, > > new IndexReader[] { ((IndexSearcher) full).getIndexReader() }, > > false); > > > > full.search(new MatchAllDocsQuery(), tdc); > > > > ScoreDoc[] sd = tdc.topDocs().scoreDocs; > > for (int i = 1; i < sd.length; i++) { > > assertTrue(sd[i].doc != sd[i - 1].doc); > > } > > } > > > > You'll notice that creating a TopFieldCollector now is much more > complicated > > and *ugly*. As a user of IndexSearcher, I can only call getIndexReader() > > which returns a single IndexReader. I don't have access to > gatherSubReaders > > and sortSubReaders. I don't see why I should have access to them. So it > > forces me to create a dummy array with a single IndexReader. > > > > There are two ways I see to solve it: > > 1. Introduce a getIndexReaders() method on IndexSearcher, which will > return > > an array of (sorted?) IndexReader. > > 2. Introduce a new constructor in TopFieldCollector which accepts a > single > > IndexReader and make the other one package-private (for use by > IndexSearcher > > only). That constructor can internally create a dummy array of readers, > but > > at least it's private to the constructor and not exposed to the rest of > the > > world. > > > > Otherwise, I think it ruins TopFieldCollector and will make it a lot > less > > intuitive to use. At least, people who'd want to move from > > TopFieldDocCollector to TopFieldCollector, will find it very > inconvenient > > and strange. > > > > What do you think? I can do that (2) as part of 1575. If (1) is better, > then > > I think a different issue should be opened, because we might want to > return > > such an array as sorted or something, which makes it less trivial. > > > > Shai > > > > On Mon, Mar 30, 2009 at 11:07 AM, Michael McCandless > > <luc...@mikemccandless.com> wrote: > > > > Looks like quite a bug, Shai! Thanks. It came in with LUCENE-1483. > > I would say add test case & fix it under 1575. > > > > Mike > > > > On Mon, Mar 30, 2009 at 3:50 AM, Shai Erera <ser...@gmail.com> wrote: > >> Hi > >> > >> As I prepared the patch for 1575, I noticed a strange implementation in > >> TopFieldCollector's topDocs(): > >> > >> ScoreDoc[] scoreDocs = new ScoreDoc[queue.size()]; > >> if (fillFields) { > >> for (int i = queue.size() - 1; i >= 0; i--) { > >> scoreDocs[i] = queue.fillFields((FieldValueHitQueue.Entry) > >> queue.pop()); > >> } > >> } else { > >> Entry entry = (FieldValueHitQueue.Entry) queue.pop(); > >> for (int i = queue.size() - 1; i >= 0; i--) { > >> scoreDocs[i] = new FieldDoc(entry.docID, > >> entry.score); > >> } > >> } > >> > >> return new TopFieldDocs(totalHits, scoreDocs, queue.getFields(), > >> maxScore); > >> > >> > >> Notice that if fillFields is true, then documents are popped from the > >> queue. > >> However if it's false, then the first document is popped out of the > queue > >> and used to populate the entire ScoreDoc[]? I believe that's wrong, > right? > >> Otherwise, the returned TopFieldDocs.scoreDocs array will include the > same > >> document and score? > >> > >> I noticed there's no test case for that ... TopFieldCollector's > >> constructor > >> is called only from IndexSearcher.search(Weight, Filter, int, Sort, > >> boolean > >> /* fillFields */) which is called from IndexSearcher.search(Weight, > >> Filter, > >> int, sort) with fillFields always set to true. So perhaps that's why > this > >> hasn't showed up? > >> > >> BTW, the now deprecated TopFieldDocCollector's topDocs() implementation > >> looks like this: > >> > >> FieldSortedHitQueue fshq = (FieldSortedHitQueue)hq; > >> ScoreDoc[] scoreDocs = new ScoreDoc[fshq.size()]; > >> for (int i = fshq.size()-1; i >= 0; i--) // put docs in array > >> scoreDocs[i] = fshq.fillFields ((FieldDoc) fshq.pop()); > >> > >> return new TopFieldDocs(totalHits, scoreDocs, > >> fshq.getFields(), fshq.getMaxScore()); > >> > >> It assumes fillFields is always true and always pops elements out of > the > >> queue. > >> > >> If this is a bug, I can fix it as part of 1575, as I'm touching that > class > >> anyway. I can also add a test case ... The fix is very simple BTW, just > >> move > >> the line "Entry entry = (FieldValueHitQueue.Entry) queue.pop();" inside > >> the > >> for loop. > >> > >> Shai > >> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org