[ https://issues.apache.org/jira/browse/LUCENE-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935753#action_12935753 ]
Michael McCandless commented on LUCENE-2348: -------------------------------------------- Actually I think Filter is the natural fit for this functionality. You should be able to compute it once, cache it, pass it along with your Query during searching, etc. Doing this during collection is of course possible, but not ideal since you waste CPU on the query finding a hit only to then filter it out. (In fact Filter used to be applied this way!). Plus you must have the dedup values RAM resident. Especially w/ optos like LUCENE-1536 on the horizon, doing this during collection will be even slower. That said, yes, it's trickier to implement, with the cutover to per-segment search, since it needs the full reader up front in order to decide how docs in each segment will be filtered. But I don't consider this a show stopper -- it'd be simple to change DuplicateFilter to receive the top IR up front, and pre-compute and cache the bit set for all segments. > DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment > readers > ------------------------------------------------------------------------------------- > > Key: LUCENE-2348 > URL: https://issues.apache.org/jira/browse/LUCENE-2348 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/* > Affects Versions: 2.9.2 > Reporter: Trejkaz > Attachments: LUCENE-2348.patch, LUCENE-2348.patch > > > DuplicateFilter currently works by building a single doc ID set, without > taking into account that getDocIdSet() will be called once per segment and > only with each segment's local reader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org