[ https://issues.apache.org/jira/browse/LUCENE-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935835#action_12935835 ]
Robert Muir commented on LUCENE-2348: ------------------------------------- {quote} I think Robert's comments possibly stem from the misconception that the duplicate filter somehow works like field collapsing. I wrote a test just to illustrate how it actually behaves, just to make sure I wasn't confused myself (since he seemed to think I was...) {quote} No, I understand exactly how this filter works. which is why my patch, that uses SlowMultiReaderWrapper and forces the index to appear as if it were a single segment, fixes the issue. {quote} Actually I think Filter is the natural fit for this functionality. ... But I don't consider this a show stopper - it'd be simple to change DuplicateFilter to receive the top IR up front, and pre-compute and cache the bit set for all segments. {quote} So now you contradict yourself. the only way is like you said, in the ctor, in other words, its forcefully cached. This is *unnatural* ! We should deprecate this functionality. If someone wants to make a "DuplicateBitSetBuilder" that is a factory for creating a BitSet, to me that is more natural and obvious as to what is going on. if its not doing the work in getdocidset, it shouldn't extend Filter! > DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment > readers > ------------------------------------------------------------------------------------- > > Key: LUCENE-2348 > URL: https://issues.apache.org/jira/browse/LUCENE-2348 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/* > Affects Versions: 2.9.2 > Reporter: Trejkaz > Attachments: LUCENE-2348.patch, LUCENE-2348.patch > > > DuplicateFilter currently works by building a single doc ID set, without > taking into account that getDocIdSet() will be called once per segment and > only with each segment's local reader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org