[jira] Commented: (LUCENE-2348) DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment readers

Karthick Sankarachary (JIRA) Mon, 21 Jun 2010 18:57:19 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881010#action_12881010
 ]


Karthick Sankarachary commented on LUCENE-2348:
-----------------------------------------------

Hi, All,

Having run into this very issue in our platform, I decided to take a stab at 
addressing it by defining what is essentially a stateful type of filter (for 
details, please see LUCENE-2506). In my mind, the stateful filter affords an 
easy and intuitive way for filters such as the DuplicateFilter, to work 
seamlessly across (the potentially many) segments of the index. 

In a nutshell, I tweaked the DuplicateFilter such that it accepts a given term 
if and only if it does not already exist in its "memory". For details, please 
see the DedupingTermsEnum#accept method in the revised DuplicateFilter class 
attached here.  

Note that I took the liberty of incorporating the edge case shown above into 
the DuplicateFilter's test case, which is also attached in the patch. 

Regards,
Karthick Sankarachary

> DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment 
> readers
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2348
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2348
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>    Affects Versions: 2.9.2
>            Reporter: Trejkaz
>         Attachments: LUCENE-2348.patch
>
>
> DuplicateFilter currently works by building a single doc ID set, without 
> taking into account that getDocIdSet() will be called once per segment and 
> only with each segment's local reader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2348) DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment readers

Reply via email to