[ 
https://issues.apache.org/jira/browse/LUCENE-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935973#action_12935973
 ] 

Michael McCandless commented on LUCENE-2506:
--------------------------------------------

bq. That assumption's gonna break very soon. Very very soon, when IndexWriter 
learns how to merge non-sequential segments.

Even if we break this assumption on the ootb config we will still have
to provide a way to get it back.  EG in this case, a merge policy
which only selects contiguous segments (like the LogMergePolicy
today).


> A Stateful Filter That Works Across Index Segments
> --------------------------------------------------
>
>                 Key: LUCENE-2506
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2506
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 3.0.2
>            Reporter: Karthick Sankarachary
>         Attachments: LUCENE-2506.patch
>
>
> By design, Lucene's Filter abstraction is applied once for every segment in 
> the index during searching. In particular, the reader provided to its 
> #getDocIdSet method does not represent the whole underlying index. In other 
> words, if the index has more than one segment the given reader only 
> represents a single segment.  As a result, that definition of the filter 
> suffers the limitation of not having the ability to permit/prohibit documents 
> in the search results based on the terms that reside in segments that precede 
> the current one.
> To address this limitation, we introduce here a StatefulFilter which 
> specifically builds on the Filter class so as to make it capable of 
> remembering terms in segments spanning the whole underlying index. To 
> reiterate, the need for making filters stateful stems from the fact that 
> some, although not most, filters care about the terms that they may have come 
> across in prior segments. It does so by keeping track of the past terms from 
> prior segments in a cache that is maintained in a StatefulTermsEnum instance 
> on a per-thread basis. 
> Additionally, to address the case where a filter might want to accept the 
> last matching term, we keep track of the TermsEnum#docFreq of the terms in 
> the segments filtered thus far. By comparing the sum of such 
> TermsEnum#docFreq with that of the top-level reader, we can tell if the 
> current segment is the last segment in which the current term appears. 
> Ideally, for this to work correctly, we require the user to explicitly set 
> the top-level reader on the StatefulFilter. Knowing what the top-level reader 
> is also helps the StatefulFilter to clean up after itself once the search has 
> concluded.
> Note that we leave it up to each concrete sub-class of the stateful filter to 
> decide what to remember in its state and what not to. In other words, it can 
> choose to remember as much or as little from prior segments as it deems 
> necessary. In keeping with the TermsEnum interface, which the 
> StatefulTermsEnum class extends, the filter must decide which terms to accept 
> or not, based on the holistic state of the search.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to