[ 
https://issues.apache.org/jira/browse/LUCENE-6121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-6121:
---------------------------------
    Attachment: 
LUCENE-6121_CachingTokenFilter_reset_propagates_reset_if_not_cached.patch

The attached patch propagates reset() on reset() only when the stream hasn't 
been cached yet.  This is good/intuitive behavior, simple, and actually isn't 
as much of a change to existing users; many might not even notice.  If you 
still call reset on the input (but not in CachingTokenFilter) before then 
calling incrementToken, it'll all work out fine.  You shouldn't to follow the 
spirit of our API, but you can.  In fact there's one query builder ('flexible' 
AnalyzerQueryNodeProcessor) that I didn't change to move what it calls reset() 
on and it still works.

In the patch you may notice I moved the reset() to be before incrementToken() 
-- I find that flow clearest.

I did have to make a change to the default highlighter.  
WeightedSpanTermExtractor handed off it's stream to MemoryIndex, and when done, 
it called reset() as the last thing it did.  That's bad behavior, IMO but it 
turned out to (previously) be necessary because Highlighter called reset() 
_before_ passing the tokenStream to QueryScorer/WSTE.  I fixed this so that 
WSTE doesn't call reset (it doesn't call incrementToken itself, after all), and 
moved Highlighter's invocation of reset() to the last possible moment, right 
before the loop of incrementToken().  I think this is best practice in general 
-- always call reset() as close to incrementToken() as you can.

In CHANGES.txt I'll say this:
bq. CachingTokenFilter now propagates reset() to its input if incrementToken() 
hasn't been called yet. You should generally call reset() now on this token 
filter instead of doing it a-priori on its input (which previously didn't work).

> Fix CachingTokenFilter to propagate reset() the first time
> ----------------------------------------------------------
>
>                 Key: LUCENE-6121
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6121
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: David Smiley
>            Assignee: David Smiley
>             Fix For: 5.0, Trunk
>
>         Attachments: 
> LUCENE-6121_CachingTokenFilter_reset_propagates_reset_if_not_cached.patch
>
>
> CachingTokenFilter should have been propagating reset() _but only the first 
> time_ and thus you would then use CachingTokenFilter in a more normal way – 
> wrap it and call reset() then increment in a loop, etc., instead of knowing 
> you need to reset() on what it wraps but not this token filter itself. That's 
> weird. It's ab-normal for a TokenFilter to never propagate reset, so every 
> user of CachingTokenFilter to date has worked around this by calling reset() 
> on the underlying input instead of the final wrapping token filter 
> (CachingTokenFilter in this case).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to