[jira] [Commented] (LUCENE-6033) Add CachingTokenFilter.isCached and switch LinkedList to ArrayList

David Smiley (JIRA) Wed, 17 Dec 2014 14:16:31 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250666#comment-14250666
 ]


David Smiley commented on LUCENE-6033:
--------------------------------------

Hi Rob.
I think it's easier to make the case that CachingTokenFilter should have been 
propagating reset() from it's fillCache() all along, and thus you would then 
use CachingTokenFilter in a more normal way -- wrap it and call reset() then 
increment in a loop, etc., instead of knowing you need to reset() on what it 
wraps but not this token filter itself.  That's weird.  It's ab-normal for a 
TokenFilter to never propagate reset, so every user of CachingTokenFilter to 
date has worked around this by calling reset() on the underlying input 
_instead_ of the final wrapping token filter (CachingTokenFilter in this case). 
 To be clear, CachingTokenFilter._reset()_ didn't and still doesn't with this 
patch propagate reset(), it happens the one time it consumes the stream 
indirectly via incrementToken().

The exact case that brought me here is as follows:  DefaultSolrHighlighter has 
a block of code activated when you pass "hl.usePhraseHighlighter" (around line 
501).  This block of code calls getPhraseHighlighter passing in a token stream 
that may never actually be used by that method.  This is an extension point for 
subclassing, our shipped code doesn't use it at all.  Prior to me doing 
SOLR-6680, we'd always then pass the CachingTokenFilter further on into the 
Highlighter.  But unless getPhraseHighlighter actually uses the token stream, 
doing this is a waste (needless caching of every token -- pretty bulky).  So 
with isCached() I can now see if it was used, and if not then toss the 
CachingTokenFilter aside.  The problem is that isCached() isn't enough here; I 
overlooked it in SOLR-6680 (no test for this extension point).  I was hoping to 
simply declare that if you want to use this token stream, you need to call 
reset() on it first.  But CachingTokenFilter doesn't propagate the reset()!  So 
it won't get reset.  I _could_ add a reset on the underlying stream before 
calling getPhraseHighlighter but doing so would likely result in reset() being 
called twice in a row when the caching isn't needed; Highlighter calls reset(). 
 Test assertions trip when this happens, although I think in practice it's fine.

> Add CachingTokenFilter.isCached and switch LinkedList to ArrayList
> ------------------------------------------------------------------
>
>                 Key: LUCENE-6033
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6033
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: David Smiley
>            Assignee: David Smiley
>             Fix For: 5.0, Trunk
>
>         Attachments: LUCENE-6033.patch, 
> LUCENE-6033_boolean_resetInput_option.patch
>
>
> CachingTokenFilter could use a simple boolean isCached() method implemented 
> as-such:
> {code:java}
>   /** If the underlying token stream was consumed and cached */
>   public boolean isCached() {
>     return cache != null;
>   }
> {code}
> It's useful for the highlighting code to remove its wrapping of 
> CachingTokenFilter if after handing-off to parts of its framework it turns 
> out that it wasn't used.
> Furthermore, use an ArrayList, not a LinkedList.  ArrayList is leaner when 
> the token count is high, and this class doesn't manipulate the list in a way 
> that might favor LL.
> A separate patch will come that actually uses this method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6033) Add CachingTokenFilter.isCached and switch LinkedList to ArrayList

Reply via email to