[ 
https://issues.apache.org/jira/browse/SOLR-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194858#comment-14194858
 ] 

David Smiley commented on SOLR-6680:
------------------------------------

I should point out that the benefit of LUCENE-6033 won't be realized for a 
multi-valued field because of the way the offset adjusting works 
(TermOffsetsTokenStream).  I'm not concerned with optimizing for this case but 
should someone else want to take this further then consider this approach:  
Don't wrap the TokenStream from the TermVectors.  Instead, grab all the values 
of this field and wrap them in a CharSequence implementation that reads from 
each value in sequence.  But Highlighter expects a String for the value; it 
could be modified to deal with a CharSequence instead.

> DefaultSolrHighlighter can sometimes avoid CachingTokenFilter
> -------------------------------------------------------------
>
>                 Key: SOLR-6680
>                 URL: https://issues.apache.org/jira/browse/SOLR-6680
>             Project: Solr
>          Issue Type: Improvement
>          Components: highlighter
>            Reporter: David Smiley
>            Assignee: David Smiley
>             Fix For: 5.0
>
>         Attachments: SOLR-6680.patch
>
>
> The DefaultSolrHighlighter (the most accurate one) is a bit over-eager to 
> wrap the token stream in a CachingTokenFilter when 
> hl.usePhraseHighlighter=true.  This wastes memory, and it interferes with 
> other optimizations -- LUCENE-6034.  Furthermore, the internal 
> TermOffsetsTokenStream (used when TermVectors are used with this) wasn't 
> properly delegating reset().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to