[jira] [Reopened] (SOLR-6680) DefaultSolrHighlighter can sometimes avoid CachingTokenFilter

David Smiley (JIRA) Fri, 19 Dec 2014 12:50:22 -0800

     [ 
https://issues.apache.org/jira/browse/SOLR-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


David Smiley reopened SOLR-6680:
--------------------------------

Re-opening for incremental improvement:

The next patch further reduces token caching in DefaultSolrHighlighter, this 
time by "TermOffsetsTokenStream", which is used for multi-valued fields with 
term vectors to provide an offset based view/window into the token stream.  I 
found the name unclear so I also renamed it to OffsetWindowTokenFilter with a 
comment to clarify it's used for multi-valued term vectors. I found the 
variable names unclear so I renamed them too.  It used to call captureState & 
restoreState for every token; now it only does it for the first token leading 
into the next value.  It used to use a cloned AttributeSource but I found there 
to be no point to that, plus it interferes with TokenStreamFromTermVector's 
ability to detect if payloads are desired.

> DefaultSolrHighlighter can sometimes avoid CachingTokenFilter
> -------------------------------------------------------------
>
>                 Key: SOLR-6680
>                 URL: https://issues.apache.org/jira/browse/SOLR-6680
>             Project: Solr
>          Issue Type: Improvement
>          Components: highlighter
>            Reporter: David Smiley
>            Assignee: David Smiley
>             Fix For: 5.0, Trunk
>
>         Attachments: SOLR-6680.patch, SOLR-6680.patch
>
>
> The DefaultSolrHighlighter (the most accurate one) is a bit over-eager to 
> wrap the token stream in a CachingTokenFilter when 
> hl.usePhraseHighlighter=true.  This wastes memory, and it interferes with 
> other optimizations -- LUCENE-6034.  Furthermore, the internal 
> TermOffsetsTokenStream (used when TermVectors are used with this) wasn't 
> properly delegating reset().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Reopened] (SOLR-6680) DefaultSolrHighlighter can sometimes avoid CachingTokenFilter

Reply via email to