[
https://issues.apache.org/jira/browse/SOLR-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Smiley reopened SOLR-6680:
--------------------------------
Re-opening for incremental improvement:
The next patch further reduces token caching in DefaultSolrHighlighter, this
time by "TermOffsetsTokenStream", which is used for multi-valued fields with
term vectors to provide an offset based view/window into the token stream. I
found the name unclear so I also renamed it to OffsetWindowTokenFilter with a
comment to clarify it's used for multi-valued term vectors. I found the
variable names unclear so I renamed them too. It used to call captureState &
restoreState for every token; now it only does it for the first token leading
into the next value. It used to use a cloned AttributeSource but I found there
to be no point to that, plus it interferes with TokenStreamFromTermVector's
ability to detect if payloads are desired.
> DefaultSolrHighlighter can sometimes avoid CachingTokenFilter
> -------------------------------------------------------------
>
> Key: SOLR-6680
> URL: https://issues.apache.org/jira/browse/SOLR-6680
> Project: Solr
> Issue Type: Improvement
> Components: highlighter
> Reporter: David Smiley
> Assignee: David Smiley
> Fix For: 5.0, Trunk
>
> Attachments: SOLR-6680.patch, SOLR-6680.patch
>
>
> The DefaultSolrHighlighter (the most accurate one) is a bit over-eager to
> wrap the token stream in a CachingTokenFilter when
> hl.usePhraseHighlighter=true. This wastes memory, and it interferes with
> other optimizations -- LUCENE-6034. Furthermore, the internal
> TermOffsetsTokenStream (used when TermVectors are used with this) wasn't
> properly delegating reset().
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]