[
https://issues.apache.org/jira/browse/LUCENE-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616233#comment-15616233
]
ASF GitHub Bot commented on LUCENE-7438:
----------------------------------------
Github user Timothy055 closed the pull request at:
https://github.com/apache/lucene-solr/pull/79
> UnifiedHighlighter
> ------------------
>
> Key: LUCENE-7438
> URL: https://issues.apache.org/jira/browse/LUCENE-7438
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/highlighter
> Affects Versions: 6.2
> Reporter: Timothy M. Rodriguez
> Assignee: David Smiley
> Fix For: 6.3
>
> Attachments: LUCENE-7438.patch, LUCENE_7438_UH_benchmark.patch,
> LUCENE_7438_UH_benchmark.patch, LUCENE_7438_UH_small_changes.patch
>
>
> The UnifiedHighlighter is an evolution of the PostingsHighlighter that is
> able to highlight using offsets in either postings, term vectors, or from
> analysis (a TokenStream). Lucene’s existing highlighters are mostly
> demarcated along offset source lines, whereas here it is unified -- hence
> this proposed name. In this highlighter, the offset source strategy is
> separated from the core highlighting functionalty. The UnifiedHighlighter
> further improves on the PostingsHighlighter’s design by supporting accurate
> phrase highlighting using an approach similar to the standard highlighter’s
> WeightedSpanTermExtractor. The next major improvement is a hybrid offset
> source strategythat utilizes postings and “light” term vectors (i.e. just the
> terms) for highlighting multi-term queries (wildcards) without resorting to
> analysis. Phrase highlighting and wildcard highlighting can both be disabled
> if you’d rather highlight a little faster albeit not as accurately reflecting
> the query.
> We’ve benchmarked an earlier version of this highlighter comparing it to the
> other highlighters and the results were exciting! It’s tempting to share
> those results but it’s definitely due for another benchmark, so we’ll work on
> that. Performance was the main motivator for creating the UnifiedHighlighter,
> as the standard Highlighter (the only one meeting Bloomberg Law’s accuracy
> requirements) wasn’t fast enough, even with term vectors along with several
> improvements we contributed back, and even after we forked it to highlight in
> multiple threads.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]