[ https://issues.apache.org/jira/browse/LUCENE-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15526474#comment-15526474 ]
David Smiley commented on LUCENE-7438: -------------------------------------- BTW I reviewed the W.E.H. and posted a comparison: https://github.com/wikimedia/search-highlighter/issues/19 > UnifiedHighlighter > ------------------ > > Key: LUCENE-7438 > URL: https://issues.apache.org/jira/browse/LUCENE-7438 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/highlighter > Affects Versions: 6.2 > Reporter: Timothy M. Rodriguez > Assignee: David Smiley > Attachments: LUCENE_7438_UH_benchmark.patch > > > The UnifiedHighlighter is an evolution of the PostingsHighlighter that is > able to highlight using offsets in either postings, term vectors, or from > analysis (a TokenStream). Lucene’s existing highlighters are mostly > demarcated along offset source lines, whereas here it is unified -- hence > this proposed name. In this highlighter, the offset source strategy is > separated from the core highlighting functionalty. The UnifiedHighlighter > further improves on the PostingsHighlighter’s design by supporting accurate > phrase highlighting using an approach similar to the standard highlighter’s > WeightedSpanTermExtractor. The next major improvement is a hybrid offset > source strategythat utilizes postings and “light” term vectors (i.e. just the > terms) for highlighting multi-term queries (wildcards) without resorting to > analysis. Phrase highlighting and wildcard highlighting can both be disabled > if you’d rather highlight a little faster albeit not as accurately reflecting > the query. > We’ve benchmarked an earlier version of this highlighter comparing it to the > other highlighters and the results were exciting! It’s tempting to share > those results but it’s definitely due for another benchmark, so we’ll work on > that. Performance was the main motivator for creating the UnifiedHighlighter, > as the standard Highlighter (the only one meeting Bloomberg Law’s accuracy > requirements) wasn’t fast enough, even with term vectors along with several > improvements we contributed back, and even after we forked it to highlight in > multiple threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org