[ https://issues.apache.org/jira/browse/LUCENE-8121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321637#comment-16321637 ]
David Smiley commented on LUCENE-8121: -------------------------------------- I benchmarked it using benchmark/conf/highlighters-postings.alg with file.query.maker.file=conf/query-phrases.txt and highlighter=UH_PV (offsets in postings with term vectors) and there is only a slight difference that may be in the noise. Seemed same or slightly faster, and slightly less memory. That's a wikipedia data set. CHANGES.txt: Improvement: {noformat} * LUCENE-8121: UnifiedHighlighter passage relevancy is improved for terms that are position sensitive (e.g. part of a phrase) by having an accurate freq. (David Smiley) {noformat} Bug Fixes: {noformat} * LUCENE-8121: The UnifiedHighlighter would highlight some terms within some nested SpanNearQueries at positions where it should not have. It's fixed in this highlighter by switching to the SpanCollector API. The original Highlighter still has this problem (LUCENE-2287, LUCENE-5455, LUCENE-6796). Some public but internal parts of the UH were refactored. (David Smiley, Steve Davids) {noformat} > UnifiedHighlighter can highlight terms within SpanNear clauses at unmatched > positions > ------------------------------------------------------------------------------------- > > Key: LUCENE-8121 > URL: https://issues.apache.org/jira/browse/LUCENE-8121 > Project: Lucene - Core > Issue Type: Bug > Components: modules/highlighter > Reporter: David Smiley > Assignee: David Smiley > Priority: Minor > Fix For: 7.3 > > Attachments: LUCENE-2287_UH_SpanCollector.patch, > LUCENE-2287_UH_SpanCollector.patch > > > The UnifiedHighlighter (and original Highlighter) highlight phrases by > converting to a SpanQuery and using the Spans start and end positions to > assume that every occurrence of the underlying terms between those positions > are to be highlighted. But this is inaccurate; see LUCENE-5455 for a good > example, and also LUCENE-2287. The solution is to use the SpanCollector API > which was introduced after the phrase matching aspects of those highlighters > were developed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org