[
https://issues.apache.org/jira/browse/LUCENE-8121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321637#comment-16321637
]
David Smiley commented on LUCENE-8121:
--------------------------------------
I benchmarked it using benchmark/conf/highlighters-postings.alg with
file.query.maker.file=conf/query-phrases.txt and highlighter=UH_PV (offsets in
postings with term vectors) and there is only a slight difference that may be
in the noise. Seemed same or slightly faster, and slightly less memory.
That's a wikipedia data set.
CHANGES.txt:
Improvement:
{noformat}
* LUCENE-8121: UnifiedHighlighter passage relevancy is improved for terms that
are
position sensitive (e.g. part of a phrase) by having an accurate freq. (David
Smiley)
{noformat}
Bug Fixes:
{noformat}
* LUCENE-8121: The UnifiedHighlighter would highlight some terms within some
nested
SpanNearQueries at positions where it should not have. It's fixed in this
highlighter
by switching to the SpanCollector API. The original Highlighter still has
this
problem (LUCENE-2287, LUCENE-5455, LUCENE-6796). Some public but internal
parts of
the UH were refactored. (David Smiley, Steve Davids)
{noformat}
> UnifiedHighlighter can highlight terms within SpanNear clauses at unmatched
> positions
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-8121
> URL: https://issues.apache.org/jira/browse/LUCENE-8121
> Project: Lucene - Core
> Issue Type: Bug
> Components: modules/highlighter
> Reporter: David Smiley
> Assignee: David Smiley
> Priority: Minor
> Fix For: 7.3
>
> Attachments: LUCENE-2287_UH_SpanCollector.patch,
> LUCENE-2287_UH_SpanCollector.patch
>
>
> The UnifiedHighlighter (and original Highlighter) highlight phrases by
> converting to a SpanQuery and using the Spans start and end positions to
> assume that every occurrence of the underlying terms between those positions
> are to be highlighted. But this is inaccurate; see LUCENE-5455 for a good
> example, and also LUCENE-2287. The solution is to use the SpanCollector API
> which was introduced after the phrase matching aspects of those highlighters
> were developed.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]