[ 
https://issues.apache.org/jira/browse/LUCENE-8121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321637#comment-16321637
 ] 

David Smiley commented on LUCENE-8121:
--------------------------------------

I benchmarked it using benchmark/conf/highlighters-postings.alg with 
file.query.maker.file=conf/query-phrases.txt and highlighter=UH_PV  (offsets in 
postings with term vectors) and there is only a slight difference that may be 
in the noise.  Seemed same or slightly faster, and slightly less memory.  
That's a wikipedia data set.  

CHANGES.txt:
Improvement:
{noformat}
* LUCENE-8121: UnifiedHighlighter passage relevancy is improved for terms that 
are
  position sensitive (e.g. part of a phrase) by having an accurate freq. (David 
Smiley)
{noformat}
Bug Fixes:
{noformat}
* LUCENE-8121: The UnifiedHighlighter would highlight some terms within some 
nested
  SpanNearQueries at positions where it should not have.  It's fixed in this 
highlighter
  by switching to the SpanCollector API.  The original Highlighter still has 
this
  problem (LUCENE-2287, LUCENE-5455, LUCENE-6796).  Some public but internal 
parts of
  the UH were refactored. (David Smiley, Steve Davids)
{noformat}


> UnifiedHighlighter can highlight terms within SpanNear clauses at unmatched 
> positions
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-8121
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8121
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/highlighter
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 7.3
>
>         Attachments: LUCENE-2287_UH_SpanCollector.patch, 
> LUCENE-2287_UH_SpanCollector.patch
>
>
> The UnifiedHighlighter (and original Highlighter) highlight phrases by 
> converting to a SpanQuery and using the Spans start and end positions to 
> assume that every occurrence of the underlying terms between those positions 
> are to be highlighted.  But this is inaccurate; see LUCENE-5455 for a good 
> example, and also LUCENE-2287.  The solution is to use the SpanCollector API 
> which was introduced after the phrase matching aspects of those highlighters 
> were developed. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to