[ 
https://issues.apache.org/jira/browse/LUCENE-8145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347059#comment-16347059
 ] 

Alan Woodward commented on LUCENE-8145:
---------------------------------------

This patch renames `FieldOffsetStrategy#getOffsetsEnums()` to 
`FieldOffsetStrategy#getOffsetsEnum`, and changes the return value from 
`List<OffsetsEnum>` to `OffsetsEnum` directly.

FieldHighlighter is simplified a bit, particularly in terms of handling 
OffsetsEnum as a closeable resource.  Scoring is delegated to the Passage 
itself, which now keeps track of the within-passage frequencies of its 
highlighted terms and phrases.  A new MultiOffsetsEnum class deals with 
combining multiple OffsetsEnums using a priority queue.  Because all offsets 
are iterated in order, Passage no longer needs to worry about sorting its 
internal hits.

The APIs for FieldOffsetStrategy, Passage and OffsetEnum have all changed 
slightly, but they're all pretty expert so I think this could be targeted at 
7.3?

cc [~dsmiley] [~jimczi]

> UnifiedHighlighter should use single OffsetEnum rather than List<OffsetEnum>
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-8145
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8145
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Minor
>         Attachments: LUCENE-8145.patch
>
>
> The UnifiedHighlighter deals with several different aspects of highlighting: 
> finding highlight offsets, breaking content up into snippets, and passage 
> scoring.  It would be nice to split this up so that consumers can use them 
> separately.
> As a first step, I'd like to change the API of FieldOffsetStrategy to return 
> a single unified OffsetsEnum, rather than a collection of them.  This will 
> make it easier to expose the OffsetsEnum of a document directly from the 
> highlighter, bypassing snippet extraction and scoring.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to