[
https://issues.apache.org/jira/browse/LUCENE-8145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347602#comment-16347602
]
Alan Woodward commented on LUCENE-8145:
---------------------------------------
Thanks for the review David! I'll put up another patch shortly with your
suggestions.
re Automata - I agree that we can replace CompositeOffsetsPostingsEnum, but I
wonder if we need to bother with the frequency summing? It would make more
sense I think to preserve the freqs of the individual term matches, so that a
rarer term is more relevant than a more frequent one. We don't do this with
wildcard queries in general because of performance, but that's not an issue
here.
Passage is heavier now, but the objects are re-used, and only n-fragments + 1
are build for each highlighted doc, so I'm not too concerned.
> UnifiedHighlighter should use single OffsetEnum rather than List<OffsetEnum>
> ----------------------------------------------------------------------------
>
> Key: LUCENE-8145
> URL: https://issues.apache.org/jira/browse/LUCENE-8145
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/highlighter
> Reporter: Alan Woodward
> Assignee: Alan Woodward
> Priority: Minor
> Attachments: LUCENE-8145.patch
>
>
> The UnifiedHighlighter deals with several different aspects of highlighting:
> finding highlight offsets, breaking content up into snippets, and passage
> scoring. It would be nice to split this up so that consumers can use them
> separately.
> As a first step, I'd like to change the API of FieldOffsetStrategy to return
> a single unified OffsetsEnum, rather than a collection of them. This will
> make it easier to expose the OffsetsEnum of a document directly from the
> highlighter, bypassing snippet extraction and scoring.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]