[
https://issues.apache.org/jira/browse/LUCENE-8145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347823#comment-16347823
]
Timothy M. Rodriguez commented on LUCENE-8145:
----------------------------------------------
Thanks for the CC [~dsmiley].
[~romseygeek] really nice change! Definitely simplifies things quite a bit and
conceptually one meta OffsetEnum over the field makes more sense than the list
from previous.
I'm in favor of keeping the summed frequency on MTQ or at least preserving a
mechanism to keep it on. The extra occurrences may not always seem spurious in
all cases. For example, consider "expert" systems where users are accustomed
to using wildcards for stemming-like expressions. E.g. purchas* for getting
variants of the word purchase. In those cases, the extra frequency counts
would hopefully select a better passage.
I'm not so sure about setScore being passed in a scorer and content length to
set the score though. That feels awkward to me. If we were to keep it this
way, I'd argue a Passage should receive the PassageScorer and content length at
construction instead of via the setScore method. If we did that, I think we
could incrementally build the score instead of tracking terms and frequencies
for a later score calculation? Another choice is to move a lot of scoring
behavior and perhaps introduce another class that's tracking the terms and
score in a passage analagous to Weight?
> UnifiedHighlighter should use single OffsetEnum rather than List<OffsetEnum>
> ----------------------------------------------------------------------------
>
> Key: LUCENE-8145
> URL: https://issues.apache.org/jira/browse/LUCENE-8145
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/highlighter
> Reporter: Alan Woodward
> Assignee: Alan Woodward
> Priority: Minor
> Attachments: LUCENE-8145.patch
>
>
> The UnifiedHighlighter deals with several different aspects of highlighting:
> finding highlight offsets, breaking content up into snippets, and passage
> scoring. It would be nice to split this up so that consumers can use them
> separately.
> As a first step, I'd like to change the API of FieldOffsetStrategy to return
> a single unified OffsetsEnum, rather than a collection of them. This will
> make it easier to expose the OffsetsEnum of a document directly from the
> highlighter, bypassing snippet extraction and scoring.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]