[
https://issues.apache.org/jira/browse/LUCENE-8145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347692#comment-16347692
]
David Smiley commented on LUCENE-8145:
--------------------------------------
{quote}but I wonder if we need to bother with the frequency summing?
{quote}
It's debatable. Consider an aggressive MTQ like {{st*}} that hypothetically
matches a lot of terms that each occur one time. Passages with those terms will
be scored higher than a term query that matched twice.
It would be cool if we could further affect the passage score by a term's
string-distance to the automata string. For example if "st" would have it's
score dampened quite a bit if it matches "strangelyLongWord" but say only a
small dampening for "stir". Artificially increasing the frequency would be one
way, albeit less flexible than some other hook. If we had something like this,
it'd probably matter less how accurate the frequency is since I think people
would want to dampen the score for any MTQ.
Hmmm. With if Passage.setScore remains a simple setter, but we add
PassageScorer.computeScore(Passage, int contentLength)? We'd need to expose
more data from Passage that you added, granted, but it sure adds some
flexibility!
CC [~Timothy055]
> UnifiedHighlighter should use single OffsetEnum rather than List<OffsetEnum>
> ----------------------------------------------------------------------------
>
> Key: LUCENE-8145
> URL: https://issues.apache.org/jira/browse/LUCENE-8145
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/highlighter
> Reporter: Alan Woodward
> Assignee: Alan Woodward
> Priority: Minor
> Attachments: LUCENE-8145.patch
>
>
> The UnifiedHighlighter deals with several different aspects of highlighting:
> finding highlight offsets, breaking content up into snippets, and passage
> scoring. It would be nice to split this up so that consumers can use them
> separately.
> As a first step, I'd like to change the API of FieldOffsetStrategy to return
> a single unified OffsetsEnum, rather than a collection of them. This will
> make it easier to expose the OffsetsEnum of a document directly from the
> highlighter, bypassing snippet extraction and scoring.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]