[jira] [Commented] (LUCENE-8145) UnifiedHighlighter should use single OffsetEnum rather than List

Timothy M. Rodriguez (JIRA) Wed, 31 Jan 2018 16:40:26 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-8145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347823#comment-16347823
 ]


Timothy M. Rodriguez commented on LUCENE-8145:
----------------------------------------------

Thanks for the CC [~dsmiley].

[~romseygeek] really nice change!  Definitely simplifies things quite a bit and 
conceptually one meta OffsetEnum over the field makes more sense than the list 
from previous.

I'm in favor of keeping the summed frequency on MTQ or at least preserving a 
mechanism to keep it on.  The extra occurrences may not always seem spurious in 
all cases.  For example, consider "expert" systems where users are accustomed 
to using wildcards for stemming-like expressions.  E.g. purchas* for getting 
variants of the word purchase.  In those cases, the extra frequency counts 
would hopefully select a better passage.



I'm not so sure about setScore being passed in a scorer and content length to 
set the score though. That feels awkward to me.  If we were to keep it this 
way, I'd argue a Passage should receive the PassageScorer and content length at 
construction instead of via the setScore method.  If we did that, I think we 
could incrementally build the score instead of tracking terms and frequencies 
for a later score calculation?  Another choice is to move a lot of scoring 
behavior and perhaps introduce another class that's tracking the terms and 
score in a passage analagous to Weight?

 

 

> UnifiedHighlighter should use single OffsetEnum rather than List<OffsetEnum>
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-8145
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8145
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Minor
>         Attachments: LUCENE-8145.patch
>
>
> The UnifiedHighlighter deals with several different aspects of highlighting: 
> finding highlight offsets, breaking content up into snippets, and passage 
> scoring.  It would be nice to split this up so that consumers can use them 
> separately.
> As a first step, I'd like to change the API of FieldOffsetStrategy to return 
> a single unified OffsetsEnum, rather than a collection of them.  This will 
> make it easier to expose the OffsetsEnum of a document directly from the 
> highlighter, bypassing snippet extraction and scoring.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8145) UnifiedHighlighter should use single OffsetEnum rather than List

Reply via email to