[ 
https://issues.apache.org/jira/browse/LUCENE-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709524#comment-15709524
 ] 

David Smiley commented on LUCENE-7575:
--------------------------------------

Thanks for this contribution Jim! You're the first one to improve the UH 
outside of those who first worked on it.

Overall patch looks pretty good.

UnifiedHighlighter:
* What do you think of adding this to the HighlightFlags instead?  It's 
intended to be a single point to capture various boolean options.  As an aside, 
I'm kinda wondering if the default* and DEFAULT* boolean fields shouldn't exist 
and instead simply have a highlightFlags enumSet field.
* I think the results of filterExtractedTerms might now contain duplicated 
terms (BytesRefs)?  (see my note later about testing the same term and varying 
the field). We could simply collect those bytes into a HashSet, then extract to 
an array and then sort.

PhraseHelper:
* You applied SingleFieldFilterLeafReader at the top of getTermToSpans but I 
think this should be done by the caller so it happens just once, not per 
SpanQuery.
* FieldRewritingTermHashSet is so close to the other other one... hmm.... what 
if we had just one, remove "static" from class (thus has access to fieldName & 
requireFieldMatch), and then implement add() appropriately.

Tests:
* you used the same test input string for both the "field" and 
"field_require_field_match" fields. To make this more clear; can you vary them 
if even a little?
* in no test queries do I see the same term BytesRef across more than one 
field.  For example, maybe add a test incorporating something like {{field:test 
OR field_require_field_match:test}} -- granted the results might not be 
interesting but lets hope it doesn't puke.  Do for phrase as well.

I agree this requireFieldMatch=false should not be the default. It'll add some 
overhead -- especially for phrase and other position sensitive queries since we 
aren't de-duplicating them.  Besides, it's more accurate as-is.

As an aside... it'd be interesting if instead of a simple boolean toggle, if it 
were a {{Predicate<String>}} fieldMatchPredicate so that only some fields could 
be collected in the query but not all.  Just an idea.

> UnifiedHighlighter: add requireFieldMatch=false support
> -------------------------------------------------------
>
>                 Key: LUCENE-7575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7575
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: David Smiley
>            Assignee: David Smiley
>         Attachments: LUCENE-7575.patch
>
>
> The UnifiedHighlighter (like the PostingsHighlighter) only supports 
> highlighting queries for the same fields that are being highlighted.  The 
> original Highlighter and FVH support loosening this, AKA 
> requireFieldMatch=false.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to