David Smiley created LUCENE-8848:
------------------------------------

             Summary: UnifiedHighlighter should highlight all Query types that 
implement Weight.matches
                 Key: LUCENE-8848
                 URL: https://issues.apache.org/jira/browse/LUCENE-8848
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/highlighter
            Reporter: David Smiley


The UnifiedHighlighter internally extracts terms and automata from the query.  
Usually this works perfectly but it's possible a Query might be of a type it 
doesn't know -- a leaf query that is perhaps in effect similar to a 
MultiTermQuery yet it might not even be a subclass of this or it does but the 
UH doesn't know how to extract an automata from it.  The UH is oblivious to 
this and probably won't highlight this query.  If re-analysis of the text is 
necessary, the UH will pre-filter all terms to only those it _thinks_ are 
pertinent.  Or if offsets are in the postings then the UH could perform very 
poorly by unleashing this query on the index for each highlighted document 
without recognizing re-analysis is a more appropriate path.

I think to solve this, the UnifiedHighlighter.getFieldHighlighter needs to 
inspect the query (using a QueryVisitor) to see if it can find a leaf query 
that is not one it knows how to pull automata from, and is otherwise not in a 
special list (like MatchAllDocsQuery).  If we find one, we avoid choosing 
OffsetSource.POSTINGS or OffsetSource.NONE_NEEDED since we might in effect have 
an MTQ like query.  If a MemoryIndex is needed then we don't pre-filter the 
terms since we can't assume we know precisely which terms are pertinent.

We needn't bother extracting terms & automata in this case either; it's wasted 
effort which can involve building a CharacterRunAutomaton (see 
MultiTermHighlighting.binaryToCharRunAutomaton).  Speaking of which, it'd be 
nice to avoid that in other cases as well, like for WEIGHT_MATCHES when we 
aren't using MemoryIndex (thus no term pre-filtering).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to