[ https://issues.apache.org/jira/browse/LUCENE-7578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709615#comment-15709615 ]
David Smiley commented on LUCENE-7578: -------------------------------------- _disclaimer: I'm merely filing this issue at this time; no time to do it._ Perhaps a separate issue or do here as well if it would be overall less work than separate: Instead of PhraseHelper filtering a provided PostingsEnum, I think it should produce one OffsetsEnum per top level SpanQuery. A redesigned half rewritten PhraseHelper that uses the SpanCollector API could do this in the same amount of code whereas trying to change the current design to do this would add a lot of complexity, I think. The outcome would improve passage relevancy for position-sensitive clauses, I think. It could be further tweaked such that _some_ SpanQueries (namely those converted from PhraseQuery) yield one virtual position (with earliest startOffset and last endOffset) instead of exposing each word position separately. That would eliminate intra-phrase highlight delimiters, and it would probably indirectly improve passage relevancy too. The reported freq() would be the smallest freq of the provided terms. Also, the move to this design would eliminate the position span caching going on in PhraseHelper. > UnifiedHighlighter: Convert PhraseHelper to use SpanCollector API > ----------------------------------------------------------------- > > Key: LUCENE-7578 > URL: https://issues.apache.org/jira/browse/LUCENE-7578 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/highlighter > Reporter: David Smiley > > The PhraseHelper of the UnifiedHighlighter currently collects position-spans > per SpanQuery (and it knows which terms are in which SpanQuery), and then it > filters PostingsEnum based on that. It's similar to how the original > Highlighter WSTE works. The main problem with this approach is that it can > be inaccurate for some nested span queries -- LUCENE-2287, LUCENE-5455 (has > the clearest example), LUCENE-6796. Non-nested SpanQueries (e.g. that which > is converted from a PhraseQuery or MultiPhraseQuery) are _not_ a problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org