[ 
https://issues.apache.org/jira/browse/LUCENE-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474278#comment-16474278
 ] 

David Smiley commented on LUCENE-8306:
--------------------------------------

{quote}Could we address this need by calling extract terms on the weight, and 
filtering the positions/offsets of these terms to only keep those that 
intersect with the returned matches?
{quote}
Nice idea but it would be inaccurate, and I think we should aim for accurate 
results with this new API.

For example, if the query is "Game of Thrones" near "Show",  then extracting 
terms is going to find "of" and other words.  But "of" ought to only be a match 
when it's in the phrase "Game of Thrones", not in other places that happen to 
occur in the larger span near "Show".  Our highlighters have failed this for a 
long time but only recently was the UnifiedHighlighter improved to resolve this 
by using the SpanCollector API – LUCENE-8121  (for 7.3, yay).

> Allow iteration over the term positions of a Match
> --------------------------------------------------
>
>                 Key: LUCENE-8306
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8306
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8306.patch, LUCENE-8306.patch
>
>
> For multi-term queries such as phrase queries, the matches API currently just 
> returns information about the span of the whole match.  It would be useful to 
> also expose information about the matching terms within the phrase.  The same 
> would apply to Spans and Interval queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to