[
https://issues.apache.org/jira/browse/LUCENE-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474278#comment-16474278
]
David Smiley commented on LUCENE-8306:
--------------------------------------
{quote}Could we address this need by calling extract terms on the weight, and
filtering the positions/offsets of these terms to only keep those that
intersect with the returned matches?
{quote}
Nice idea but it would be inaccurate, and I think we should aim for accurate
results with this new API.
For example, if the query is "Game of Thrones" near "Show", then extracting
terms is going to find "of" and other words. But "of" ought to only be a match
when it's in the phrase "Game of Thrones", not in other places that happen to
occur in the larger span near "Show". Our highlighters have failed this for a
long time but only recently was the UnifiedHighlighter improved to resolve this
by using the SpanCollector API – LUCENE-8121 (for 7.3, yay).
> Allow iteration over the term positions of a Match
> --------------------------------------------------
>
> Key: LUCENE-8306
> URL: https://issues.apache.org/jira/browse/LUCENE-8306
> Project: Lucene - Core
> Issue Type: New Feature
> Reporter: Alan Woodward
> Assignee: Alan Woodward
> Priority: Major
> Attachments: LUCENE-8306.patch, LUCENE-8306.patch
>
>
> For multi-term queries such as phrase queries, the matches API currently just
> returns information about the span of the whole match. It would be useful to
> also expose information about the matching terms within the phrase. The same
> would apply to Spans and Interval queries.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]