[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document

Alan Woodward (JIRA) Thu, 29 Mar 2018 04:07:36 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16418770#comment-16418770
 ]


Alan Woodward commented on LUCENE-8229:
---------------------------------------

Having slept on it, I've come round to [~dsmiley]'s suggestion of returning 
matches from all fields.  I've pushed some changes which add an intermediate 
Matches object, which holds iterators for all fields with matches.  So the 
method signature on Weight now looks like this:
{code:java}
public abstract Matches getMatches(LeafReaderContext ctx, int doc){code}

You can then get a MatchesIterator for a given field by calling 
{code}Matches.getFieldMatches(String field}{code}, or get the set of all fields 
containing matches by calling {code}Matches.getMatchFields(){code}.  This has 
the nice side-effect of making BooleanWeight.matches() much more efficient.

Re AutomatonQuery, we have a lot more leeway here because it's only working on 
a single document at a time.  The way I've done things so far is to pull 
postings for all the matching terms, but only create a MatchesIterator if the 
postings can be advanced to the document we're interested in.  Otherwise, the 
PostingsEnum gets re-used.  This should have similar performance 
characteristics to the creation of a scorer over a single segment.

> Add a method to Weight to retrieve matches for a single document
> ----------------------------------------------------------------
>
>                 Key: LUCENE-8229
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8229
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ability to find out exactly what a query has matched on is a fairly 
> frequent feature request, and would also make highlighters much easier to 
> implement.  There have been a few attempts at doing this, including adding 
> positions to Scorers, or re-writing queries as Spans, but these all either 
> compromise general performance or involve up-front knowledge of all queries.
> Instead, I propose adding a method to Weight that exposes an iterator over 
> matches in a particular document and field.  It should be used in a similar 
> manner to explain() - ie, just for TopDocs, not as part of the scoring loop, 
> which relieves some of the pressure on performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document

Reply via email to