[jira] [Commented] (LUCENE-8941) Build wildcard matches more lazily

Alan Woodward (JIRA) Mon, 05 Aug 2019 01:27:29 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899890#comment-16899890
 ]


Alan Woodward commented on LUCENE-8941:
---------------------------------------

> Can you add an assert in the additional test that checks that the number of 
> segments in the reader is 1?

```getOnlyLeafReader()``` already covers this, so I think we're OK.

> It'd be clearer if TermsEnumDisjunctionMatchesIterator had a line of 
> documentation somewhere that more explicitly points out that it is merely 
> lazily wrapping the MatchesIterator because it may not be needed. Also maybe 
> pulling out the initialization code to a method named as such, like init() 
> would overall be clearer.

+1

> It's a shame FilterMatchesIterator cannot be used here, given all but one 
> method simply delegates. We can't because the input is declared to be final. 
> Do you think it's worth loosening that so that we can use it?

This is a one-off, so I don't think it's worth loosening the requirements in 
FilterMatchesIterator yet.  Maybe if we end up making more lazy iterators?

> I confess I don't see how the test here validates the laziness. I anticipated 
> you were going to create a boolean AND query including the MTQ and some other 
> simple term query that sometimes doesn't match.

I'll add a comment to the test - the point is that documents 0 to 3 match 
several terms, and previously we would do multiple seeks when creating the 
Matches object for each document to load all of the matching sub-iterators; now 
with the laziness, we only seek to the first term and then short-cut return 
immediately, delaying the other seeks until a MatchesIterator is pulled.  Docs 
4 and 5 still require multiple seeks, because they either match on a later term 
(w5 as opposed to w1), or don't match at all, in which case we need to check 
every possible term to ensure that there's no match.


> Build wildcard matches more lazily
> ----------------------------------
>
>                 Key: LUCENE-8941
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8941
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8941.patch
>
>
> When retrieving a Matches object from a multi-term query, such as an 
> AutomatonQuery or TermInSetQuery, we currently find all matching term 
> iterators up-front, to return a disjunction over all of them.  This can be 
> inefficient if we're only interested in finding out if anything matched, and 
> are iterating over a different field to retrieve offsets.
> We can improve this by returning immediately when the first matching term is 
> found, and only collecting other matching terms when we start iterating.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8941) Build wildcard matches more lazily

Reply via email to