[
https://issues.apache.org/jira/browse/LUCENE-8941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899890#comment-16899890
]
Alan Woodward commented on LUCENE-8941:
---------------------------------------
> Can you add an assert in the additional test that checks that the number of
> segments in the reader is 1?
```getOnlyLeafReader()``` already covers this, so I think we're OK.
> It'd be clearer if TermsEnumDisjunctionMatchesIterator had a line of
> documentation somewhere that more explicitly points out that it is merely
> lazily wrapping the MatchesIterator because it may not be needed. Also maybe
> pulling out the initialization code to a method named as such, like init()
> would overall be clearer.
+1
> It's a shame FilterMatchesIterator cannot be used here, given all but one
> method simply delegates. We can't because the input is declared to be final.
> Do you think it's worth loosening that so that we can use it?
This is a one-off, so I don't think it's worth loosening the requirements in
FilterMatchesIterator yet. Maybe if we end up making more lazy iterators?
> I confess I don't see how the test here validates the laziness. I anticipated
> you were going to create a boolean AND query including the MTQ and some other
> simple term query that sometimes doesn't match.
I'll add a comment to the test - the point is that documents 0 to 3 match
several terms, and previously we would do multiple seeks when creating the
Matches object for each document to load all of the matching sub-iterators; now
with the laziness, we only seek to the first term and then short-cut return
immediately, delaying the other seeks until a MatchesIterator is pulled. Docs
4 and 5 still require multiple seeks, because they either match on a later term
(w5 as opposed to w1), or don't match at all, in which case we need to check
every possible term to ensure that there's no match.
> Build wildcard matches more lazily
> ----------------------------------
>
> Key: LUCENE-8941
> URL: https://issues.apache.org/jira/browse/LUCENE-8941
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Alan Woodward
> Assignee: Alan Woodward
> Priority: Major
> Attachments: LUCENE-8941.patch
>
>
> When retrieving a Matches object from a multi-term query, such as an
> AutomatonQuery or TermInSetQuery, we currently find all matching term
> iterators up-front, to return a disjunction over all of them. This can be
> inefficient if we're only interested in finding out if anything matched, and
> are iterating over a different field to retrieve offsets.
> We can improve this by returning immediately when the first matching term is
> found, and only collecting other matching terms when we start iterating.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]