[ 
https://issues.apache.org/jira/browse/LUCENE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15797937#comment-15797937
 ] 

Michael McCandless commented on LUCENE-7055:
--------------------------------------------

I really like this idea and the latest patch: I think it will be an immense 
query-time optimization for some cases, e.g. a restrictive {{TermQuery}} 
against a massive {{PointRangeQuery}} where doc values are also indexed for 
that range field.

I like how this solution let's us "phase in" queries over time (default impl 
for lazyScorer).

For the {{BKDReader}} impls and {{PointValues}} APIs can we rename 
{{estimateCost}} to {{estimatePointCount}} or just {{estimateCount}} since 
"cost" is a bit more vague here yet what we are computing is somewhat tightly 
defined.  I think {{cost}} is a good name for the {{LazyScorer}} method.

Maybe rename {{LazyScorer}} to {{ScorerSource}}?  {{LazyScorer}} makes me feel 
like the laziness applies during actual iteration of the hits...

I like the switch to a {{Map<Occur,Collection>}} for boolean scorer's {{subs}} 
tracking.

{{FakeLazyScorer}} in {{TestLazyBoolean2Scorer}} seems to fail to initialize 
its {{this.randomAccess}} in its 2nd ctor so the assert is never invoked?

If I pass {{randomAccess = false}} to {{LazyScorer.get}} am I not allowed to 
invoke {{advance}} on the returned {{Scorer}}?  Maybe the javadocs can call 
this argument "hint about expected usage"?  It's too bad this is not somehow 
more strongly typed, like you get back a {{Bits}} (plus some way to score if 
it's needed) if you asked for random access, but I don't see how to do that.  
Long ago (can't find the issue now) we had an issue exploring something along 
these lines. But, let's keep the approach now in your patch: progress not 
perfection!

We don't need to implement it now, but I'm curious how we'll implement the cost 
method for multi term queries?  It seems like merely computing the cost 
(enumerating all terms & summing their {{sumDocFreq}}) would be a big part of 
the overall cost of executing such queries.  I guess we would also need a 
doc-values based query here too, e.g. one that checks the automaton on a binary 
doc values field or something?

Maybe change {{if (cost < 0) { }} to {{if (cost == -1) {}} in 
{{LazyBoolean2Scorer}} (more explicit)?


> Better execution path for costly queries
> ----------------------------------------
>
>                 Key: LUCENE-7055
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7055
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>         Attachments: LUCENE-7055.patch, LUCENE-7055.patch
>
>
> In Lucene 5.0, we improved the execution path for queries that run costly 
> operations on a per-document basis, like phrase queries or doc values 
> queries. But we have another class of costly queries, that return fine 
> iterators, but these iterators are very expensive to build. This is typically 
> the case for queries that leverage DocIdSetBuilder, like TermsQuery, 
> multi-term queries or the new point queries. Intersecting such queries with a 
> selective query is very inefficient since these queries build a doc id set of 
> matching documents for the entire index.
> Is there something we could do to improve the execution path for these 
> queries?
> One idea that comes to mind is that most of these queries could also run on 
> doc values, so maybe we could come up with something that would help decide 
> how to run a query based on other parts of the query? (Just thinking out 
> loud, other ideas are very welcome)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to