[ https://issues.apache.org/jira/browse/LUCENE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15797937#comment-15797937 ]
Michael McCandless commented on LUCENE-7055: -------------------------------------------- I really like this idea and the latest patch: I think it will be an immense query-time optimization for some cases, e.g. a restrictive {{TermQuery}} against a massive {{PointRangeQuery}} where doc values are also indexed for that range field. I like how this solution let's us "phase in" queries over time (default impl for lazyScorer). For the {{BKDReader}} impls and {{PointValues}} APIs can we rename {{estimateCost}} to {{estimatePointCount}} or just {{estimateCount}} since "cost" is a bit more vague here yet what we are computing is somewhat tightly defined. I think {{cost}} is a good name for the {{LazyScorer}} method. Maybe rename {{LazyScorer}} to {{ScorerSource}}? {{LazyScorer}} makes me feel like the laziness applies during actual iteration of the hits... I like the switch to a {{Map<Occur,Collection>}} for boolean scorer's {{subs}} tracking. {{FakeLazyScorer}} in {{TestLazyBoolean2Scorer}} seems to fail to initialize its {{this.randomAccess}} in its 2nd ctor so the assert is never invoked? If I pass {{randomAccess = false}} to {{LazyScorer.get}} am I not allowed to invoke {{advance}} on the returned {{Scorer}}? Maybe the javadocs can call this argument "hint about expected usage"? It's too bad this is not somehow more strongly typed, like you get back a {{Bits}} (plus some way to score if it's needed) if you asked for random access, but I don't see how to do that. Long ago (can't find the issue now) we had an issue exploring something along these lines. But, let's keep the approach now in your patch: progress not perfection! We don't need to implement it now, but I'm curious how we'll implement the cost method for multi term queries? It seems like merely computing the cost (enumerating all terms & summing their {{sumDocFreq}}) would be a big part of the overall cost of executing such queries. I guess we would also need a doc-values based query here too, e.g. one that checks the automaton on a binary doc values field or something? Maybe change {{if (cost < 0) { }} to {{if (cost == -1) {}} in {{LazyBoolean2Scorer}} (more explicit)? > Better execution path for costly queries > ---------------------------------------- > > Key: LUCENE-7055 > URL: https://issues.apache.org/jira/browse/LUCENE-7055 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Assignee: Adrien Grand > Attachments: LUCENE-7055.patch, LUCENE-7055.patch > > > In Lucene 5.0, we improved the execution path for queries that run costly > operations on a per-document basis, like phrase queries or doc values > queries. But we have another class of costly queries, that return fine > iterators, but these iterators are very expensive to build. This is typically > the case for queries that leverage DocIdSetBuilder, like TermsQuery, > multi-term queries or the new point queries. Intersecting such queries with a > selective query is very inefficient since these queries build a doc id set of > matching documents for the entire index. > Is there something we could do to improve the execution path for these > queries? > One idea that comes to mind is that most of these queries could also run on > doc values, so maybe we could come up with something that would help decide > how to run a query based on other parts of the query? (Just thinking out > loud, other ideas are very welcome) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org