[jira] [Updated] (LUCENE-7055) Better execution path for costly queries

Adrien Grand (JIRA) Wed, 04 Jan 2017 07:56:23 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Adrien Grand updated LUCENE-7055:
---------------------------------
    Attachment: LUCENE-7055.patch

Thanks for the great feedback! I did the following changes:
 - renamed {{estimateCost}} to {{estimatePointCount}}. I kept {{Point}} in the 
name to make it clear it was about points rather than docs.
 - renamed {{LazyScorer}} to {{ScorerSupplier}} to have a consistent naming 
with the JDK, hopefully that works for you?
 - fixed {{FakeScorerSupplier}} and added a test to test the tester
 - made the cost caching in {{Boolean2ScorerSupplier}} more explicit

bq. We don't need to implement it now, but I'm curious how we'll implement the 
cost method for multi term queries? It seems like merely computing the cost 
(enumerating all terms & summing their sumDocFreq) would be a big part of the 
overall cost of executing such queries. I guess we would also need a doc-values 
based query here too, e.g. one that checks the automaton on a binary doc values 
field or something?

Right, I did not think too much about these ones. When figuring out the number 
of matching terms is cheap (TermsQuery, TermRangeQuery, PrefixQuery), we could 
return eg. {{num_matching_terms * sum_doc_freq / size}}. For more complex 
automata, this looks more complicated however.

> Better execution path for costly queries
> ----------------------------------------
>
>                 Key: LUCENE-7055
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7055
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>         Attachments: LUCENE-7055.patch, LUCENE-7055.patch, LUCENE-7055.patch
>
>
> In Lucene 5.0, we improved the execution path for queries that run costly 
> operations on a per-document basis, like phrase queries or doc values 
> queries. But we have another class of costly queries, that return fine 
> iterators, but these iterators are very expensive to build. This is typically 
> the case for queries that leverage DocIdSetBuilder, like TermsQuery, 
> multi-term queries or the new point queries. Intersecting such queries with a 
> selective query is very inefficient since these queries build a doc id set of 
> matching documents for the entire index.
> Is there something we could do to improve the execution path for these 
> queries?
> One idea that comes to mind is that most of these queries could also run on 
> doc values, so maybe we could come up with something that would help decide 
> how to run a query based on other parts of the query? (Just thinking out 
> loud, other ideas are very welcome)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-7055) Better execution path for costly queries

Reply via email to