[ 
https://issues.apache.org/jira/browse/LUCENE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-7055:
---------------------------------
    Attachment: LUCENE-7055.patch

Here is an updated patch. It adds docs, tests and improves the way BooleanQuery 
implements this new API so that cost estimations would be propagated lazily 
through deep query trees (similarly to the fact that two-phase iteration still 
executes slow bits in deep leaves after fast bits that are in the top-level 
BooleanQuery).

Only PointRangeQuery and BooleanQuery implement this API right now, since they 
are the two important ones to get started, but I think we might be able to 
generalize to other slow queries, like TermsQuery for instance, by leveraging 
the sumDocFreq index statistic. But this belongs to another issue.

Reviews are warmly welcome.

> Better execution path for costly queries
> ----------------------------------------
>
>                 Key: LUCENE-7055
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7055
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>         Attachments: LUCENE-7055.patch, LUCENE-7055.patch
>
>
> In Lucene 5.0, we improved the execution path for queries that run costly 
> operations on a per-document basis, like phrase queries or doc values 
> queries. But we have another class of costly queries, that return fine 
> iterators, but these iterators are very expensive to build. This is typically 
> the case for queries that leverage DocIdSetBuilder, like TermsQuery, 
> multi-term queries or the new point queries. Intersecting such queries with a 
> selective query is very inefficient since these queries build a doc id set of 
> matching documents for the entire index.
> Is there something we could do to improve the execution path for these 
> queries?
> One idea that comes to mind is that most of these queries could also run on 
> doc values, so maybe we could come up with something that would help decide 
> how to run a query based on other parts of the query? (Just thinking out 
> loud, other ideas are very welcome)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to