[ 
https://issues.apache.org/jira/browse/LUCENE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-7055:
---------------------------------
    Attachment: LUCENE-7055.patch

I have been looking at slow queries recently, which were slow for that exact 
reason. They were running a point range query that covered most of the index, 
intersected with selective queries, which is typically the case when doc values 
would perform better than points.

So I started exploring how we could improve this and got the following:
 - PointValues get a new method that computes an estimate of the cost of a 
visitor. For the Lucene70 codec it basically counts the numbers of leaf blocks 
that intersect the visitor, and multiplies that number by the number of points 
on leaf blocks.
 - A new API on Weight allows to get an estimate of the cost of a Scorer before 
building it. The underlying idea is that in the case of a conjunction that 
contains a range, the range should use points if it has the least cost (ie. it 
will lead the iteration) and doc values otherwise since the scorer will only be 
used to validate that the current document matched.

I attached a patch if someone is interested to look into how that works. I 
tried to make it as little invasive as possible: the new API on Weight is 
optional, and we do not need to implement giant queries that know both how to 
use points and doc values, instead there is a wrapper query called 
IndexOrDocValuesQuery that wraps both a point/index query and a doc values 
query and it figures out which one to use based on costs. It is neither 
complete not commitable, just a proof of concept to trigger some discussion.

> Better execution path for costly queries
> ----------------------------------------
>
>                 Key: LUCENE-7055
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7055
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>         Attachments: LUCENE-7055.patch
>
>
> In Lucene 5.0, we improved the execution path for queries that run costly 
> operations on a per-document basis, like phrase queries or doc values 
> queries. But we have another class of costly queries, that return fine 
> iterators, but these iterators are very expensive to build. This is typically 
> the case for queries that leverage DocIdSetBuilder, like TermsQuery, 
> multi-term queries or the new point queries. Intersecting such queries with a 
> selective query is very inefficient since these queries build a doc id set of 
> matching documents for the entire index.
> Is there something we could do to improve the execution path for these 
> queries?
> One idea that comes to mind is that most of these queries could also run on 
> doc values, so maybe we could come up with something that would help decide 
> how to run a query based on other parts of the query? (Just thinking out 
> loud, other ideas are very welcome)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to