[jira] [Commented] (LUCENE-7897) RangeQuery optimization in IndexOrDocValuesQuery

Adrien Grand (JIRA) Mon, 07 Aug 2017 01:41:42 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116270#comment-16116270
 ]


Adrien Grand commented on LUCENE-7897:
--------------------------------------

Thanks for checking! The opt implementation changed because before we only knew 
about whether random or sequential access was required. So we tried to use 
random access for the most costly scorers since they would be unlikely to drive 
iteration for the MinShouldMatchScorer. The priority queue was used to select 
those most costly scorers. Now that we know about the lead cost, we can just 
use random access for clauses that have a 8x higher cost and sequential access 
otherwise. We will still be more likely to random access on the most costly 
clauses than on the least costly ones, but in a safer way.

> RangeQuery optimization in IndexOrDocValuesQuery 
> -------------------------------------------------
>
>                 Key: LUCENE-7897
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7897
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: trunk, 7.0
>            Reporter: Murali Krishna P
>         Attachments: LUCENE-7897.patch
>
>
> For range queries, Lucene uses either Points or Docvalues based on cost 
> estimation 
> (https://lucene.apache.org/core/6_5_0/core/org/apache/lucene/search/IndexOrDocValuesQuery.html).
>  Scorer is chosen based on the minCost here: 
> https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java#L16
> However, the cost calculation for TermQuery and IndexOrDocvalueQuery seems to 
> have same weightage. Essentially, cost depends upon the docfreq in TermDict, 
> number of points visited and number of docvalues. In a situation where 
> docfreq is not too restrictive, this is lot of lookups for docvalues and 
> using points would have been better.
> Following query with 1M matches, takes 60ms with docvalues, but only 27ms 
> with points. If I change the query to "message:*", which matches all docs, it 
> choses the points(since cost is same), but with message:xyz it choses 
> docvalues eventhough doc frequency is 1million which results in many docvalue 
> fetches. Would it make sense to change the cost of docvalues query to be 
> higher or use points if the docfreq is too high for the term query(find an 
> optimum threshold where points cost < docvalue cost)?
> {noformat}
> {
>   "query": {
>     "bool": {
>       "must": [
>         {
>           "query_string": {
>             "query": "message:xyz"
>           }
>         },
>         {
>           "range": {
>             "@timestamp": {
>               "gte": 1498652400000,
>               "lte": 1498905000000,
>               "format": "epoch_millis"
>             }
>           }
>         }
>       ]
>     }
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7897) RangeQuery optimization in IndexOrDocValuesQuery

Reply via email to