[ https://issues.apache.org/jira/browse/LUCENE-7897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121396#comment-16121396 ]
ASF subversion and git services commented on LUCENE-7897: --------------------------------------------------------- Commit 9c83d025e401bb0d454e9de9b40572e47d5da317 in lucene-solr's branch refs/heads/master from [~jpountz] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9c83d02 ] LUCENE-7897: IndexOrDocValuesQuery now requires the range cost to be more than 8x greater than the cost of the lead iterator in order to use doc values. > RangeQuery optimization in IndexOrDocValuesQuery > ------------------------------------------------- > > Key: LUCENE-7897 > URL: https://issues.apache.org/jira/browse/LUCENE-7897 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search > Affects Versions: trunk, 7.0 > Reporter: Murali Krishna P > Attachments: LUCENE-7897.patch > > > For range queries, Lucene uses either Points or Docvalues based on cost > estimation > (https://lucene.apache.org/core/6_5_0/core/org/apache/lucene/search/IndexOrDocValuesQuery.html). > Scorer is chosen based on the minCost here: > https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java#L16 > However, the cost calculation for TermQuery and IndexOrDocvalueQuery seems to > have same weightage. Essentially, cost depends upon the docfreq in TermDict, > number of points visited and number of docvalues. In a situation where > docfreq is not too restrictive, this is lot of lookups for docvalues and > using points would have been better. > Following query with 1M matches, takes 60ms with docvalues, but only 27ms > with points. If I change the query to "message:*", which matches all docs, it > choses the points(since cost is same), but with message:xyz it choses > docvalues eventhough doc frequency is 1million which results in many docvalue > fetches. Would it make sense to change the cost of docvalues query to be > higher or use points if the docfreq is too high for the term query(find an > optimum threshold where points cost < docvalue cost)? > {noformat} > { > "query": { > "bool": { > "must": [ > { > "query_string": { > "query": "message:xyz" > } > }, > { > "range": { > "@timestamp": { > "gte": 1498652400000, > "lte": 1498905000000, > "format": "epoch_millis" > } > } > } > ] > } > } > } > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org