[
https://issues.apache.org/jira/browse/LUCENE-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827630#comment-15827630
]
Adrien Grand commented on LUCENE-7641:
--------------------------------------
I guess I wanted to stay on the safe side since point count estimation tends to
be overestimated. Maybe we should improve the formula to be more accurate
instead of checking the inverse cost. For instance, maybe we should count
{{maxPointsInLeafNode/2}} when the relation is {{CELL_CROSSES_QUERY}} on a leaf
cell as well as make BKDReader record the maximum number of points that have
been put in a leaf in practice rather than the configuration parameter that was
passed to {{BKDWriter}}, since the latter can be up to 2x the actual number of
points in leaf nodes in the N-dims case?
> Speed up point ranges that match most documents
> -----------------------------------------------
>
> Key: LUCENE-7641
> URL: https://issues.apache.org/jira/browse/LUCENE-7641
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-7461.patch
>
>
> If a point range matches most documents and every document has exactly one
> value, then we could make things faster by computing the set of documents
> that do NOT match the range instead.
> It was not possible until recently since figuring out whether a range query
> matches most documents was not possible, but we can now use the new
> {{PointValues.estimatePointcount}} API to do that: we could just check
> whether the cost of the inverse visitor is lower than the cost of the regular
> range visitor.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]