[jira] [Updated] (OAK-7300) Lucene Index: per-column selectivity to improve cost estimation

Davide Giannella (JIRA) Tue, 09 Apr 2019 03:38:33 -0700


     [ 
https://issues.apache.org/jira/browse/OAK-7300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Davide Giannella updated OAK-7300:
----------------------------------
    Fix Version/s:     (was: 1.12.0)

> Lucene Index: per-column selectivity to improve cost estimation
> ---------------------------------------------------------------
>
>                 Key: OAK-7300
>                 URL: https://issues.apache.org/jira/browse/OAK-7300
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene, query
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>            Priority: Major
>             Fix For: 1.14.0
>
>
> In OAK-6735 we have improved cost estimation for Lucene indexes, however the 
> following case is still not working as expected: a very common property is 
> indexes (many nodes have that property), and each value of that property is 
> more or less unique. In this case, currently the cost estimation is the total 
> number of documents that contain that property. Assuming the condition 
> "property is not null" this is correct, however for the common case "property 
> = x" the estimated cost is far too high.
> A known workaround is to set the "costPerEntry" for the given index to a low 
> value, for example 0.2. However this isn't a good solution, as it affects all 
> properties and queries.
> It would be good to be able to set the selectivity per property, for example 
> by specifying the number of distinct values, or (better yet) the average 
> number of entries for a given key (1 for unique values, 2 meaning for each 
> distinct values there are two documents on average).
> That value can be set manually (cost override), and it can be set 
> automatically, e.g. when building the index, or updated from time to time 
> during the index update, using a cardinality
> estimation algorithm. That doesn't have to be accurate; we could use an rough 
> approximation such as hyperbitbit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (OAK-7300) Lucene Index: per-column selectivity to improve cost estimation

Reply via email to