[ https://issues.apache.org/jira/browse/OAK-7300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Davide Giannella updated OAK-7300: ---------------------------------- Fix Version/s: (was: 1.12.0) > Lucene Index: per-column selectivity to improve cost estimation > --------------------------------------------------------------- > > Key: OAK-7300 > URL: https://issues.apache.org/jira/browse/OAK-7300 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query > Reporter: Thomas Mueller > Assignee: Thomas Mueller > Priority: Major > Fix For: 1.14.0 > > > In OAK-6735 we have improved cost estimation for Lucene indexes, however the > following case is still not working as expected: a very common property is > indexes (many nodes have that property), and each value of that property is > more or less unique. In this case, currently the cost estimation is the total > number of documents that contain that property. Assuming the condition > "property is not null" this is correct, however for the common case "property > = x" the estimated cost is far too high. > A known workaround is to set the "costPerEntry" for the given index to a low > value, for example 0.2. However this isn't a good solution, as it affects all > properties and queries. > It would be good to be able to set the selectivity per property, for example > by specifying the number of distinct values, or (better yet) the average > number of entries for a given key (1 for unique values, 2 meaning for each > distinct values there are two documents on average). > That value can be set manually (cost override), and it can be set > automatically, e.g. when building the index, or updated from time to time > during the index update, using a cardinality > estimation algorithm. That doesn't have to be accurate; we could use an rough > approximation such as hyperbitbit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)