[
https://issues.apache.org/jira/browse/OAK-12221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas Mueller updated OAK-12221:
---------------------------------
Description:
Currently, the cost estimation of the Lucene and Elasticsearch indexes only
tracks the minimum across all properties. The number of indexed properties is
not taken into account.
What should happen is: the more indexed conditions are added to a query, the
lower the expected cost should be. Eg. a query with "color = 'red' and id = 1"
should have a lower expected cost than a query with just "id = 1" or just
"color = 'red'".
In this issue, I want to add a feature toggle, and if enabled:
* The cost estimation should be (as I had expected it already is) such that the
more conditions, the lower the cost.
* Support for "most-common values" (MCV). This is because we have some
properties where some values have a huge number of entries, and other values
are basically unique. (Later we can also add histograms etc. but this seems
less urgent).
was:
Currently, the cost estimation of the Lucene and Elasticsearch indexes only
tracks the minimum across all properties. The number of indexed properties is
not taken into account.
What should happen is: the more indexed conditions are added to a query, the
lower the expected cost should be. Eg. a query with "color = 'red' and id = 1"
should have a lower expected cost than a query with just "id = 1" or just
"color = 'red'".
> Cost estimation should be lower if more conditions are indexed
> --------------------------------------------------------------
>
> Key: OAK-12221
> URL: https://issues.apache.org/jira/browse/OAK-12221
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: query
> Reporter: Thomas Mueller
> Assignee: Thomas Mueller
> Priority: Major
>
> Currently, the cost estimation of the Lucene and Elasticsearch indexes only
> tracks the minimum across all properties. The number of indexed properties is
> not taken into account.
> What should happen is: the more indexed conditions are added to a query, the
> lower the expected cost should be. Eg. a query with "color = 'red' and id =
> 1" should have a lower expected cost than a query with just "id = 1" or just
> "color = 'red'".
> In this issue, I want to add a feature toggle, and if enabled:
> * The cost estimation should be (as I had expected it already is) such that
> the more conditions, the lower the cost.
> * Support for "most-common values" (MCV). This is because we have some
> properties where some values have a huge number of entries, and other values
> are basically unique. (Later we can also add histograms etc. but this seems
> less urgent).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)