[ https://issues.apache.org/jira/browse/CASSANDRA-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sam Tunnicliffe updated CASSANDRA-10436: ---------------------------------------- Description: If a SELECT contains a custom index expression (CASSANDRA-10217), that should always be chosen as the primary expression during query execution. Should the statement contain other expressions which can be satsfied by a built in index, we don't currently have the ability to apply the custom expression as a filter. What's more, the method of selecting which index to use is fairly primitive (and cannot be overridden until CASSANDRA-10214), so we should ensure that a custom expression, if present, is always chosen. Suppose we have a custom index implementation which provides prefix matching on text fields. {code} CREATE TABLE ks.t (k int, v1 int, v2 text, PRIMARY KEY(k)); CREATE INDEX v1_idx ON ks.t(v1); CREATE CUSTOM INDEX v2_idx ON ks.t(v2) USING 'com.example.CustomIndex'; INSERT INTO ks.t(k, v1, v2) VALUES(0, 0, 'abc'); INSERT INTO ks.t(k, v1, v2) VALUES(1, 1, 'def'); SELECT * FROM ks.t WHERE v1=0 AND expr(v2_idx, 'd*') ALLOW FILTERING; {code} In the above example the expected result would contain no rows, which would be the case if {{v2_idx}} is selected as the primary (i.e. most selective) index during query execution. However, if {{v1_idx}} is chosen instead, the results of its lookup will have no further filter applied and so an incorrect result will be returned. Note: this has always been something of an issue for custom indexes as the expressions they support may not be natively filterable by C*. For example, with the full text search syntax used by Stratio & DSE Search, if the custom index isn't selected the filtering will erroneously remove all rows as the value of the dummy column does not match the Lucene/Solr search expression literal. It's probably a fairly minor concern as in most cases a query using a custom index will not include other expressions (usually because custom indexes are per-row indexes, and so can support multi-field expression syntax). Also, an index implementation can return a very low number of estimated result count to try and ensure it is selected, custom expressions just provide an opportunity to improve the situation. was: If a SELECT contains a custom index expression (CASSANDRA-10217), that should always be chosen as the primary expression during query execution. Should the statement contain other expressions which can be satsfied by a built in index, we don't currently have the ability to apply the custom expression as a filter. What's more, the method of selecting which index to use is fairly primitive (and cannot be overridden until CASSANDRA-10214), so we should ensure that a custom expression, if present, is always chosen. Suppose we have a custom index implementation which provides prefix matching on text fields. {code} CREATE TABLE ks.t (k int, v1 int, v2 text, PRIMARY KEY(k)); CREATE INDEX v1_idx ON ks.t(v1); CREATE CUSTOM INDEX v2_idx ON ks.t(v2) USING 'com.example.CustomIndex'; INSERT INTO ks.t(k, v1, v2) VALUES(0, 0, 'abc'); INSERT INTO ks.t(k, v1, v2) VALUES(1, 1, 'def'); SELECT * FROM ks.t WHERE v1=0 AND expr(v2_idx, 'd*'); {code} In the above example the expected result would contain no rows, which would be the case if {{v2_idx}} is selected as the primary (i.e. most selective) index during query execution. However, if {{v1_idx}} is chosen instead, the results of its lookup will have no further filter applied and so an incorrect result will be returned. Note: this has always been something of an issue for custom indexes as the expressions they support may not be natively filterable by C*. For example, with the full text search syntax used by Stratio & DSE Search, if the custom index isn't selected the filtering will erroneously remove all rows as the value of the dummy column does not match the Lucene/Solr search expression literal. It's probably a fairly minor concern as in most cases a query using a custom index will not include other expressions (usually because custom indexes are per-row indexes, and so can support multi-field expression syntax). Also, an index implementation can return a very low number of estimated result count to try and ensure it is selected, custom expressions just provide an opportunity to improve the situation. > Index selection should be weighted in favour of custom expressions > ------------------------------------------------------------------ > > Key: CASSANDRA-10436 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10436 > Project: Cassandra > Issue Type: Improvement > Reporter: Sam Tunnicliffe > Assignee: Sam Tunnicliffe > Fix For: 3.0.0 rc2 > > > If a SELECT contains a custom index expression (CASSANDRA-10217), that should > always be chosen as the primary expression during query execution. Should the > statement contain other expressions which can be satsfied by a built in > index, we don't currently have the ability to apply the custom expression as > a filter. What's more, the method of selecting which index to use is fairly > primitive (and cannot be overridden until CASSANDRA-10214), so we should > ensure that a custom expression, if present, is always chosen. > Suppose we have a custom index implementation which provides prefix matching > on text fields. > {code} > CREATE TABLE ks.t (k int, v1 int, v2 text, PRIMARY KEY(k)); > CREATE INDEX v1_idx ON ks.t(v1); > CREATE CUSTOM INDEX v2_idx ON ks.t(v2) USING 'com.example.CustomIndex'; > INSERT INTO ks.t(k, v1, v2) VALUES(0, 0, 'abc'); > INSERT INTO ks.t(k, v1, v2) VALUES(1, 1, 'def'); > SELECT * FROM ks.t WHERE v1=0 AND expr(v2_idx, 'd*') ALLOW FILTERING; > {code} > In the above example the expected result would contain no rows, which would > be the case if {{v2_idx}} is selected as the primary (i.e. most selective) > index during query execution. However, if {{v1_idx}} is chosen instead, the > results of its lookup will have no further filter applied and so an incorrect > result will be returned. > Note: this has always been something of an issue for custom indexes as the > expressions they support may not be natively filterable by C*. For example, > with the full text search syntax used by Stratio & DSE Search, if the custom > index isn't selected the filtering will erroneously remove all rows as the > value of the dummy column does not match the Lucene/Solr search expression > literal. It's probably a fairly minor concern as in most cases a query using > a custom index will not include other expressions (usually because custom > indexes are per-row indexes, and so can support multi-field expression > syntax). Also, an index implementation can return a very low number of > estimated result count to try and ensure it is selected, custom expressions > just provide an opportunity to improve the situation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)