[ 
https://issues.apache.org/jira/browse/CASSANDRA-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-10436:
----------------------------------------
    Description: 
If a SELECT contains a custom index expression (CASSANDRA-10217), that should 
always be chosen as the primary expression during query execution. Should the 
statement contain other expressions which can be satsfied by a built in index, 
we don't currently have the ability to apply the custom expression as a filter. 
What's more, the method of selecting which index to use is fairly primitive 
(and cannot be overridden until CASSANDRA-10214), so we should ensure that a 
custom expression, if present, is always chosen. 

Suppose we have a custom index implementation which provides prefix matching on 
text fields.
{code}
CREATE TABLE ks.t (k int, v1 int, v2 text, PRIMARY KEY(k));
CREATE INDEX v1_idx ON ks.t(v1);
CREATE CUSTOM INDEX v2_idx ON ks.t(v2) USING 'com.example.CustomIndex';

INSERT INTO ks.t(k, v1, v2) VALUES(0, 0, 'abc');
INSERT INTO ks.t(k, v1, v2) VALUES(1, 1, 'def');

SELECT * FROM ks.t WHERE v1=0 AND expr(v2_idx, 'd*') ALLOW FILTERING;
{code}

In the above example the expected result would contain no rows, which would be 
the case if {{v2_idx}} is selected as the primary (i.e. most selective) index 
during query execution. However, if {{v1_idx}} is chosen instead, the results 
of its lookup will have no further filter applied and so an incorrect result 
will be returned.  


Note: this has always been something of an issue for custom indexes as the 
expressions they support may not be natively filterable by C*. For example, 
with the full text search syntax used by Stratio & DSE Search, if the custom 
index isn't selected the filtering will erroneously remove all rows as the 
value of the dummy column does not match the Lucene/Solr search expression 
literal. It's probably a fairly minor concern as in most cases a query using a 
custom index will not include other expressions (usually because custom indexes 
are per-row indexes, and so can support multi-field expression syntax). Also, 
an index implementation can return a very low number of estimated result count 
to try and ensure it is selected, custom expressions just provide an 
opportunity to improve the situation.


  was:
If a SELECT contains a custom index expression (CASSANDRA-10217), that should 
always be chosen as the primary expression during query execution. Should the 
statement contain other expressions which can be satsfied by a built in index, 
we don't currently have the ability to apply the custom expression as a filter. 
What's more, the method of selecting which index to use is fairly primitive 
(and cannot be overridden until CASSANDRA-10214), so we should ensure that a 
custom expression, if present, is always chosen. 

Suppose we have a custom index implementation which provides prefix matching on 
text fields.
{code}
CREATE TABLE ks.t (k int, v1 int, v2 text, PRIMARY KEY(k));
CREATE INDEX v1_idx ON ks.t(v1);
CREATE CUSTOM INDEX v2_idx ON ks.t(v2) USING 'com.example.CustomIndex';

INSERT INTO ks.t(k, v1, v2) VALUES(0, 0, 'abc');
INSERT INTO ks.t(k, v1, v2) VALUES(1, 1, 'def');

SELECT * FROM ks.t WHERE v1=0 AND expr(v2_idx, 'd*');
{code}

In the above example the expected result would contain no rows, which would be 
the case if {{v2_idx}} is selected as the primary (i.e. most selective) index 
during query execution. However, if {{v1_idx}} is chosen instead, the results 
of its lookup will have no further filter applied and so an incorrect result 
will be returned.  


Note: this has always been something of an issue for custom indexes as the 
expressions they support may not be natively filterable by C*. For example, 
with the full text search syntax used by Stratio & DSE Search, if the custom 
index isn't selected the filtering will erroneously remove all rows as the 
value of the dummy column does not match the Lucene/Solr search expression 
literal. It's probably a fairly minor concern as in most cases a query using a 
custom index will not include other expressions (usually because custom indexes 
are per-row indexes, and so can support multi-field expression syntax). Also, 
an index implementation can return a very low number of estimated result count 
to try and ensure it is selected, custom expressions just provide an 
opportunity to improve the situation.



> Index selection should be weighted in favour of custom expressions
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-10436
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10436
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sam Tunnicliffe
>            Assignee: Sam Tunnicliffe
>             Fix For: 3.0.0 rc2
>
>
> If a SELECT contains a custom index expression (CASSANDRA-10217), that should 
> always be chosen as the primary expression during query execution. Should the 
> statement contain other expressions which can be satsfied by a built in 
> index, we don't currently have the ability to apply the custom expression as 
> a filter. What's more, the method of selecting which index to use is fairly 
> primitive (and cannot be overridden until CASSANDRA-10214), so we should 
> ensure that a custom expression, if present, is always chosen. 
> Suppose we have a custom index implementation which provides prefix matching 
> on text fields.
> {code}
> CREATE TABLE ks.t (k int, v1 int, v2 text, PRIMARY KEY(k));
> CREATE INDEX v1_idx ON ks.t(v1);
> CREATE CUSTOM INDEX v2_idx ON ks.t(v2) USING 'com.example.CustomIndex';
> INSERT INTO ks.t(k, v1, v2) VALUES(0, 0, 'abc');
> INSERT INTO ks.t(k, v1, v2) VALUES(1, 1, 'def');
> SELECT * FROM ks.t WHERE v1=0 AND expr(v2_idx, 'd*') ALLOW FILTERING;
> {code}
> In the above example the expected result would contain no rows, which would 
> be the case if {{v2_idx}} is selected as the primary (i.e. most selective) 
> index during query execution. However, if {{v1_idx}} is chosen instead, the 
> results of its lookup will have no further filter applied and so an incorrect 
> result will be returned.  
> Note: this has always been something of an issue for custom indexes as the 
> expressions they support may not be natively filterable by C*. For example, 
> with the full text search syntax used by Stratio & DSE Search, if the custom 
> index isn't selected the filtering will erroneously remove all rows as the 
> value of the dummy column does not match the Lucene/Solr search expression 
> literal. It's probably a fairly minor concern as in most cases a query using 
> a custom index will not include other expressions (usually because custom 
> indexes are per-row indexes, and so can support multi-field expression 
> syntax). Also, an index implementation can return a very low number of 
> estimated result count to try and ensure it is selected, custom expressions 
> just provide an opportunity to improve the situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to