[ 
https://issues.apache.org/jira/browse/CASSANDRA-18782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andres de la Peña updated CASSANDRA-18782:
------------------------------------------
          Fix Version/s: 5.0
                         5.1
                             (was: 5.x)
                             (was: 5.0.x)
          Since Version: 5.0-alpha1
    Source Control Link: 
https://github.com/apache/cassandra/commit/b265b4658e007b6943d543a11c609b7ba5fd979f
             Resolution: Fixed
                 Status: Resolved  (was: Ready to Commit)

> Forbid analyzed SAI indexes on primary key columns
> --------------------------------------------------
>
>                 Key: CASSANDRA-18782
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18782
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Feature/SAI
>            Reporter: Andres de la Peña
>            Assignee: Andres de la Peña
>            Priority: Normal
>             Fix For: 5.0, 5.1
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Queries using SAI indexes don't find any results when the index is on a 
> primary key column, the indexing uses analysis, and the queried value is 
> different to the exact value of the column. For example:
> {code:java}
> CREATE TABLE t(k int, c text, PRIMARY KEY (k, c));
> CREATE INDEX ON t(c) USING 'sai' WITH OPTIONS = { 'case_sensitive' : false };
> INSERT INTO t(k, c) VALUES (1, 'A');
> SELECT * FROM t WHERE c = 'a'; -- no results found!!!
> {code}
> This happens because the {{ClusteringIndexFilter}} for the query doesn't take 
> analysis into account. Thus, when that filter is applied by 
> [{{QueryController#doesNotSelect(PrimaryKey)}}|https://github.com/apache/cassandra/blob/655a2455ac29395b0a303e6ad7fc4d458b18932d/src/java/org/apache/cassandra/index/sai/plan/QueryController.java#L194-L210]
>  it rejects the results that have been correctly found by the index.
> An initial approach to solve this problem could be making 
> {{ClusteringIndexFilter}} aware of the index analysis options. However, this 
> would be problematic for paging. The first page of the query contains a 
> restriction in the clustering that requires analysis. But subsequent queries 
> will contain the last seen clustering, and we don’t want analysis in that 
> case.
> Another approach would be not adding a {{ClusteringIndexFilter}} to the query 
> restrictions when it contains this type of restriction on columns. However, 
> this approach would create a weird situation where adding an index might make 
> {{ALLOW FILTERING}} necessary in queries that wouldn’t need it without the 
> index. This is the opposite of the natural way of things, where more indexes 
> mean less AF needed. For example:
> {code:java}
> CREATE TABLE t(k int, c1 text, c2 int, PRIMARY KEY (k, c1, c2));
> CREATE INDEX idx ON t(c1) USING 'sai' WITH OPTIONS = { 'case_sensitive' : 
> false };
> SELECT * FROM t WHERE k = 0 AND c1 = 'a' AND c2 = 0 ALLOW FILTERING;
> {code}
> The query would need AF because it has been translated into an index query 
> without a clustering filter, and {{c2}} is not indexed.
> I think there is an ambiguity in the query, and it's not clear if it should 
> use the secondary index filter and use analysis, or it should be a primary 
> index query and not use analysis. Although we can default to one or another 
> interpretation, both can serve different use cases. We will probably need 
> some new CQL syntax to allow users to specify whether they want to use the 
> secondary index or not.
> We can work on those CQL improvements during the second phase of SAI. In the 
> meantime, I think we should simply forbid the creation of analyzed indexes on 
> primary key columns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to