[ https://issues.apache.org/jira/browse/CASSANDRA-18782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Caleb Rackliffe updated CASSANDRA-18782: ---------------------------------------- Reviewers: Caleb Rackliffe > Forbid analyzed SAI indexes on primary key columns > -------------------------------------------------- > > Key: CASSANDRA-18782 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18782 > Project: Cassandra > Issue Type: Bug > Components: Feature/SAI > Reporter: Andres de la Peña > Assignee: Andres de la Peña > Priority: Normal > Fix For: 5.0.x, 5.x > > Time Spent: 20m > Remaining Estimate: 0h > > Queries using SAI indexes don't find any results when the index is on a > primary key column, the indexing uses analysis, and the queried value is > different to the exact value of the column. For example: > {code:java} > CREATE TABLE t(k int, c text, PRIMARY KEY (k, c)); > CREATE INDEX ON t(c) USING 'sai' WITH OPTIONS = { 'case_sensitive' : false }; > INSERT INTO t(k, c) VALUES (1, 'A'); > SELECT * FROM t WHERE c = 'a'; -- no results found!!! > {code} > This happens because the {{ClusteringIndexFilter}} for the query doesn't take > analysis into account. Thus, when that filter is applied by > [{{QueryController#doesNotSelect(PrimaryKey)}}|https://github.com/apache/cassandra/blob/655a2455ac29395b0a303e6ad7fc4d458b18932d/src/java/org/apache/cassandra/index/sai/plan/QueryController.java#L194-L210] > it rejects the results that have been correctly found by the index. > An initial approach to solve this problem could be making > {{ClusteringIndexFilter}} aware of the index analysis options. However, this > would be problematic for paging. The first page of the query contains a > restriction in the clustering that requires analysis. But subsequent queries > will contain the last seen clustering, and we don’t want analysis in that > case. > Another approach would be not adding a {{ClusteringIndexFilter}} to the query > restrictions when it contains this type of restriction on columns. However, > this approach would create a weird situation where adding an index might make > {{ALLOW FILTERING}} necessary in queries that wouldn’t need it without the > index. This is the opposite of the natural way of things, where more indexes > mean less AF needed. For example: > {code:java} > CREATE TABLE t(k int, c1 text, c2 int, PRIMARY KEY (k, c1, c2)); > CREATE INDEX idx ON t(c1) USING 'sai' WITH OPTIONS = { 'case_sensitive' : > false }; > SELECT * FROM t WHERE k = 0 AND c1 = 'a' AND c2 = 0 ALLOW FILTERING; > {code} > The query would need AF because it has been translated into an index query > without a clustering filter, and {{c2}} is not indexed. > I think there is an ambiguity in the query, and it's not clear if it should > use the secondary index filter and use analysis, or it should be a primary > index query and not use analysis. Although we can default to one or another > interpretation, both can serve different use cases. We will probably need > some new CQL syntax to allow users to specify whether they want to use the > secondary index or not. > We can work on those CQL improvements during the second phase of SAI. In the > meantime, I think we should simply forbid the creation of analyzed indexes on > primary key columns. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org