[ https://issues.apache.org/jira/browse/CASSANDRA-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174954#comment-14174954 ]
Sylvain Lebresne commented on CASSANDRA-8131: --------------------------------------------- I don't really remember which issue fixed that, but it appears to have been fixed since 2.1.0 given what [~mshuler] reports (you are wrong in assuming that {{ SELECT * FROM by_sets WHERE datakeys CONTAINS 'a' AND datakeys CONTAINS 'c';}} should return 3 rows. It's a {{AND}}, you're asking for the row whose {{datakeys}} contains both 'a' and 'c', and no row match that). That said, there is a small validation bug in that we shouldn't be allowing those queries (the ones with 2 {{CONTAINS}}) without {{ALLOW FILTERING}} since we do use server side filtering to handle those. [~blerer] can you have a look at why that is? bq. requires an "allow filtering" Any 2ndary index query that that has more than one restriction will require {{ALLOW FILTERING}} because server side we only ever query a 2ndary index with one expression, and we filter out results if there is more expressions. And that is exactly the definition of when {{ALLOW FILTERING}} is required. > Short-circuited query results from collection index query > --------------------------------------------------------- > > Key: CASSANDRA-8131 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8131 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Debian Wheezy, Oracle JDK, Cassandra 2.1 > Reporter: Catalin Alexandru Zamfir > Assignee: Benjamin Lerer > Labels: collections, cql3, cqlsh, query, queryparser, triaged > Fix For: 2.1.0 > > > After watching Jonathan's 2014 summit video, I wanted to give collection > indexes a try as they seem to be a fit for a "search by key/values" usage > pattern we have in our setup. Doing some test queries that I expect users > would do against the table, a short-circuit behavior came up: > Here's the whole transcript: > {noformat} > CREATE TABLE by_sets (id int PRIMARY KEY, datakeys set<text>, datavars > set<text>); > CREATE INDEX by_sets_datakeys ON by_sets (datakeys); > CREATE INDEX by_sets_datavars ON by_sets (datavars); > INSERT INTO by_sets (id, datakeys, datavars) VALUES (1, {'a'}, {'b'}); > INSERT INTO by_sets (id, datakeys, datavars) VALUES (2, {'c'}, {'d'}); > INSERT INTO by_sets (id, datakeys, datavars) VALUES (3, {'e'}, {'f'}); > INSERT INTO by_sets (id, datakeys, datavars) VALUES (4, {'a'}, {'z'}); > SELECT * FROM by_sets; > id | datakeys | datavars > ----+----------+---------- > 1 | {'a'} | {'b'} > 2 | {'c'} | {'d'} > 4 | {'a'} | {'z'} > 3 | {'e'} | {'f'} > {noformat} > We then tried this query which short-circuited: > {noformat} > SELECT * FROM by_sets WHERE datakeys CONTAINS 'a' AND datakeys CONTAINS 'c'; > id | datakeys | datavars > ----+----------+---------- > 1 | {'a'} | {'b'} > 4 | {'a'} | {'z'} > (2 rows) > {noformat} > Instead of receveing 3 rows, which match the datakeys CONTAINS 'a' AND > datakeys CONTAINS 'c' we only got the first. > Doing the same, but with CONTAINS 'c' first, ignores the second AND. > {noformat} > SELECT * FROM by_sets WHERE datakeys CONTAINS 'c' AND datakeys CONTAINS 'a' ; > id | datakeys | datavars > ----+----------+---------- > 2 | {'c'} | {'d'} > (1 rows) > {noformat} > Also, on a side-note, I have two indexes on both datakeys and datavars. But > when trying to run a query such as: > {noformat} > select * from by_sets WHERE datakeys CONTAINS 'a' AND datavars CONTAINS 'z'; > code=2200 [Invalid query] message="Cannot execute this query as it might > involve data filtering and thus may have unpredictable performance. > If you want to execute this query despite the performance unpredictability, > use ALLOW FILTERING" > {noformat} > The second column, after AND (even if I inverse the order) requires an "allow > filtering" clause yet the column is indexed an an in-memory "join" of the > primary keys of these sets on the coordinator could build up the result. > Could anyone explain the short-circuit behavior? > And the requirement for "allow-filtering" on a secondly indexed column? > If they're not bugs but intended they should be documented better, at least > their limitations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)