[jira] [Commented] (CASSANDRA-8131) Short-circuited query results from collection index query

Catalin Alexandru Zamfir (JIRA) Fri, 17 Oct 2014 05:57:57 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14175030#comment-14175030
 ]


Catalin Alexandru Zamfir commented on CASSANDRA-8131:
-----------------------------------------------------

True. I was threw off by the "CONTAINS" which I interpreted as a search, 
basically linking to "searches" (CONTAINS) in the same result set (at least in 
my mind). Cassandra does not support OR to reach the goal we're trying to 
achieve.

Instead, I've tried this, which I guess it's already fixed (should have 
returned one row):
{noformat}
insert into by_sets (id, datakeys, datavars) values (5, {'a', 'c'}, {'q'});
select * from by_sets WHERE datakeys CONTAINS 'a' AND datakeys CONTAINS 'c' ;

 id | datakeys   | datavars
----+------------+----------
  5 | {'a', 'c'} |    {'q'}
  1 |      {'a'} |    {'b'}
  4 |      {'a'} |    {'z'}

(3 rows)
{noformat}

> Short-circuited query results from collection index query
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-8131
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8131
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Debian Wheezy, Oracle JDK, Cassandra 2.1
>            Reporter: Catalin Alexandru Zamfir
>            Assignee: Benjamin Lerer
>              Labels: collections, cql3, cqlsh, query, queryparser, triaged
>             Fix For: 2.1.0
>
>
> After watching Jonathan's 2014 summit video, I wanted to give collection 
> indexes a try as they seem to be a fit for a "search by key/values" usage 
> pattern we have in our setup. Doing some test queries that I expect users 
> would do against the table, a short-circuit behavior came up:
> Here's the whole transcript:
> {noformat}
> CREATE TABLE by_sets (id int PRIMARY KEY, datakeys set<text>, datavars 
> set<text>);
> CREATE INDEX by_sets_datakeys ON by_sets (datakeys);
> CREATE INDEX by_sets_datavars ON by_sets (datavars);
> INSERT INTO by_sets (id, datakeys, datavars) VALUES (1, {'a'}, {'b'});
> INSERT INTO by_sets (id, datakeys, datavars) VALUES (2, {'c'}, {'d'});
> INSERT INTO by_sets (id, datakeys, datavars) VALUES (3, {'e'}, {'f'});
> INSERT INTO by_sets (id, datakeys, datavars) VALUES (4, {'a'}, {'z'});
> SELECT * FROM by_sets;
>  id | datakeys | datavars
> ----+----------+----------
>   1 |    {'a'} |    {'b'}
>   2 |    {'c'} |    {'d'}
>   4 |    {'a'} |    {'z'}
>   3 |    {'e'} |    {'f'}
> {noformat}
> We then tried this query which short-circuited:
> {noformat}
> SELECT * FROM by_sets WHERE datakeys CONTAINS 'a' AND datakeys CONTAINS 'c';
>  id | datakeys | datavars
> ----+----------+----------
>   1 |    {'a'} |    {'b'}
>   4 |    {'a'} |    {'z'}
> (2 rows)
> {noformat}
> Instead of receveing 3 rows, which match the datakeys CONTAINS 'a' AND 
> datakeys CONTAINS 'c' we only got the first.
> Doing the same, but with CONTAINS 'c' first, ignores the second AND.
> {noformat}
> SELECT * FROM by_sets WHERE datakeys CONTAINS 'c' AND datakeys CONTAINS 'a' ;
>  id | datakeys | datavars
> ----+----------+----------
>   2 |    {'c'} |    {'d'}
> (1 rows)
> {noformat}
> Also, on a side-note, I have two indexes on both datakeys and datavars. But 
> when trying to run a query such as:
> {noformat}
> select * from by_sets WHERE datakeys CONTAINS 'a' AND datavars CONTAINS 'z';
> code=2200 [Invalid query] message="Cannot execute this query as it might 
> involve data filtering and thus may have unpredictable performance. 
> If you want to execute this query despite the performance unpredictability, 
> use ALLOW FILTERING"
> {noformat}
> The second column, after AND (even if I inverse the order) requires an "allow 
> filtering" clause yet the column is indexed an an in-memory "join" of the 
> primary keys of these sets on the coordinator could build up the result.
> Could anyone explain the short-circuit behavior?
> And the requirement for "allow-filtering" on a secondly indexed column?
> If they're not bugs but intended they should be documented better, at least 
> their limitations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8131) Short-circuited query results from collection index query

Reply via email to