[jira] [Commented] (CASSANDRA-6348) TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering

Sylvain Lebresne (JIRA) Mon, 18 Nov 2013 07:34:01 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13825401#comment-13825401
 ]


Sylvain Lebresne commented on CASSANDRA-6348:
---------------------------------------------

Hum, can't really reproduce on the cassandra-1.2 branch:
{noformat}
Connected to test at 127.0.0.1:9160.
[cqlsh 3.1.8 | Cassandra 1.2.11-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol 
19.36.1]
Use HELP for help.
cqlsh> create KEYSPACE ks WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': 1};
cqlsh> use ks;
cqlsh:ks>   create table test ( key1 int, key2 int , col1 int, col2 int, 
primary key (key1, key2));
cqlsh:ks>   create index col1 on test(col1);
cqlsh:ks>   create index col2 on test(col2);
cqlsh:ks> select * from test where col1=100 and col2 =1;
Bad Request: Cannot execute this query as it might involve data filtering and 
thus may have unpredictable performance. If you want to execute this query 
despite the performance unpredictability, use ALLOW FILTERING
{noformat}
I.e. ALLOW FILTERING does is required.

bq. We can either disable those kind of queries or WARN the user that data 
filtering might lead to timeout exception or OOM.

Just to make sure we agree, that's *exactly* what requiring ALLOW FILTERING is 
about, warning the user that C* does not execute the query smartly and that the 
performance will suck. You should *never* use ALLOW FILTERING in production 
unless you know very well what you do in particular.

bq. We should be able to auto page through 2i CF (for native protocol), so if 
the auto-paging ends in the middle of a index scanning

This is not really what the native protocol paging is about. If you ask pages 
of 1000 results, the native protocol paging will return you pages of 1000 
results until you're done paging. In that case, the point is that it takes a 
long time to find any results at all because the way we handle the query is 
dumb.  But I'll note that we do page internally the index scanning (which is 
why you can get a timeout but in theory not an OOM).

Note that I'm not saying we shouldn't improve the way we handle such queries, 
but that's a whole separate issue (CASSANDRA-6048).


> TimeoutException throws if Cql query allows data filtering and index is too 
> big and it can't find the data in base CF after filtering 
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6348
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6348
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Alex Liu
>            Assignee: Alex Liu
>
> If index row is too big, and filtering can't find the match Cql row in base 
> CF, it keep scanning the index row and retrieving base CF until the index row 
> is scanned completely which may take too long and thrift server returns 
> TimeoutException. This is one of the reasons why we shouldn't index a column 
> if the index is too big.
> Multiple indexes merging can resolve the case where there are only EQUAL 
> clauses. (CASSANDRA-6048 addresses it).
> If the query has none-EQUAL clauses, we still need do data filtering which 
> might lead to timeout exception.
> We can either disable those kind of queries or WARN the user that data 
> filtering might lead to timeout exception or OOM.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6348) TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering

Reply via email to