[ 
https://issues.apache.org/jira/browse/CASSANDRA-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13825496#comment-13825496
 ] 

Alex Liu commented on CASSANDRA-6348:
-------------------------------------

I forgot to put the "ALLOW FILTERING" in the clauses. The issue is raised 
during the Hadoop performance testing on indexed columns(The test case indexes 
on the columns which results in too big index). Hadoop Cql query uses "ALLOW 
FILTERING", user can provide user defined where clauses which might have data 
filtering on multiple columns. But the hadoop user may not understand fully 
what does data filtering work under the hood.

 Other than hadoop queries, It's common for user to query on multiple indexes, 
we should explain more detail about when the "ALLOW FILTERING" results in bad 
performance and which case leads to timeout exception in the following 
exception. 

{code}
Cannot execute this query as it might involve data filtering and thus may have 
unpredictable performance. If you want to execute this query despite the 
performance unpredictability, use ALLOW FILTERING
{code}

For most of the cases, "ALLOW FILTERING" improves performance. We can't assume 
that user can fully understand "ALLOW FILTERING" under the hood. I even spend 
quite some time on CASSANDRA-6048 to understand more about data filtering.



> TimeoutException throws if Cql query allows data filtering and index is too 
> big and it can't find the data in base CF after filtering 
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6348
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6348
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Alex Liu
>            Assignee: Alex Liu
>
> If index row is too big, and filtering can't find the match Cql row in base 
> CF, it keep scanning the index row and retrieving base CF until the index row 
> is scanned completely which may take too long and thrift server returns 
> TimeoutException. This is one of the reasons why we shouldn't index a column 
> if the index is too big.
> Multiple indexes merging can resolve the case where there are only EQUAL 
> clauses. (CASSANDRA-6048 addresses it).
> If the query has none-EQUAL clauses, we still need do data filtering which 
> might lead to timeout exception.
> We can either disable those kind of queries or WARN the user that data 
> filtering might lead to timeout exception or OOM.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to