[jira] [Commented] (CASSANDRA-7059) Range query with strict bound on clustering column can return less results than required for compact tables

Alex Liu (JIRA) Fri, 11 Jul 2014 09:38:28 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14058967#comment-14058967
 ]


Alex Liu commented on CASSANDRA-7059:
-------------------------------------

I got the following error for pig-test on 2.1 branch for a counter CF

{code}
create column family CC with " +
                       "key_validation_class = UTF8Type and " +
                       "default_validation_class=CounterColumnType " +
                       "and comparator=UTF8Type;
{code}

The cal query is 
{code}
SELECT * FROM "CC" WHERE token("key") = token(?)  AND "column1"  > ?  LIMIT 
1000 ALLOW FILTERING
{code}

{code}
    [junit] java.lang.RuntimeException
    [junit]     at 
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:665)
    [junit]     at 
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.computeNext(CqlPagingRecordReader.java:366)
    [junit]     at 
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.computeNext(CqlPagingRecordReader.java:289)
    [junit]     at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
    [junit]     at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
    [junit]     at 
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.getProgress(CqlPagingRecordReader.java:195)
    [junit]     at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.getProgress(PigRecordReader.java:169)
    [junit]     at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:514)
    [junit]     at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:539)
    [junit]     at 
org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
    [junit]     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    [junit]     at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    [junit]     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    [junit]     at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
    [junit] Caused by: InvalidRequestException(why:The query requests a 
restriction of rows with a strict bound (column1 > ?) over a range of 
partitions. This is not supported by the underlying storage engine for COMPACT 
tables if a LIMIT is provided. Please either make the condition non strict 
(column1 >= ?) or remove the user LIMIT)
    [junit]     at 
org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result$prepare_cql3_query_resultStandardScheme.read(Cassandra.java:52282)
    [junit]     at 
org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result$prepare_cql3_query_resultStandardScheme.read(Cassandra.java:52259)
    [junit]     at 
org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result.read(Cassandra.java:52198)
    [junit]     at 
org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
    [junit]     at 
org.apache.cassandra.thrift.Cassandra$Client.recv_prepare_cql3_query(Cassandra.java:1797)
    [junit]     at 
org.apache.cassandra.thrift.Cassandra$Client.prepare_cql3_query(Cassandra.java:1783)
    [junit]     at 
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.prepareQuery(CqlPagingRecordReader.java:605)
    [junit]     at 
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:635)
    [junit]     ... 13 more
{code}

> Range query with strict bound on clustering column can return less results 
> than required for compact tables
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7059
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7059
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 2.0.9
>
>         Attachments: 7059.txt
>
>
> What's wrong:
> {noformat}
> CREATE TABLE test (
>     k int,
>     v int,
>     PRIMARY KEY (k, v)
> ) WITH COMPACT STORAGE;
> INSERT INTO test(k, v) VALUES (0, 0);
> INSERT INTO test(k, v) VALUES (0, 1);
> INSERT INTO test(k, v) VALUES (1, 0);
> INSERT INTO test(k, v) VALUES (1, 1);
> INSERT INTO test(k, v) VALUES (2, 0);
> INSERT INTO test(k, v) VALUES (2, 1);
> SELECT * FROM test WHERE v > 0 LIMIT 3 ALLOW FILTERING;
>  k | v
> ---+---
>  1 | 1
>  0 | 1
> {noformat}
> That last query should return 3 results.
> The problem lies into how we deal with 'strict greater than' ({{>}}) for 
> "wide" compact storage table. Namely, for those tables, we internally only 
> support inclusive bounds (for CQL3 tables this is not a problem as we deal 
> with this using the 'end-of-component' of the CompositeType encoding). So we 
> "compensate" by asking one more result than asked by the user, and we trim 
> afterwards if that was unnecessary. This works fine for per-partition 
> queries, but don't for "range" queries since we potentially would have to ask 
> for {{X}} more results where {{X}} is the number of partition fetched, but we 
> don't know {{X}} beforehand.
> I'll note that:
> * this has always be there
> * this only (potentially) affect compact tables
> * this only affect range queries that have a strict bound on the clustering 
> column (this means only {{ALLOW FILTERING}}) queries in particular.
> * this only matters if a {{LIMIT}} is set on the query.
> As for fixes, it's not entirely trivial. The "right" fix would probably be to 
> start supporting non-inclusive bound internally, but that's far from a small 
> fix and is "at best" a 2.1 fix (since we'll have to make a messaging protocol 
> change to ship some additional info for SliceQueryFilter). Also, this might 
> be a lot of work for something that only affect some {{ALLOW FILTERING}} 
> queries on compact tables.
> Another (somewhat simpler) solution might be to detect when we have this kind 
> of queries and use a pager with no limit. We would then query a first page 
> using the user limit (plus some smudge factor to avoid being inefficient too 
> often) and would continue paging unless either we've exhausted all results or 
> we can prove that post-processing we do have enough results to satisfy the 
> user limit.  This does mean in some case we might do 2 or more internal 
> queries, but in practice we can probably make that case very rare, and since 
> the query is an {{ALLOW FILTERING}} one, the user is somewhat warned that the 
> query may not be terribly efficient.
> Lastly, we could always start by disallowing the kind of query that is 
> potentially problematic (until we have a proper fix), knowing that users can 
> work around that by either using non-strict bounds or removing the {{LIMIT}}, 
> whichever makes the most sense in their case. In 1.2 in particular, we don't 
> have the query pagers, so the previous solution I describe would be a bit of 
> a mess to implement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7059) Range query with strict bound on clustering column can return less results than required for compact tables

Reply via email to