[ https://issues.apache.org/jira/browse/CASSANDRA-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14058967#comment-14058967 ]
Alex Liu commented on CASSANDRA-7059: ------------------------------------- I got the following error for pig-test on 2.1 branch for a counter CF {code} create column family CC with " + "key_validation_class = UTF8Type and " + "default_validation_class=CounterColumnType " + "and comparator=UTF8Type; {code} The cal query is {code} SELECT * FROM "CC" WHERE token("key") = token(?) AND "column1" > ? LIMIT 1000 ALLOW FILTERING {code} {code} [junit] java.lang.RuntimeException [junit] at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:665) [junit] at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.computeNext(CqlPagingRecordReader.java:366) [junit] at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.computeNext(CqlPagingRecordReader.java:289) [junit] at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) [junit] at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) [junit] at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.getProgress(CqlPagingRecordReader.java:195) [junit] at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.getProgress(PigRecordReader.java:169) [junit] at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:514) [junit] at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:539) [junit] at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) [junit] at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) [junit] at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) [junit] at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) [junit] at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) [junit] Caused by: InvalidRequestException(why:The query requests a restriction of rows with a strict bound (column1 > ?) over a range of partitions. This is not supported by the underlying storage engine for COMPACT tables if a LIMIT is provided. Please either make the condition non strict (column1 >= ?) or remove the user LIMIT) [junit] at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result$prepare_cql3_query_resultStandardScheme.read(Cassandra.java:52282) [junit] at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result$prepare_cql3_query_resultStandardScheme.read(Cassandra.java:52259) [junit] at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result.read(Cassandra.java:52198) [junit] at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) [junit] at org.apache.cassandra.thrift.Cassandra$Client.recv_prepare_cql3_query(Cassandra.java:1797) [junit] at org.apache.cassandra.thrift.Cassandra$Client.prepare_cql3_query(Cassandra.java:1783) [junit] at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.prepareQuery(CqlPagingRecordReader.java:605) [junit] at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:635) [junit] ... 13 more {code} > Range query with strict bound on clustering column can return less results > than required for compact tables > ----------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-7059 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7059 > Project: Cassandra > Issue Type: Bug > Reporter: Sylvain Lebresne > Assignee: Sylvain Lebresne > Fix For: 2.0.9 > > Attachments: 7059.txt > > > What's wrong: > {noformat} > CREATE TABLE test ( > k int, > v int, > PRIMARY KEY (k, v) > ) WITH COMPACT STORAGE; > INSERT INTO test(k, v) VALUES (0, 0); > INSERT INTO test(k, v) VALUES (0, 1); > INSERT INTO test(k, v) VALUES (1, 0); > INSERT INTO test(k, v) VALUES (1, 1); > INSERT INTO test(k, v) VALUES (2, 0); > INSERT INTO test(k, v) VALUES (2, 1); > SELECT * FROM test WHERE v > 0 LIMIT 3 ALLOW FILTERING; > k | v > ---+--- > 1 | 1 > 0 | 1 > {noformat} > That last query should return 3 results. > The problem lies into how we deal with 'strict greater than' ({{>}}) for > "wide" compact storage table. Namely, for those tables, we internally only > support inclusive bounds (for CQL3 tables this is not a problem as we deal > with this using the 'end-of-component' of the CompositeType encoding). So we > "compensate" by asking one more result than asked by the user, and we trim > afterwards if that was unnecessary. This works fine for per-partition > queries, but don't for "range" queries since we potentially would have to ask > for {{X}} more results where {{X}} is the number of partition fetched, but we > don't know {{X}} beforehand. > I'll note that: > * this has always be there > * this only (potentially) affect compact tables > * this only affect range queries that have a strict bound on the clustering > column (this means only {{ALLOW FILTERING}}) queries in particular. > * this only matters if a {{LIMIT}} is set on the query. > As for fixes, it's not entirely trivial. The "right" fix would probably be to > start supporting non-inclusive bound internally, but that's far from a small > fix and is "at best" a 2.1 fix (since we'll have to make a messaging protocol > change to ship some additional info for SliceQueryFilter). Also, this might > be a lot of work for something that only affect some {{ALLOW FILTERING}} > queries on compact tables. > Another (somewhat simpler) solution might be to detect when we have this kind > of queries and use a pager with no limit. We would then query a first page > using the user limit (plus some smudge factor to avoid being inefficient too > often) and would continue paging unless either we've exhausted all results or > we can prove that post-processing we do have enough results to satisfy the > user limit. This does mean in some case we might do 2 or more internal > queries, but in practice we can probably make that case very rare, and since > the query is an {{ALLOW FILTERING}} one, the user is somewhat warned that the > query may not be terribly efficient. > Lastly, we could always start by disallowing the kind of query that is > potentially problematic (until we have a proper fix), knowing that users can > work around that by either using non-strict bounds or removing the {{LIMIT}}, > whichever makes the most sense in their case. In 1.2 in particular, we don't > have the query pagers, so the previous solution I describe would be a bit of > a mess to implement. -- This message was sent by Atlassian JIRA (v6.2#6252)