[jira] [Commented] (CASSANDRA-7059) Range query with strict bound on clustering column can return less results than required for compact tables

2014-07-11 Thread Alex Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058967#comment-14058967
 ] 

Alex Liu commented on CASSANDRA-7059:
-

I got the following error for pig-test on 2.1 branch for a counter CF

{code}
create column family CC with  +
   key_validation_class = UTF8Type and  +
   default_validation_class=CounterColumnType  +
   and comparator=UTF8Type;
{code}

The cal query is 
{code}
SELECT * FROM CC WHERE token(key) = token(?)  AND column1   ?  LIMIT 
1000 ALLOW FILTERING
{code}

{code}
[junit] java.lang.RuntimeException
[junit] at 
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:665)
[junit] at 
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.computeNext(CqlPagingRecordReader.java:366)
[junit] at 
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.computeNext(CqlPagingRecordReader.java:289)
[junit] at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
[junit] at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
[junit] at 
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.getProgress(CqlPagingRecordReader.java:195)
[junit] at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.getProgress(PigRecordReader.java:169)
[junit] at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:514)
[junit] at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:539)
[junit] at 
org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
[junit] at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
[junit] at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
[junit] at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
[junit] at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
[junit] Caused by: InvalidRequestException(why:The query requests a 
restriction of rows with a strict bound (column1  ?) over a range of 
partitions. This is not supported by the underlying storage engine for COMPACT 
tables if a LIMIT is provided. Please either make the condition non strict 
(column1 = ?) or remove the user LIMIT)
[junit] at 
org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result$prepare_cql3_query_resultStandardScheme.read(Cassandra.java:52282)
[junit] at 
org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result$prepare_cql3_query_resultStandardScheme.read(Cassandra.java:52259)
[junit] at 
org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result.read(Cassandra.java:52198)
[junit] at 
org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
[junit] at 
org.apache.cassandra.thrift.Cassandra$Client.recv_prepare_cql3_query(Cassandra.java:1797)
[junit] at 
org.apache.cassandra.thrift.Cassandra$Client.prepare_cql3_query(Cassandra.java:1783)
[junit] at 
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.prepareQuery(CqlPagingRecordReader.java:605)
[junit] at 
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:635)
[junit] ... 13 more
{code}

 Range query with strict bound on clustering column can return less results 
 than required for compact tables
 ---

 Key: CASSANDRA-7059
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7059
 Project: Cassandra
  Issue Type: Bug
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 2.0.9

 Attachments: 7059.txt


 What's wrong:
 {noformat}
 CREATE TABLE test (
 k int,
 v int,
 PRIMARY KEY (k, v)
 ) WITH COMPACT STORAGE;
 INSERT INTO test(k, v) VALUES (0, 0);
 INSERT INTO test(k, v) VALUES (0, 1);
 INSERT INTO test(k, v) VALUES (1, 0);
 INSERT INTO test(k, v) VALUES (1, 1);
 INSERT INTO test(k, v) VALUES (2, 0);
 INSERT INTO test(k, v) VALUES (2, 1);
 SELECT * FROM test WHERE v  0 LIMIT 3 ALLOW FILTERING;
  k | v
 ---+---
  1 | 1
  0 | 1
 {noformat}
 That last query should return 3 results.
 The problem lies into how we deal with 'strict greater than' ({{}}) for 
 wide compact storage table. Namely, for those tables, we internally only 
 support inclusive bounds (for CQL3 tables this is not a problem as we deal 
 with this using the 'end-of-component' of the CompositeType encoding). So we 
 compensate by asking one more result than asked by the user, and we trim 
 afterwards if that was 

[jira] [Commented] (CASSANDRA-7059) Range query with strict bound on clustering column can return less results than required for compact tables

2014-07-11 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059600#comment-14059600
 ] 

Edward Capriolo commented on CASSANDRA-7059:


{code}
 WHERE token(key) = token(?)  
{code}
I do not understand how any of these work under RP because the relationship 
from keys to tokens is many to one? If there are 1000 keys that map to the same 
token how do we page them. I think only paging keys is logically possible and 
always correct?


 Range query with strict bound on clustering column can return less results 
 than required for compact tables
 ---

 Key: CASSANDRA-7059
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7059
 Project: Cassandra
  Issue Type: Bug
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 2.0.9

 Attachments: 7059.txt


 What's wrong:
 {noformat}
 CREATE TABLE test (
 k int,
 v int,
 PRIMARY KEY (k, v)
 ) WITH COMPACT STORAGE;
 INSERT INTO test(k, v) VALUES (0, 0);
 INSERT INTO test(k, v) VALUES (0, 1);
 INSERT INTO test(k, v) VALUES (1, 0);
 INSERT INTO test(k, v) VALUES (1, 1);
 INSERT INTO test(k, v) VALUES (2, 0);
 INSERT INTO test(k, v) VALUES (2, 1);
 SELECT * FROM test WHERE v  0 LIMIT 3 ALLOW FILTERING;
  k | v
 ---+---
  1 | 1
  0 | 1
 {noformat}
 That last query should return 3 results.
 The problem lies into how we deal with 'strict greater than' ({{}}) for 
 wide compact storage table. Namely, for those tables, we internally only 
 support inclusive bounds (for CQL3 tables this is not a problem as we deal 
 with this using the 'end-of-component' of the CompositeType encoding). So we 
 compensate by asking one more result than asked by the user, and we trim 
 afterwards if that was unnecessary. This works fine for per-partition 
 queries, but don't for range queries since we potentially would have to ask 
 for {{X}} more results where {{X}} is the number of partition fetched, but we 
 don't know {{X}} beforehand.
 I'll note that:
 * this has always be there
 * this only (potentially) affect compact tables
 * this only affect range queries that have a strict bound on the clustering 
 column (this means only {{ALLOW FILTERING}}) queries in particular.
 * this only matters if a {{LIMIT}} is set on the query.
 As for fixes, it's not entirely trivial. The right fix would probably be to 
 start supporting non-inclusive bound internally, but that's far from a small 
 fix and is at best a 2.1 fix (since we'll have to make a messaging protocol 
 change to ship some additional info for SliceQueryFilter). Also, this might 
 be a lot of work for something that only affect some {{ALLOW FILTERING}} 
 queries on compact tables.
 Another (somewhat simpler) solution might be to detect when we have this kind 
 of queries and use a pager with no limit. We would then query a first page 
 using the user limit (plus some smudge factor to avoid being inefficient too 
 often) and would continue paging unless either we've exhausted all results or 
 we can prove that post-processing we do have enough results to satisfy the 
 user limit.  This does mean in some case we might do 2 or more internal 
 queries, but in practice we can probably make that case very rare, and since 
 the query is an {{ALLOW FILTERING}} one, the user is somewhat warned that the 
 query may not be terribly efficient.
 Lastly, we could always start by disallowing the kind of query that is 
 potentially problematic (until we have a proper fix), knowing that users can 
 work around that by either using non-strict bounds or removing the {{LIMIT}}, 
 whichever makes the most sense in their case. In 1.2 in particular, we don't 
 have the query pagers, so the previous solution I describe would be a bit of 
 a mess to implement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7059) Range query with strict bound on clustering column can return less results than required for compact tables

2014-07-09 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056632#comment-14056632
 ] 

T Jake Luciani commented on CASSANDRA-7059:
---

Apparently this change breaks the CqlPagingRecordReader.  Can [~alexliu68] or 
[~slebresne] can you add a unit/integration test for this?  The BatchTest could 
serve as a prototype since it starts an embedded c* 



 Range query with strict bound on clustering column can return less results 
 than required for compact tables
 ---

 Key: CASSANDRA-7059
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7059
 Project: Cassandra
  Issue Type: Bug
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 2.0.9

 Attachments: 7059.txt


 What's wrong:
 {noformat}
 CREATE TABLE test (
 k int,
 v int,
 PRIMARY KEY (k, v)
 ) WITH COMPACT STORAGE;
 INSERT INTO test(k, v) VALUES (0, 0);
 INSERT INTO test(k, v) VALUES (0, 1);
 INSERT INTO test(k, v) VALUES (1, 0);
 INSERT INTO test(k, v) VALUES (1, 1);
 INSERT INTO test(k, v) VALUES (2, 0);
 INSERT INTO test(k, v) VALUES (2, 1);
 SELECT * FROM test WHERE v  0 LIMIT 3 ALLOW FILTERING;
  k | v
 ---+---
  1 | 1
  0 | 1
 {noformat}
 That last query should return 3 results.
 The problem lies into how we deal with 'strict greater than' ({{}}) for 
 wide compact storage table. Namely, for those tables, we internally only 
 support inclusive bounds (for CQL3 tables this is not a problem as we deal 
 with this using the 'end-of-component' of the CompositeType encoding). So we 
 compensate by asking one more result than asked by the user, and we trim 
 afterwards if that was unnecessary. This works fine for per-partition 
 queries, but don't for range queries since we potentially would have to ask 
 for {{X}} more results where {{X}} is the number of partition fetched, but we 
 don't know {{X}} beforehand.
 I'll note that:
 * this has always be there
 * this only (potentially) affect compact tables
 * this only affect range queries that have a strict bound on the clustering 
 column (this means only {{ALLOW FILTERING}}) queries in particular.
 * this only matters if a {{LIMIT}} is set on the query.
 As for fixes, it's not entirely trivial. The right fix would probably be to 
 start supporting non-inclusive bound internally, but that's far from a small 
 fix and is at best a 2.1 fix (since we'll have to make a messaging protocol 
 change to ship some additional info for SliceQueryFilter). Also, this might 
 be a lot of work for something that only affect some {{ALLOW FILTERING}} 
 queries on compact tables.
 Another (somewhat simpler) solution might be to detect when we have this kind 
 of queries and use a pager with no limit. We would then query a first page 
 using the user limit (plus some smudge factor to avoid being inefficient too 
 often) and would continue paging unless either we've exhausted all results or 
 we can prove that post-processing we do have enough results to satisfy the 
 user limit.  This does mean in some case we might do 2 or more internal 
 queries, but in practice we can probably make that case very rare, and since 
 the query is an {{ALLOW FILTERING}} one, the user is somewhat warned that the 
 query may not be terribly efficient.
 Lastly, we could always start by disallowing the kind of query that is 
 potentially problematic (until we have a proper fix), knowing that users can 
 work around that by either using non-strict bounds or removing the {{LIMIT}}, 
 whichever makes the most sense in their case. In 1.2 in particular, we don't 
 have the query pagers, so the previous solution I describe would be a bit of 
 a mess to implement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7059) Range query with strict bound on clustering column can return less results than required for compact tables

2014-06-25 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044176#comment-14044176
 ] 

Aleksey Yeschenko commented on CASSANDRA-7059:
--

LGTM, +1

 Range query with strict bound on clustering column can return less results 
 than required for compact tables
 ---

 Key: CASSANDRA-7059
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7059
 Project: Cassandra
  Issue Type: Bug
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 2.0.9

 Attachments: 7059.txt


 What's wrong:
 {noformat}
 CREATE TABLE test (
 k int,
 v int,
 PRIMARY KEY (k, v)
 ) WITH COMPACT STORAGE;
 INSERT INTO test(k, v) VALUES (0, 0);
 INSERT INTO test(k, v) VALUES (0, 1);
 INSERT INTO test(k, v) VALUES (1, 0);
 INSERT INTO test(k, v) VALUES (1, 1);
 INSERT INTO test(k, v) VALUES (2, 0);
 INSERT INTO test(k, v) VALUES (2, 1);
 SELECT * FROM test WHERE v  0 LIMIT 3 ALLOW FILTERING;
  k | v
 ---+---
  1 | 1
  0 | 1
 {noformat}
 That last query should return 3 results.
 The problem lies into how we deal with 'strict greater than' ({{}}) for 
 wide compact storage table. Namely, for those tables, we internally only 
 support inclusive bounds (for CQL3 tables this is not a problem as we deal 
 with this using the 'end-of-component' of the CompositeType encoding). So we 
 compensate by asking one more result than asked by the user, and we trim 
 afterwards if that was unnecessary. This works fine for per-partition 
 queries, but don't for range queries since we potentially would have to ask 
 for {{X}} more results where {{X}} is the number of partition fetched, but we 
 don't know {{X}} beforehand.
 I'll note that:
 * this has always be there
 * this only (potentially) affect compact tables
 * this only affect range queries that have a strict bound on the clustering 
 column (this means only {{ALLOW FILTERING}}) queries in particular.
 * this only matters if a {{LIMIT}} is set on the query.
 As for fixes, it's not entirely trivial. The right fix would probably be to 
 start supporting non-inclusive bound internally, but that's far from a small 
 fix and is at best a 2.1 fix (since we'll have to make a messaging protocol 
 change to ship some additional info for SliceQueryFilter). Also, this might 
 be a lot of work for something that only affect some {{ALLOW FILTERING}} 
 queries on compact tables.
 Another (somewhat simpler) solution might be to detect when we have this kind 
 of queries and use a pager with no limit. We would then query a first page 
 using the user limit (plus some smudge factor to avoid being inefficient too 
 often) and would continue paging unless either we've exhausted all results or 
 we can prove that post-processing we do have enough results to satisfy the 
 user limit.  This does mean in some case we might do 2 or more internal 
 queries, but in practice we can probably make that case very rare, and since 
 the query is an {{ALLOW FILTERING}} one, the user is somewhat warned that the 
 query may not be terribly efficient.
 Lastly, we could always start by disallowing the kind of query that is 
 potentially problematic (until we have a proper fix), knowing that users can 
 work around that by either using non-strict bounds or removing the {{LIMIT}}, 
 whichever makes the most sense in their case. In 1.2 in particular, we don't 
 have the query pagers, so the previous solution I describe would be a bit of 
 a mess to implement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7059) Range query with strict bound on clustering column can return less results than required for compact tables

2014-06-23 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041478#comment-14041478
 ] 

Aleksey Yeschenko commented on CASSANDRA-7059:
--

Can you please rebase? Thanks.

 Range query with strict bound on clustering column can return less results 
 than required for compact tables
 ---

 Key: CASSANDRA-7059
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7059
 Project: Cassandra
  Issue Type: Bug
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 2.0.9

 Attachments: 7059.txt


 What's wrong:
 {noformat}
 CREATE TABLE test (
 k int,
 v int,
 PRIMARY KEY (k, v)
 ) WITH COMPACT STORAGE;
 INSERT INTO test(k, v) VALUES (0, 0);
 INSERT INTO test(k, v) VALUES (0, 1);
 INSERT INTO test(k, v) VALUES (1, 0);
 INSERT INTO test(k, v) VALUES (1, 1);
 INSERT INTO test(k, v) VALUES (2, 0);
 INSERT INTO test(k, v) VALUES (2, 1);
 SELECT * FROM test WHERE v  0 LIMIT 3 ALLOW FILTERING;
  k | v
 ---+---
  1 | 1
  0 | 1
 {noformat}
 That last query should return 3 results.
 The problem lies into how we deal with 'strict greater than' ({{}}) for 
 wide compact storage table. Namely, for those tables, we internally only 
 support inclusive bounds (for CQL3 tables this is not a problem as we deal 
 with this using the 'end-of-component' of the CompositeType encoding). So we 
 compensate by asking one more result than asked by the user, and we trim 
 afterwards if that was unnecessary. This works fine for per-partition 
 queries, but don't for range queries since we potentially would have to ask 
 for {{X}} more results where {{X}} is the number of partition fetched, but we 
 don't know {{X}} beforehand.
 I'll note that:
 * this has always be there
 * this only (potentially) affect compact tables
 * this only affect range queries that have a strict bound on the clustering 
 column (this means only {{ALLOW FILTERING}}) queries in particular.
 * this only matters if a {{LIMIT}} is set on the query.
 As for fixes, it's not entirely trivial. The right fix would probably be to 
 start supporting non-inclusive bound internally, but that's far from a small 
 fix and is at best a 2.1 fix (since we'll have to make a messaging protocol 
 change to ship some additional info for SliceQueryFilter). Also, this might 
 be a lot of work for something that only affect some {{ALLOW FILTERING}} 
 queries on compact tables.
 Another (somewhat simpler) solution might be to detect when we have this kind 
 of queries and use a pager with no limit. We would then query a first page 
 using the user limit (plus some smudge factor to avoid being inefficient too 
 often) and would continue paging unless either we've exhausted all results or 
 we can prove that post-processing we do have enough results to satisfy the 
 user limit.  This does mean in some case we might do 2 or more internal 
 queries, but in practice we can probably make that case very rare, and since 
 the query is an {{ALLOW FILTERING}} one, the user is somewhat warned that the 
 query may not be terribly efficient.
 Lastly, we could always start by disallowing the kind of query that is 
 potentially problematic (until we have a proper fix), knowing that users can 
 work around that by either using non-strict bounds or removing the {{LIMIT}}, 
 whichever makes the most sense in their case. In 1.2 in particular, we don't 
 have the query pagers, so the previous solution I describe would be a bit of 
 a mess to implement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7059) Range query with strict bound on clustering column can return less results than required for compact tables

2014-04-28 Thread Christian Spriegel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983089#comment-13983089
 ] 

Christian Spriegel commented on CASSANDRA-7059:
---

Is it possible that allow filtering is generally not allowed for compact 
storage tables? (due to this ticket?)

 Range query with strict bound on clustering column can return less results 
 than required for compact tables
 ---

 Key: CASSANDRA-7059
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7059
 Project: Cassandra
  Issue Type: Bug
Reporter: Sylvain Lebresne

 What's wrong:
 {noformat}
 CREATE TABLE test (
 k int,
 v int,
 PRIMARY KEY (k, v)
 ) WITH COMPACT STORAGE;
 INSERT INTO test(k, v) VALUES (0, 0);
 INSERT INTO test(k, v) VALUES (0, 1);
 INSERT INTO test(k, v) VALUES (1, 0);
 INSERT INTO test(k, v) VALUES (1, 1);
 INSERT INTO test(k, v) VALUES (2, 0);
 INSERT INTO test(k, v) VALUES (2, 1);
 SELECT * FROM test WHERE v  0 LIMIT 3 ALLOW FILTERING;
  k | v
 ---+---
  1 | 1
  0 | 1
 {noformat}
 That last query should return 3 results.
 The problem lies into how we deal with 'strict greater than' ({{}}) for 
 wide compact storage table. Namely, for those tables, we internally only 
 support inclusive bounds (for CQL3 tables this is not a problem as we deal 
 with this using the 'end-of-component' of the CompositeType encoding). So we 
 compensate by asking one more result than asked by the user, and we trim 
 afterwards if that was unnecessary. This works fine for per-partition 
 queries, but don't for range queries since we potentially would have to ask 
 for {{X}} more results where {{X}} is the number of partition fetched, but we 
 don't know {{X}} beforehand.
 I'll note that:
 * this has always be there
 * this only (potentially) affect compact tables
 * this only affect range queries that have a strict bound on the clustering 
 column (this means only {{ALLOW FILTERING}}) queries in particular.
 * this only matters if a {{LIMIT}} is set on the query.
 As for fixes, it's not entirely trivial. The right fix would probably be to 
 start supporting non-inclusive bound internally, but that's far from a small 
 fix and is at best a 2.1 fix (since we'll have to make a messaging protocol 
 change to ship some additional info for SliceQueryFilter). Also, this might 
 be a lot of work for something that only affect some {{ALLOW FILTERING}} 
 queries on compact tables.
 Another (somewhat simpler) solution might be to detect when we have this kind 
 of queries and use a pager with no limit. We would then query a first page 
 using the user limit (plus some smudge factor to avoid being inefficient too 
 often) and would continue paging unless either we've exhausted all results or 
 we can prove that post-processing we do have enough results to satisfy the 
 user limit.  This does mean in some case we might do 2 or more internal 
 queries, but in practice we can probably make that case very rare, and since 
 the query is an {{ALLOW FILTERING}} one, the user is somewhat warned that the 
 query may not be terribly efficient.
 Lastly, we could always start by disallowing the kind of query that is 
 potentially problematic (until we have a proper fix), knowing that users can 
 work around that by either using non-strict bounds or removing the {{LIMIT}}, 
 whichever makes the most sense in their case. In 1.2 in particular, we don't 
 have the query pagers, so the previous solution I describe would be a bit of 
 a mess to implement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)