[ 
https://issues.apache.org/jira/browse/CASSANDRA-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13858462#comment-13858462
 ] 

Dumitru Pascu commented on CASSANDRA-6151:
------------------------------------------

As a conclusion for the current ticket and / or CASSANDRA-6311, is CqlStorage 
going to allow single partition retrieval from a column family?

In my case, I am keeping daily account snapshots in a column family like the 
following:
CREATE TABLE snapshots (
  snapshot_date timestamp,
  account_id text,
  account_type text,
  ...other columns here...
  PRIMARY KEY (snapshot_date, account_id)
)

Each day contains ~10 million records, filtering a partition at PIG level 
doesn't really seem an option in several use cases...

> CqlPagingRecorderReader Used when Partition Key Is Explicitly Stated
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-6151
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6151
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>            Reporter: Russell Alexander Spitzer
>            Assignee: Alex Liu
>            Priority: Minor
>         Attachments: 6151-1.2-branch.txt, 6151-v2-1.2-branch.txt, 
> 6151-v3-1.2-branch.txt, 6151-v4-1.2.10-branch.txt
>
>
> From 
> http://stackoverflow.com/questions/19189649/composite-key-in-cassandra-with-pig/19211546#19211546
> The user was attempting to load a single partition using a where clause in a 
> pig load statement. 
> CQL Table
> {code}
> CREATE table data (
>   occurday  text,
>   seqnumber int,
>   occurtimems bigint,
>   unique bigint,
>   fields map<text, text>,
>   primary key ((occurday, seqnumber), occurtimems, unique)
> )
> {code}
> Pig Load statement Query
> {code}
> data = LOAD 
> 'cql://ks/data?where_clause=seqnumber%3D10%20AND%20occurday%3D%272013-10-01%27'
>  USING CqlStorage();    
> {code}
> This results in an exception when processed by the the CqlPagingRecordReader 
> which attempts to page this query even though it contains at most one 
> partition key. This leads to an invalid CQL statement. 
> CqlPagingRecordReader Query
> {code}
> SELECT * FROM "data" WHERE token("occurday","seqnumber") > ? AND
> token("occurday","seqnumber") <= ? AND occurday='A Great Day' 
> AND seqnumber=1 LIMIT 1000 ALLOW FILTERING
> {code}
> Exception
> {code}
>  InvalidRequestException(why:occurday cannot be restricted by more than one 
> relation if it includes an Equal)
> {code}
> I'm not sure it is worth the special case but, a modification to not use the 
> paging record reader when the entire partition key is specified would solve 
> this issue. 
> h3. Solution
>  If it have EQUAL clauses for all the partitioning keys, we use Query 
> {code}
>   SELECT * FROM "data" 
>   WHERE occurday='A Great Day' 
>        AND seqnumber=1 LIMIT 1000 ALLOW FILTERING
> {code}
> instead of 
> {code}
>   SELECT * FROM "data" 
>   WHERE token("occurday","seqnumber") > ? 
>    AND token("occurday","seqnumber") <= ? 
>    AND occurday='A Great Day' 
>    AND seqnumber=1 LIMIT 1000 ALLOW FILTERING
> {code}
> The base line implementation is to retrieve all data of all rows around the 
> ring. This new feature is to retrieve all data of a wide row. It's a one 
> level lower than the base line. It helps for the use case where user is only 
> interested in a specific wide row, so the user doesn't spend whole job to 
> retrieve all the rows around the ring.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to