[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql

Benjamin Lerer (JIRA) Thu, 28 Aug 2014 02:45:06 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Benjamin Lerer updated CASSANDRA-7016:
--------------------------------------

    Attachment: CASSANDRA-7016-V3.txt

The patch fix the previously mentioned issues and fix also the fact that the 
previous solution was not working properly with partition key on multiple 
columns.


> can't map/reduce over subset of rows with cql
> ---------------------------------------------
>
>                 Key: CASSANDRA-7016
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core, Hadoop
>            Reporter: Jonathan Halliday
>            Assignee: Benjamin Lerer
>            Priority: Minor
>              Labels: cql
>             Fix For: 2.1.1
>
>         Attachments: CASSANDRA-7016-V2.txt, CASSANDRA-7016-V3.txt, 
> CASSANDRA-7016.txt
>
>
> select ... where token(k) < x and token(k) >= y and k in (a,b) allow 
> filtering;
> This fails on 2.0.6: can't restrict k by more than one relation.
> In the context of map/reduce (hence the token range) I want to map over only 
> a subset of the keys (hence the 'in').  Pushing the 'in' filter down to cql 
> is substantially cheaper than pulling all rows to the client and then 
> discarding most of them.
> Currently this is possible only if the hadoop integration code is altered to 
> apply the AND on the client side and use cql that contains only the resulting 
> filtered 'in' set.  The problem is not hadoop specific though, so IMO it 
> should really be solved in cql not the hadoop integration code.
> Most restrictions on cql syntax seem to exist to prevent unduly expensive 
> queries. This one seems to be doing the opposite.
> Edit: on further thought and with reference to the code in 
> SelectStatement$RawStatement, it seems to me that  token(k) and k should be 
> considered distinct entities for the purposes of processing restrictions. 
> That is, no restriction on the token should conflict with a restriction on 
> the raw key. That way any monolithic query in terms of k and be decomposed 
> into parallel chunks over the token range for the purposes of map/reduce 
> processing simply by appending a 'and where token(k)...' clause to the 
> exiting 'where k ...'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql

Reply via email to