[jira] [Comment Edited] (CASSANDRA-10707) Add support for Group By to Select statement

Brian Hess (JIRA) Wed, 20 Jan 2016 11:04:58 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15109181#comment-15109181
 ]


 Brian Hess edited comment on CASSANDRA-10707 at 1/20/16 7:04 PM:
------------------------------------------------------------------

Correct, what [~iamaleksey] said.  In fact, pushing the aggregate computation 
to the replicas is troublesome at an RF>1.

Quick follow up - will this ticket also cover:
SELECT clusterCol, Max(x) FROM myData GROUP BY clusterCol;

That is, you group on a clustering column, but not on a partition key?

Second question - consider a table with schema myData(partitionKey INT, 
clusteringCol1 INT, clusteringCol2 INT, x INT, PRIMARY KEY ((partitionKey), 
clusteringCol1, clusteringCol2).  Now, will the following query be supported:
SELECT partitionKey, clusteringCol2, Sum(x) FROM myData GROUP BY partitionKey, 
clusteringCol2;

The reason I ask is that the following is not supported:
SELECT partitionKey, clusteringCol2, x FROM myData WHERE partitionKey=5 ORDER 
BY clusteringCol2;
Because you cannot order by clusteringCol2, only clusteringCol1.  So, the 
assumption that the data will be sorted when it arrives to the coordinator 
might not be true in all cases.


was (Author: brianmhess):
Correct, what [~iamaleksey] said.  In fact, pushing the aggregate computation 
to the replicas is troublesome at an RF>1.

Quick follow up - will this ticket also cover:
SELECT clusterCol, Max(x) FROM myData GROUP BY clusterCol;

That is, you group on a clustering column, but not on a partition key?

Second question - consider a table with schema myData(partitionKey INT, 
clusteringCol1 INT, clusteringCol2 INT, x INT, PRIMARY KEY ((partitionKey), 
clusteringCol1, clusteringCol2).  Now, will the following query be supported:
SELECT partitionKey, clusteringCol2, Sum(x) FROM myData GROUP BY partitionKey, 
clusteringCol2;

The reason I ask is that the following is not supported:
SELECT partitionKey, clusteringCol2, x FROM myData WHERE partitionKey=5 ORDER 
BY clusteringCol2;
Because you cannot order by clusteringCol2, only clusteringCol1.

> Add support for Group By to Select statement
> --------------------------------------------
>
>                 Key: CASSANDRA-10707
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10707
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL
>            Reporter: Benjamin Lerer
>            Assignee: Benjamin Lerer
>
> Now that Cassandra support aggregate functions, it makes sense to support 
> {{GROUP BY}} on the {{SELECT}} statements.
> It should be possible to group either at the partition level or at the 
> clustering column level.
> {code}
> SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey;
> SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP 
> BY partitionKey, clustering0, clustering1; 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-10707) Add support for Group By to Select statement

Reply via email to