[ https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107583#comment-15107583 ]
Brian Hess commented on CASSANDRA-10707: ----------------------------------------- I think that supporting grouping by clustering column (or perhaps even a regular column) with a partition key predicate is a good idea. I think that supporting grouping by partition key (either in part, or in toto) is a bad idea. In that query, all the data in the cluster would stream to the coordinator who would then be responsible for doing a *lot* of processing. In other distributed systems that do GROUP BY queries, the groups end up being split up among the nodes in the system and each node is responsible for rolling up the data for those groups it was assigned. This is a common way to get all the nodes in the system to help with a pretty significant computation - and the data streamed out (potentially via a single node in the system) to the client. However, in this approach, all the data is streaming to a single node and that node is doing all the work, for all the groups. This feels like either a ton of work to orchestrate the computation (that would start to mimic other systems - e.g., Spark) or would do a lot of work and risk being very inefficient and slow. I am also concerned to what this would do in the face of QueryTimeoutException - would we really be able to do a GROUP BY partitionKey aggregate under the QTE limit? > Add support for Group By to Select statement > -------------------------------------------- > > Key: CASSANDRA-10707 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10707 > Project: Cassandra > Issue Type: Improvement > Components: CQL > Reporter: Benjamin Lerer > Assignee: Benjamin Lerer > > Now that Cassandra support aggregate functions, it makes sense to support > {{GROUP BY}} on the {{SELECT}} statements. > It should be possible to group either at the partition level or at the > clustering column level. > {code} > SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey; > SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP > BY partitionKey, clustering0, clustering1; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)