[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324146#comment-14324146 ]
Cristian O commented on CASSANDRA-4914: --------------------------------------- A couple of thoughts: - doing aggregations on the coordinator is clearly not feasible in the real world beyond some toy use cases. I don't know the internals but it should be doable to push the aggregation function to the partitions without requiring the data interface to understand CQL. Note that *all* agg functions are eminently parallelizible including AVG which obviously can be computed from SUM/COUNT on the same elements. As someone pointed out before these are all REDUCE type functions (or monoids if you like) - dealing with consistency is tricky but then Cassandra is by design eventually consistent so why not have eventually consistent aggregations. Just pick a partition and aggregate on that. With large datasets an average differing at the sixth decimal won't really matter. Or if you want to be really fancy compute on every (or quorum) partitions and return results with a tolerance factor. Maybe it's useful to target this feature at use cases that need fast simple aggregates on large amounts of data like for example charts on time series. For more complex analytics Spark on top of Cass is actually an excellent solution already if it's setup correctly in terms of colocation. This would help use cases when Spark is too much of an overhead. > Aggregation functions in CQL > ---------------------------- > > Key: CASSANDRA-4914 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 > Project: Cassandra > Issue Type: New Feature > Reporter: Vijay > Assignee: Benjamin Lerer > Labels: cql, docs > Fix For: 3.0 > > Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, > CASSANDRA-4914-V4.txt, CASSANDRA-4914-V5.txt, CASSANDRA-4914.txt > > > The requirement is to do aggregation of data in Cassandra (Wide row of column > values of int, double, float etc). > With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for > the columns within a row). > Example: > SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; > > empid | deptid | first_name | last_name | salary > -------+--------+------------+-----------+-------- > 130 | 3 | joe | doe | 10.1 > 130 | 2 | joe | doe | 100 > 130 | 1 | joe | doe | 1e+03 > > SELECT sum(salary), empid FROM emp WHERE empID IN (130); > > sum(salary) | empid > -------------+-------- > 1110.1 | 130 -- This message was sent by Atlassian JIRA (v6.3.4#6332)