[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326112#comment-14326112 ] Robert Stupp commented on CASSANDRA-4914: - I've opened CASSANDRA-8862 + CASSANDRA-8827. Discussion about improvements sbhould go there. Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, CASSANDRA-4914-V4.txt, CASSANDRA-4914-V5.txt, CASSANDRA-4914.txt The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326012#comment-14326012 ] Benedict commented on CASSANDRA-4914: - I'm with Cristian here, as I suggested at the NGCC last year. If we want efficient aggregations, they should absolutely be performed at the replicas. I realise we're not aiming for that first time around, but IMO it should be the long term goal. Shipping all of your data over the wire is a pretty significant cost and bottleneck, making the current implementation more of a convenience than an analytic tool. It's possible to perform conflict resolution a few ways. Probably the best is to first let the user specify if they care (CL=ONE is not exactly an uncommon usecase, last I heard we reckon 30% of deployments use this. esp. for analytics queries slight staleness may not be important), and if they do perform a repair-aware read from each neighbour to ensure the replica is up-to-date. Or calculate the result optimistically, along with a checksum and perform the repair if either don't match. Or select your strategy based on if the data has been updated recently (say, last few minutes), and if it has be pessimistic, and otherwise be optimistic. This is largely what [~tjake]'s Repair Aware Consistency Levels (CASSANDRA-7168) is about. Generally, analytics queries are intended to be run over large, _majority_ static datasets, so the computation should be optimised for this IMO. There is of course the complication of supporting deterministic aggregations over multiple partitions, which would probably have to fallback to coordinator level aggregation for operations that cannot be trivially composed exactly (e.g. median), but most aggregations can be composed from partial computations trivially. The provision of a sampled approach seems like another excellent idea to me, but an orthogonal one. The calculation should probably still be offloaded to each node, then combined probabilistically. This would also support efficient multi-partition queries for all aggregations. I'm not saying any of these are trivial undertakings, but they should be what we're aiming for AFAICT. Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, CASSANDRA-4914-V4.txt, CASSANDRA-4914-V5.txt, CASSANDRA-4914.txt The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324355#comment-14324355 ] Cristian O commented on CASSANDRA-4914: --- To be clear I'm not talking about distributed processing. I'm talking about online analitical queries, particularly time series. Incidentally, I don't know how many people realize this but Cass has the same distributed storage architecture that Vertica has. It's also possible to map a columnar schema on top of sstable. Of course native support for columnar storage would be immensely better. See CASSANDRA-7447 There's a lot of opportunity in this space, just need a bit of vision. Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, CASSANDRA-4914-V4.txt, CASSANDRA-4914-V5.txt, CASSANDRA-4914.txt The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324408#comment-14324408 ] Tyler Hobbs commented on CASSANDRA-4914: bq. I don't know the internals but it should be doable to push the aggregation function to the partitions without requiring the data interface to understand CQL. The problem with pushing aggregate calculation down to the replicas is that there's no conflict resolution. So the aggregation can be computed over stale or deleted data. That may be acceptable if you're reading at consistency level ONE, but then we're dealing with a limited, special case. bq. Note that all agg functions are eminently parallelizible I don't believe this is true. Off the top of my head, computing the median of a dataset is not really parallelizable (without some sort of internode communication). bq. dealing with consistency is tricky but then Cassandra is by design eventually consistent so why not have eventually consistent aggregations. Just pick a partition and aggregate on that. With large datasets an average differing at the sixth decimal won't really matter. That may be acceptable for aggregates like average, but other aggregates may require precision. With all of that said, I wouldn't necessarily be opposed to supporting selecting a sampling of data from a table (and allowing an aggregate to be run over that), but I suggest opening a new ticket for that discussion. Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, CASSANDRA-4914-V4.txt, CASSANDRA-4914-V5.txt, CASSANDRA-4914.txt The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324266#comment-14324266 ] Anton Slutsky commented on CASSANDRA-4914: -- Robert Stupp, you are absolutely right! Somehow, I forgot we are hashing! This is cool. I'll look into the code more carefully, but if hash buckets are narrow enough, maybe true random sampling can be assumed. If that's the case, then this is not that difficult to do. Also, please dont add distributed calculation facilities to Cassandra! :) It will turn it into a poor-man's hadoop and either you guys will have to support a real distributed processing system or everyone will be pissed off that Map/Reduce on Cassandra doesnt work like it does on Hadoop :-) Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, CASSANDRA-4914-V4.txt, CASSANDRA-4914-V5.txt, CASSANDRA-4914.txt The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323798#comment-14323798 ] Robert Stupp commented on CASSANDRA-4914: - // flame off You could use Spark instead of Hadoop ;) You're right - computation of aggregates is done by the coordinator that has to pull all rows and do computation on it. That's (unfortunately) what we can do now. If aggregates are applied to some partitions or on an even bigger data set, performance is directly proportional to the number of involved partitions (sounds better than _getting slower_). I have been thinking about a method to let the other nodes (owners of other partitions) take part in aggregate calculation. But that implies that the other nodes _know_ about the aggregate - i.e. basically the actual CQL. Means: the approach *could* be a two-stage aggregate, where the first stage runs on the partitions and a second (final) stage runs on the partial results from the first stage. But the current storage protocol does not allow us to do that - it just allows to grab _raw data_. Such an approach might also improve edge cases that require ALLOW FILTERING, which basically do the same (pipe all data to the coordinator and filter in the coordinator). Your approach looks interesting (although I'm not a statistics guru). Although I'm not sure what's meant by _first record_ or _smart sampling_ since there's nothing like ordering by partition key. Don't get me wrong - I'm interested in that. Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, CASSANDRA-4914-V4.txt, CASSANDRA-4914-V5.txt, CASSANDRA-4914.txt The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323933#comment-14323933 ] Ahmet AKYOL commented on CASSANDRA-4914: As [~jbellis] said it's a different yet interesting case. IMHO, some probabilistic data structures can be implemented like bloom filter. A library like [stream-lib|https://github.com/addthis/stream-lib] can be used (by the way, they say that, [~jbellis]'s blog post about bloom filters has inspired them ). Hyperloglog can also be useful,just look [Redis's implementation|http://redis.io/commands#hyperloglog]. Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, CASSANDRA-4914-V4.txt, CASSANDRA-4914-V5.txt, CASSANDRA-4914.txt The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324146#comment-14324146 ] Cristian O commented on CASSANDRA-4914: --- A couple of thoughts: - doing aggregations on the coordinator is clearly not feasible in the real world beyond some toy use cases. I don't know the internals but it should be doable to push the aggregation function to the partitions without requiring the data interface to understand CQL. Note that *all* agg functions are eminently parallelizible including AVG which obviously can be computed from SUM/COUNT on the same elements. As someone pointed out before these are all REDUCE type functions (or monoids if you like) - dealing with consistency is tricky but then Cassandra is by design eventually consistent so why not have eventually consistent aggregations. Just pick a partition and aggregate on that. With large datasets an average differing at the sixth decimal won't really matter. Or if you want to be really fancy compute on every (or quorum) partitions and return results with a tolerance factor. Maybe it's useful to target this feature at use cases that need fast simple aggregates on large amounts of data like for example charts on time series. For more complex analytics Spark on top of Cass is actually an excellent solution already if it's setup correctly in terms of colocation. This would help use cases when Spark is too much of an overhead. Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, CASSANDRA-4914-V4.txt, CASSANDRA-4914-V5.txt, CASSANDRA-4914.txt The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323676#comment-14323676 ] Jonathan Ellis commented on CASSANDRA-4914: --- That's really a different use case. It could be an interesting one, but I'm not sure the infrastructure is there yet to support it. Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, CASSANDRA-4914-V4.txt, CASSANDRA-4914-V5.txt, CASSANDRA-4914.txt The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323716#comment-14323716 ] Anton Slutsky commented on CASSANDRA-4914: -- I think, it may not be all that complicated, at least in some cases. If we consider the avg function for example, any record in the resultset of interest has a non-zero probability of being exactly the average value, kind of by definition :-), and nothing prevents us from grabbing the very first record and looking at it from that point of view. The key here is, of course, to figure out what that non-zero probability is, but that can also be approximated with some accuracy by sampling a little bit beyond the first record. If we are smart about how we sample and if we have an idea as to how big the actual resultset is, reasonably close approximation of the average value can be achieved and the probability of it being the true average can be computed with common techniques. Along the same lines, sum can be thought of as an integral over the shape approximated by the avg, which can also be approximated with some probability of being correct. Of course, there are many problems with the above from the statistical point of view. For one, resultsets are often ordered in some way, so sampling cannot be assumed to be random, which is not good. Anyway, I dont know if this is the right use case, but I really need aggregate functions for what I'm trying to do and right now I have to fire up a hadoop cluster to get simple aggregates computed, which is a major pain and takes forever. I'll give it a shot in my own code and see if I can come up with a reasonable approach. Perhaps others will see this discussion and suggest some ideas. Thanks, Anton Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, CASSANDRA-4914-V4.txt, CASSANDRA-4914-V5.txt, CASSANDRA-4914.txt The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323669#comment-14323669 ] Anton Slutsky commented on CASSANDRA-4914: -- Hello all, I noticed that some of the aggregate functions discussed on this thread made it into the trunk. I'm a little concerned with the implementation. It looks like aggregates, such as sum, avg, etc. are implemented in code by basically looping through the result set pages and computing the desired aggregates in code. I'm worried that, since Cassandra is meant for large volumes of data, this is not at all a feasible implementation for real world cases. I tried using avg on a more or less sizable dataset and observed two things -- first, my select statement would time out even with bumped up read timeout setting and second, CPU that's running the average computation is quite busy. Obviously, there's only so much that can be done in terms of computing these aggregates without resorting to some sort of distributed computation framework, but I'd like to suggest a slightly different approach. I wonder if we can just rethink how we think about aggregate functions in context of large data. Perhaps, what we could do is consider a probabilistic aggregates instead of raw computable ones? That is, instead of striving to compute an aggregate on an entire resultset, maybe we can compute the aggregate with a stated probability of that aggregate being true. For example: select probabilistic_avg(my_col) from my_table; would return something like a map: {avg:101.1, prob:0.78} where avg is our probabilistic avg and prob is the probability of it being what we say it is. Of course, that wont be as good as the real thing, but it still has value in many cases, I think. And it can be implemented in a scalable way with some scratch system tables. I'm happy to give it a stab if this is of interest to anyone. Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, CASSANDRA-4914-V4.txt, CASSANDRA-4914-V5.txt, CASSANDRA-4914.txt The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160217#comment-14160217 ] Sylvain Lebresne commented on CASSANDRA-4914: - The last patch looks almost good to go (provided it's rebased) for me but there seems to be some confusion around the {{reset}} method for {{Selection}}. You've removed the abstract one in {{Selection}} itself, but both {{SimpleSelection}} and {{SelectionWithFunctions}} still have a {{reset}} method which, as far as I can tell, is never called. And while the one in {{SimpleSelection}} does nothing, the one in {{SelectionWithFunctions}} seems to do something. I think what you'd want is add back the {{reset}} method to {{Selection}} but actually call it in {{ResultSetBuilder.newRow/build}} (as a nit, you can then move the {{current = null}} in {{SimpleSelection}} in the {{reset}} method for consistency). bq. Postgres's approach to user-defined aggregation functions is almost exactly this: http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html I agree that it's where we want to go (there was some discussion around this at the bootcamp that ended up with the same conclusion but it sound like I forgot to update JIRA with said conclusions). That said, this ticket is mostly about the internal interface for aggregation functions and as far as I can tell, nothing in the attached patch limit us from going with a reduce-style approach for UDF (it will be in fact rather simple). So let's maybe leave the discussion on UDF in CASSANDRA-8053. Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, CASSANDRA-4914-V4.txt, CASSANDRA-4914.txt The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158400#comment-14158400 ] Tyler Hobbs commented on CASSANDRA-4914: MySQL and SqlServer also return the same type as the argument and suggest casting to handle overflows (although casting is limited in mysql). I think casting is simple enough and returning the same type as the argument is the least surprising behavior, so my vote is for sticking with that. I agree that the behavior needs to be well-documented, though. Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, CASSANDRA-4914-V4.txt, CASSANDRA-4914.txt The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158569#comment-14158569 ] Tyler Hobbs commented on CASSANDRA-4914: I'm thinking a bit about making this compatible with with UDFs. The problem with this approach is that it relies on state that's not visible to the aggregation functions. An alternative that would be (more easily) compatible with UDFs is a reduce-style aggregation. The reducer function takes two inputs: the current state and the next value. You can optionally provide an initial state and a finalizer function that is called with the final state after reducing. UDTs, tuples, and collections should be sufficiently powerful to represent anything that's needed for state. In fact, Postgres's approach to user-defined aggregation functions is almost exactly this: http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html. I think we could slightly simplify their approach by inferring the data types. Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, CASSANDRA-4914-V4.txt, CASSANDRA-4914.txt The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158595#comment-14158595 ] Robert Stupp commented on CASSANDRA-4914: - bq. UDTs, tuples, and collections should be sufficiently powerful to represent anything that's needed for state Nice idea just to use existing (complex) types for state :) Cool - then we could simply use two UDAF function bodies. One for row processing and one for result. Think that's easy to understand for everybody :) Something like {code} CREATE AGGREGATION FUNCTION superDuperSum ( input double ) RETURNS double [STATE fooBarUDT] LANGUAGE javascript REDUCER $$...$$ [INITIALIZER $$...$$] [FINALIZER $$...$$] {code} (optionals enclosed with square brackets) Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, CASSANDRA-4914-V4.txt, CASSANDRA-4914.txt The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151074#comment-14151074 ] Robert Stupp commented on CASSANDRA-4914: - bq. aggregate functions do only return their input type ... Hm - I still think that we should be able to handle integer overflows - if for example people get a negative result if summing just positive values, they'll complain about it. Postgres for example [returns a bigint for int sums|http://www.postgresql.org/docs/9.1/static/functions-aggregate.html] - Oracle doesn't have any integer data type in tables (only that NUMBER data type). (But if that input-type==output-type behavior is clearly documented and people are able to do e.g. a {{SELECT sum( (varint) myIntCol ) FROM ...}} then that's fine for me - but I'm not sold on this.) Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, CASSANDRA-4914-V4.txt, CASSANDRA-4914.txt The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133128#comment-14133128 ] Robert Stupp commented on CASSANDRA-4914: - Nice work, [~blerer] :) * Regarding {{COUNT(\*)}}/{{COUNT(1)}} - can you change/extend that to also allow {{COUNT(any_column)}} or even {{COUNT(func_call)}}? The v3 patch does not handle this. * Can you add a test to check whether an aggregate works against a UDF and vice versa? E.g. some sin function: {{SELECT sin(max(a)) FROM foo}} or {{SELECT max(sin(a)) FROM foo}} * The aggregate functions do only return their input type. Means: overflows are not handled. It's not a problem for the {{BigDecimal}}/{{BigInteger}} implementations but for all the others. Maybe it's better to do all sum+avg on the {{BigDecimal}}/{{BigInteger}} types and always return them - although it's slower, the result will be correct. As a nice side effect it could be possible to reduce the number of avg/sum implementations and let them extend some few impls. * The implementations of the aggregate functions are sometimes declared using {{public static final}} and sometimes using {{public static}} - nice to have: make them look equal Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, CASSANDRA-4914.txt The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133395#comment-14133395 ] Benjamin Lerer commented on CASSANDRA-4914: --- bq. Nice work, Benjamin Lerer :-) Thanks Robert. bq. Regarding COUNT(*)/COUNT(1) - can you change/extend that to also allow COUNT(any_column) or even COUNT(func_call)? The v3 patch does not handle this. It is already there. See last line of AggregationTest.testFunctions() bq.Can you add a test to check whether an aggregate works against a UDF and vice versa? E.g. some sin function: SELECT sin(max(a)) FROM foo or SELECT max(sin(a)) FROM foo The first form is already there. See AggregationTest.testNestedFunctions(). I will add the second form. bq. The aggregate functions do only return their input type. Means: overflows are not handled. It's not a problem for the BigDecimal/BigInteger implementations but for all the others. Maybe it's better to do all sum+avg on the BigDecimal/BigInteger types and always return them - although it's slower, the result will be correct. As a nice side effect it could be possible to reduce the number of avg/sum implementations and let them extend some few impls. See first feed back from Sylvain bq.The implementations of the aggregate functions are sometimes declared using public static final and sometimes using public static - nice to have: make them look equal I will fix it. Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, CASSANDRA-4914.txt The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132148#comment-14132148 ] Sylvain Lebresne commented on CASSANDRA-4914: - The general approach lgtm but a bunch of remarks: * I'm not sure about the sum and avg return type. For sum, using {{bigint}} whatever the input type is is wrong for at least {{varint}}. We could use {{varint}} as a return type instead, but I wonder if it's really that convenient. Typically, in the java driver, {{varint}} is mapped to {{BigInteger}} and I'm not sure always returning that even when summing {{int}} is what people wants in general. What I suggest is to have {{sum}} and {{avg}} method always return the same type than their argument, and that we allow users to type-cast the argument if they want a greater precision that the input type (from a quick look at their doc, that seems to be what oracle does, haven't checked other). * I'd rather get rid of the specific code for {{COUNT(*)}} in SelectStatement. It sounds to me that we could just have a specific no-arg function for that (we could still have some special casing in the parser). * In {{ScalarFunctionSelector.isAggregate}}, we shouldn't assume that the function has one argument, it can have none (this is currently breaking a UFTest unit test). Also, I believe something like: {noformat} SELECT max(a), add(max(b), c) FROM .. {noformat} is not rejected (because {{ScalarFunctionSelector}} only even check it's first argument). I'll note though that it could make sense to allow: {noformat} SELECT max(a), log(max(b), 2) FROM .. {noformat} so maybe we should (not that it's a big deal if we don't, but unless I'm wrong, it's reasonably trivial to so ...). As an aside, a comment on why {{ScalarFunctionSector}} don't just return true for {{isAggregate}} could be useful here. * Since Selector has a reset() method, why not use that to reset the {{current}} value for {{SimpleSelection}} and others? * I'd have a preference for making ScalarFunction/AggregateFunction be abstract classes and have them implement isAggregate (instead of having the NativeScalarFunction/NativeAggretationFunction). * Needs to switch a 'assertInvalid' to 'assertInvalidSyntax' in {{AggregationTest.testInvalidCalls}}. A couple of nitpicks too: * There is a typo in the error message in {{AggregateFunctionSelector.validateArguments}}. * Regarding code style, we don't put @Override when implementing an interface (AggregateFcts is full of them). * I believe {{Selection.reset()}} is unused. Maybe we can remove it for now until we actually need it. * I'd shorten {{SelectionWithFunctions.validateSelectors}} to something like {noformat} int aggregations = 0; for (Select s : selectors) if (s.isAggregate()) ++aggregations; if (aggregations != 0 aggregations != selectors.size()) throw new InvalidRequestException(the select clause must either contains only aggregations or none); {noformat} Feels like too many ifs for what it does currently. Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914.txt The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112098#comment-14112098 ] Sylvain Lebresne commented on CASSANDRA-4914: - I have not yet reviewed this, but at very quick glance, I see nothing that handles the paging of aggregations: we don't want to load everything in memory with a single query. I'll note that we already support one hard-coded aggregation function, COUNT, and you can look at how it's handled by SelectStatement to see what I mean by paging internally. Ideally the patch for this should replace this currently hardcoded COUNT to use the general mechanism for aggregation. Also, we also want to support syntax like {{SELECT AVG(*) FROM ...}} which I don't think this patch handles (since it doesn't modify the parser and it would need to). Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-4914.txt The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112192#comment-14112192 ] Brian Hess commented on CASSANDRA-4914: [~slebresne] - what does SELECT AVG(*) FROM supposed to mean? If I have a table called MYDATA with schema (pkey INT, ccol INT, x INT, y INT, PRIMARY KEY ((pkey), ccol)), then what would SELECT AVG(*) FROM mydata WHERE pkey=1 AND ccol=1 return? I think you need to include a column to AVG, like SELECT AVG(x) FROM mydata WHERE pkey=1 AND ccol=1 or SELECT AVG(x), AVG(y) FROM mydata WHERE pkey=1 AND ccol=1. COUNT() is a special case. COUNT(*) really is the same as COUNT(1), because you are just counting the number of rows. Note, though, that COUNT(x) is not the same as COUNT(*) since COUNT() will not count things that are NULL, so COUNT(*) will count all rows but COUNT(x) will only count rows where X is not NULL. At least, according to SQL. Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-4914.txt The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112210#comment-14112210 ] Sylvain Lebresne commented on CASSANDRA-4914: - bq. what does SELECT AVG(\*) FROM supposed to mean? Nothing, that was a brain fart, forget about that part. Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-4914.txt The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100672#comment-14100672 ] Sylvain Lebresne commented on CASSANDRA-4914: - Updating the title to refect that for this ticket we'll stick to aggregation functions. We've already introduced some UDFs in CASSANDRA-7395 so we'll want to try to reuse as much as possible here in term of syntax, though obviously the interface for aggregation functions will be slightly different. Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql Fix For: 3.0 The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.2#6252)