[jira] [Comment Edited] (CASSANDRA-18060) Add aggregation scalar functions on collections

Jira Tue, 22 Nov 2022 05:23:04 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17637255#comment-17637255
 ]


Andres de la Peña edited comment on CASSANDRA-18060 at 11/22/22 1:22 PM:
-------------------------------------------------------------------------

Thanks for the review.

The test failure [can be reproduced on 
trunk|https://app.circleci.com/pipelines/github/adelapena/cassandra/2511/workflows/7aba8baa-0a6d-404a-b08b-c6a8078caca3/jobs/24706/tests]
 with the multiplexer. Just opened CASSANDRA-18065 for it.

I have added the suggested JavaDoc and documentation for the already existing 
{{sum}} and {{avg}} functions, and for the new {{collection_sum}} and 
{{{}collection_avg{}}}.

Regarding followup tickets, I can add one for {{avg}} returning {{NaN}} instead 
of zero, but I don't know if it's too late for that since the current behaviour 
has been around for ages.

As for overflows and truncated decimals, we'll definitively need a followup 
ticket. The {{sum}} and {{avg}} functions can walk around this by using type 
casting, for example:
{code:java}
CREATE TABLE t (k int PRIMARY KEY, v int);
SELECT sum(cast(v AS varint)) FROM t;
SELECT avg(cast(v AS float)) FROM t;
{code}
However, we don't have such type of casting for collections, so we could have a 
followup ticket for adding that feature, so we can do something like:
{code:java}
CREATE TABLE t (k int PRIMARY KEY, v list<int>);
SELECT collection_sum(cast(v AS list<varint>)) FROM t;
SELECT collection_avg(cast(v AS lsit<float>)) FROM t;
{code}
CC [~blerer]


was (Author: adelapena):
Thanks for the review.

The test failure [can be reproduced on 
trunk|https://app.circleci.com/pipelines/github/adelapena/cassandra/2511/workflows/7aba8baa-0a6d-404a-b08b-c6a8078caca3/jobs/24706/tests]
 with the multiplexer. Just opened CASSANDRA-18065 for it.

I' have added the suggested JavaDoc and documentation for the already existing 
{{sum}} and {{avg}} functions, and for the new {{collection_sum}} and 
{{{}collection_avg{}}}.

Regarding followup tickets, I can add one for {{avg}} returning {{NaN}} instead 
of zero, but I don't know if it's too late for that since the current behaviour 
has been around for ages.

As for overflows and truncated decimals, we'll definitively need a followup 
ticket. The {{sum}} and {{avg}} functions can walk around this by using type 
casting, for example:
{code:java}
CREATE TABLE t (k int PRIMARY KEY, v int);
SELECT sum(cast(v AS varint)) FROM t;
SELECT avg(cast(v AS float)) FROM t;
{code}
However, we don't have such type of casting for collections, so we could have a 
followup ticket for adding that feature, so we can do something like:
{code:java}
CREATE TABLE t (k int PRIMARY KEY, v list<int>);
SELECT collection_sum(cast(v AS list<varint>)) FROM t;
SELECT collection_avg(cast(v AS lsit<float>)) FROM t;
{code}
CC [~blerer]

> Add aggregation scalar functions on collections
> -----------------------------------------------
>
>                 Key: CASSANDRA-18060
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18060
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: CQL/Semantics
>            Reporter: Andres de la Peña
>            Assignee: Andres de la Peña
>            Priority: Normal
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> The new mechanism for dynamically building native functions introduced by 
> CASSANDRA-17811 can be used to provide within-collection aggregation 
> functions. We can use that mechanism to add new CQL functions to get:
>  * The number of items in a collection.
>  * The max/min items of a collection.
>  * The sum/avg of the items of a numeric collection.
>  * The keys or the values of a map.
> For example:
> {code:java}
> CREATE TABLE k.t (k int PRIMARY KEY, l list<int>, m map<int, int>);
> INSERT INTO t(k, l, m) VALUES (0, [1, 2, 3], {1:10, 2:20, 3:30});
> > SELECT map_keys(m), map_values(m) FROM t;
>  system.map_keys(m) | system.map_values(m)
> --------------------+----------------------
>           {1, 2, 3} |         [10, 20, 30]
> > SELECT collection_count(m), collection_count(l) FROM t;
>  system.collection_count(m) | system.collection_count(l)
> ----------------------------+----------------------------
>                           3 |                          3
> > SELECT collection_min(l), collection_max(l) FROM t;
>  system.collection_min(l) | system.collection_max(l)
> --------------------------+--------------------------
>                         1 |                        3
> > SELECT collection_sum(l), collection_avg(l) FROM t;
>  system.collection_sum(l) | system.collection_avg(l)
> --------------------------+--------------------------
>                         6 |                        2
> {code}
> Note that this type of aggregation is different from the kind of aggregation 
> provided by {{min}}, {{max}}, {{sum}} and {{avg}}, which aggregate entire 
> collections across rows. Here we only aggregate the items of a collection row 
> per row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-18060) Add aggregation scalar functions on collections

Reply via email to