[jira] [Commented] (CASSANDRA-18060) Add aggregation scalar functions on collections

Jira Mon, 05 Dec 2022 02:51:11 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643262#comment-17643262
 ]


Andres de la Peña commented on CASSANDRA-18060:
-----------------------------------------------

Using {{size}} for collections to distinguish is similar to what MySQL does 
with {{{}MAX{}}}/{{{}GREATEST{}}}, looking a kind of synonym to distinguish 
between analogous functions applied to different data/element types. The 
long-term problem with that strategy is that it's easy to run out of synonyms 
on some cases, not to mention that {{size}} is a very generic name that could 
be also be used for other things like calculating the size in bytes of any 
column.

In contrast, using the {{collection_}} prefix avoids the need for looking for 
synonyms, the function selection doesn't depend on the column type, and it 
makes explicit the analogy between aggregate functions and their collection 
counterparts.

There is also CASSANDRA-18085, which was proposed by [~yifanc] and [~frankgh]. 
That would make collection functions work of not-collection elements, 
considering them as singleton collections, so for example:
 * collection_min(7) = collection_min([7]) = collection_min(\{7}) = 7
 * collection_max(7) = collection_max([7]) = collection_max(\{7}) = 7

That is particularly useful in combination with {{{}writetime{}}}/{{{}ttl{}}} 
functions, that can return either a list, a set or a single number. The ability 
to read single elements as collections would allow users to select, for 
example: 
{code}
collection_max(writetime(column))
{code}
without needing to know the type of the column. This however won't work if we 
use the same {{max}} name for both aggregate and collection functions.

> Add aggregation scalar functions on collections
> -----------------------------------------------
>
>                 Key: CASSANDRA-18060
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18060
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: CQL/Semantics
>            Reporter: Andres de la Peña
>            Assignee: Andres de la Peña
>            Priority: Normal
>             Fix For: 4.2
>
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The new mechanism for dynamically building native functions introduced by 
> CASSANDRA-17811 can be used to provide within-collection aggregation 
> functions. We can use that mechanism to add new CQL functions to get:
>  * The number of items in a collection.
>  * The max/min items of a collection.
>  * The sum/avg of the items of a numeric collection.
>  * The keys or the values of a map.
> For example:
> {code:java}
> CREATE TABLE k.t (k int PRIMARY KEY, l list<int>, m map<int, int>);
> INSERT INTO t(k, l, m) VALUES (0, [1, 2, 3], {1:10, 2:20, 3:30});
> > SELECT map_keys(m), map_values(m) FROM t;
>  system.map_keys(m) | system.map_values(m)
> --------------------+----------------------
>           {1, 2, 3} |         [10, 20, 30]
> > SELECT collection_count(m), collection_count(l) FROM t;
>  system.collection_count(m) | system.collection_count(l)
> ----------------------------+----------------------------
>                           3 |                          3
> > SELECT collection_min(l), collection_max(l) FROM t;
>  system.collection_min(l) | system.collection_max(l)
> --------------------------+--------------------------
>                         1 |                        3
> > SELECT collection_sum(l), collection_avg(l) FROM t;
>  system.collection_sum(l) | system.collection_avg(l)
> --------------------------+--------------------------
>                         6 |                        2
> {code}
> Note that this type of aggregation is different from the kind of aggregation 
> provided by {{min}}, {{max}}, {{sum}} and {{avg}}, which aggregate entire 
> collections across rows. Here we only aggregate the items of a collection row 
> per row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18060) Add aggregation scalar functions on collections

Reply via email to