[jira] [Commented] (CASSANDRA-18060) Add aggregation scalar functions on collections

2022-12-06 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643873#comment-17643873
 ] 

Andres de la Peña commented on CASSANDRA-18060:
---

Just started a thread on the mail list: 
https://lists.apache.org/thread/nkm04oo9m60o7dcjzyrll7rzdqzzo1gy

> Add aggregation scalar functions on collections
> ---
>
> Key: CASSANDRA-18060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18060
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL/Semantics
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The new mechanism for dynamically building native functions introduced by 
> CASSANDRA-17811 can be used to provide within-collection aggregation 
> functions. We can use that mechanism to add new CQL functions to get:
>  * The number of items in a collection.
>  * The max/min items of a collection.
>  * The sum/avg of the items of a numeric collection.
>  * The keys or the values of a map.
> For example:
> {code:java}
> CREATE TABLE k.t (k int PRIMARY KEY, l list, m map);
> INSERT INTO t(k, l, m) VALUES (0, [1, 2, 3], {1:10, 2:20, 3:30});
> > SELECT map_keys(m), map_values(m) FROM t;
>  system.map_keys(m) | system.map_values(m)
> +--
>   {1, 2, 3} | [10, 20, 30]
> > SELECT collection_count(m), collection_count(l) FROM t;
>  system.collection_count(m) | system.collection_count(l)
> +
>   3 |  3
> > SELECT collection_min(l), collection_max(l) FROM t;
>  system.collection_min(l) | system.collection_max(l)
> --+--
> 1 |3
> > SELECT collection_sum(l), collection_avg(l) FROM t;
>  system.collection_sum(l) | system.collection_avg(l)
> --+--
> 6 |2
> {code}
> Note that this type of aggregation is different from the kind of aggregation 
> provided by {{min}}, {{max}}, {{sum}} and {{avg}}, which aggregate entire 
> collections across rows. Here we only aggregate the items of a collection row 
> per row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18060) Add aggregation scalar functions on collections

2022-12-05 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643262#comment-17643262
 ] 

Andres de la Peña commented on CASSANDRA-18060:
---

Using {{size}} for collections to distinguish is similar to what MySQL does 
with {{{}MAX{}}}/{{{}GREATEST{}}}, looking a kind of synonym to distinguish 
between analogous functions applied to different data/element types. The 
long-term problem with that strategy is that it's easy to run out of synonyms 
on some cases, not to mention that {{size}} is a very generic name that could 
be also be used for other things like calculating the size in bytes of any 
column.

In contrast, using the {{collection_}} prefix avoids the need for looking for 
synonyms, the function selection doesn't depend on the column type, and it 
makes explicit the analogy between aggregate functions and their collection 
counterparts.

There is also CASSANDRA-18085, which was proposed by [~yifanc] and [~frankgh]. 
That would make collection functions work of not-collection elements, 
considering them as singleton collections, so for example:
 * collection_min(7) = collection_min([7]) = collection_min(\{7}) = 7
 * collection_max(7) = collection_max([7]) = collection_max(\{7}) = 7

That is particularly useful in combination with {{{}writetime{}}}/{{{}ttl{}}} 
functions, that can return either a list, a set or a single number. The ability 
to read single elements as collections would allow users to select, for 
example: 
{code}
collection_max(writetime(column))
{code}
without needing to know the type of the column. This however won't work if we 
use the same {{max}} name for both aggregate and collection functions.

> Add aggregation scalar functions on collections
> ---
>
> Key: CASSANDRA-18060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18060
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL/Semantics
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The new mechanism for dynamically building native functions introduced by 
> CASSANDRA-17811 can be used to provide within-collection aggregation 
> functions. We can use that mechanism to add new CQL functions to get:
>  * The number of items in a collection.
>  * The max/min items of a collection.
>  * The sum/avg of the items of a numeric collection.
>  * The keys or the values of a map.
> For example:
> {code:java}
> CREATE TABLE k.t (k int PRIMARY KEY, l list, m map);
> INSERT INTO t(k, l, m) VALUES (0, [1, 2, 3], {1:10, 2:20, 3:30});
> > SELECT map_keys(m), map_values(m) FROM t;
>  system.map_keys(m) | system.map_values(m)
> +--
>   {1, 2, 3} | [10, 20, 30]
> > SELECT collection_count(m), collection_count(l) FROM t;
>  system.collection_count(m) | system.collection_count(l)
> +
>   3 |  3
> > SELECT collection_min(l), collection_max(l) FROM t;
>  system.collection_min(l) | system.collection_max(l)
> --+--
> 1 |3
> > SELECT collection_sum(l), collection_avg(l) FROM t;
>  system.collection_sum(l) | system.collection_avg(l)
> --+--
> 6 |2
> {code}
> Note that this type of aggregation is different from the kind of aggregation 
> provided by {{min}}, {{max}}, {{sum}} and {{avg}}, which aggregate entire 
> collections across rows. Here we only aggregate the items of a collection row 
> per row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18060) Add aggregation scalar functions on collections

2022-12-02 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642603#comment-17642603
 ] 

Benedict Elliott Smith commented on CASSANDRA-18060:


bq. we can't ignore the ambiguity in the count function.

I think {{size}} is perhaps a more natural operator name for collections anyway.

I think anyway these decisions should be escalated to a DISCUSS thread. There's 
a lot of related tickets with decisions being made that should have broader 
community input to ensure there's consensus on the direction of language 
evolution.

> Add aggregation scalar functions on collections
> ---
>
> Key: CASSANDRA-18060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18060
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL/Semantics
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The new mechanism for dynamically building native functions introduced by 
> CASSANDRA-17811 can be used to provide within-collection aggregation 
> functions. We can use that mechanism to add new CQL functions to get:
>  * The number of items in a collection.
>  * The max/min items of a collection.
>  * The sum/avg of the items of a numeric collection.
>  * The keys or the values of a map.
> For example:
> {code:java}
> CREATE TABLE k.t (k int PRIMARY KEY, l list, m map);
> INSERT INTO t(k, l, m) VALUES (0, [1, 2, 3], {1:10, 2:20, 3:30});
> > SELECT map_keys(m), map_values(m) FROM t;
>  system.map_keys(m) | system.map_values(m)
> +--
>   {1, 2, 3} | [10, 20, 30]
> > SELECT collection_count(m), collection_count(l) FROM t;
>  system.collection_count(m) | system.collection_count(l)
> +
>   3 |  3
> > SELECT collection_min(l), collection_max(l) FROM t;
>  system.collection_min(l) | system.collection_max(l)
> --+--
> 1 |3
> > SELECT collection_sum(l), collection_avg(l) FROM t;
>  system.collection_sum(l) | system.collection_avg(l)
> --+--
> 6 |2
> {code}
> Note that this type of aggregation is different from the kind of aggregation 
> provided by {{min}}, {{max}}, {{sum}} and {{avg}}, which aggregate entire 
> collections across rows. Here we only aggregate the items of a collection row 
> per row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18060) Add aggregation scalar functions on collections

2022-12-02 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642592#comment-17642592
 ] 

Andres de la Peña commented on CASSANDRA-18060:
---

The ticket fixing that is mentioned on the description of this ticket, it's 
CASSANDRA-17811. You can see that what you are proposing is mentioned on the 
description of that ticket, but it was discarded during the 3-month review 
period.

Regarding the names of the functions, even if we selectively ignore backwards 
compatibility for max/min, we can't ignore the ambiguity in the {{count}} 
function.

CC [~blerer] 

> Add aggregation scalar functions on collections
> ---
>
> Key: CASSANDRA-18060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18060
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL/Semantics
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The new mechanism for dynamically building native functions introduced by 
> CASSANDRA-17811 can be used to provide within-collection aggregation 
> functions. We can use that mechanism to add new CQL functions to get:
>  * The number of items in a collection.
>  * The max/min items of a collection.
>  * The sum/avg of the items of a numeric collection.
>  * The keys or the values of a map.
> For example:
> {code:java}
> CREATE TABLE k.t (k int PRIMARY KEY, l list, m map);
> INSERT INTO t(k, l, m) VALUES (0, [1, 2, 3], {1:10, 2:20, 3:30});
> > SELECT map_keys(m), map_values(m) FROM t;
>  system.map_keys(m) | system.map_values(m)
> +--
>   {1, 2, 3} | [10, 20, 30]
> > SELECT collection_count(m), collection_count(l) FROM t;
>  system.collection_count(m) | system.collection_count(l)
> +
>   3 |  3
> > SELECT collection_min(l), collection_max(l) FROM t;
>  system.collection_min(l) | system.collection_max(l)
> --+--
> 1 |3
> > SELECT collection_sum(l), collection_avg(l) FROM t;
>  system.collection_sum(l) | system.collection_avg(l)
> --+--
> 6 |2
> {code}
> Note that this type of aggregation is different from the kind of aggregation 
> provided by {{min}}, {{max}}, {{sum}} and {{avg}}, which aggregate entire 
> collections across rows. Here we only aggregate the items of a collection row 
> per row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18060) Add aggregation scalar functions on collections

2022-12-02 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642586#comment-17642586
 ] 

Benedict Elliott Smith commented on CASSANDRA-18060:


When was this bug fixed? I’d propose that was an opportunity to deprecate this 
rather than fix it, as I don’t believe this is a meaningful operation over 
collections. That would also permit us to use the same function names here more 
intuitively, using eg size for collections.

I have had a peek at MySQL and Postgres and they seem to also have a very 
confused history here so we won’t be unique in that respect, though I don’t 
believe the MySQL Greatest function is equivalent judging by the docs.

> Add aggregation scalar functions on collections
> ---
>
> Key: CASSANDRA-18060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18060
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL/Semantics
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The new mechanism for dynamically building native functions introduced by 
> CASSANDRA-17811 can be used to provide within-collection aggregation 
> functions. We can use that mechanism to add new CQL functions to get:
>  * The number of items in a collection.
>  * The max/min items of a collection.
>  * The sum/avg of the items of a numeric collection.
>  * The keys or the values of a map.
> For example:
> {code:java}
> CREATE TABLE k.t (k int PRIMARY KEY, l list, m map);
> INSERT INTO t(k, l, m) VALUES (0, [1, 2, 3], {1:10, 2:20, 3:30});
> > SELECT map_keys(m), map_values(m) FROM t;
>  system.map_keys(m) | system.map_values(m)
> +--
>   {1, 2, 3} | [10, 20, 30]
> > SELECT collection_count(m), collection_count(l) FROM t;
>  system.collection_count(m) | system.collection_count(l)
> +
>   3 |  3
> > SELECT collection_min(l), collection_max(l) FROM t;
>  system.collection_min(l) | system.collection_max(l)
> --+--
> 1 |3
> > SELECT collection_sum(l), collection_avg(l) FROM t;
>  system.collection_sum(l) | system.collection_avg(l)
> --+--
> 6 |2
> {code}
> Note that this type of aggregation is different from the kind of aggregation 
> provided by {{min}}, {{max}}, {{sum}} and {{avg}}, which aggregate entire 
> collections across rows. Here we only aggregate the items of a collection row 
> per row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18060) Add aggregation scalar functions on collections

2022-12-02 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642583#comment-17642583
 ] 

Andres de la Peña commented on CASSANDRA-18060:
---

The {{max}} function has accepted collections compared according to their 
comparator since CASSANDRA-4914, merged to 2.2, although due to a bug it was 
returning its results in serialized form. The {{count}} function can also be 
applied to any column, including collections, and it should have been there for 
even longer.

> Add aggregation scalar functions on collections
> ---
>
> Key: CASSANDRA-18060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18060
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL/Semantics
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The new mechanism for dynamically building native functions introduced by 
> CASSANDRA-17811 can be used to provide within-collection aggregation 
> functions. We can use that mechanism to add new CQL functions to get:
>  * The number of items in a collection.
>  * The max/min items of a collection.
>  * The sum/avg of the items of a numeric collection.
>  * The keys or the values of a map.
> For example:
> {code:java}
> CREATE TABLE k.t (k int PRIMARY KEY, l list, m map);
> INSERT INTO t(k, l, m) VALUES (0, [1, 2, 3], {1:10, 2:20, 3:30});
> > SELECT map_keys(m), map_values(m) FROM t;
>  system.map_keys(m) | system.map_values(m)
> +--
>   {1, 2, 3} | [10, 20, 30]
> > SELECT collection_count(m), collection_count(l) FROM t;
>  system.collection_count(m) | system.collection_count(l)
> +
>   3 |  3
> > SELECT collection_min(l), collection_max(l) FROM t;
>  system.collection_min(l) | system.collection_max(l)
> --+--
> 1 |3
> > SELECT collection_sum(l), collection_avg(l) FROM t;
>  system.collection_sum(l) | system.collection_avg(l)
> --+--
> 6 |2
> {code}
> Note that this type of aggregation is different from the kind of aggregation 
> provided by {{min}}, {{max}}, {{sum}} and {{avg}}, which aggregate entire 
> collections across rows. Here we only aggregate the items of a collection row 
> per row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18060) Add aggregation scalar functions on collections

2022-12-02 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642573#comment-17642573
 ] 

Benedict Elliott Smith commented on CASSANDRA-18060:


Hold up. We permit {{max}} to operate over collections and select between the 
collections based on our internal comparator? When did we decide to do this, 
and where is this implemented? {{makeMaxFunction}} appears to operate only on 
native CQL types.

> Add aggregation scalar functions on collections
> ---
>
> Key: CASSANDRA-18060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18060
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL/Semantics
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The new mechanism for dynamically building native functions introduced by 
> CASSANDRA-17811 can be used to provide within-collection aggregation 
> functions. We can use that mechanism to add new CQL functions to get:
>  * The number of items in a collection.
>  * The max/min items of a collection.
>  * The sum/avg of the items of a numeric collection.
>  * The keys or the values of a map.
> For example:
> {code:java}
> CREATE TABLE k.t (k int PRIMARY KEY, l list, m map);
> INSERT INTO t(k, l, m) VALUES (0, [1, 2, 3], {1:10, 2:20, 3:30});
> > SELECT map_keys(m), map_values(m) FROM t;
>  system.map_keys(m) | system.map_values(m)
> +--
>   {1, 2, 3} | [10, 20, 30]
> > SELECT collection_count(m), collection_count(l) FROM t;
>  system.collection_count(m) | system.collection_count(l)
> +
>   3 |  3
> > SELECT collection_min(l), collection_max(l) FROM t;
>  system.collection_min(l) | system.collection_max(l)
> --+--
> 1 |3
> > SELECT collection_sum(l), collection_avg(l) FROM t;
>  system.collection_sum(l) | system.collection_avg(l)
> --+--
> 6 |2
> {code}
> Note that this type of aggregation is different from the kind of aggregation 
> provided by {{min}}, {{max}}, {{sum}} and {{avg}}, which aggregate entire 
> collections across rows. Here we only aggregate the items of a collection row 
> per row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18060) Add aggregation scalar functions on collections

2022-12-02 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642564#comment-17642564
 ] 

Andres de la Peña commented on CASSANDRA-18060:
---

Sure, let's see an example of how the previously existing aggregate functions 
and the new collection functions are different:
{code:java}
> CREATE TABLE t (k int PRIMARY KEY, l list);
> INSERT INTO t(k, l) VALUES (1, [1, 2, 3]);
> INSERT INTO t(k, l) VALUES (2, [4, 5, 6]);
> SELECT count(l), max(l) FROM t;

 system.count(l) | system.max(l)
-+---
   2 | [4, 5, 6]

> SELECT collection_count(l), collection_max(l) FROM t;

 system.collection_count(l) | system.collection_max(l)
+--
  3 |3
  3 |6
{code}
We need different function names to determine what type of functions we want to 
run.

Note that this is similar for example to the difference between MySQL's [MAX 
function|https://www.w3schools.com/sql/func_mysql_max.asp] and 
[GREATEST|https://www.w3schools.com/sql/func_mysql_greatest.asp], although here 
we work with collections. Having used the nice name for aggregates, we need to 
figure out a more convoluted name for the row-per-row functions.

> Add aggregation scalar functions on collections
> ---
>
> Key: CASSANDRA-18060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18060
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL/Semantics
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The new mechanism for dynamically building native functions introduced by 
> CASSANDRA-17811 can be used to provide within-collection aggregation 
> functions. We can use that mechanism to add new CQL functions to get:
>  * The number of items in a collection.
>  * The max/min items of a collection.
>  * The sum/avg of the items of a numeric collection.
>  * The keys or the values of a map.
> For example:
> {code:java}
> CREATE TABLE k.t (k int PRIMARY KEY, l list, m map);
> INSERT INTO t(k, l, m) VALUES (0, [1, 2, 3], {1:10, 2:20, 3:30});
> > SELECT map_keys(m), map_values(m) FROM t;
>  system.map_keys(m) | system.map_values(m)
> +--
>   {1, 2, 3} | [10, 20, 30]
> > SELECT collection_count(m), collection_count(l) FROM t;
>  system.collection_count(m) | system.collection_count(l)
> +
>   3 |  3
> > SELECT collection_min(l), collection_max(l) FROM t;
>  system.collection_min(l) | system.collection_max(l)
> --+--
> 1 |3
> > SELECT collection_sum(l), collection_avg(l) FROM t;
>  system.collection_sum(l) | system.collection_avg(l)
> --+--
> 6 |2
> {code}
> Note that this type of aggregation is different from the kind of aggregation 
> provided by {{min}}, {{max}}, {{sum}} and {{avg}}, which aggregate entire 
> collections across rows. Here we only aggregate the items of a collection row 
> per row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18060) Add aggregation scalar functions on collections

2022-12-02 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642556#comment-17642556
 ] 

Benedict Elliott Smith commented on CASSANDRA-18060:


Could you outline one of the ambiguous situations? I would expect that between 
the type of the column and whether a group-by is in use we would be able to 
unambiguously figure out what's being requested.

> Add aggregation scalar functions on collections
> ---
>
> Key: CASSANDRA-18060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18060
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL/Semantics
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The new mechanism for dynamically building native functions introduced by 
> CASSANDRA-17811 can be used to provide within-collection aggregation 
> functions. We can use that mechanism to add new CQL functions to get:
>  * The number of items in a collection.
>  * The max/min items of a collection.
>  * The sum/avg of the items of a numeric collection.
>  * The keys or the values of a map.
> For example:
> {code:java}
> CREATE TABLE k.t (k int PRIMARY KEY, l list, m map);
> INSERT INTO t(k, l, m) VALUES (0, [1, 2, 3], {1:10, 2:20, 3:30});
> > SELECT map_keys(m), map_values(m) FROM t;
>  system.map_keys(m) | system.map_values(m)
> +--
>   {1, 2, 3} | [10, 20, 30]
> > SELECT collection_count(m), collection_count(l) FROM t;
>  system.collection_count(m) | system.collection_count(l)
> +
>   3 |  3
> > SELECT collection_min(l), collection_max(l) FROM t;
>  system.collection_min(l) | system.collection_max(l)
> --+--
> 1 |3
> > SELECT collection_sum(l), collection_avg(l) FROM t;
>  system.collection_sum(l) | system.collection_avg(l)
> --+--
> 6 |2
> {code}
> Note that this type of aggregation is different from the kind of aggregation 
> provided by {{min}}, {{max}}, {{sum}} and {{avg}}, which aggregate entire 
> collections across rows. Here we only aggregate the items of a collection row 
> per row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18060) Add aggregation scalar functions on collections

2022-12-02 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642551#comment-17642551
 ] 

Andres de la Peña commented on CASSANDRA-18060:
---

Semantic, so we can distinguish between the max, min, sum, avg and count 
aggregate functions that aggregate rows, and the new functions that aggregate 
the collection elements row per row. Otherwise, there can be ambiguous 
situations.

> Add aggregation scalar functions on collections
> ---
>
> Key: CASSANDRA-18060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18060
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL/Semantics
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The new mechanism for dynamically building native functions introduced by 
> CASSANDRA-17811 can be used to provide within-collection aggregation 
> functions. We can use that mechanism to add new CQL functions to get:
>  * The number of items in a collection.
>  * The max/min items of a collection.
>  * The sum/avg of the items of a numeric collection.
>  * The keys or the values of a map.
> For example:
> {code:java}
> CREATE TABLE k.t (k int PRIMARY KEY, l list, m map);
> INSERT INTO t(k, l, m) VALUES (0, [1, 2, 3], {1:10, 2:20, 3:30});
> > SELECT map_keys(m), map_values(m) FROM t;
>  system.map_keys(m) | system.map_values(m)
> +--
>   {1, 2, 3} | [10, 20, 30]
> > SELECT collection_count(m), collection_count(l) FROM t;
>  system.collection_count(m) | system.collection_count(l)
> +
>   3 |  3
> > SELECT collection_min(l), collection_max(l) FROM t;
>  system.collection_min(l) | system.collection_max(l)
> --+--
> 1 |3
> > SELECT collection_sum(l), collection_avg(l) FROM t;
>  system.collection_sum(l) | system.collection_avg(l)
> --+--
> 6 |2
> {code}
> Note that this type of aggregation is different from the kind of aggregation 
> provided by {{min}}, {{max}}, {{sum}} and {{avg}}, which aggregate entire 
> collections across rows. Here we only aggregate the items of a collection row 
> per row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18060) Add aggregation scalar functions on collections

2022-12-02 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642544#comment-17642544
 ] 

Benedict Elliott Smith commented on CASSANDRA-18060:


Did you reject using the same function names as their simple aggregation 
variants for implementation reasons, or for semantic reasons?

> Add aggregation scalar functions on collections
> ---
>
> Key: CASSANDRA-18060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18060
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL/Semantics
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The new mechanism for dynamically building native functions introduced by 
> CASSANDRA-17811 can be used to provide within-collection aggregation 
> functions. We can use that mechanism to add new CQL functions to get:
>  * The number of items in a collection.
>  * The max/min items of a collection.
>  * The sum/avg of the items of a numeric collection.
>  * The keys or the values of a map.
> For example:
> {code:java}
> CREATE TABLE k.t (k int PRIMARY KEY, l list, m map);
> INSERT INTO t(k, l, m) VALUES (0, [1, 2, 3], {1:10, 2:20, 3:30});
> > SELECT map_keys(m), map_values(m) FROM t;
>  system.map_keys(m) | system.map_values(m)
> +--
>   {1, 2, 3} | [10, 20, 30]
> > SELECT collection_count(m), collection_count(l) FROM t;
>  system.collection_count(m) | system.collection_count(l)
> +
>   3 |  3
> > SELECT collection_min(l), collection_max(l) FROM t;
>  system.collection_min(l) | system.collection_max(l)
> --+--
> 1 |3
> > SELECT collection_sum(l), collection_avg(l) FROM t;
>  system.collection_sum(l) | system.collection_avg(l)
> --+--
> 6 |2
> {code}
> Note that this type of aggregation is different from the kind of aggregation 
> provided by {{min}}, {{max}}, {{sum}} and {{avg}}, which aggregate entire 
> collections across rows. Here we only aggregate the items of a collection row 
> per row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18060) Add aggregation scalar functions on collections

2022-11-25 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638672#comment-17638672
 ] 

Andres de la Peña commented on CASSANDRA-18060:
---

Committed to trunk as 
[b7c7972a51ab6be6e5f410d2b12c770f5b7ebc98|https://github.com/apache/cassandra/commit/b7c7972a51ab6be6e5f410d2b12c770f5b7ebc98].

Thanks for the reviews.

> Add aggregation scalar functions on collections
> ---
>
> Key: CASSANDRA-18060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18060
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL/Semantics
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The new mechanism for dynamically building native functions introduced by 
> CASSANDRA-17811 can be used to provide within-collection aggregation 
> functions. We can use that mechanism to add new CQL functions to get:
>  * The number of items in a collection.
>  * The max/min items of a collection.
>  * The sum/avg of the items of a numeric collection.
>  * The keys or the values of a map.
> For example:
> {code:java}
> CREATE TABLE k.t (k int PRIMARY KEY, l list, m map);
> INSERT INTO t(k, l, m) VALUES (0, [1, 2, 3], {1:10, 2:20, 3:30});
> > SELECT map_keys(m), map_values(m) FROM t;
>  system.map_keys(m) | system.map_values(m)
> +--
>   {1, 2, 3} | [10, 20, 30]
> > SELECT collection_count(m), collection_count(l) FROM t;
>  system.collection_count(m) | system.collection_count(l)
> +
>   3 |  3
> > SELECT collection_min(l), collection_max(l) FROM t;
>  system.collection_min(l) | system.collection_max(l)
> --+--
> 1 |3
> > SELECT collection_sum(l), collection_avg(l) FROM t;
>  system.collection_sum(l) | system.collection_avg(l)
> --+--
> 6 |2
> {code}
> Note that this type of aggregation is different from the kind of aggregation 
> provided by {{min}}, {{max}}, {{sum}} and {{avg}}, which aggregate entire 
> collections across rows. Here we only aggregate the items of a collection row 
> per row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18060) Add aggregation scalar functions on collections

2022-11-23 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637750#comment-17637750
 ] 

Berenguer Blasi commented on CASSANDRA-18060:
-

LGTM +1

> Add aggregation scalar functions on collections
> ---
>
> Key: CASSANDRA-18060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18060
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL/Semantics
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The new mechanism for dynamically building native functions introduced by 
> CASSANDRA-17811 can be used to provide within-collection aggregation 
> functions. We can use that mechanism to add new CQL functions to get:
>  * The number of items in a collection.
>  * The max/min items of a collection.
>  * The sum/avg of the items of a numeric collection.
>  * The keys or the values of a map.
> For example:
> {code:java}
> CREATE TABLE k.t (k int PRIMARY KEY, l list, m map);
> INSERT INTO t(k, l, m) VALUES (0, [1, 2, 3], {1:10, 2:20, 3:30});
> > SELECT map_keys(m), map_values(m) FROM t;
>  system.map_keys(m) | system.map_values(m)
> +--
>   {1, 2, 3} | [10, 20, 30]
> > SELECT collection_count(m), collection_count(l) FROM t;
>  system.collection_count(m) | system.collection_count(l)
> +
>   3 |  3
> > SELECT collection_min(l), collection_max(l) FROM t;
>  system.collection_min(l) | system.collection_max(l)
> --+--
> 1 |3
> > SELECT collection_sum(l), collection_avg(l) FROM t;
>  system.collection_sum(l) | system.collection_avg(l)
> --+--
> 6 |2
> {code}
> Note that this type of aggregation is different from the kind of aggregation 
> provided by {{min}}, {{max}}, {{sum}} and {{avg}}, which aggregate entire 
> collections across rows. Here we only aggregate the items of a collection row 
> per row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18060) Add aggregation scalar functions on collections

2022-11-22 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637277#comment-17637277
 ] 

Andres de la Peña commented on CASSANDRA-18060:
---

CI after rebase:
||PR||CI||
|[trunk|https://github.com/apache/cassandra/pull/2024]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/2512/workflows/3b076050-4dc5-49b6-afbc-059188b1a99a]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/2512/workflows/cfa16b44-c54f-4497-a12d-b36eed71045f]|


> Add aggregation scalar functions on collections
> ---
>
> Key: CASSANDRA-18060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18060
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL/Semantics
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The new mechanism for dynamically building native functions introduced by 
> CASSANDRA-17811 can be used to provide within-collection aggregation 
> functions. We can use that mechanism to add new CQL functions to get:
>  * The number of items in a collection.
>  * The max/min items of a collection.
>  * The sum/avg of the items of a numeric collection.
>  * The keys or the values of a map.
> For example:
> {code:java}
> CREATE TABLE k.t (k int PRIMARY KEY, l list, m map);
> INSERT INTO t(k, l, m) VALUES (0, [1, 2, 3], {1:10, 2:20, 3:30});
> > SELECT map_keys(m), map_values(m) FROM t;
>  system.map_keys(m) | system.map_values(m)
> +--
>   {1, 2, 3} | [10, 20, 30]
> > SELECT collection_count(m), collection_count(l) FROM t;
>  system.collection_count(m) | system.collection_count(l)
> +
>   3 |  3
> > SELECT collection_min(l), collection_max(l) FROM t;
>  system.collection_min(l) | system.collection_max(l)
> --+--
> 1 |3
> > SELECT collection_sum(l), collection_avg(l) FROM t;
>  system.collection_sum(l) | system.collection_avg(l)
> --+--
> 6 |2
> {code}
> Note that this type of aggregation is different from the kind of aggregation 
> provided by {{min}}, {{max}}, {{sum}} and {{avg}}, which aggregate entire 
> collections across rows. Here we only aggregate the items of a collection row 
> per row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18060) Add aggregation scalar functions on collections

2022-11-22 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637276#comment-17637276
 ] 

Berenguer Blasi commented on CASSANDRA-18060:
-

I saw the new opened ticket for the test failure great. I guess the 0/NaN 
matter should go in the ML as it could be seen as a bug or may not :shrug: And 
for the casting of collections I new ticket would be nice, maybe with a 
different syntax, but that's food that new ticket.

Regarding this one so far and pending [~b.le...@gmail.com]'s comments +1 from 
me.

> Add aggregation scalar functions on collections
> ---
>
> Key: CASSANDRA-18060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18060
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL/Semantics
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The new mechanism for dynamically building native functions introduced by 
> CASSANDRA-17811 can be used to provide within-collection aggregation 
> functions. We can use that mechanism to add new CQL functions to get:
>  * The number of items in a collection.
>  * The max/min items of a collection.
>  * The sum/avg of the items of a numeric collection.
>  * The keys or the values of a map.
> For example:
> {code:java}
> CREATE TABLE k.t (k int PRIMARY KEY, l list, m map);
> INSERT INTO t(k, l, m) VALUES (0, [1, 2, 3], {1:10, 2:20, 3:30});
> > SELECT map_keys(m), map_values(m) FROM t;
>  system.map_keys(m) | system.map_values(m)
> +--
>   {1, 2, 3} | [10, 20, 30]
> > SELECT collection_count(m), collection_count(l) FROM t;
>  system.collection_count(m) | system.collection_count(l)
> +
>   3 |  3
> > SELECT collection_min(l), collection_max(l) FROM t;
>  system.collection_min(l) | system.collection_max(l)
> --+--
> 1 |3
> > SELECT collection_sum(l), collection_avg(l) FROM t;
>  system.collection_sum(l) | system.collection_avg(l)
> --+--
> 6 |2
> {code}
> Note that this type of aggregation is different from the kind of aggregation 
> provided by {{min}}, {{max}}, {{sum}} and {{avg}}, which aggregate entire 
> collections across rows. Here we only aggregate the items of a collection row 
> per row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18060) Add aggregation scalar functions on collections

2022-11-22 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637255#comment-17637255
 ] 

Andres de la Peña commented on CASSANDRA-18060:
---

Thanks for the review.

The test failure [can be reproduced on 
trunk|https://app.circleci.com/pipelines/github/adelapena/cassandra/2511/workflows/7aba8baa-0a6d-404a-b08b-c6a8078caca3/jobs/24706/tests]
 with the multiplexer. Just opened CASSANDRA-18065 for it.

I' have added the suggested JavaDoc and documentation for the already existing 
{{sum}} and {{avg}} functions, and for the new {{collection_sum}} and 
{{{}collection_avg{}}}.

Regarding followup tickets, I can add one for {{avg}} returning {{NaN}} instead 
of zero, but I don't know if it's too late for that since the current behaviour 
has been around for ages.

As for overflows and truncated decimals, we'll definitively need a followup 
ticket. The {{sum}} and {{avg}} functions can walk around this by using type 
casting, for example:
{code:java}
CREATE TABLE t (k int PRIMARY KEY, v int);
SELECT sum(cast(v AS varint)) FROM t;
SELECT avg(cast(v AS float)) FROM t;
{code}
However, we don't have such type of casting for collections, so we could have a 
followup ticket for adding that feature, so we can do something like:
{code:java}
CREATE TABLE t (k int PRIMARY KEY, v list);
SELECT collection_sum(cast(v AS list)) FROM t;
SELECT collection_avg(cast(v AS lsit)) FROM t;
{code}
CC [~blerer]

> Add aggregation scalar functions on collections
> ---
>
> Key: CASSANDRA-18060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18060
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL/Semantics
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The new mechanism for dynamically building native functions introduced by 
> CASSANDRA-17811 can be used to provide within-collection aggregation 
> functions. We can use that mechanism to add new CQL functions to get:
>  * The number of items in a collection.
>  * The max/min items of a collection.
>  * The sum/avg of the items of a numeric collection.
>  * The keys or the values of a map.
> For example:
> {code:java}
> CREATE TABLE k.t (k int PRIMARY KEY, l list, m map);
> INSERT INTO t(k, l, m) VALUES (0, [1, 2, 3], {1:10, 2:20, 3:30});
> > SELECT map_keys(m), map_values(m) FROM t;
>  system.map_keys(m) | system.map_values(m)
> +--
>   {1, 2, 3} | [10, 20, 30]
> > SELECT collection_count(m), collection_count(l) FROM t;
>  system.collection_count(m) | system.collection_count(l)
> +
>   3 |  3
> > SELECT collection_min(l), collection_max(l) FROM t;
>  system.collection_min(l) | system.collection_max(l)
> --+--
> 1 |3
> > SELECT collection_sum(l), collection_avg(l) FROM t;
>  system.collection_sum(l) | system.collection_avg(l)
> --+--
> 6 |2
> {code}
> Note that this type of aggregation is different from the kind of aggregation 
> provided by {{min}}, {{max}}, {{sum}} and {{avg}}, which aggregate entire 
> collections across rows. Here we only aggregate the items of a collection row 
> per row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18060) Add aggregation scalar functions on collections

2022-11-21 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637048#comment-17637048
 ] 

Berenguer Blasi commented on CASSANDRA-18060:
-

I have left minor comments that can be addressed on commit. The only thing that 
worries me is the j8 unit failure which seems to be a new one as there's no 
ticket for it and doesn't align to the latest trunk CI runs on jenkins. I would 
also create a ticket for the improved return types and for the failing test be 
it the case.

> Add aggregation scalar functions on collections
> ---
>
> Key: CASSANDRA-18060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18060
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL/Semantics
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The new mechanism for dynamically building native functions introduced by 
> CASSANDRA-17811 can be used to provide within-collection aggregation 
> functions. We can use that mechanism to add new CQL functions to get:
>  * The number of items in a collection.
>  * The max/min items of a collection.
>  * The sum/avg of the items of a numeric collection.
>  * The keys or the values of a map.
> For example:
> {code:java}
> CREATE TABLE k.t (k int PRIMARY KEY, l list, m map);
> INSERT INTO t(k, l, m) VALUES (0, [1, 2, 3], {1:10, 2:20, 3:30});
> > SELECT map_keys(m), map_values(m) FROM t;
>  system.map_keys(m) | system.map_values(m)
> +--
>   {1, 2, 3} | [10, 20, 30]
> > SELECT collection_count(m), collection_count(l) FROM t;
>  system.collection_count(m) | system.collection_count(l)
> +
>   3 |  3
> > SELECT collection_min(l), collection_max(l) FROM t;
>  system.collection_min(l) | system.collection_max(l)
> --+--
> 1 |3
> > SELECT collection_sum(l), collection_avg(l) FROM t;
>  system.collection_sum(l) | system.collection_avg(l)
> --+--
> 6 |2
> {code}
> Note that this type of aggregation is different from the kind of aggregation 
> provided by {{min}}, {{max}}, {{sum}} and {{avg}}, which aggregate entire 
> collections across rows. Here we only aggregate the items of a collection row 
> per row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18060) Add aggregation scalar functions on collections

2022-11-18 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636066#comment-17636066
 ] 

Andres de la Peña commented on CASSANDRA-18060:
---

Here is the patch, and CI is running:
||PR||CI||
|[trunk|https://github.com/apache/cassandra/pull/2024]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/2508/workflows/92f054d7-9386-498f-9ba4-330181cd4782]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/2508/workflows/8a0838e8-ffbb-424d-a572-3770f9a41632]|

Differently to [the 
prototype|https://github.com/apache/cassandra/compare/trunk...adelapena:cassandra:17811-trunk-collections?expand=1]
 mentioned during CASSANDRA-17811, the proposed PR uses the existing 
aggregation functions available at {{AggregateFcts}} as the underlying 
implementation of {{{}collection_min{}}}, {{{}collection_max{}}}, 
{{collection_sum}} and {{{}collection_avg{}}}. That way we avoid code 
duplication and make sure that the functions are consistent.

> Add aggregation scalar functions on collections
> ---
>
> Key: CASSANDRA-18060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18060
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL/Semantics
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The new mechanism for dynamically building native functions introduced by 
> CASSANDRA-17811 can be used to provide within-collection aggregation 
> functions. We can use that mechanism to add new CQL functions to get:
>  * The number of items in a collection.
>  * The max/min items of a collection.
>  * The sum/avg of the items of a numeric collection.
>  * The keys or the values of a map.
> For example:
> {code:java}
> CREATE TABLE k.t (k int PRIMARY KEY, l list, m map);
> INSERT INTO t(k, l, m) VALUES (0, [1, 2, 3], {1:10, 2:20, 3:30});
> > SELECT map_keys(m), map_values(m) FROM t;
>  system.map_keys(m) | system.map_values(m)
> +--
>   {1, 2, 3} | [10, 20, 30]
> > SELECT collection_count(m), collection_count(l) FROM t;
>  system.collection_count(m) | system.collection_count(l)
> +
>   3 |  3
> > SELECT collection_min(l), collection_max(l) FROM t;
>  system.collection_min(l) | system.collection_max(l)
> --+--
> 1 |3
> > SELECT collection_sum(l), collection_avg(l) FROM t;
>  system.collection_sum(l) | system.collection_avg(l)
> --+--
> 6 |2
> {code}
> Note that this type of aggregation is different from the kind of aggregation 
> provided by {{min}}, {{max}}, {{sum}} and {{avg}}, which aggregate entire 
> collections across rows. Here we only aggregate the items of a collection row 
> per row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org