[ https://issues.apache.org/jira/browse/CASSANDRA-17811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andres de la Peña updated CASSANDRA-17811: ------------------------------------------ Bug Category: Parent values: Correctness(12982)Level 1 values: API / Semantic Definition(13162) Complexity: Normal Discovered By: Code Inspection Severity: Low Status: Open (was: Triage Needed) > CQL aggregation functions on collections > ---------------------------------------- > > Key: CASSANDRA-17811 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17811 > Project: Cassandra > Issue Type: Bug > Components: CQL/Semantics > Reporter: Andres de la Peña > Assignee: Andres de la Peña > Priority: Normal > > It has been found during CASSANDRA-8877 that CQLS's aggregation functions > {{{}max{}}}, {{min}} and {{count}} can be applied to collections, but the > result is returned as a blob. For example: > {code:java} > CREATE TABLE t (k int PRIMARY KEY, l list<int>); > INSERT INTO t(k, l) VALUES (0, [1, 2, 3]); > INSERT INTO t(k, l) VALUES (1, [10, 20, 30]); > SELECT max(l) FROM t; > system.max(l) > ------------------------------------------------------------ > 0x00000003000000040000000a0000000400000014000000040000001e > {code} > This happens on 3.0, 3.11, 4.0, 4.1 and trunk. > I'm not sure on whether the function shouldn't be supported for collections, > or it should be supported but the result is wrong. > In the example above, the returned blob is the serialized value of {{{}[10, > 20, 30]{}}}, which is the right one according to the list comparator. I think > this happens because the matched version of the function is the one for > {{{}(blob) -> blob{}}}. We would need a {{(list<int>) -> list<int>}} function > instead, but this function doesn't exist. > It would be quite easy to add versions of the {{{}max{}}}, {{min}} and > {{count}} functions for every type of collection ({{{}list<int>{}}}, > {{{}list<text>{}}}, {{{}map<int, int>{}}}, {{{}map<int, text>{}}}, etc.). The > downside of this approach is that it would increase the number of aggregation > functions kept in memory from 82 to 2722, if my maths are right. This is > quite an increase, mainly due to the many possible combinations of the > {{map}} type. > [Here|https://github.com/adelapena/cassandra/commit/e3ba3c2dc36ce58d06942078c708ffb93eb3cd84] > is a quick, incomplete prototype of the approach. > Also, I'm not sure that applying those aggregation functions to collections > is very useful in practice. Thus, an alternative approach would be just > forbidding them, considering them not supported. I don't think it would be a > problem for backward compatibility since no one has complained about the > current behaviour, and we might well consider that the original intent was > not to allow aggregation on collections. At least, there aren't any tests for > it, and I can't find any documentation about it either. > Another idea that comes to mind is that we could change the meaning of those > functions to aggregate the values within the collection, instead of > aggregating the rows. In that case, the behaviour would be: > {code:java} > CREATE TABLE t (k int PRIMARY KEY, l list<int>); > INSERT INTO t(k, l) VALUES (0, [1, 2, 3]); > INSERT INTO t(k, l) VALUES (1, [10, 20, 30]); > SELECT max(l) FROM t; > k | system.max(l) > ---+----------- > 1 | 30 > 0 | 3 > {code} > Of course we could have separate function names for that type of collection > aggregations, like {{{}collectionMax{}}}, {{{}maxItem{}}}, or something like > that. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org