This will require some long introduction for context: The MAX/MIN functions aggregate rows to get the row with min/max column value according to their comparator. For collections, the comparison is on the lexicographical order of the collection elements. That's the very same comparator that is used when collections are used as clustering keys and for ORDER BY.
However, a bug in the MIN/MAX aggregate functions used to make that the results were presented in their unserialized form, although the row selection was correct. That bug was recently solved by CASSANDRA-17811. During that ticket it was also considered the option of simply disabling MIN/MAX on collection since applying those functions to collections, since they don't seem super useful. However, that option was quickly discarded and the operation was fixed so the MIN/MAX functions correctly work for every data type. As a byproduct of the internal improvements of that fix, CASSANDRA-8877 introduced a new set of functions that can perform aggregations of the elements of a collection. Those where named "map_keys", "map_values", "collection_min", "collection_max", "collection_sum", and "collection_count". Those are the names mentioned on the mail list thread about function naming conventions. Despite doing a kind of within-collection aggregation, these functions are not what we usually call aggregate functions, since they don't aggregate multiple rows together. On a different line of work, CASSANDRA-17425 added to trunk a MAXWRITETIME function to get the max timestamp of a multi-cell column. However, the new collection functions can be used in combination with the WRITETIME and TTL functions to retrieve the min/max/sum/avg timestamp or ttl of a multi-cell column. Since the new functions give a generic way of aggreagting timestamps ant TTLs of multi-cell columns, CASSANDRA-18078 proposed to remove that MAXWRITETIME function. Yifan Cai, author of the MAXWRITETIME function, agreed to remove that function in favour of the new generic collection functions. However, the MAXWRITETIME function can work on both single-cell and multi-cell columns, whereas "COLLECTION_MAX(WRITETIME(column))" would only work on multi-cell columns, That's because MAXWRITETIME of a not-multicell column doesn't return a collection, and one should simply use "WRITETIME(column)" instead. So it was proposed in CASSANDRA-18037 that collections functions applied to a not-collection value consider that value as the only element of a singleton collection. So, for example, COLLECTION_MAX(7) = COLLECTION_MAX([7]) = 7. That ticket has already been reviewed and it's mostly ready to commit. Now we can go straight to the point: Recently Benedict brought back the idea of deprecating aggregate functions applied to collections, the very same idea that was mentioned on CASSANDRA-17811 description almost four months ago. That way we could rename the new collection functions MIN/MAX/SUM/AVG, same as the classic aggregate functions. That way MIN/MAX/SUM/AVG would be an aggregate function when applied to not-collection columns, and a scalar function when applied to collection. We can't do that with COUNT because there would be an ambiguity, so the proposal for that case is renaming COLLECTION_COUNT to SIZE. Benedict, please correct me if I'm not correctly exposing the proposal. I however would prefer to keep aggregate functions working on collections, and keep the names of the new collection functions as "COLLECTION_*". Reasons are: 1 - Making aggregate functions not work on collections might be cosidered as breaking backward compatibility and require a deprecation plan. 2 - Keeping aggregate functions working on collections might not look superuseful, but they make the set of aggregate functions consistent and applicable to every column type. 3 - Using the "COLLECTION_" prefix on collection functions establishes a clear distinction between row aggregations and collection aggregations, while at the same time exposing the analogy between each pair of functions. 4 - Not using the "COLLECTION_" prefix forces us to search for workarounds such as using the column type when possible, or trying to figure out synonyms like in the case of COUNT/SIZE. Even if that works for this case, future functions can find more trouble when trying to figure out workarounds to avoid clashing with existing function names. For example, we might want to add a SIZE function that gets the size in bytes of any column, or we might want to add a MAX function that gets the maximum of a set of columns, etc. And example of the synonym-based approach that comes to mind is MySQL's MAX and GREATEST functions, where MAX is for row aggregation and GREATEST is for column aggregation. 5 - If MIN/MAX function selection is based on the column type, we can't implement Yifan's proposal of making COLLECTION_MAX(7) = COLLECTION_MAX([7]) = 7, which would be very useful for combining collection functions with time functions. What do others think? What should we do with aggregate functions on collections, collection functions and MAXWRITETIME?