I have a question about the way Calcite handles window functions. Here are my assumptions:
- The purpose of aggregation is to merge rows of the input relation. Therefore, an Aggregate <https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/core/Aggregate.java> layer can only exist in a relational expression if there is a corresponding GROUP BY clause in the corresponding SQL expression. If there is no explicit GROUP BY, then GROUP BY () is assumed implicitly, but logically there is a 1-to-1 correspondence between agg layers and (possibly implicit) GROUP BY clauses. - An agg function <https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/SqlAggFunction.java> is only meant to appear in an aggregate layer; i.e. It should never appear in a projection layer as a rex function would. - Calcite generally treats a window function as an agg function with requiresOver=true. That's the purpose of the requiresOver field, which has existed since at least as early as 2015. - Window functions cannot be used with GROUP BY clauses. Invoking them never causes the rows of the input relation to merge as is characteristic of an aggregation. Therefore, they are not agg functions. It would probably make more sense for us to think of them as rex functions that should appear in a projection, even though they cannot be computed row-wise independently. Perhaps a new type of relational operator is needed besides aggregate layers and projection layers; a hypothetical "windowing layer". Which of my assumptions is wrong? Consider this BigQuery example using SUM as an agg function. This query would be invalid without the GROUP BY clause. It will return as many rows as there are unique names, and total_score is per-name. SELECT name, SUM(score) as total_score GROUP BY name FROM games Now consider this BQ example with SUM as a window function. It would be invalid to include any GROUP BY clause here. It will return as many rows as there are in the input table, and total_score is global (it will have the same value in every single row). SELECT name, SUM(score) OVER () as total_score FROM games Does anybody actually use window functions in Calcite as they're currently implemented? How can it possibly make sense to consider them as agg functions, when they can never be used in the same context as a "true" agg function (which requires grouping)? Seems to me like these are actually two completely different functions with zero overlap in terms of where they can appear in a relational expression; they just happen to share the name SUM and involve similar math. Thanks for any clarification / guidance.