I have a question about the way Calcite handles window functions.

Here are my assumptions:

   - The purpose of aggregation is to merge rows of the input relation.
   Therefore, an Aggregate
   
<https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/core/Aggregate.java>
   layer can only exist in a relational expression if there is a
corresponding GROUP
   BY clause in the corresponding SQL expression. If there is no explicit GROUP
   BY, then GROUP BY () is assumed implicitly, but logically there is a
   1-to-1 correspondence between agg layers and (possibly implicit) GROUP BY
   clauses.
   - An agg function
   
<https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/SqlAggFunction.java>
   is only meant to appear in an aggregate layer; i.e. It should never appear
   in a projection layer as a rex function would.
   - Calcite generally treats a window function as an agg function with
   requiresOver=true. That's the purpose of the requiresOver field, which
   has existed since at least as early as 2015.
   - Window functions cannot be used with GROUP BY clauses. Invoking them
   never causes the rows of the input relation to merge as is characteristic
   of an aggregation. Therefore, they are not agg functions. It would probably
   make more sense for us to think of them as rex functions that should appear
   in a projection, even though they cannot be computed row-wise
   independently. Perhaps a new type of relational operator is needed besides
   aggregate layers and projection layers; a hypothetical "windowing layer".

Which of my assumptions is wrong?

Consider this BigQuery example using SUM as an agg function. This query
would be invalid without the GROUP BY clause. It will return as many rows
as there are unique names, and total_score is per-name.

SELECT name, SUM(score) as total_score
GROUP BY name
FROM games

Now consider this BQ example with SUM as a window function. It would be
invalid to include any GROUP BY clause here. It will return as many rows as
there are in the input table, and total_score is global (it will have the
same value in every single row).

SELECT name, SUM(score) OVER () as total_score
FROM games

Does anybody actually use window functions in Calcite as they're currently
implemented? How can it possibly make sense to consider them as agg
functions, when they can never be used in the same context as a "true" agg
function (which requires grouping)? Seems to me like these are actually two
completely different functions with zero overlap in terms of where they can
appear in a relational expression; they just happen to share the name SUM
and involve similar math.

Thanks for any clarification / guidance.

Reply via email to