I am not 100% sure I understand your question, but we do implement window functions in our compiler using the Calcite IR.
In our optimizer we use a Calcite rule which rewrites RexOver expressions into LogicalWindow operations: CoreRules.PROJECT_TO_LOGICAL_PROJECT_AND_WINDOW I find that the meaning of LogicalWindow as an IR representation is quite clean. Each group in a window has a list of aggregate calls, which work just like aggregate calls in a standard group-by setting. Our code handles both kinds of aggregates in the same way. Mihai ________________________________ From: Will Noble <wno...@google.com.INVALID> Sent: Wednesday, January 10, 2024 4:13 PM To: dev@calcite.apache.org <dev@calcite.apache.org> Subject: Why are window functions considered agg function? I have a question about the way Calcite handles window functions. Here are my assumptions: - The purpose of aggregation is to merge rows of the input relation. Therefore, an Aggregate <https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/core/Aggregate.java> layer can only exist in a relational expression if there is a corresponding GROUP BY clause in the corresponding SQL expression. If there is no explicit GROUP BY, then GROUP BY () is assumed implicitly, but logically there is a 1-to-1 correspondence between agg layers and (possibly implicit) GROUP BY clauses. - An agg function <https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/SqlAggFunction.java> is only meant to appear in an aggregate layer; i.e. It should never appear in a projection layer as a rex function would. - Calcite generally treats a window function as an agg function with requiresOver=true. That's the purpose of the requiresOver field, which has existed since at least as early as 2015. - Window functions cannot be used with GROUP BY clauses. Invoking them never causes the rows of the input relation to merge as is characteristic of an aggregation. Therefore, they are not agg functions. It would probably make more sense for us to think of them as rex functions that should appear in a projection, even though they cannot be computed row-wise independently. Perhaps a new type of relational operator is needed besides aggregate layers and projection layers; a hypothetical "windowing layer". Which of my assumptions is wrong? Consider this BigQuery example using SUM as an agg function. This query would be invalid without the GROUP BY clause. It will return as many rows as there are unique names, and total_score is per-name. SELECT name, SUM(score) as total_score GROUP BY name FROM games Now consider this BQ example with SUM as a window function. It would be invalid to include any GROUP BY clause here. It will return as many rows as there are in the input table, and total_score is global (it will have the same value in every single row). SELECT name, SUM(score) OVER () as total_score FROM games Does anybody actually use window functions in Calcite as they're currently implemented? How can it possibly make sense to consider them as agg functions, when they can never be used in the same context as a "true" agg function (which requires grouping)? Seems to me like these are actually two completely different functions with zero overlap in terms of where they can appear in a relational expression; they just happen to share the name SUM and involve similar math. Thanks for any clarification / guidance.