I am not 100% sure I understand your question, but we do implement window 
functions in our compiler using the Calcite IR.

In our optimizer we use a Calcite rule which rewrites RexOver expressions into 
LogicalWindow operations:
CoreRules.PROJECT_TO_LOGICAL_PROJECT_AND_WINDOW

I find that the meaning of LogicalWindow as an IR representation is quite 
clean. Each group in a window has a list of aggregate calls, which work just 
like aggregate calls in a standard group-by setting. Our code handles both 
kinds of aggregates in the same way.

Mihai

________________________________
From: Will Noble <wno...@google.com.INVALID>
Sent: Wednesday, January 10, 2024 4:13 PM
To: dev@calcite.apache.org <dev@calcite.apache.org>
Subject: Why are window functions considered agg function?

I have a question about the way Calcite handles window functions.

Here are my assumptions:

   - The purpose of aggregation is to merge rows of the input relation.
   Therefore, an Aggregate
   
<https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/core/Aggregate.java>
   layer can only exist in a relational expression if there is a
corresponding GROUP
   BY clause in the corresponding SQL expression. If there is no explicit GROUP
   BY, then GROUP BY () is assumed implicitly, but logically there is a
   1-to-1 correspondence between agg layers and (possibly implicit) GROUP BY
   clauses.
   - An agg function
   
<https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/SqlAggFunction.java>
   is only meant to appear in an aggregate layer; i.e. It should never appear
   in a projection layer as a rex function would.
   - Calcite generally treats a window function as an agg function with
   requiresOver=true. That's the purpose of the requiresOver field, which
   has existed since at least as early as 2015.
   - Window functions cannot be used with GROUP BY clauses. Invoking them
   never causes the rows of the input relation to merge as is characteristic
   of an aggregation. Therefore, they are not agg functions. It would probably
   make more sense for us to think of them as rex functions that should appear
   in a projection, even though they cannot be computed row-wise
   independently. Perhaps a new type of relational operator is needed besides
   aggregate layers and projection layers; a hypothetical "windowing layer".

Which of my assumptions is wrong?

Consider this BigQuery example using SUM as an agg function. This query
would be invalid without the GROUP BY clause. It will return as many rows
as there are unique names, and total_score is per-name.

SELECT name, SUM(score) as total_score
GROUP BY name
FROM games

Now consider this BQ example with SUM as a window function. It would be
invalid to include any GROUP BY clause here. It will return as many rows as
there are in the input table, and total_score is global (it will have the
same value in every single row).

SELECT name, SUM(score) OVER () as total_score
FROM games

Does anybody actually use window functions in Calcite as they're currently
implemented? How can it possibly make sense to consider them as agg
functions, when they can never be used in the same context as a "true" agg
function (which requires grouping)? Seems to me like these are actually two
completely different functions with zero overlap in terms of where they can
appear in a relational expression; they just happen to share the name SUM
and involve similar math.

Thanks for any clarification / guidance.

Reply via email to