Ok, I perhaps should have looked a little deeper. Looks like org.apache.calcite.rel.core.Window is the "hypothetical" windowing layer I described in assumption #4.
I suppose it makes sense to call window functions "window aggregates" on a conceptual level. It's just a little confusing that both AggregateCall <https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/core/AggregateCall.java> and Window.RexWinAggCall <https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/core/Window.java> define their functions using SqlAggFunction. It feels to me like that's conflating two different concepts, but perhaps that's not very important. Thanks On Wed, Jan 10, 2024 at 5:16 PM Mihai Budiu <mbu...@gmail.com> wrote: > I am not 100% sure I understand your question, but we do implement window > functions in our compiler using the Calcite IR. > > In our optimizer we use a Calcite rule which rewrites RexOver expressions > into LogicalWindow operations: > CoreRules.PROJECT_TO_LOGICAL_PROJECT_AND_WINDOW > > I find that the meaning of LogicalWindow as an IR representation is quite > clean. Each group in a window has a list of aggregate calls, which work > just like aggregate calls in a standard group-by setting. Our code handles > both kinds of aggregates in the same way. > > Mihai > > ________________________________ > From: Will Noble <wno...@google.com.INVALID> > Sent: Wednesday, January 10, 2024 4:13 PM > To: dev@calcite.apache.org <dev@calcite.apache.org> > Subject: Why are window functions considered agg function? > > I have a question about the way Calcite handles window functions. > > Here are my assumptions: > > - The purpose of aggregation is to merge rows of the input relation. > Therefore, an Aggregate > < > https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/core/Aggregate.java > > > layer can only exist in a relational expression if there is a > corresponding GROUP > BY clause in the corresponding SQL expression. If there is no explicit > GROUP > BY, then GROUP BY () is assumed implicitly, but logically there is a > 1-to-1 correspondence between agg layers and (possibly implicit) GROUP > BY > clauses. > - An agg function > < > https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/SqlAggFunction.java > > > is only meant to appear in an aggregate layer; i.e. It should never > appear > in a projection layer as a rex function would. > - Calcite generally treats a window function as an agg function with > requiresOver=true. That's the purpose of the requiresOver field, which > has existed since at least as early as 2015. > - Window functions cannot be used with GROUP BY clauses. Invoking them > never causes the rows of the input relation to merge as is > characteristic > of an aggregation. Therefore, they are not agg functions. It would > probably > make more sense for us to think of them as rex functions that should > appear > in a projection, even though they cannot be computed row-wise > independently. Perhaps a new type of relational operator is needed > besides > aggregate layers and projection layers; a hypothetical "windowing > layer". > > Which of my assumptions is wrong? > > Consider this BigQuery example using SUM as an agg function. This query > would be invalid without the GROUP BY clause. It will return as many rows > as there are unique names, and total_score is per-name. > > SELECT name, SUM(score) as total_score > GROUP BY name > FROM games > > Now consider this BQ example with SUM as a window function. It would be > invalid to include any GROUP BY clause here. It will return as many rows as > there are in the input table, and total_score is global (it will have the > same value in every single row). > > SELECT name, SUM(score) OVER () as total_score > FROM games > > Does anybody actually use window functions in Calcite as they're currently > implemented? How can it possibly make sense to consider them as agg > functions, when they can never be used in the same context as a "true" agg > function (which requires grouping)? Seems to me like these are actually two > completely different functions with zero overlap in terms of where they can > appear in a relational expression; they just happen to share the name SUM > and involve similar math. > > Thanks for any clarification / guidance. >