Ok, I perhaps should have looked a little deeper. Looks like
org.apache.calcite.rel.core.Window is the "hypothetical" windowing layer I
described in assumption #4.

I suppose it makes sense to call window functions "window aggregates" on a
conceptual level. It's just a little confusing that both AggregateCall
<https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/core/AggregateCall.java>
 and Window.RexWinAggCall
<https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/core/Window.java>
define
their functions using SqlAggFunction. It feels to me like that's conflating
two different concepts, but perhaps that's not very important.

Thanks

On Wed, Jan 10, 2024 at 5:16 PM Mihai Budiu <mbu...@gmail.com> wrote:

> I am not 100% sure I understand your question, but we do implement window
> functions in our compiler using the Calcite IR.
>
> In our optimizer we use a Calcite rule which rewrites RexOver expressions
> into LogicalWindow operations:
> CoreRules.PROJECT_TO_LOGICAL_PROJECT_AND_WINDOW
>
> I find that the meaning of LogicalWindow as an IR representation is quite
> clean. Each group in a window has a list of aggregate calls, which work
> just like aggregate calls in a standard group-by setting. Our code handles
> both kinds of aggregates in the same way.
>
> Mihai
>
> ________________________________
> From: Will Noble <wno...@google.com.INVALID>
> Sent: Wednesday, January 10, 2024 4:13 PM
> To: dev@calcite.apache.org <dev@calcite.apache.org>
> Subject: Why are window functions considered agg function?
>
> I have a question about the way Calcite handles window functions.
>
> Here are my assumptions:
>
>    - The purpose of aggregation is to merge rows of the input relation.
>    Therefore, an Aggregate
>    <
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/core/Aggregate.java
> >
>    layer can only exist in a relational expression if there is a
> corresponding GROUP
>    BY clause in the corresponding SQL expression. If there is no explicit
> GROUP
>    BY, then GROUP BY () is assumed implicitly, but logically there is a
>    1-to-1 correspondence between agg layers and (possibly implicit) GROUP
> BY
>    clauses.
>    - An agg function
>    <
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/SqlAggFunction.java
> >
>    is only meant to appear in an aggregate layer; i.e. It should never
> appear
>    in a projection layer as a rex function would.
>    - Calcite generally treats a window function as an agg function with
>    requiresOver=true. That's the purpose of the requiresOver field, which
>    has existed since at least as early as 2015.
>    - Window functions cannot be used with GROUP BY clauses. Invoking them
>    never causes the rows of the input relation to merge as is
> characteristic
>    of an aggregation. Therefore, they are not agg functions. It would
> probably
>    make more sense for us to think of them as rex functions that should
> appear
>    in a projection, even though they cannot be computed row-wise
>    independently. Perhaps a new type of relational operator is needed
> besides
>    aggregate layers and projection layers; a hypothetical "windowing
> layer".
>
> Which of my assumptions is wrong?
>
> Consider this BigQuery example using SUM as an agg function. This query
> would be invalid without the GROUP BY clause. It will return as many rows
> as there are unique names, and total_score is per-name.
>
> SELECT name, SUM(score) as total_score
> GROUP BY name
> FROM games
>
> Now consider this BQ example with SUM as a window function. It would be
> invalid to include any GROUP BY clause here. It will return as many rows as
> there are in the input table, and total_score is global (it will have the
> same value in every single row).
>
> SELECT name, SUM(score) OVER () as total_score
> FROM games
>
> Does anybody actually use window functions in Calcite as they're currently
> implemented? How can it possibly make sense to consider them as agg
> functions, when they can never be used in the same context as a "true" agg
> function (which requires grouping)? Seems to me like these are actually two
> completely different functions with zero overlap in terms of where they can
> appear in a relational expression; they just happen to share the name SUM
> and involve similar math.
>
> Thanks for any clarification / guidance.
>

Reply via email to