Re: [PR] Push aggregates through UNION ALL in MSE via AggregateUnionTranspose [pinot]

via GitHub Sun, 17 May 2026 19:21:37 -0700


yashmayya commented on code in PR #18513:
URL: https://github.com/apache/pinot/pull/18513#discussion_r3256004173



##########
pinot-query-planner/src/main/java/org/apache/pinot/calcite/rel/rules/PinotQueryRuleSets.java:
##########
@@ -141,9 +141,21 @@ private PinotQueryRuleSets() {
       // push aggregate functions through join, disabled by default
       AggregateJoinTransposeRule.Config.EXTENDED
           
.withDescription(PlannerRuleNames.AGGREGATE_JOIN_TRANSPOSE_EXTENDED).toRule(),
-      // aggregate union rule
+      // AggregateUnionAggregate and AggregateUnionTranspose are inverses of 
each other and the defaults are chosen to
+      // optimize for distributed-OLAP cost (shuffle dominates compute):
+      //   - AggregateUnionTranspose (default-on) pushes the aggregate into 
every UNION ALL branch, so each branch
+      //     ships pre-aggregated rows across the next exchange instead of raw 
rows. With low-cardinality group keys
+      //     (the common analytics case) this trades a cheap extra per-branch 
aggregate for a much smaller shuffle.
+      //   - AggregateUnionAggregate (default-off) does the opposite: it 
collapses Agg(Union(Agg(A), B)) into
+      //     Agg(Union(A, B)), which loses the pre-aggregation on A and forces 
raw rows through the union's exchange.
+      //     We keep it available behind `usePlannerRules` for 
embedded/in-memory deployments where shuffle is free,
+      //     but enabling it alongside the default-on Transpose just undoes 
Transpose's work (see the order below).
       AggregateUnionAggregateRule.Config.DEFAULT
           
.withDescription(PlannerRuleNames.AGGREGATE_UNION_AGGREGATE).toRule(),
+      // Registered after AggregateUnionAggregate on purpose: if both are 
enabled, this phase runs last so Transpose
+      // wins and any merge AggregateUnionAggregate did first gets pushed back 
into the branches.
+      PinotAggregateUnionTransposeRule

Review Comment:
   I'd push back a bit on this one. `AggregateUnionAggregate` was itself 
flipped from default-on (Calcite's stock setting) to default-off in #16515 for 
exactly the same reason this PR is enabling Transpose by default: in 
distributed OLAP shuffle dominates, so the rule's direction (collapse 
pre-aggregates into a single top-level aggregate, increasing shuffle) is the 
wrong default for Pinot. The PR description there 
([link](https://github.com/apache/pinot/pull/16515)) makes the same 
shuffle-vs-compute argument.
   
   That precedent flipped the default outright for everyone — anyone who relied 
on the Calcite-default behavior of `AggregateUnionAggregate` had to add 
`usePlannerRules='AggregateUnionAggregate'` to keep it. We accepted that 
user-visible change because the new default was much better for the common 
Pinot workload.
   
   The change in this PR is materially smaller than that: the rule still exists 
and `usePlannerRules='AggregateUnionAggregate'` still enables it — what changes 
is the surrounding ruleset. The combination 
`usePlannerRules='AggregateUnionAggregate'` *plus* default-on Transpose is 
incoherent on its own merits: the user is asking the planner to do collapse 
work that the very next rule phase will undo. Telling that small set of users 
to write `usePlannerRules='AggregateUnionAggregate', 
skipPlannerRules='AggregateUnionTranspose'` together is, I think, the right 
behavior — it forces an explicit decision rather than silently making one of 
the two rules a no-op.
   
   A compatibility shim (auto-skip Transpose whenever AggregateUnionAggregate 
is opted in, or vice versa) would lock in coupling between two unrelated rules 
at the framework level for a vanishingly narrow use case — users opted in to 
`AggregateUnionAggregate` only when they specifically wanted the merge behavior 
(high group-key cardinality + tiny branches), which is exactly when they should 
also want Transpose off. That `SET` line is the same shape as the one #16515 
told users to add.
   
   I can call this out in the PR description / release notes so deployments 
currently using `usePlannerRules='AggregateUnionAggregate'` know to add the 
matching `skipPlannerRules`. But I'd rather not encode the linkage in 
`PinotQueryRuleSets` — happy to discuss further if you feel strongly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Push aggregates through UNION ALL in MSE via AggregateUnionTranspose [pinot]

Reply via email to