Re: [PR] Push aggregates through UNION ALL in MSE via AggregateUnionTranspose [pinot]

via GitHub Thu, 21 May 2026 09:11:25 -0700


yashmayya commented on code in PR #18513:
URL: https://github.com/apache/pinot/pull/18513#discussion_r3282648392



##########
pinot-query-planner/src/main/java/org/apache/pinot/calcite/rel/rules/PinotQueryRuleSets.java:
##########
@@ -141,9 +141,21 @@ private PinotQueryRuleSets() {
       // push aggregate functions through join, disabled by default
       AggregateJoinTransposeRule.Config.EXTENDED
           
.withDescription(PlannerRuleNames.AGGREGATE_JOIN_TRANSPOSE_EXTENDED).toRule(),
-      // aggregate union rule
+      // AggregateUnionAggregate and AggregateUnionTranspose are inverses of 
each other and the defaults are chosen to
+      // optimize for distributed-OLAP cost (shuffle dominates compute):
+      //   - AggregateUnionTranspose (default-on) pushes the aggregate into 
every UNION ALL branch, so each branch
+      //     ships pre-aggregated rows across the next exchange instead of raw 
rows. With low-cardinality group keys
+      //     (the common analytics case) this trades a cheap extra per-branch 
aggregate for a much smaller shuffle.
+      //   - AggregateUnionAggregate (default-off) does the opposite: it 
collapses Agg(Union(Agg(A), B)) into
+      //     Agg(Union(A, B)), which loses the pre-aggregation on A and forces 
raw rows through the union's exchange.
+      //     We keep it available behind `usePlannerRules` for 
embedded/in-memory deployments where shuffle is free,
+      //     but enabling it alongside the default-on Transpose just undoes 
Transpose's work (see the order below).
       AggregateUnionAggregateRule.Config.DEFAULT
           
.withDescription(PlannerRuleNames.AGGREGATE_UNION_AGGREGATE).toRule(),
+      // Registered after AggregateUnionAggregate on purpose: if both are 
enabled, this phase runs last so Transpose
+      // wins and any merge AggregateUnionAggregate did first gets pushed back 
into the branches.
+      PinotAggregateUnionTransposeRule

Review Comment:
   Thanks for taking another look. I want to engage with the contract framing 
more precisely, because I think "preserve the old opt-in contract" is doing 
more work than it appears.
   
   The shape of this change mirrors #16515 exactly: there, deployments that had 
relied on Calcite's default-on `AggregateUnionAggregate` had to add 
`usePlannerRules='AggregateUnionAggregate'` post-upgrade to keep their plans. 
Here, deployments currently carrying that same `usePlannerRules` line will need 
to add `skipPlannerRules='AggregateUnionTranspose'` to keep theirs. Neither 
case changes the rule's own behavior — both change the surrounding default 
ruleset. We accepted the trade in #16515 on the grounds that the new default 
was right for the dominant Pinot workload; this PR is making the same trade in 
the same direction for the same reason.
   
   I do see the asymmetry you're pointing at — users who set `usePlannerRules` 
explicitly arguably did more than users who just inherited Calcite's stock 
defaults. But the `usePlannerRules` semantic is "register this rule into the 
planner's ruleset", not "pin the plan shape it produced against any subsequent 
rule that runs in a later phase". A multi-phase HEP planner doesn't and can't 
make the latter guarantee — if it did, we couldn't add any future rule that 
touches an aggregate-over-union pattern.
   
   The compat-shim alternative (auto-skip Transpose whenever 
AggregateUnionAggregate is opted in, or vice versa) is what worries me more 
than the explicit-`SET`-line ask:
   - It encodes a runtime coupling between two notionally-independent rules 
inside `PinotQueryRuleSets`. Future maintainers won't expect it and won't know 
to extend it when adding the next aggregate-over-union rule.
   - It silently disables the cluster-wide shuffle optimization for any 
deployment that still has the historical opt-in sitting in config-as-code — 
which is exactly the audience you're trying to protect, but the protection 
lands by giving them the *old* (worse-for-most-workloads) plan without telling 
them.
   - It gets the policy wrong if anyone has a legitimate reason to want both, 
or any future combination we haven't anticipated. There's no good place to 
express that beyond what query options already allow.
   
   The audience this affects is also narrow and self-selecting: only 
deployments that *deliberately* opted into AggregateUnionAggregate because they 
believed the merged shape was better for their workload. Those are precisely 
the deployments for whom the new Transpose default is the wrong default — they 
need `skipPlannerRules='AggregateUnionTranspose'` anyway. Asking them to write 
the two-rule `SET` together isn't extra ceremony, it's the 
configuration-as-code spelling of "I want the merged shape", which is the 
decision they already made.
   
   Concretely, in line with how #16515 was rolled out, I'll add an explicit 
migration callout to the PR description and the eventual release notes: 
"Deployments currently using `usePlannerRules='AggregateUnionAggregate'` should 
add `skipPlannerRules='AggregateUnionTranspose'` to preserve the previous 
plan." If you still feel strongly about the compat shim after this, I'd 
appreciate hearing what specific production scenario you want to protect — that 
might surface something I'm missing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Push aggregates through UNION ALL in MSE via AggregateUnionTranspose [pinot]

Reply via email to