yashmayya commented on code in PR #18513:
URL: https://github.com/apache/pinot/pull/18513#discussion_r3256004173
##########
pinot-query-planner/src/main/java/org/apache/pinot/calcite/rel/rules/PinotQueryRuleSets.java:
##########
@@ -141,9 +141,21 @@ private PinotQueryRuleSets() {
// push aggregate functions through join, disabled by default
AggregateJoinTransposeRule.Config.EXTENDED
.withDescription(PlannerRuleNames.AGGREGATE_JOIN_TRANSPOSE_EXTENDED).toRule(),
- // aggregate union rule
+ // AggregateUnionAggregate and AggregateUnionTranspose are inverses of
each other and the defaults are chosen to
+ // optimize for distributed-OLAP cost (shuffle dominates compute):
+ // - AggregateUnionTranspose (default-on) pushes the aggregate into
every UNION ALL branch, so each branch
+ // ships pre-aggregated rows across the next exchange instead of raw
rows. With low-cardinality group keys
+ // (the common analytics case) this trades a cheap extra per-branch
aggregate for a much smaller shuffle.
+ // - AggregateUnionAggregate (default-off) does the opposite: it
collapses Agg(Union(Agg(A), B)) into
+ // Agg(Union(A, B)), which loses the pre-aggregation on A and forces
raw rows through the union's exchange.
+ // We keep it available behind `usePlannerRules` for
embedded/in-memory deployments where shuffle is free,
+ // but enabling it alongside the default-on Transpose just undoes
Transpose's work (see the order below).
AggregateUnionAggregateRule.Config.DEFAULT
.withDescription(PlannerRuleNames.AGGREGATE_UNION_AGGREGATE).toRule(),
+ // Registered after AggregateUnionAggregate on purpose: if both are
enabled, this phase runs last so Transpose
+ // wins and any merge AggregateUnionAggregate did first gets pushed back
into the branches.
+ PinotAggregateUnionTransposeRule
Review Comment:
I'd push back a bit on this one. `AggregateUnionAggregate` was itself
flipped from default-on (Calcite's stock setting) to default-off in #16515 for
exactly the same reason this PR is enabling Transpose by default: in
distributed OLAP shuffle dominates, so the rule's direction (collapse
pre-aggregates into a single top-level aggregate, increasing shuffle) is the
wrong default for Pinot. The PR description there
([link](https://github.com/apache/pinot/pull/16515)) makes the same
shuffle-vs-compute argument.
That precedent flipped the default outright for everyone — anyone who relied
on the Calcite-default behavior of `AggregateUnionAggregate` had to add
`usePlannerRules='AggregateUnionAggregate'` to keep it. We accepted that
user-visible change because the new default was much better for the common
Pinot workload.
The change in this PR is materially smaller than that: the rule still exists
and `usePlannerRules='AggregateUnionAggregate'` still enables it — what changes
is the surrounding ruleset. The combination
`usePlannerRules='AggregateUnionAggregate'` *plus* default-on Transpose is
incoherent on its own merits: the user is asking the planner to do collapse
work that the very next rule phase will undo. Telling that small set of users
to write `usePlannerRules='AggregateUnionAggregate',
skipPlannerRules='AggregateUnionTranspose'` together is, I think, the right
behavior — it forces an explicit decision rather than silently making one of
the two rules a no-op.
A compatibility shim (auto-skip Transpose whenever AggregateUnionAggregate
is opted in, or vice versa) would lock in coupling between two unrelated rules
at the framework level for a vanishingly narrow use case — users opted in to
`AggregateUnionAggregate` only when they specifically wanted the merge behavior
(high group-key cardinality + tiny branches), which is exactly when they should
also want Transpose off. That `SET` line is the same shape as the one #16515
told users to add.
I can call this out in the PR description / release notes so deployments
currently using `usePlannerRules='AggregateUnionAggregate'` know to add the
matching `skipPlannerRules`. But I'd rather not encode the linkage in
`PinotQueryRuleSets` — happy to discuss further if you feel strongly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]