yaooqinn opened a new pull request, #56009:
URL: https://github.com/apache/spark/pull/56009
### What changes were proposed in this pull request?
In `Optimizer.DecimalAggregates`, rewrite the 4 Aggregate-arm `Sum` /
`Average`
rewrites from the single-arg helper ctor (`Sum(child)` / `Average(child)`) to
`original.copy(child = UnscaledValue(...))`. The helper ctor re-reads
`EvalMode`
from `SQLConf`, which silently drops `EvalMode.TRY` from `try_sum` /
`try_avg`
after the rule's fast path fires.
The fix is a pattern-bind plus case-class `copy`: `copy(child = ...)`
preserves
any sibling fields automatically — relevant because SPARK-38432 added
`evalMode` / `evalContext` to `Sum` / `Average` *after* this rule was
written,
and the rule was missed in that update.
#### Scope
4 of 6 sites (the Aggregate arms — SUM/AVG × un-widened/widened peel). The 2
Window-arm sites are not in this PR; they need a separate vanilla repro to
confirm reachability through `ExtractWindowExpressions` hoisting, tracked as
follow-up. The helper-ctor itself (`Sum.scala:52`, `Average.scala:50`) is
retained — it is used by non-`DecimalAggregates` call sites (analyzer ctor,
test fixtures); removing it requires a separate audit.
### Why are the changes needed?
Latent bug since SPARK-38432 added `evalMode` / `evalContext` fields to
`Sum` / `Average` for the `try_*` aggregates: the optimizer rule
`DecimalAggregates` was not updated, so the rewritten `Sum` / `Average` nodes
re-read `EvalMode` from the current `SQLConf` via the helper ctor.
Vanilla pyspark 3.5.3 ground-truth (`spark.sql.optimizer.excludedRules`
toggle):
```
--- rule ON : select try_sum(cast(id as decimal(7,2))) from range(10) ---
* Sum evalMode=LEGACY :: sum(UnscaledValue(x)) ← bug
--- rule OFF : same query ---
* Sum evalMode=TRY :: try_sum(x) ← correct
```
`Sum.scala:58` `shouldTrackIsEmpty = evalContext.evalMode == EvalMode.TRY`
decides aggregate buffer schema (adds the `isEmpty` column required for
NULL-on-overflow). When `EvalMode.TRY` is dropped,
`shouldTrackIsEmpty=false`,
no `isEmpty` column, and overflow no longer returns NULL.
### Does this PR introduce _any_ user-facing change?
Yes: `try_sum` / `try_avg` on narrow-precision decimal columns now return
NULL on overflow as documented for `try_*`, instead of wrapping
(`ansi=false`) or throwing (`ansi=true`).
Fast-path trigger condition (so the affected surface is precise):
- SUM gate: `p + 10 <= MAX_LONG_DIGITS=18` -> `p <= 8`
- AVG gate: `p <= AVG_PEEL_MAX_INNER_PRECISION=7`
Decimal columns outside these precision bounds are unaffected (the rule's
`case` does not match, the original node is preserved).
### How was this patch tested?
4 new unit tests in `DecimalAggregatesSuite`, covering both arms
(un-widened + widened-cast peel) for both `try_sum` and `try_avg`, asserting
that `Sum.evalContext.evalMode` / `Average.evalMode` equals `EvalMode.TRY`
after the rule fires.
- Pre-fix: 4/4 FAIL with `Expected evalMode=TRY post-rewrite, got List(ANSI)`
- Post-fix: 35/35 PASS (4 new + 31 existing, 0 regression)
- Sibling regressions clean: `DecimalPrecisionSuite` +
`AggregateOptimizeSuite`
20/20 PASS
- `dev/lint-scala` clean
**Backport candidates**: branch-3.5, branch-4.0 (bug latent on both)
**Out-of-PR follow-ups** (explicitly tracked):
- Window-arm 2 sites (`L2591` / `L2596`) — needs separate vanilla repro
- ANSI=true plain-sum regression test (defensive)
- Drop unused helper-ctor (separate refactor)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]