yaooqinn opened a new pull request, #55957: URL: https://github.com/apache/spark/pull/55957
### What changes were proposed in this pull request? Extend `DecimalAggregates` to peel a scale-preserving widening `Cast` around `Sum`/`Average` arguments, recovering the long-backed fast path when the inner expression's precision still fits the existing safety bounds. When the input is `Sum(Cast(inner: dec(p, s), dec(p', s)))` with `p' >= p`: - SUM arm fires under `p + 10 <= 18`, identical to the existing SUM fast-path guard. - AVG arm fires under `p <= 7` (`AVG_PEEL_MAX_INNER_PRECISION`), strictly tighter than the existing AVG arm's `p + 4 <= 15` (= `p <= 11`), to avoid amplifying SPARK-37024 Double-regime precision loss. Both arms share a `WidenedDecimalChild` extractor that refuses to unwrap `CheckOverflow` (preserves row-level overflow semantics). Window arm is unchanged: `ExtractWindowExpressions` hoists the `Cast` into a preceding `Project`, so an expression-level rewrite cannot see it. ### Why are the changes needed? The existing fast path keys off the declared precision `p'` after a widening Cast, not the effective precision `p` of the inner expression. User patterns like `SUM(CAST(small_dec AS larger_dec))` — common from BI tools generating SQL with normalized types — fall off the fast path even though `p + 10 <= 18`. TPC-DS q18 exhibits this pattern. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - `DecimalAggregatesSuite`, including invariant-guard tests that lock the SUM/AVG safety boundaries. - ScalaCheck property-based tests in `DataFrameAggregateSuite` for numerical equivalence of the peeled and un-peeled paths. - `TPCDSV1_4PlanStabilitySuite` and `TPCDSV1_4PlanStabilityWithStatsSuite` regenerated for q18. - `DecimalAggregatesBenchmark` added; results committed for JDK 17/21/25 under `sql/core/benchmarks/`. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.7 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
