This is an automated email from the ASF dual-hosted git repository.
yaooqinn pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 606b8fad2f13 [SPARK-57023][SQL] DecimalAggregates: peel widened Cast
on Min/Max
606b8fad2f13 is described below
commit 606b8fad2f135fb5a5a9d3021338ff4875cc2381
Author: Kent Yao <[email protected]>
AuthorDate: Mon May 25 14:37:47 2026 +0800
[SPARK-57023][SQL] DecimalAggregates: peel widened Cast on Min/Max
### What changes were proposed in this pull request?
Extend `DecimalAggregates` to peel a scale-preserving widening `Cast`
around `Min`/`Max` arguments, mirroring the existing SUM/AVG widened-Cast arms
landed via SPARK-56983.
When the input is `Min(Cast(inner: dec(p, s), dec(p', s)))` (or `Max(...)`)
with `p' >= p` and no `CheckOverflow` wrapper, the rule rewrites to
`Cast(Min(inner), dec(p', s))` (and likewise for `Max`). MIN/MAX are pointwise
on a totally-ordered domain, so under same-scale widening the rewrite is
value-equivalent and NULL-preserving (see design §D6 self-Q&A).
Both arms reuse the same-package `WidenedDecimalChild` extractor introduced
for SUM/AVG, which refuses to unwrap `CheckOverflow` and enforces the same `s
== s'`, `p' >= p` guard. `TreePatterns.MIN` / `TreePatterns.MAX` are added and
registered on `Min` / `Max`; `DecimalAggregates`'s `containsAnyPattern` pruning
is widened to `(SUM, AVERAGE, MIN, MAX)`. No new rule, no new file — three arms
cohabit one object.
### Why are the changes needed?
The SUM/AVG arms recover the long-backed fast path when BI tools generate
`SUM(CAST(small_dec AS larger_dec))`. The MIN/MAX case is the natural sibling:
same widening pattern, but currently no peel arm exists, so each aggregated row
pays a per-row `Decimal.changePrecision` call inside `Cast`
(`Cast.scala:1074-1082`) even though the outer Cast could be applied **once**
to the partition extremum instead.
MIN/MAX are pointwise on a totally-ordered domain and immune to both the
SUM overflow boundary (SPARK-56983) and the AVG SPARK-37024 Double-regime gate,
so the equivalence is unconditional within the `WidenedDecimalChild` guard
domain (R1b Lemma 1 — design §D6 self-Q&A).
The saving is per-row `changePrecision` elimination on the aggregate input
— ceiling **−0.39% ~ −6.02% JDK-progressive** (JDK 25 strongest) per the GHA 3
JDK × 16 case matrix below — and the patch is essentially free: three lines of
extractor reuse, no new rule, no new file.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
- `DecimalAggregatesSuite` extended with 6 SPARK-57023 oracle cases — 2
peel positives (`Min`/`Max` over widening Cast) and 4 negatives (scale-changing
Cast, narrowing Cast, `CheckOverflow`-wrapped, `MinBy`/`MaxBy`/`MaxMinByK`).
Suite: 43/43.
- Full `catalyst/test`: 9341 tests / 353 suites, 0 failed, 5 ignored, 670 s.
- `TPCDSV1_4PlanStabilitySuite` + `TPCDSV1_4PlanStabilityWithStatsSuite`:
no golden change on `apache/master`. Existence-test on `efb7beab826` recorded
**0 trigger across the 130 TPC-DS v1.4 + v2.7.0 queries** (see investigation
`0002-baseline-run-results.md`, positive-control verified).
- `DecimalAggregatesBenchmark`: extended with new MIN section (`C1-C4`) and
MAX section (`D1-D4`) mirroring the existing SUM/AVG sections. Full 8-case × 3
JDK matrix run on GitHub Actions standard `ubuntu-22.04` runners (AMD EPYC 7763
64-Core, 10M rows × 5 iters);
`DecimalAggregatesBenchmark{-,-jdk21-,-jdk25-}results.txt` regenerated and
committed. Headline cells (Best ms peel off / peel on / Δ%):
| case | p,s,p' | JDK 17 | JDK 21 |
JDK 25 |
|------|----------|-------------------------|-------------------------|-------------------------|
| C1 MIN | 10,2,18 | 3974 / 3920 (−1.36%) | 3301 / 3202 (−3.00%) |
1353 / 1291 (−4.58%) |
| C2 MIN | 10,2,28 | 3959 / 3880 (−2.00%) | 3294 / 3228 (−2.00%) |
1351 / 1287 (−4.74%) |
| C3 MIN | 18,2,28 | 3623 / 3609 (−0.39%) | 3557 / 3450 (−3.01%) |
1368 / 1292 (−5.56%) |
| C4 MIN | 10,2,38 | 3856 / 3835 (−0.54%) | 3240 / 3151 (−2.75%) |
1348 / 1283 (−4.82%) |
| D1 MAX | 10,2,18 | 3854 / 3785 (−1.79%) | 3241 / 3173 (−2.10%) |
1346 / 1279 (−4.98%) |
| D2 MAX | 10,2,28 | 3908 / 3808 (−2.56%) | 3267 / 3152 (−3.52%) |
1352 / 1287 (−4.81%) |
| D3 MAX | 18,2,28 | 3664 / 3620 (−1.20%) | 3507 / 3462 (−1.28%) |
1378 / 1295 (−6.02%) |
| D4 MAX | 10,2,38 | 3904 / 3792 (−2.87%) | 3233 / 3164 (−2.13%) |
1342 / 1274 (−5.07%) |
Pattern: ceiling **−0.39% ~ −6.02% JDK-progressive** (JDK 25 strongest,
JDK 17 weakest) across the 24 readings; no negative-delta (regression) cell.
The saving is per-row `Decimal.changePrecision` elimination on the aggregate
input — design `0002-design-minmax-fastpath.md` §D5.1 declares this a legal
micro-only ship contract. A pre-GHA local sbt sanity (Section C MIN, JDK 17,
10M × 5) recorded −1.5% to −2.0%; the GHA EPYC numbers above supersede it.
### Note on Section A/B `results.txt` refresh
The sibling SPARK-56627 .scala edited Section B2/B4 cases to `p'=11` but
did not regenerate `DecimalAggregatesBenchmark-results.txt`, so the committed
text was stale for the new shape. This PR regenerates the whole file under the
canonical EPYC 7763 GHA runner (replacing the prior EPYC 9V74 numbers in
Section A/B), and the same regeneration produces the JDK 21 / JDK 25 companion
files. Section A/B refresh is mechanical housekeeping triggered by the
regeneration, not part of the MIN/MA [...]
### Why no TPC-DS results?
The existence-test on `apache/master` `efb7beab826` walked optimized plans
across the full 130-query TPC-DS v1.4 + v2.7.0 corpus and recorded **0
queries** triggering the `Min(Cast(...))` / `Max(Cast(...))` widening pattern
(see investigation `0002-baseline-run-results.md`, positive-control verified).
The MIN/MAX peel is justified by semantic equivalence + micro-level pattern
coverage, not by TPC-DS revenue. Design `0002-design-minmax-fastpath.md` §D5.1
declares this as a legal micro- [...]
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7
Closes #56078 from yaooqinn/users/kentyao/spark-decimal-minmax-cast-peel.
Authored-by: Kent Yao <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
---
.../sql/catalyst/expressions/aggregate/Max.scala | 3 +
.../sql/catalyst/expressions/aggregate/Min.scala | 3 +
.../spark/sql/catalyst/optimizer/Optimizer.scala | 21 ++-
.../spark/sql/catalyst/trees/TreePatterns.scala | 2 +
.../optimizer/DecimalAggregatesSuite.scala | 78 ++++++++++-
.../DecimalAggregatesBenchmark-jdk21-results.txt | 142 ++++++++++++++++-----
.../DecimalAggregatesBenchmark-jdk25-results.txt | 142 ++++++++++++++++-----
.../DecimalAggregatesBenchmark-results.txt | 142 ++++++++++++++++-----
.../benchmark/DecimalAggregatesBenchmark.scala | 65 ++++++++++
9 files changed, 493 insertions(+), 105 deletions(-)
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Max.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Max.scala
index 902f53309de4..f49297eba88b 100644
---
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Max.scala
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Max.scala
@@ -20,6 +20,7 @@ package org.apache.spark.sql.catalyst.expressions.aggregate
import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
import org.apache.spark.sql.catalyst.dsl.expressions._
import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.trees.TreePattern.{MAX, TreePattern}
import org.apache.spark.sql.catalyst.trees.UnaryLike
import org.apache.spark.sql.catalyst.util.TypeUtils
import org.apache.spark.sql.types._
@@ -43,6 +44,8 @@ case class Max(child: Expression) extends
DeclarativeAggregate with UnaryLike[Ex
override def checkInputDataTypes(): TypeCheckResult =
TypeUtils.checkForOrderingExpr(child.dataType, prettyName)
+ final override val nodePatterns: Seq[TreePattern] = Seq(MAX)
+
private lazy val max = AttributeReference("max", child.dataType)()
override lazy val aggBufferAttributes: Seq[AttributeReference] = max :: Nil
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Min.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Min.scala
index 7a9588808dbd..eaef7b6bec11 100644
---
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Min.scala
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Min.scala
@@ -20,6 +20,7 @@ package org.apache.spark.sql.catalyst.expressions.aggregate
import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
import org.apache.spark.sql.catalyst.dsl.expressions._
import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.trees.TreePattern.{MIN, TreePattern}
import org.apache.spark.sql.catalyst.trees.UnaryLike
import org.apache.spark.sql.catalyst.util.TypeUtils
import org.apache.spark.sql.types._
@@ -43,6 +44,8 @@ case class Min(child: Expression) extends
DeclarativeAggregate with UnaryLike[Ex
override def checkInputDataTypes(): TypeCheckResult =
TypeUtils.checkForOrderingExpr(child.dataType, prettyName)
+ final override val nodePatterns: Seq[TreePattern] = Seq(MIN)
+
private lazy val min = AttributeReference("min", child.dataType)()
override lazy val aggBufferAttributes: Seq[AttributeReference] = min :: Nil
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index 95d774c6e991..1c991729c7d4 100644
---
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -2576,9 +2576,9 @@ object DecimalAggregates extends Rule[LogicalPlan] {
}
def apply(plan: LogicalPlan): LogicalPlan = plan.transformWithPruning(
- _.containsAnyPattern(SUM, AVERAGE), ruleId) {
+ _.containsAnyPattern(SUM, AVERAGE, MIN, MAX), ruleId) {
case q: LogicalPlan => q.transformExpressionsDownWithPruning(
- _.containsAnyPattern(SUM, AVERAGE), ruleId) {
+ _.containsAnyPattern(SUM, AVERAGE, MIN, MAX), ruleId) {
case we @ WindowExpression(ae @ AggregateExpression(af, _, _, _, _), _)
=> af match {
// Window arm: `ExtractWindowExpressions` hoists composite children
// (here the widening Cast) into a child Project, so widened-Cast
@@ -2636,6 +2636,23 @@ object DecimalAggregates extends Rule[LogicalPlan] {
Divide(newAggExpr, Literal.create(math.pow(10.0, scale),
DoubleType)),
DecimalType(prec + 4, scale + 4),
Option(conf.sessionLocalTimeZone))
+ // Hoist a scale-preserving widening Cast out of Min so the Min runs on
+ // the narrower inner Decimal. Min picks an existing row's value, so a
+ // widening Cast (same scale, larger precision) is bit-identical to
+ // applying the Cast after the aggregate. The outer Cast preserves the
+ // pre-rewrite result dataType (Min.dataType == child.dataType).
+ case m @ Min(WidenedDecimalChild(inner, _, pPrime, sPrime)) =>
+ Cast(
+ ae.copy(aggregateFunction = m.copy(child = inner)),
+ DecimalType(pPrime, sPrime), Option(conf.sessionLocalTimeZone))
+
+ // Hoist a scale-preserving widening Cast out of Max (same reasoning
+ // as the Min arm above).
+ case m @ Max(WidenedDecimalChild(inner, _, pPrime, sPrime)) =>
+ Cast(
+ ae.copy(aggregateFunction = m.copy(child = inner)),
+ DecimalType(pPrime, sPrime), Option(conf.sessionLocalTimeZone))
+
case _ => ae
}
}
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreePatterns.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreePatterns.scala
index cca9bcd673d6..4e06fcb36767 100644
---
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreePatterns.scala
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreePatterns.scala
@@ -71,7 +71,9 @@ object TreePattern extends Enumeration {
val MAP_FROM_ARRAYS: Value = Value
val MAP_FROM_ENTRIES: Value = Value
val MAP_OBJECTS: Value = Value
+ val MAX: Value = Value
val MEASURE: Value = Value
+ val MIN: Value = Value
val MULTI_ALIAS: Value = Value
val NEW_INSTANCE: Value = Value
val NOT: Value = Value
diff --git
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/DecimalAggregatesSuite.scala
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/DecimalAggregatesSuite.scala
index b65ce3a0f017..0850929d3d24 100644
---
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/DecimalAggregatesSuite.scala
+++
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/DecimalAggregatesSuite.scala
@@ -23,7 +23,7 @@ import
org.scalatestplus.scalacheck.ScalaCheckDrivenPropertyChecks
import org.apache.spark.sql.catalyst.dsl.expressions._
import org.apache.spark.sql.catalyst.dsl.plans._
import org.apache.spark.sql.catalyst.expressions._
-import org.apache.spark.sql.catalyst.expressions.aggregate.{Average, Sum}
+import org.apache.spark.sql.catalyst.expressions.aggregate.{Average, MaxBy,
MaxMinByK, MinBy, Sum}
import org.apache.spark.sql.catalyst.plans.PlanTest
import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
import org.apache.spark.sql.catalyst.rules.RuleExecutor
@@ -613,4 +613,80 @@ class DecimalAggregatesSuite extends PlanTest with
ScalaCheckDrivenPropertyCheck
s"evalMode should be preserved as TRY after rewrite, got " +
avgs.map(_.evalMode).mkString(","))
}
+ test("SPARK-57023: MIN(CAST(dec(7,2) AS dec(12,2))) peels via widened-Cast
fast path") {
+ val widened = $"d7_2".cast(DecimalType(12, 2))
+ val originalQuery = widenRel.select(min(widened).as("min_widened"))
+ val optimized = Optimize.execute(originalQuery.analyze)
+ val correctAnswer = widenRel
+ .select(
+ Cast(
+ min($"d7_2"),
+ DecimalType(12, 2),
+ Option(conf.sessionLocalTimeZone))
+ .as("min_widened"))
+ .analyze
+
+ comparePlans(optimized, correctAnswer)
+ }
+
+ test("SPARK-57023: MAX(CAST(dec(7,2) AS dec(12,2))) peels via widened-Cast
fast path") {
+ val widened = $"d7_2".cast(DecimalType(12, 2))
+ val originalQuery = widenRel.select(max(widened).as("max_widened"))
+ val optimized = Optimize.execute(originalQuery.analyze)
+ val correctAnswer = widenRel
+ .select(
+ Cast(
+ max($"d7_2"),
+ DecimalType(12, 2),
+ Option(conf.sessionLocalTimeZone))
+ .as("max_widened"))
+ .analyze
+
+ comparePlans(optimized, correctAnswer)
+ }
+
+ test("SPARK-57023: MIN(CAST(dec(7,2) AS dec(12,4))) does NOT peel (scale
change)") {
+ val rescaled = $"d7_2".cast(DecimalType(12, 4))
+ val originalQuery = widenRel.select(min(rescaled).as("min_rescaled"))
+ val optimized = Optimize.execute(originalQuery.analyze)
+ val correctAnswer = originalQuery.analyze
+
+ comparePlans(optimized, correctAnswer)
+ }
+
+ test("SPARK-57023: MIN(CAST(dec(17,2) AS dec(10,2))) does NOT peel
(narrowing)") {
+ val narrowed = $"d17_2".cast(DecimalType(10, 2))
+ val originalQuery = widenRel.select(min(narrowed).as("min_narrowed"))
+ val optimized = Optimize.execute(originalQuery.analyze)
+ val correctAnswer = originalQuery.analyze
+
+ comparePlans(optimized, correctAnswer)
+ }
+
+ test("SPARK-57023: MIN/MAX(CheckOverflow) does NOT peel (CheckOverflow
guard)") {
+ val co = CheckOverflow($"d7_2", DecimalType(7, 2), nullOnOverflow = true)
+ val widened = Cast(co, DecimalType(12, 2))
+ val originalQuery = widenRel.select(min(widened).as("min_co"),
max(widened).as("max_co"))
+ val optimized = Optimize.execute(originalQuery.analyze)
+ val correctAnswer = originalQuery.analyze
+
+ comparePlans(optimized, correctAnswer)
+ }
+
+ test("SPARK-57023: MinBy/MaxBy/MaxMinByK with widened-Cast value do NOT peel
" +
+ "(rule pattern matches only Min/Max)") {
+ val widened = $"d7_2".cast(DecimalType(12, 2))
+ val ordering = $"i"
+ val minByExpr = MinBy(widened, ordering).toAggregateExpression()
+ val maxByExpr = MaxBy(widened, ordering).toAggregateExpression()
+ val maxMinByKExpr = MaxMinByK(widened, ordering,
Literal(3)).toAggregateExpression()
+ val originalQuery = widenRel.select(
+ minByExpr.as("min_by_w"),
+ maxByExpr.as("max_by_w"),
+ maxMinByKExpr.as("mmbk_w"))
+ val optimized = Optimize.execute(originalQuery.analyze)
+ val correctAnswer = originalQuery.analyze
+
+ comparePlans(optimized, correctAnswer)
+ }
}
diff --git a/sql/core/benchmarks/DecimalAggregatesBenchmark-jdk21-results.txt
b/sql/core/benchmarks/DecimalAggregatesBenchmark-jdk21-results.txt
index 1186901b3575..4448850b83d3 100644
--- a/sql/core/benchmarks/DecimalAggregatesBenchmark-jdk21-results.txt
+++ b/sql/core/benchmarks/DecimalAggregatesBenchmark-jdk21-results.txt
@@ -3,36 +3,36 @@ DecimalAggregates SUM widened-cast peel (Aggregate)
================================================================================================
OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
A1 p=7 s=2 p'=8: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 2178 2236
56 4.6 217.8 1.0X
-widened cast, peel off 2369 2381
9 4.2 236.9 0.9X
-widened cast, peel on 2105 2118
12 4.8 210.5 1.0X
+native (no cast, rule on) 2111 2193
59 4.7 211.1 1.0X
+widened cast, peel off 2364 2371
7 4.2 236.4 0.9X
+widened cast, peel on 2074 2091
20 4.8 207.4 1.0X
OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
A2 p=7 s=2 p'=17: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 2103 2115
17 4.8 210.3 1.0X
-widened cast, peel off 2366 2377
7 4.2 236.6 0.9X
-widened cast, peel on 2100 2109
11 4.8 210.0 1.0X
+native (no cast, rule on) 2088 2100
14 4.8 208.8 1.0X
+widened cast, peel off 2314 2340
31 4.3 231.4 0.9X
+widened cast, peel on 2084 2093
15 4.8 208.4 1.0X
OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
A3 p=5 s=0 p'=6: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 2117 2138
29 4.7 211.7 1.0X
-widened cast, peel off 2403 2416
13 4.2 240.3 0.9X
-widened cast, peel on 2157 2164
7 4.6 215.7 1.0X
+native (no cast, rule on) 2109 2118
9 4.7 210.9 1.0X
+widened cast, peel off 2394 2405
22 4.2 239.4 0.9X
+widened cast, peel on 2125 2146
13 4.7 212.5 1.0X
OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
A4 p=5 s=0 p'=15: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 2151 2157
7 4.6 215.1 1.0X
-widened cast, peel off 2420 2427
10 4.1 242.0 0.9X
-widened cast, peel on 2152 2159
9 4.6 215.2 1.0X
+native (no cast, rule on) 2109 2113
3 4.7 210.9 1.0X
+widened cast, peel off 2409 2423
21 4.2 240.9 0.9X
+widened cast, peel on 2116 2125
11 4.7 211.6 1.0X
================================================================================================
@@ -40,35 +40,109 @@ DecimalAggregates AVG widened-cast peel (Aggregate)
================================================================================================
OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
B1 p=7 s=2 p'=8: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 2130 2136
5 4.7 213.0 1.0X
-widened cast, peel off 2358 2367
15 4.2 235.8 0.9X
-widened cast, peel on 2140 2150
7 4.7 214.0 1.0X
+native (no cast, rule on) 2087 2098
7 4.8 208.7 1.0X
+widened cast, peel off 2292 2300
10 4.4 229.2 0.9X
+widened cast, peel on 2125 2127
2 4.7 212.5 1.0X
OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
-B2 p=7 s=2 p'=12: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+AMD EPYC 7763 64-Core Processor
+B2 p=7 s=2 p'=11: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 2147 2151
3 4.7 214.7 1.0X
-widened cast, peel off 2359 2361
2 4.2 235.9 0.9X
-widened cast, peel on 2126 2161
20 4.7 212.6 1.0X
+native (no cast, rule on) 2145 2151
5 4.7 214.5 1.0X
+widened cast, peel off 2312 2317
4 4.3 231.2 0.9X
+widened cast, peel on 2090 2096
10 4.8 209.0 1.0X
OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
B3 p=5 s=0 p'=6: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 2173 2185
9 4.6 217.3 1.0X
-widened cast, peel off 2405 2413
7 4.2 240.5 0.9X
-widened cast, peel on 2167 2177
12 4.6 216.7 1.0X
+native (no cast, rule on) 2144 2147
3 4.7 214.4 1.0X
+widened cast, peel off 2395 2420
14 4.2 239.5 0.9X
+widened cast, peel on 2153 2161
13 4.6 215.3 1.0X
OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
-B4 p=5 s=0 p'=15: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+AMD EPYC 7763 64-Core Processor
+B4 p=5 s=0 p'=11: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 2173 2179
7 4.6 217.3 1.0X
-widened cast, peel off 2393 2400
11 4.2 239.3 0.9X
-widened cast, peel on 2172 2178
5 4.6 217.2 1.0X
+native (no cast, rule on) 2145 2149
6 4.7 214.5 1.0X
+widened cast, peel off 2369 2381
7 4.2 236.9 0.9X
+widened cast, peel on 2152 2154
1 4.6 215.2 1.0X
+
+
+================================================================================================
+DecimalAggregates MIN widened-cast peel (Aggregate)
+================================================================================================
+
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C1 p=10 s=2 p'=18: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 3258 3263
4 3.1 325.8 1.0X
+widened cast, peel off 3301 3318
30 3.0 330.1 1.0X
+widened cast, peel on 3202 3274
41 3.1 320.2 1.0X
+
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C2 p=10 s=2 p'=28: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 3261 3265
3 3.1 326.1 1.0X
+widened cast, peel off 3294 3323
16 3.0 329.4 1.0X
+widened cast, peel on 3228 3239
6 3.1 322.8 1.0X
+
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C3 p=18 s=2 p'=28: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 3440 3540
56 2.9 344.0 1.0X
+widened cast, peel off 3557 3582
15 2.8 355.7 1.0X
+widened cast, peel on 3450 3471
34 2.9 345.0 1.0X
+
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C4 p=10 s=2 p'=38: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 3218 3223
3 3.1 321.8 1.0X
+widened cast, peel off 3240 3252
19 3.1 324.0 1.0X
+widened cast, peel on 3151 3192
23 3.2 315.1 1.0X
+
+
+================================================================================================
+DecimalAggregates MAX widened-cast peel (Aggregate)
+================================================================================================
+
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D1 p=10 s=2 p'=18: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 3148 3177
16 3.2 314.8 1.0X
+widened cast, peel off 3241 3249
9 3.1 324.1 1.0X
+widened cast, peel on 3173 3173
0 3.2 317.3 1.0X
+
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D2 p=10 s=2 p'=28: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 3178 3183
4 3.1 317.8 1.0X
+widened cast, peel off 3267 3271
4 3.1 326.7 1.0X
+widened cast, peel on 3152 3168
10 3.2 315.2 1.0X
+
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D3 p=18 s=2 p'=28: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 3458 3531
42 2.9 345.8 1.0X
+widened cast, peel off 3507 3560
30 2.9 350.7 1.0X
+widened cast, peel on 3462 3525
36 2.9 346.2 1.0X
+
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D4 p=10 s=2 p'=38: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 3179 3183
4 3.1 317.9 1.0X
+widened cast, peel off 3233 3251
10 3.1 323.3 1.0X
+widened cast, peel on 3164 3171
11 3.2 316.4 1.0X
diff --git a/sql/core/benchmarks/DecimalAggregatesBenchmark-jdk25-results.txt
b/sql/core/benchmarks/DecimalAggregatesBenchmark-jdk25-results.txt
index 60109cac85ec..4e901134bd02 100644
--- a/sql/core/benchmarks/DecimalAggregatesBenchmark-jdk25-results.txt
+++ b/sql/core/benchmarks/DecimalAggregatesBenchmark-jdk25-results.txt
@@ -3,36 +3,36 @@ DecimalAggregates SUM widened-cast peel (Aggregate)
================================================================================================
OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
A1 p=7 s=2 p'=8: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 1194 1230
57 8.4 119.4 1.0X
-widened cast, peel off 1421 1433
11 7.0 142.1 0.8X
-widened cast, peel on 1181 1188
5 8.5 118.1 1.0X
+native (no cast, rule on) 1200 1222
33 8.3 120.0 1.0X
+widened cast, peel off 1437 1447
8 7.0 143.7 0.8X
+widened cast, peel on 1197 1205
8 8.4 119.7 1.0X
OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
A2 p=7 s=2 p'=17: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 1174 1189
12 8.5 117.4 1.0X
-widened cast, peel off 1401 1414
8 7.1 140.1 0.8X
-widened cast, peel on 1169 1178
8 8.6 116.9 1.0X
+native (no cast, rule on) 1189 1196
8 8.4 118.9 1.0X
+widened cast, peel off 1426 1431
5 7.0 142.6 0.8X
+widened cast, peel on 1189 1195
3 8.4 118.9 1.0X
OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
A3 p=5 s=0 p'=6: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 1245 1254
10 8.0 124.5 1.0X
-widened cast, peel off 1498 1503
5 6.7 149.8 0.8X
-widened cast, peel on 1222 1232
10 8.2 122.2 1.0X
+native (no cast, rule on) 1223 1224
2 8.2 122.3 1.0X
+widened cast, peel off 1497 1501
3 6.7 149.7 0.8X
+widened cast, peel on 1213 1219
4 8.2 121.3 1.0X
OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
A4 p=5 s=0 p'=15: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 1234 1238
3 8.1 123.4 1.0X
-widened cast, peel off 1473 1478
7 6.8 147.3 0.8X
-widened cast, peel on 1242 1255
16 8.1 124.2 1.0X
+native (no cast, rule on) 1214 1219
5 8.2 121.4 1.0X
+widened cast, peel off 1464 1469
3 6.8 146.4 0.8X
+widened cast, peel on 1227 1233
6 8.2 122.7 1.0X
================================================================================================
@@ -40,35 +40,109 @@ DecimalAggregates AVG widened-cast peel (Aggregate)
================================================================================================
OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
B1 p=7 s=2 p'=8: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 1178 1185
9 8.5 117.8 1.0X
-widened cast, peel off 1434 1440
8 7.0 143.4 0.8X
-widened cast, peel on 1232 1235
3 8.1 123.2 1.0X
+native (no cast, rule on) 1195 1200
5 8.4 119.5 1.0X
+widened cast, peel off 1392 1395
3 7.2 139.2 0.9X
+widened cast, peel on 1189 1195
5 8.4 118.9 1.0X
OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
-B2 p=7 s=2 p'=12: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+AMD EPYC 7763 64-Core Processor
+B2 p=7 s=2 p'=11: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 1222 1229
7 8.2 122.2 1.0X
-widened cast, peel off 1434 1444
10 7.0 143.4 0.9X
-widened cast, peel on 1216 1223
6 8.2 121.6 1.0X
+native (no cast, rule on) 1192 1195
3 8.4 119.2 1.0X
+widened cast, peel off 1401 1406
4 7.1 140.1 0.9X
+widened cast, peel on 1191 1195
5 8.4 119.1 1.0X
OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
B3 p=5 s=0 p'=6: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 1267 1274
6 7.9 126.7 1.0X
-widened cast, peel off 1505 1509
4 6.6 150.5 0.8X
-widened cast, peel on 1272 1277
7 7.9 127.2 1.0X
+native (no cast, rule on) 1213 1218
7 8.2 121.3 1.0X
+widened cast, peel off 1423 1443
40 7.0 142.3 0.9X
+widened cast, peel on 1213 1214
2 8.2 121.3 1.0X
OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
-B4 p=5 s=0 p'=15: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+AMD EPYC 7763 64-Core Processor
+B4 p=5 s=0 p'=11: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 1269 1275
5 7.9 126.9 1.0X
-widened cast, peel off 1494 1501
9 6.7 149.4 0.8X
-widened cast, peel on 1268 1274
6 7.9 126.8 1.0X
+native (no cast, rule on) 1214 1218
5 8.2 121.4 1.0X
+widened cast, peel off 1422 1422
1 7.0 142.2 0.9X
+widened cast, peel on 1209 1214
3 8.3 120.9 1.0X
+
+
+================================================================================================
+DecimalAggregates MIN widened-cast peel (Aggregate)
+================================================================================================
+
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C1 p=10 s=2 p'=18: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 1292 1298
5 7.7 129.2 1.0X
+widened cast, peel off 1353 1356
3 7.4 135.3 1.0X
+widened cast, peel on 1291 1292
3 7.7 129.1 1.0X
+
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C2 p=10 s=2 p'=28: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 1297 1302
5 7.7 129.7 1.0X
+widened cast, peel off 1351 1354
3 7.4 135.1 1.0X
+widened cast, peel on 1287 1290
3 7.8 128.7 1.0X
+
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C3 p=18 s=2 p'=28: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 1290 1294
4 7.8 129.0 1.0X
+widened cast, peel off 1368 1372
5 7.3 136.8 0.9X
+widened cast, peel on 1292 1294
2 7.7 129.2 1.0X
+
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C4 p=10 s=2 p'=38: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 1281 1285
4 7.8 128.1 1.0X
+widened cast, peel off 1348 1351
4 7.4 134.8 1.0X
+widened cast, peel on 1283 1290
7 7.8 128.3 1.0X
+
+
+================================================================================================
+DecimalAggregates MAX widened-cast peel (Aggregate)
+================================================================================================
+
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D1 p=10 s=2 p'=18: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 1275 1281
5 7.8 127.5 1.0X
+widened cast, peel off 1346 1349
2 7.4 134.6 0.9X
+widened cast, peel on 1279 1280
2 7.8 127.9 1.0X
+
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D2 p=10 s=2 p'=28: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 1273 1275
2 7.9 127.3 1.0X
+widened cast, peel off 1352 1356
2 7.4 135.2 0.9X
+widened cast, peel on 1287 1291
4 7.8 128.7 1.0X
+
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D3 p=18 s=2 p'=28: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 1284 1289
7 7.8 128.4 1.0X
+widened cast, peel off 1378 1385
7 7.3 137.8 0.9X
+widened cast, peel on 1295 1300
4 7.7 129.5 1.0X
+
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D4 p=10 s=2 p'=38: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 1273 1276
3 7.9 127.3 1.0X
+widened cast, peel off 1342 1347
5 7.4 134.2 0.9X
+widened cast, peel on 1274 1278
3 7.9 127.4 1.0X
diff --git a/sql/core/benchmarks/DecimalAggregatesBenchmark-results.txt
b/sql/core/benchmarks/DecimalAggregatesBenchmark-results.txt
index d9c2c9662826..8e28043f8aa3 100644
--- a/sql/core/benchmarks/DecimalAggregatesBenchmark-results.txt
+++ b/sql/core/benchmarks/DecimalAggregatesBenchmark-results.txt
@@ -3,36 +3,36 @@ DecimalAggregates SUM widened-cast peel (Aggregate)
================================================================================================
OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
-AMD EPYC 9V74 80-Core Processor
+AMD EPYC 7763 64-Core Processor
A1 p=7 s=2 p'=8: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 3068 3095
35 3.3 306.8 1.0X
-widened cast, peel off 3396 3410
19 2.9 339.6 0.9X
-widened cast, peel on 3107 3115
10 3.2 310.7 1.0X
+native (no cast, rule on) 2814 2840
22 3.6 281.4 1.0X
+widened cast, peel off 3042 3052
7 3.3 304.2 0.9X
+widened cast, peel on 2740 2764
26 3.6 274.0 1.0X
OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
-AMD EPYC 9V74 80-Core Processor
+AMD EPYC 7763 64-Core Processor
A2 p=7 s=2 p'=17: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 3104 3120
23 3.2 310.4 1.0X
-widened cast, peel off 3386 3407
27 3.0 338.6 0.9X
-widened cast, peel on 3094 3106
17 3.2 309.4 1.0X
+native (no cast, rule on) 2721 2728
4 3.7 272.1 1.0X
+widened cast, peel off 3033 3061
18 3.3 303.3 0.9X
+widened cast, peel on 2792 2799
7 3.6 279.2 1.0X
OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
-AMD EPYC 9V74 80-Core Processor
+AMD EPYC 7763 64-Core Processor
A3 p=5 s=0 p'=6: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 3039 3053
21 3.3 303.9 1.0X
-widened cast, peel off 3336 3340
5 3.0 333.6 0.9X
-widened cast, peel on 3034 3048
14 3.3 303.4 1.0X
+native (no cast, rule on) 2843 2864
34 3.5 284.3 1.0X
+widened cast, peel off 3103 3119
19 3.2 310.3 0.9X
+widened cast, peel on 2852 2859
7 3.5 285.2 1.0X
OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
-AMD EPYC 9V74 80-Core Processor
+AMD EPYC 7763 64-Core Processor
A4 p=5 s=0 p'=15: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 3037 3049
16 3.3 303.7 1.0X
-widened cast, peel off 3324 3340
16 3.0 332.4 0.9X
-widened cast, peel on 3027 3031
4 3.3 302.7 1.0X
+native (no cast, rule on) 2852 2863
9 3.5 285.2 1.0X
+widened cast, peel off 3138 3143
8 3.2 313.8 0.9X
+widened cast, peel on 2814 2823
8 3.6 281.4 1.0X
================================================================================================
@@ -40,35 +40,109 @@ DecimalAggregates AVG widened-cast peel (Aggregate)
================================================================================================
OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
-AMD EPYC 9V74 80-Core Processor
+AMD EPYC 7763 64-Core Processor
B1 p=7 s=2 p'=8: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 3038 3041
2 3.3 303.8 1.0X
-widened cast, peel off 3274 3283
18 3.1 327.4 0.9X
-widened cast, peel on 3056 3074
15 3.3 305.6 1.0X
+native (no cast, rule on) 2777 2787
10 3.6 277.7 1.0X
+widened cast, peel off 3019 3033
18 3.3 301.9 0.9X
+widened cast, peel on 2781 2799
33 3.6 278.1 1.0X
OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
-AMD EPYC 9V74 80-Core Processor
-B2 p=7 s=2 p'=12: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+AMD EPYC 7763 64-Core Processor
+B2 p=7 s=2 p'=11: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 3029 3033
3 3.3 302.9 1.0X
-widened cast, peel off 3288 3291
2 3.0 328.8 0.9X
-widened cast, peel on 3031 3036
6 3.3 303.1 1.0X
+native (no cast, rule on) 2808 2818
9 3.6 280.8 1.0X
+widened cast, peel off 3067 3121
34 3.3 306.7 0.9X
+widened cast, peel on 2776 2785
16 3.6 277.6 1.0X
OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
-AMD EPYC 9V74 80-Core Processor
+AMD EPYC 7763 64-Core Processor
B3 p=5 s=0 p'=6: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 3022 3030
5 3.3 302.2 1.0X
-widened cast, peel off 3275 3307
28 3.1 327.5 0.9X
-widened cast, peel on 3025 3028
3 3.3 302.5 1.0X
+native (no cast, rule on) 2837 2857
16 3.5 283.7 1.0X
+widened cast, peel off 3087 3100
21 3.2 308.7 0.9X
+widened cast, peel on 2834 2846
18 3.5 283.4 1.0X
OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
-AMD EPYC 9V74 80-Core Processor
-B4 p=5 s=0 p'=15: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+AMD EPYC 7763 64-Core Processor
+B4 p=5 s=0 p'=11: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on) 3024 3039
21 3.3 302.4 1.0X
-widened cast, peel off 3279 3298
17 3.1 327.9 0.9X
-widened cast, peel on 3016 3023
6 3.3 301.6 1.0X
+native (no cast, rule on) 2850 2856
7 3.5 285.0 1.0X
+widened cast, peel off 3107 3113
6 3.2 310.7 0.9X
+widened cast, peel on 2831 2841
14 3.5 283.1 1.0X
+
+
+================================================================================================
+DecimalAggregates MIN widened-cast peel (Aggregate)
+================================================================================================
+
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C1 p=10 s=2 p'=18: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 3880 3888
13 2.6 388.0 1.0X
+widened cast, peel off 3974 4008
20 2.5 397.4 1.0X
+widened cast, peel on 3920 3934
8 2.6 392.0 1.0X
+
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C2 p=10 s=2 p'=28: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 3877 3887
11 2.6 387.7 1.0X
+widened cast, peel off 3959 3978
20 2.5 395.9 1.0X
+widened cast, peel on 3880 3951
40 2.6 388.0 1.0X
+
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C3 p=18 s=2 p'=28: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 3632 3651
14 2.8 363.2 1.0X
+widened cast, peel off 3623 3637
15 2.8 362.3 1.0X
+widened cast, peel on 3609 3621
14 2.8 360.9 1.0X
+
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C4 p=10 s=2 p'=38: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 3849 3855
10 2.6 384.9 1.0X
+widened cast, peel off 3856 3873
10 2.6 385.6 1.0X
+widened cast, peel on 3835 3846
8 2.6 383.5 1.0X
+
+
+================================================================================================
+DecimalAggregates MAX widened-cast peel (Aggregate)
+================================================================================================
+
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D1 p=10 s=2 p'=18: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 3810 3832
13 2.6 381.0 1.0X
+widened cast, peel off 3854 3875
34 2.6 385.4 1.0X
+widened cast, peel on 3785 3821
36 2.6 378.5 1.0X
+
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D2 p=10 s=2 p'=28: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 3830 3833
3 2.6 383.0 1.0X
+widened cast, peel off 3908 3910
3 2.6 390.8 1.0X
+widened cast, peel on 3808 3845
21 2.6 380.8 1.0X
+
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D3 p=18 s=2 p'=28: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 3611 3628
19 2.8 361.1 1.0X
+widened cast, peel off 3664 3685
18 2.7 366.4 1.0X
+widened cast, peel on 3620 3628
9 2.8 362.0 1.0X
+
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D4 p=10 s=2 p'=38: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on) 3831 3844
12 2.6 383.1 1.0X
+widened cast, peel off 3904 3919
10 2.6 390.4 1.0X
+widened cast, peel on 3792 3816
24 2.6 379.2 1.0X
diff --git
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DecimalAggregatesBenchmark.scala
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DecimalAggregatesBenchmark.scala
index e006787dbfa1..af76955e8f05 100644
---
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DecimalAggregatesBenchmark.scala
+++
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DecimalAggregatesBenchmark.scala
@@ -51,6 +51,11 @@ import org.apache.spark.sql.types.Decimal
* B -- Aggregate AVG widened-cast sweep (`pPrime + 4 <= MAX_DOUBLE_DIGITS`
* so the rule fires only inside the existing AVG Double-regime
* envelope; wider casts stay on the Decimal-exact path).
+ * C -- Aggregate MIN widened-cast sweep (no regime guard: the MIN arm
+ * peels for any `pPrime >= p` same-scale widening; expected
+ * BigInteger-domain outer (`pPrime > MAX_LONG_DIGITS = 18`) is the
+ * main saving path per design).
+ * D -- Aggregate MAX widened-cast sweep (mirrors C).
*
* NOTE on Window arm: the optimizer does not extend widened-Cast peel to
* the Window arm (see DecimalAggregates rule comment) because the analyzer
@@ -121,6 +126,40 @@ object DecimalAggregatesBenchmark extends
SqlBasedBenchmark {
("B4 p=5 s=0 p'=11", 5, 0, 11) // pPrime upper bound, zero scale
)
+ /**
+ * Aggregate MIN cases: (label, p, s, widened p').
+ *
+ * MIN/MAX widened-cast peel has NO regime guard -- it peels for any
+ * `pPrime >= p` same-scale widening (`Optimizer.scala WidenedDecimalChild`).
+ * The main saving path is `pPrime > MAX_LONG_DIGITS = 18`, where the
+ * unrewritten plan would create a BigInteger-domain outer Decimal for
+ * every row, while the rewritten plan compares the inner Long-domain
+ * values and casts only the single aggregate result.
+ *
+ * Coverage:
+ * - C1: inner Long, outer Long -- weakest saving (sibling-compatible
+ * baseline; the row-cast still goes through `changePrecision`
+ * but stays in Long).
+ * - C2: inner Long, outer BigInteger -- the main saving regime.
+ * - C3: inner at Long boundary (p=18), outer BigInteger -- isolates
+ * the outer-domain cost.
+ * - C4: inner Long, outer at MAX_PRECISION=38 -- deepest BigInteger.
+ */
+ private val MinAggCases: Seq[(String, Int, Int, Int)] = Seq(
+ ("C1 p=10 s=2 p'=18", 10, 2, 18), // inner Long, outer Long (boundary)
+ ("C2 p=10 s=2 p'=28", 10, 2, 28), // inner Long, outer BigInteger (main
saving)
+ ("C3 p=18 s=2 p'=28", 18, 2, 28), // inner Long max, outer BigInteger
+ ("C4 p=10 s=2 p'=38", 10, 2, 38) // inner Long, outer MAX_PRECISION
+ )
+
+ /** Aggregate MAX cases: mirror C above. */
+ private val MaxAggCases: Seq[(String, Int, Int, Int)] = Seq(
+ ("D1 p=10 s=2 p'=18", 10, 2, 18),
+ ("D2 p=10 s=2 p'=28", 10, 2, 28),
+ ("D3 p=18 s=2 p'=28", 18, 2, 28),
+ ("D4 p=10 s=2 p'=38", 10, 2, 38)
+ )
+
/** Clamp generator to `10^(p-s) - 1` so rand() * bound fits `DECIMAL(p,
s)`. */
private def unscaledBound(p: Int, s: Int): Long = {
require(p - s >= 0, s"p=$p s=$s p-s must be non-negative")
@@ -207,5 +246,31 @@ object DecimalAggregatesBenchmark extends
SqlBasedBenchmark {
iters, apl)
}
}
+
+ // Section C -- Aggregate MIN widened-cast.
+ runBenchmark("DecimalAggregates MIN widened-cast peel (Aggregate)") {
+ MinAggCases.foreach { case (label, p, s, pPrime) =>
+ require(pPrime >= p, s"$label: p'=$pPrime must be >= p=$p (widening)")
+ require(pPrime <= 38, s"$label: p'=$pPrime exceeds MAX_PRECISION=38")
+ setupAggTable(spark, aN, p, s)
+ runThreeWay(label, aN,
+ nativeSql = "select min(x) from t",
+ widenedSql = s"select min(cast(x as decimal($pPrime, $s))) from t",
+ iters, apl)
+ }
+ }
+
+ // Section D -- Aggregate MAX widened-cast.
+ runBenchmark("DecimalAggregates MAX widened-cast peel (Aggregate)") {
+ MaxAggCases.foreach { case (label, p, s, pPrime) =>
+ require(pPrime >= p, s"$label: p'=$pPrime must be >= p=$p (widening)")
+ require(pPrime <= 38, s"$label: p'=$pPrime exceeds MAX_PRECISION=38")
+ setupAggTable(spark, aN, p, s)
+ runThreeWay(label, aN,
+ nativeSql = "select max(x) from t",
+ widenedSql = s"select max(cast(x as decimal($pPrime, $s))) from t",
+ iters, apl)
+ }
+ }
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]