This is an automated email from the ASF dual-hosted git repository.

yaooqinn pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 606b8fad2f13 [SPARK-57023][SQL] DecimalAggregates: peel widened Cast 
on Min/Max
606b8fad2f13 is described below

commit 606b8fad2f135fb5a5a9d3021338ff4875cc2381
Author: Kent Yao <[email protected]>
AuthorDate: Mon May 25 14:37:47 2026 +0800

    [SPARK-57023][SQL] DecimalAggregates: peel widened Cast on Min/Max
    
    ### What changes were proposed in this pull request?
    
    Extend `DecimalAggregates` to peel a scale-preserving widening `Cast` 
around `Min`/`Max` arguments, mirroring the existing SUM/AVG widened-Cast arms 
landed via SPARK-56983.
    
    When the input is `Min(Cast(inner: dec(p, s), dec(p', s)))` (or `Max(...)`) 
with `p' >= p` and no `CheckOverflow` wrapper, the rule rewrites to 
`Cast(Min(inner), dec(p', s))` (and likewise for `Max`). MIN/MAX are pointwise 
on a totally-ordered domain, so under same-scale widening the rewrite is 
value-equivalent and NULL-preserving (see design §D6 self-Q&A).
    
    Both arms reuse the same-package `WidenedDecimalChild` extractor introduced 
for SUM/AVG, which refuses to unwrap `CheckOverflow` and enforces the same `s 
== s'`, `p' >= p` guard. `TreePatterns.MIN` / `TreePatterns.MAX` are added and 
registered on `Min` / `Max`; `DecimalAggregates`'s `containsAnyPattern` pruning 
is widened to `(SUM, AVERAGE, MIN, MAX)`. No new rule, no new file — three arms 
cohabit one object.
    
    ### Why are the changes needed?
    
    The SUM/AVG arms recover the long-backed fast path when BI tools generate 
`SUM(CAST(small_dec AS larger_dec))`. The MIN/MAX case is the natural sibling: 
same widening pattern, but currently no peel arm exists, so each aggregated row 
pays a per-row `Decimal.changePrecision` call inside `Cast` 
(`Cast.scala:1074-1082`) even though the outer Cast could be applied **once** 
to the partition extremum instead.
    
    MIN/MAX are pointwise on a totally-ordered domain and immune to both the 
SUM overflow boundary (SPARK-56983) and the AVG SPARK-37024 Double-regime gate, 
so the equivalence is unconditional within the `WidenedDecimalChild` guard 
domain (R1b Lemma 1 — design §D6 self-Q&A).
    
    The saving is per-row `changePrecision` elimination on the aggregate input 
— ceiling **−0.39% ~ −6.02% JDK-progressive** (JDK 25 strongest) per the GHA 3 
JDK × 16 case matrix below — and the patch is essentially free: three lines of 
extractor reuse, no new rule, no new file.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    - `DecimalAggregatesSuite` extended with 6 SPARK-57023 oracle cases — 2 
peel positives (`Min`/`Max` over widening Cast) and 4 negatives (scale-changing 
Cast, narrowing Cast, `CheckOverflow`-wrapped, `MinBy`/`MaxBy`/`MaxMinByK`). 
Suite: 43/43.
    - Full `catalyst/test`: 9341 tests / 353 suites, 0 failed, 5 ignored, 670 s.
    - `TPCDSV1_4PlanStabilitySuite` + `TPCDSV1_4PlanStabilityWithStatsSuite`: 
no golden change on `apache/master`. Existence-test on `efb7beab826` recorded 
**0 trigger across the 130 TPC-DS v1.4 + v2.7.0 queries** (see investigation 
`0002-baseline-run-results.md`, positive-control verified).
    - `DecimalAggregatesBenchmark`: extended with new MIN section (`C1-C4`) and 
MAX section (`D1-D4`) mirroring the existing SUM/AVG sections. Full 8-case × 3 
JDK matrix run on GitHub Actions standard `ubuntu-22.04` runners (AMD EPYC 7763 
64-Core, 10M rows × 5 iters); 
`DecimalAggregatesBenchmark{-,-jdk21-,-jdk25-}results.txt` regenerated and 
committed. Headline cells (Best ms peel off / peel on / Δ%):
    
      | case | p,s,p'   | JDK 17                  | JDK 21                  | 
JDK 25                  |
      
|------|----------|-------------------------|-------------------------|-------------------------|
      | C1 MIN | 10,2,18 | 3974 / 3920 (−1.36%)   | 3301 / 3202 (−3.00%)   | 
1353 / 1291 (−4.58%)   |
      | C2 MIN | 10,2,28 | 3959 / 3880 (−2.00%)   | 3294 / 3228 (−2.00%)   | 
1351 / 1287 (−4.74%)   |
      | C3 MIN | 18,2,28 | 3623 / 3609 (−0.39%)   | 3557 / 3450 (−3.01%)   | 
1368 / 1292 (−5.56%)   |
      | C4 MIN | 10,2,38 | 3856 / 3835 (−0.54%)   | 3240 / 3151 (−2.75%)   | 
1348 / 1283 (−4.82%)   |
      | D1 MAX | 10,2,18 | 3854 / 3785 (−1.79%)   | 3241 / 3173 (−2.10%)   | 
1346 / 1279 (−4.98%)   |
      | D2 MAX | 10,2,28 | 3908 / 3808 (−2.56%)   | 3267 / 3152 (−3.52%)   | 
1352 / 1287 (−4.81%)   |
      | D3 MAX | 18,2,28 | 3664 / 3620 (−1.20%)   | 3507 / 3462 (−1.28%)   | 
1378 / 1295 (−6.02%)   |
      | D4 MAX | 10,2,38 | 3904 / 3792 (−2.87%)   | 3233 / 3164 (−2.13%)   | 
1342 / 1274 (−5.07%)   |
    
      Pattern: ceiling **−0.39% ~ −6.02% JDK-progressive** (JDK 25 strongest, 
JDK 17 weakest) across the 24 readings; no negative-delta (regression) cell. 
The saving is per-row `Decimal.changePrecision` elimination on the aggregate 
input — design `0002-design-minmax-fastpath.md` §D5.1 declares this a legal 
micro-only ship contract. A pre-GHA local sbt sanity (Section C MIN, JDK 17, 
10M × 5) recorded −1.5% to −2.0%; the GHA EPYC numbers above supersede it.
    
    ### Note on Section A/B `results.txt` refresh
    
    The sibling SPARK-56627 .scala edited Section B2/B4 cases to `p'=11` but 
did not regenerate `DecimalAggregatesBenchmark-results.txt`, so the committed 
text was stale for the new shape. This PR regenerates the whole file under the 
canonical EPYC 7763 GHA runner (replacing the prior EPYC 9V74 numbers in 
Section A/B), and the same regeneration produces the JDK 21 / JDK 25 companion 
files. Section A/B refresh is mechanical housekeeping triggered by the 
regeneration, not part of the MIN/MA [...]
    
    ### Why no TPC-DS results?
    
    The existence-test on `apache/master` `efb7beab826` walked optimized plans 
across the full 130-query TPC-DS v1.4 + v2.7.0 corpus and recorded **0 
queries** triggering the `Min(Cast(...))` / `Max(Cast(...))` widening pattern 
(see investigation `0002-baseline-run-results.md`, positive-control verified). 
The MIN/MAX peel is justified by semantic equivalence + micro-level pattern 
coverage, not by TPC-DS revenue. Design `0002-design-minmax-fastpath.md` §D5.1 
declares this as a legal micro- [...]
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Generated-by: Claude Opus 4.7
    
    Closes #56078 from yaooqinn/users/kentyao/spark-decimal-minmax-cast-peel.
    
    Authored-by: Kent Yao <[email protected]>
    Signed-off-by: Kent Yao <[email protected]>
---
 .../sql/catalyst/expressions/aggregate/Max.scala   |   3 +
 .../sql/catalyst/expressions/aggregate/Min.scala   |   3 +
 .../spark/sql/catalyst/optimizer/Optimizer.scala   |  21 ++-
 .../spark/sql/catalyst/trees/TreePatterns.scala    |   2 +
 .../optimizer/DecimalAggregatesSuite.scala         |  78 ++++++++++-
 .../DecimalAggregatesBenchmark-jdk21-results.txt   | 142 ++++++++++++++++-----
 .../DecimalAggregatesBenchmark-jdk25-results.txt   | 142 ++++++++++++++++-----
 .../DecimalAggregatesBenchmark-results.txt         | 142 ++++++++++++++++-----
 .../benchmark/DecimalAggregatesBenchmark.scala     |  65 ++++++++++
 9 files changed, 493 insertions(+), 105 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Max.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Max.scala
index 902f53309de4..f49297eba88b 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Max.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Max.scala
@@ -20,6 +20,7 @@ package org.apache.spark.sql.catalyst.expressions.aggregate
 import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.dsl.expressions._
 import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.trees.TreePattern.{MAX, TreePattern}
 import org.apache.spark.sql.catalyst.trees.UnaryLike
 import org.apache.spark.sql.catalyst.util.TypeUtils
 import org.apache.spark.sql.types._
@@ -43,6 +44,8 @@ case class Max(child: Expression) extends 
DeclarativeAggregate with UnaryLike[Ex
   override def checkInputDataTypes(): TypeCheckResult =
     TypeUtils.checkForOrderingExpr(child.dataType, prettyName)
 
+  final override val nodePatterns: Seq[TreePattern] = Seq(MAX)
+
   private lazy val max = AttributeReference("max", child.dataType)()
 
   override lazy val aggBufferAttributes: Seq[AttributeReference] = max :: Nil
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Min.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Min.scala
index 7a9588808dbd..eaef7b6bec11 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Min.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Min.scala
@@ -20,6 +20,7 @@ package org.apache.spark.sql.catalyst.expressions.aggregate
 import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.dsl.expressions._
 import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.trees.TreePattern.{MIN, TreePattern}
 import org.apache.spark.sql.catalyst.trees.UnaryLike
 import org.apache.spark.sql.catalyst.util.TypeUtils
 import org.apache.spark.sql.types._
@@ -43,6 +44,8 @@ case class Min(child: Expression) extends 
DeclarativeAggregate with UnaryLike[Ex
   override def checkInputDataTypes(): TypeCheckResult =
     TypeUtils.checkForOrderingExpr(child.dataType, prettyName)
 
+  final override val nodePatterns: Seq[TreePattern] = Seq(MIN)
+
   private lazy val min = AttributeReference("min", child.dataType)()
 
   override lazy val aggBufferAttributes: Seq[AttributeReference] = min :: Nil
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index 95d774c6e991..1c991729c7d4 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -2576,9 +2576,9 @@ object DecimalAggregates extends Rule[LogicalPlan] {
   }
 
   def apply(plan: LogicalPlan): LogicalPlan = plan.transformWithPruning(
-    _.containsAnyPattern(SUM, AVERAGE), ruleId) {
+    _.containsAnyPattern(SUM, AVERAGE, MIN, MAX), ruleId) {
     case q: LogicalPlan => q.transformExpressionsDownWithPruning(
-      _.containsAnyPattern(SUM, AVERAGE), ruleId) {
+      _.containsAnyPattern(SUM, AVERAGE, MIN, MAX), ruleId) {
       case we @ WindowExpression(ae @ AggregateExpression(af, _, _, _, _), _) 
=> af match {
         // Window arm: `ExtractWindowExpressions` hoists composite children
         // (here the widening Cast) into a child Project, so widened-Cast
@@ -2636,6 +2636,23 @@ object DecimalAggregates extends Rule[LogicalPlan] {
             Divide(newAggExpr, Literal.create(math.pow(10.0, scale), 
DoubleType)),
             DecimalType(prec + 4, scale + 4), 
Option(conf.sessionLocalTimeZone))
 
+        // Hoist a scale-preserving widening Cast out of Min so the Min runs on
+        // the narrower inner Decimal. Min picks an existing row's value, so a
+        // widening Cast (same scale, larger precision) is bit-identical to
+        // applying the Cast after the aggregate. The outer Cast preserves the
+        // pre-rewrite result dataType (Min.dataType == child.dataType).
+        case m @ Min(WidenedDecimalChild(inner, _, pPrime, sPrime)) =>
+          Cast(
+            ae.copy(aggregateFunction = m.copy(child = inner)),
+            DecimalType(pPrime, sPrime), Option(conf.sessionLocalTimeZone))
+
+        // Hoist a scale-preserving widening Cast out of Max (same reasoning
+        // as the Min arm above).
+        case m @ Max(WidenedDecimalChild(inner, _, pPrime, sPrime)) =>
+          Cast(
+            ae.copy(aggregateFunction = m.copy(child = inner)),
+            DecimalType(pPrime, sPrime), Option(conf.sessionLocalTimeZone))
+
         case _ => ae
       }
     }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreePatterns.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreePatterns.scala
index cca9bcd673d6..4e06fcb36767 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreePatterns.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreePatterns.scala
@@ -71,7 +71,9 @@ object TreePattern extends Enumeration  {
   val MAP_FROM_ARRAYS: Value = Value
   val MAP_FROM_ENTRIES: Value = Value
   val MAP_OBJECTS: Value = Value
+  val MAX: Value = Value
   val MEASURE: Value = Value
+  val MIN: Value = Value
   val MULTI_ALIAS: Value = Value
   val NEW_INSTANCE: Value = Value
   val NOT: Value = Value
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/DecimalAggregatesSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/DecimalAggregatesSuite.scala
index b65ce3a0f017..0850929d3d24 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/DecimalAggregatesSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/DecimalAggregatesSuite.scala
@@ -23,7 +23,7 @@ import 
org.scalatestplus.scalacheck.ScalaCheckDrivenPropertyChecks
 import org.apache.spark.sql.catalyst.dsl.expressions._
 import org.apache.spark.sql.catalyst.dsl.plans._
 import org.apache.spark.sql.catalyst.expressions._
-import org.apache.spark.sql.catalyst.expressions.aggregate.{Average, Sum}
+import org.apache.spark.sql.catalyst.expressions.aggregate.{Average, MaxBy, 
MaxMinByK, MinBy, Sum}
 import org.apache.spark.sql.catalyst.plans.PlanTest
 import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
 import org.apache.spark.sql.catalyst.rules.RuleExecutor
@@ -613,4 +613,80 @@ class DecimalAggregatesSuite extends PlanTest with 
ScalaCheckDrivenPropertyCheck
       s"evalMode should be preserved as TRY after rewrite, got " +
         avgs.map(_.evalMode).mkString(","))
   }
+  test("SPARK-57023: MIN(CAST(dec(7,2) AS dec(12,2))) peels via widened-Cast 
fast path") {
+    val widened = $"d7_2".cast(DecimalType(12, 2))
+    val originalQuery = widenRel.select(min(widened).as("min_widened"))
+    val optimized = Optimize.execute(originalQuery.analyze)
+    val correctAnswer = widenRel
+      .select(
+        Cast(
+          min($"d7_2"),
+          DecimalType(12, 2),
+          Option(conf.sessionLocalTimeZone))
+          .as("min_widened"))
+      .analyze
+
+    comparePlans(optimized, correctAnswer)
+  }
+
+  test("SPARK-57023: MAX(CAST(dec(7,2) AS dec(12,2))) peels via widened-Cast 
fast path") {
+    val widened = $"d7_2".cast(DecimalType(12, 2))
+    val originalQuery = widenRel.select(max(widened).as("max_widened"))
+    val optimized = Optimize.execute(originalQuery.analyze)
+    val correctAnswer = widenRel
+      .select(
+        Cast(
+          max($"d7_2"),
+          DecimalType(12, 2),
+          Option(conf.sessionLocalTimeZone))
+          .as("max_widened"))
+      .analyze
+
+    comparePlans(optimized, correctAnswer)
+  }
+
+  test("SPARK-57023: MIN(CAST(dec(7,2) AS dec(12,4))) does NOT peel (scale 
change)") {
+    val rescaled = $"d7_2".cast(DecimalType(12, 4))
+    val originalQuery = widenRel.select(min(rescaled).as("min_rescaled"))
+    val optimized = Optimize.execute(originalQuery.analyze)
+    val correctAnswer = originalQuery.analyze
+
+    comparePlans(optimized, correctAnswer)
+  }
+
+  test("SPARK-57023: MIN(CAST(dec(17,2) AS dec(10,2))) does NOT peel 
(narrowing)") {
+    val narrowed = $"d17_2".cast(DecimalType(10, 2))
+    val originalQuery = widenRel.select(min(narrowed).as("min_narrowed"))
+    val optimized = Optimize.execute(originalQuery.analyze)
+    val correctAnswer = originalQuery.analyze
+
+    comparePlans(optimized, correctAnswer)
+  }
+
+  test("SPARK-57023: MIN/MAX(CheckOverflow) does NOT peel (CheckOverflow 
guard)") {
+    val co = CheckOverflow($"d7_2", DecimalType(7, 2), nullOnOverflow = true)
+    val widened = Cast(co, DecimalType(12, 2))
+    val originalQuery = widenRel.select(min(widened).as("min_co"), 
max(widened).as("max_co"))
+    val optimized = Optimize.execute(originalQuery.analyze)
+    val correctAnswer = originalQuery.analyze
+
+    comparePlans(optimized, correctAnswer)
+  }
+
+  test("SPARK-57023: MinBy/MaxBy/MaxMinByK with widened-Cast value do NOT peel 
" +
+      "(rule pattern matches only Min/Max)") {
+    val widened = $"d7_2".cast(DecimalType(12, 2))
+    val ordering = $"i"
+    val minByExpr = MinBy(widened, ordering).toAggregateExpression()
+    val maxByExpr = MaxBy(widened, ordering).toAggregateExpression()
+    val maxMinByKExpr = MaxMinByK(widened, ordering, 
Literal(3)).toAggregateExpression()
+    val originalQuery = widenRel.select(
+      minByExpr.as("min_by_w"),
+      maxByExpr.as("max_by_w"),
+      maxMinByKExpr.as("mmbk_w"))
+    val optimized = Optimize.execute(originalQuery.analyze)
+    val correctAnswer = originalQuery.analyze
+
+    comparePlans(optimized, correctAnswer)
+  }
 }
diff --git a/sql/core/benchmarks/DecimalAggregatesBenchmark-jdk21-results.txt 
b/sql/core/benchmarks/DecimalAggregatesBenchmark-jdk21-results.txt
index 1186901b3575..4448850b83d3 100644
--- a/sql/core/benchmarks/DecimalAggregatesBenchmark-jdk21-results.txt
+++ b/sql/core/benchmarks/DecimalAggregatesBenchmark-jdk21-results.txt
@@ -3,36 +3,36 @@ DecimalAggregates SUM widened-cast peel (Aggregate)
 
================================================================================================
 
 OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
 A1 p=7 s=2 p'=8:                          Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          2178           2236         
 56          4.6         217.8       1.0X
-widened cast, peel off                             2369           2381         
  9          4.2         236.9       0.9X
-widened cast, peel on                              2105           2118         
 12          4.8         210.5       1.0X
+native (no cast, rule on)                          2111           2193         
 59          4.7         211.1       1.0X
+widened cast, peel off                             2364           2371         
  7          4.2         236.4       0.9X
+widened cast, peel on                              2074           2091         
 20          4.8         207.4       1.0X
 
 OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
 A2 p=7 s=2 p'=17:                         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          2103           2115         
 17          4.8         210.3       1.0X
-widened cast, peel off                             2366           2377         
  7          4.2         236.6       0.9X
-widened cast, peel on                              2100           2109         
 11          4.8         210.0       1.0X
+native (no cast, rule on)                          2088           2100         
 14          4.8         208.8       1.0X
+widened cast, peel off                             2314           2340         
 31          4.3         231.4       0.9X
+widened cast, peel on                              2084           2093         
 15          4.8         208.4       1.0X
 
 OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
 A3 p=5 s=0 p'=6:                          Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          2117           2138         
 29          4.7         211.7       1.0X
-widened cast, peel off                             2403           2416         
 13          4.2         240.3       0.9X
-widened cast, peel on                              2157           2164         
  7          4.6         215.7       1.0X
+native (no cast, rule on)                          2109           2118         
  9          4.7         210.9       1.0X
+widened cast, peel off                             2394           2405         
 22          4.2         239.4       0.9X
+widened cast, peel on                              2125           2146         
 13          4.7         212.5       1.0X
 
 OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
 A4 p=5 s=0 p'=15:                         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          2151           2157         
  7          4.6         215.1       1.0X
-widened cast, peel off                             2420           2427         
 10          4.1         242.0       0.9X
-widened cast, peel on                              2152           2159         
  9          4.6         215.2       1.0X
+native (no cast, rule on)                          2109           2113         
  3          4.7         210.9       1.0X
+widened cast, peel off                             2409           2423         
 21          4.2         240.9       0.9X
+widened cast, peel on                              2116           2125         
 11          4.7         211.6       1.0X
 
 
 
================================================================================================
@@ -40,35 +40,109 @@ DecimalAggregates AVG widened-cast peel (Aggregate)
 
================================================================================================
 
 OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
 B1 p=7 s=2 p'=8:                          Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          2130           2136         
  5          4.7         213.0       1.0X
-widened cast, peel off                             2358           2367         
 15          4.2         235.8       0.9X
-widened cast, peel on                              2140           2150         
  7          4.7         214.0       1.0X
+native (no cast, rule on)                          2087           2098         
  7          4.8         208.7       1.0X
+widened cast, peel off                             2292           2300         
 10          4.4         229.2       0.9X
+widened cast, peel on                              2125           2127         
  2          4.7         212.5       1.0X
 
 OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
-B2 p=7 s=2 p'=12:                         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+AMD EPYC 7763 64-Core Processor
+B2 p=7 s=2 p'=11:                         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          2147           2151         
  3          4.7         214.7       1.0X
-widened cast, peel off                             2359           2361         
  2          4.2         235.9       0.9X
-widened cast, peel on                              2126           2161         
 20          4.7         212.6       1.0X
+native (no cast, rule on)                          2145           2151         
  5          4.7         214.5       1.0X
+widened cast, peel off                             2312           2317         
  4          4.3         231.2       0.9X
+widened cast, peel on                              2090           2096         
 10          4.8         209.0       1.0X
 
 OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
 B3 p=5 s=0 p'=6:                          Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          2173           2185         
  9          4.6         217.3       1.0X
-widened cast, peel off                             2405           2413         
  7          4.2         240.5       0.9X
-widened cast, peel on                              2167           2177         
 12          4.6         216.7       1.0X
+native (no cast, rule on)                          2144           2147         
  3          4.7         214.4       1.0X
+widened cast, peel off                             2395           2420         
 14          4.2         239.5       0.9X
+widened cast, peel on                              2153           2161         
 13          4.6         215.3       1.0X
 
 OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
-B4 p=5 s=0 p'=15:                         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+AMD EPYC 7763 64-Core Processor
+B4 p=5 s=0 p'=11:                         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          2173           2179         
  7          4.6         217.3       1.0X
-widened cast, peel off                             2393           2400         
 11          4.2         239.3       0.9X
-widened cast, peel on                              2172           2178         
  5          4.6         217.2       1.0X
+native (no cast, rule on)                          2145           2149         
  6          4.7         214.5       1.0X
+widened cast, peel off                             2369           2381         
  7          4.2         236.9       0.9X
+widened cast, peel on                              2152           2154         
  1          4.6         215.2       1.0X
+
+
+================================================================================================
+DecimalAggregates MIN widened-cast peel (Aggregate)
+================================================================================================
+
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C1 p=10 s=2 p'=18:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          3258           3263         
  4          3.1         325.8       1.0X
+widened cast, peel off                             3301           3318         
 30          3.0         330.1       1.0X
+widened cast, peel on                              3202           3274         
 41          3.1         320.2       1.0X
+
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C2 p=10 s=2 p'=28:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          3261           3265         
  3          3.1         326.1       1.0X
+widened cast, peel off                             3294           3323         
 16          3.0         329.4       1.0X
+widened cast, peel on                              3228           3239         
  6          3.1         322.8       1.0X
+
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C3 p=18 s=2 p'=28:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          3440           3540         
 56          2.9         344.0       1.0X
+widened cast, peel off                             3557           3582         
 15          2.8         355.7       1.0X
+widened cast, peel on                              3450           3471         
 34          2.9         345.0       1.0X
+
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C4 p=10 s=2 p'=38:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          3218           3223         
  3          3.1         321.8       1.0X
+widened cast, peel off                             3240           3252         
 19          3.1         324.0       1.0X
+widened cast, peel on                              3151           3192         
 23          3.2         315.1       1.0X
+
+
+================================================================================================
+DecimalAggregates MAX widened-cast peel (Aggregate)
+================================================================================================
+
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D1 p=10 s=2 p'=18:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          3148           3177         
 16          3.2         314.8       1.0X
+widened cast, peel off                             3241           3249         
  9          3.1         324.1       1.0X
+widened cast, peel on                              3173           3173         
  0          3.2         317.3       1.0X
+
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D2 p=10 s=2 p'=28:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          3178           3183         
  4          3.1         317.8       1.0X
+widened cast, peel off                             3267           3271         
  4          3.1         326.7       1.0X
+widened cast, peel on                              3152           3168         
 10          3.2         315.2       1.0X
+
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D3 p=18 s=2 p'=28:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          3458           3531         
 42          2.9         345.8       1.0X
+widened cast, peel off                             3507           3560         
 30          2.9         350.7       1.0X
+widened cast, peel on                              3462           3525         
 36          2.9         346.2       1.0X
+
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D4 p=10 s=2 p'=38:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          3179           3183         
  4          3.1         317.9       1.0X
+widened cast, peel off                             3233           3251         
 10          3.1         323.3       1.0X
+widened cast, peel on                              3164           3171         
 11          3.2         316.4       1.0X
 
 
diff --git a/sql/core/benchmarks/DecimalAggregatesBenchmark-jdk25-results.txt 
b/sql/core/benchmarks/DecimalAggregatesBenchmark-jdk25-results.txt
index 60109cac85ec..4e901134bd02 100644
--- a/sql/core/benchmarks/DecimalAggregatesBenchmark-jdk25-results.txt
+++ b/sql/core/benchmarks/DecimalAggregatesBenchmark-jdk25-results.txt
@@ -3,36 +3,36 @@ DecimalAggregates SUM widened-cast peel (Aggregate)
 
================================================================================================
 
 OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
 A1 p=7 s=2 p'=8:                          Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          1194           1230         
 57          8.4         119.4       1.0X
-widened cast, peel off                             1421           1433         
 11          7.0         142.1       0.8X
-widened cast, peel on                              1181           1188         
  5          8.5         118.1       1.0X
+native (no cast, rule on)                          1200           1222         
 33          8.3         120.0       1.0X
+widened cast, peel off                             1437           1447         
  8          7.0         143.7       0.8X
+widened cast, peel on                              1197           1205         
  8          8.4         119.7       1.0X
 
 OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
 A2 p=7 s=2 p'=17:                         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          1174           1189         
 12          8.5         117.4       1.0X
-widened cast, peel off                             1401           1414         
  8          7.1         140.1       0.8X
-widened cast, peel on                              1169           1178         
  8          8.6         116.9       1.0X
+native (no cast, rule on)                          1189           1196         
  8          8.4         118.9       1.0X
+widened cast, peel off                             1426           1431         
  5          7.0         142.6       0.8X
+widened cast, peel on                              1189           1195         
  3          8.4         118.9       1.0X
 
 OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
 A3 p=5 s=0 p'=6:                          Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          1245           1254         
 10          8.0         124.5       1.0X
-widened cast, peel off                             1498           1503         
  5          6.7         149.8       0.8X
-widened cast, peel on                              1222           1232         
 10          8.2         122.2       1.0X
+native (no cast, rule on)                          1223           1224         
  2          8.2         122.3       1.0X
+widened cast, peel off                             1497           1501         
  3          6.7         149.7       0.8X
+widened cast, peel on                              1213           1219         
  4          8.2         121.3       1.0X
 
 OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
 A4 p=5 s=0 p'=15:                         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          1234           1238         
  3          8.1         123.4       1.0X
-widened cast, peel off                             1473           1478         
  7          6.8         147.3       0.8X
-widened cast, peel on                              1242           1255         
 16          8.1         124.2       1.0X
+native (no cast, rule on)                          1214           1219         
  5          8.2         121.4       1.0X
+widened cast, peel off                             1464           1469         
  3          6.8         146.4       0.8X
+widened cast, peel on                              1227           1233         
  6          8.2         122.7       1.0X
 
 
 
================================================================================================
@@ -40,35 +40,109 @@ DecimalAggregates AVG widened-cast peel (Aggregate)
 
================================================================================================
 
 OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
 B1 p=7 s=2 p'=8:                          Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          1178           1185         
  9          8.5         117.8       1.0X
-widened cast, peel off                             1434           1440         
  8          7.0         143.4       0.8X
-widened cast, peel on                              1232           1235         
  3          8.1         123.2       1.0X
+native (no cast, rule on)                          1195           1200         
  5          8.4         119.5       1.0X
+widened cast, peel off                             1392           1395         
  3          7.2         139.2       0.9X
+widened cast, peel on                              1189           1195         
  5          8.4         118.9       1.0X
 
 OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
-B2 p=7 s=2 p'=12:                         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+AMD EPYC 7763 64-Core Processor
+B2 p=7 s=2 p'=11:                         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          1222           1229         
  7          8.2         122.2       1.0X
-widened cast, peel off                             1434           1444         
 10          7.0         143.4       0.9X
-widened cast, peel on                              1216           1223         
  6          8.2         121.6       1.0X
+native (no cast, rule on)                          1192           1195         
  3          8.4         119.2       1.0X
+widened cast, peel off                             1401           1406         
  4          7.1         140.1       0.9X
+widened cast, peel on                              1191           1195         
  5          8.4         119.1       1.0X
 
 OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+AMD EPYC 7763 64-Core Processor
 B3 p=5 s=0 p'=6:                          Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          1267           1274         
  6          7.9         126.7       1.0X
-widened cast, peel off                             1505           1509         
  4          6.6         150.5       0.8X
-widened cast, peel on                              1272           1277         
  7          7.9         127.2       1.0X
+native (no cast, rule on)                          1213           1218         
  7          8.2         121.3       1.0X
+widened cast, peel off                             1423           1443         
 40          7.0         142.3       0.9X
+widened cast, peel on                              1213           1214         
  2          8.2         121.3       1.0X
 
 OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
-B4 p=5 s=0 p'=15:                         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+AMD EPYC 7763 64-Core Processor
+B4 p=5 s=0 p'=11:                         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          1269           1275         
  5          7.9         126.9       1.0X
-widened cast, peel off                             1494           1501         
  9          6.7         149.4       0.8X
-widened cast, peel on                              1268           1274         
  6          7.9         126.8       1.0X
+native (no cast, rule on)                          1214           1218         
  5          8.2         121.4       1.0X
+widened cast, peel off                             1422           1422         
  1          7.0         142.2       0.9X
+widened cast, peel on                              1209           1214         
  3          8.3         120.9       1.0X
+
+
+================================================================================================
+DecimalAggregates MIN widened-cast peel (Aggregate)
+================================================================================================
+
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C1 p=10 s=2 p'=18:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          1292           1298         
  5          7.7         129.2       1.0X
+widened cast, peel off                             1353           1356         
  3          7.4         135.3       1.0X
+widened cast, peel on                              1291           1292         
  3          7.7         129.1       1.0X
+
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C2 p=10 s=2 p'=28:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          1297           1302         
  5          7.7         129.7       1.0X
+widened cast, peel off                             1351           1354         
  3          7.4         135.1       1.0X
+widened cast, peel on                              1287           1290         
  3          7.8         128.7       1.0X
+
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C3 p=18 s=2 p'=28:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          1290           1294         
  4          7.8         129.0       1.0X
+widened cast, peel off                             1368           1372         
  5          7.3         136.8       0.9X
+widened cast, peel on                              1292           1294         
  2          7.7         129.2       1.0X
+
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C4 p=10 s=2 p'=38:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          1281           1285         
  4          7.8         128.1       1.0X
+widened cast, peel off                             1348           1351         
  4          7.4         134.8       1.0X
+widened cast, peel on                              1283           1290         
  7          7.8         128.3       1.0X
+
+
+================================================================================================
+DecimalAggregates MAX widened-cast peel (Aggregate)
+================================================================================================
+
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D1 p=10 s=2 p'=18:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          1275           1281         
  5          7.8         127.5       1.0X
+widened cast, peel off                             1346           1349         
  2          7.4         134.6       0.9X
+widened cast, peel on                              1279           1280         
  2          7.8         127.9       1.0X
+
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D2 p=10 s=2 p'=28:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          1273           1275         
  2          7.9         127.3       1.0X
+widened cast, peel off                             1352           1356         
  2          7.4         135.2       0.9X
+widened cast, peel on                              1287           1291         
  4          7.8         128.7       1.0X
+
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D3 p=18 s=2 p'=28:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          1284           1289         
  7          7.8         128.4       1.0X
+widened cast, peel off                             1378           1385         
  7          7.3         137.8       0.9X
+widened cast, peel on                              1295           1300         
  4          7.7         129.5       1.0X
+
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D4 p=10 s=2 p'=38:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          1273           1276         
  3          7.9         127.3       1.0X
+widened cast, peel off                             1342           1347         
  5          7.4         134.2       0.9X
+widened cast, peel on                              1274           1278         
  3          7.9         127.4       1.0X
 
 
diff --git a/sql/core/benchmarks/DecimalAggregatesBenchmark-results.txt 
b/sql/core/benchmarks/DecimalAggregatesBenchmark-results.txt
index d9c2c9662826..8e28043f8aa3 100644
--- a/sql/core/benchmarks/DecimalAggregatesBenchmark-results.txt
+++ b/sql/core/benchmarks/DecimalAggregatesBenchmark-results.txt
@@ -3,36 +3,36 @@ DecimalAggregates SUM widened-cast peel (Aggregate)
 
================================================================================================
 
 OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
-AMD EPYC 9V74 80-Core Processor
+AMD EPYC 7763 64-Core Processor
 A1 p=7 s=2 p'=8:                          Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          3068           3095         
 35          3.3         306.8       1.0X
-widened cast, peel off                             3396           3410         
 19          2.9         339.6       0.9X
-widened cast, peel on                              3107           3115         
 10          3.2         310.7       1.0X
+native (no cast, rule on)                          2814           2840         
 22          3.6         281.4       1.0X
+widened cast, peel off                             3042           3052         
  7          3.3         304.2       0.9X
+widened cast, peel on                              2740           2764         
 26          3.6         274.0       1.0X
 
 OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
-AMD EPYC 9V74 80-Core Processor
+AMD EPYC 7763 64-Core Processor
 A2 p=7 s=2 p'=17:                         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          3104           3120         
 23          3.2         310.4       1.0X
-widened cast, peel off                             3386           3407         
 27          3.0         338.6       0.9X
-widened cast, peel on                              3094           3106         
 17          3.2         309.4       1.0X
+native (no cast, rule on)                          2721           2728         
  4          3.7         272.1       1.0X
+widened cast, peel off                             3033           3061         
 18          3.3         303.3       0.9X
+widened cast, peel on                              2792           2799         
  7          3.6         279.2       1.0X
 
 OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
-AMD EPYC 9V74 80-Core Processor
+AMD EPYC 7763 64-Core Processor
 A3 p=5 s=0 p'=6:                          Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          3039           3053         
 21          3.3         303.9       1.0X
-widened cast, peel off                             3336           3340         
  5          3.0         333.6       0.9X
-widened cast, peel on                              3034           3048         
 14          3.3         303.4       1.0X
+native (no cast, rule on)                          2843           2864         
 34          3.5         284.3       1.0X
+widened cast, peel off                             3103           3119         
 19          3.2         310.3       0.9X
+widened cast, peel on                              2852           2859         
  7          3.5         285.2       1.0X
 
 OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
-AMD EPYC 9V74 80-Core Processor
+AMD EPYC 7763 64-Core Processor
 A4 p=5 s=0 p'=15:                         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          3037           3049         
 16          3.3         303.7       1.0X
-widened cast, peel off                             3324           3340         
 16          3.0         332.4       0.9X
-widened cast, peel on                              3027           3031         
  4          3.3         302.7       1.0X
+native (no cast, rule on)                          2852           2863         
  9          3.5         285.2       1.0X
+widened cast, peel off                             3138           3143         
  8          3.2         313.8       0.9X
+widened cast, peel on                              2814           2823         
  8          3.6         281.4       1.0X
 
 
 
================================================================================================
@@ -40,35 +40,109 @@ DecimalAggregates AVG widened-cast peel (Aggregate)
 
================================================================================================
 
 OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
-AMD EPYC 9V74 80-Core Processor
+AMD EPYC 7763 64-Core Processor
 B1 p=7 s=2 p'=8:                          Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          3038           3041         
  2          3.3         303.8       1.0X
-widened cast, peel off                             3274           3283         
 18          3.1         327.4       0.9X
-widened cast, peel on                              3056           3074         
 15          3.3         305.6       1.0X
+native (no cast, rule on)                          2777           2787         
 10          3.6         277.7       1.0X
+widened cast, peel off                             3019           3033         
 18          3.3         301.9       0.9X
+widened cast, peel on                              2781           2799         
 33          3.6         278.1       1.0X
 
 OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
-AMD EPYC 9V74 80-Core Processor
-B2 p=7 s=2 p'=12:                         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+AMD EPYC 7763 64-Core Processor
+B2 p=7 s=2 p'=11:                         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          3029           3033         
  3          3.3         302.9       1.0X
-widened cast, peel off                             3288           3291         
  2          3.0         328.8       0.9X
-widened cast, peel on                              3031           3036         
  6          3.3         303.1       1.0X
+native (no cast, rule on)                          2808           2818         
  9          3.6         280.8       1.0X
+widened cast, peel off                             3067           3121         
 34          3.3         306.7       0.9X
+widened cast, peel on                              2776           2785         
 16          3.6         277.6       1.0X
 
 OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
-AMD EPYC 9V74 80-Core Processor
+AMD EPYC 7763 64-Core Processor
 B3 p=5 s=0 p'=6:                          Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          3022           3030         
  5          3.3         302.2       1.0X
-widened cast, peel off                             3275           3307         
 28          3.1         327.5       0.9X
-widened cast, peel on                              3025           3028         
  3          3.3         302.5       1.0X
+native (no cast, rule on)                          2837           2857         
 16          3.5         283.7       1.0X
+widened cast, peel off                             3087           3100         
 21          3.2         308.7       0.9X
+widened cast, peel on                              2834           2846         
 18          3.5         283.4       1.0X
 
 OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
-AMD EPYC 9V74 80-Core Processor
-B4 p=5 s=0 p'=15:                         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+AMD EPYC 7763 64-Core Processor
+B4 p=5 s=0 p'=11:                         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-native (no cast, rule on)                          3024           3039         
 21          3.3         302.4       1.0X
-widened cast, peel off                             3279           3298         
 17          3.1         327.9       0.9X
-widened cast, peel on                              3016           3023         
  6          3.3         301.6       1.0X
+native (no cast, rule on)                          2850           2856         
  7          3.5         285.0       1.0X
+widened cast, peel off                             3107           3113         
  6          3.2         310.7       0.9X
+widened cast, peel on                              2831           2841         
 14          3.5         283.1       1.0X
+
+
+================================================================================================
+DecimalAggregates MIN widened-cast peel (Aggregate)
+================================================================================================
+
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C1 p=10 s=2 p'=18:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          3880           3888         
 13          2.6         388.0       1.0X
+widened cast, peel off                             3974           4008         
 20          2.5         397.4       1.0X
+widened cast, peel on                              3920           3934         
  8          2.6         392.0       1.0X
+
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C2 p=10 s=2 p'=28:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          3877           3887         
 11          2.6         387.7       1.0X
+widened cast, peel off                             3959           3978         
 20          2.5         395.9       1.0X
+widened cast, peel on                              3880           3951         
 40          2.6         388.0       1.0X
+
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C3 p=18 s=2 p'=28:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          3632           3651         
 14          2.8         363.2       1.0X
+widened cast, peel off                             3623           3637         
 15          2.8         362.3       1.0X
+widened cast, peel on                              3609           3621         
 14          2.8         360.9       1.0X
+
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+C4 p=10 s=2 p'=38:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          3849           3855         
 10          2.6         384.9       1.0X
+widened cast, peel off                             3856           3873         
 10          2.6         385.6       1.0X
+widened cast, peel on                              3835           3846         
  8          2.6         383.5       1.0X
+
+
+================================================================================================
+DecimalAggregates MAX widened-cast peel (Aggregate)
+================================================================================================
+
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D1 p=10 s=2 p'=18:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          3810           3832         
 13          2.6         381.0       1.0X
+widened cast, peel off                             3854           3875         
 34          2.6         385.4       1.0X
+widened cast, peel on                              3785           3821         
 36          2.6         378.5       1.0X
+
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D2 p=10 s=2 p'=28:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          3830           3833         
  3          2.6         383.0       1.0X
+widened cast, peel off                             3908           3910         
  3          2.6         390.8       1.0X
+widened cast, peel on                              3808           3845         
 21          2.6         380.8       1.0X
+
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D3 p=18 s=2 p'=28:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          3611           3628         
 19          2.8         361.1       1.0X
+widened cast, peel off                             3664           3685         
 18          2.7         366.4       1.0X
+widened cast, peel on                              3620           3628         
  9          2.8         362.0       1.0X
+
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 7763 64-Core Processor
+D4 p=10 s=2 p'=38:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+native (no cast, rule on)                          3831           3844         
 12          2.6         383.1       1.0X
+widened cast, peel off                             3904           3919         
 10          2.6         390.4       1.0X
+widened cast, peel on                              3792           3816         
 24          2.6         379.2       1.0X
 
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DecimalAggregatesBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DecimalAggregatesBenchmark.scala
index e006787dbfa1..af76955e8f05 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DecimalAggregatesBenchmark.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DecimalAggregatesBenchmark.scala
@@ -51,6 +51,11 @@ import org.apache.spark.sql.types.Decimal
  *   B -- Aggregate AVG widened-cast sweep (`pPrime + 4 <= MAX_DOUBLE_DIGITS`
  *        so the rule fires only inside the existing AVG Double-regime
  *        envelope; wider casts stay on the Decimal-exact path).
+ *   C -- Aggregate MIN widened-cast sweep (no regime guard: the MIN arm
+ *        peels for any `pPrime >= p` same-scale widening; expected
+ *        BigInteger-domain outer (`pPrime > MAX_LONG_DIGITS = 18`) is the
+ *        main saving path per design).
+ *   D -- Aggregate MAX widened-cast sweep (mirrors C).
  *
  * NOTE on Window arm: the optimizer does not extend widened-Cast peel to
  * the Window arm (see DecimalAggregates rule comment) because the analyzer
@@ -121,6 +126,40 @@ object DecimalAggregatesBenchmark extends 
SqlBasedBenchmark {
     ("B4 p=5 s=0 p'=11", 5, 0, 11)   // pPrime upper bound, zero scale
   )
 
+  /**
+   * Aggregate MIN cases: (label, p, s, widened p').
+   *
+   * MIN/MAX widened-cast peel has NO regime guard -- it peels for any
+   * `pPrime >= p` same-scale widening (`Optimizer.scala WidenedDecimalChild`).
+   * The main saving path is `pPrime > MAX_LONG_DIGITS = 18`, where the
+   * unrewritten plan would create a BigInteger-domain outer Decimal for
+   * every row, while the rewritten plan compares the inner Long-domain
+   * values and casts only the single aggregate result.
+   *
+   * Coverage:
+   *   - C1: inner Long, outer Long  -- weakest saving (sibling-compatible
+   *         baseline; the row-cast still goes through `changePrecision`
+   *         but stays in Long).
+   *   - C2: inner Long, outer BigInteger -- the main saving regime.
+   *   - C3: inner at Long boundary (p=18), outer BigInteger -- isolates
+   *         the outer-domain cost.
+   *   - C4: inner Long, outer at MAX_PRECISION=38 -- deepest BigInteger.
+   */
+  private val MinAggCases: Seq[(String, Int, Int, Int)] = Seq(
+    ("C1 p=10 s=2 p'=18", 10, 2, 18), // inner Long, outer Long (boundary)
+    ("C2 p=10 s=2 p'=28", 10, 2, 28), // inner Long, outer BigInteger (main 
saving)
+    ("C3 p=18 s=2 p'=28", 18, 2, 28), // inner Long max, outer BigInteger
+    ("C4 p=10 s=2 p'=38", 10, 2, 38)  // inner Long, outer MAX_PRECISION
+  )
+
+  /** Aggregate MAX cases: mirror C above. */
+  private val MaxAggCases: Seq[(String, Int, Int, Int)] = Seq(
+    ("D1 p=10 s=2 p'=18", 10, 2, 18),
+    ("D2 p=10 s=2 p'=28", 10, 2, 28),
+    ("D3 p=18 s=2 p'=28", 18, 2, 28),
+    ("D4 p=10 s=2 p'=38", 10, 2, 38)
+  )
+
   /** Clamp generator to `10^(p-s) - 1` so rand() * bound fits `DECIMAL(p, 
s)`. */
   private def unscaledBound(p: Int, s: Int): Long = {
     require(p - s >= 0, s"p=$p s=$s p-s must be non-negative")
@@ -207,5 +246,31 @@ object DecimalAggregatesBenchmark extends 
SqlBasedBenchmark {
           iters, apl)
       }
     }
+
+    // Section C -- Aggregate MIN widened-cast.
+    runBenchmark("DecimalAggregates MIN widened-cast peel (Aggregate)") {
+      MinAggCases.foreach { case (label, p, s, pPrime) =>
+        require(pPrime >= p, s"$label: p'=$pPrime must be >= p=$p (widening)")
+        require(pPrime <= 38, s"$label: p'=$pPrime exceeds MAX_PRECISION=38")
+        setupAggTable(spark, aN, p, s)
+        runThreeWay(label, aN,
+          nativeSql = "select min(x) from t",
+          widenedSql = s"select min(cast(x as decimal($pPrime, $s))) from t",
+          iters, apl)
+      }
+    }
+
+    // Section D -- Aggregate MAX widened-cast.
+    runBenchmark("DecimalAggregates MAX widened-cast peel (Aggregate)") {
+      MaxAggCases.foreach { case (label, p, s, pPrime) =>
+        require(pPrime >= p, s"$label: p'=$pPrime must be >= p=$p (widening)")
+        require(pPrime <= 38, s"$label: p'=$pPrime exceeds MAX_PRECISION=38")
+        setupAggTable(spark, aN, p, s)
+        runThreeWay(label, aN,
+          nativeSql = "select max(x) from t",
+          widenedSql = s"select max(cast(x as decimal($pPrime, $s))) from t",
+          iters, apl)
+      }
+    }
   }
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to