[
https://issues.apache.org/jira/browse/SPARK-57023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kent Yao updated SPARK-57023:
-----------------------------
Description:
h3. Problem
When a decimal aggregate input is wrapped by a widening, scale-preserving
{{Cast}} (e.g. {{Cast(d as Decimal(p+k, s))}} where {{k >= 0}} and scale
unchanged), {{MIN}}/{{MAX}} computes the result in the wider type even though
the cast is order-preserving and the min/max value is bit-identical in the
narrower type. This bloats codegen / shuffle payload and wastes a
{{Decimal.changePrecision}} per row.
{{SUM}} and {{AVG}} already peel this pattern in {{DecimalAggregates}} (see
SPARK-3933 for the original arm and SPARK-56983 for the {{evalMode}}-aware
follow-up). {{MIN}}/{{MAX}} were never extended.
h3. Proposal
Extend {{DecimalAggregates}} with a {{MIN}}/{{MAX}} arm that peels the widening
{{Cast}} when:
* child precision/scale fit, and
* {{evalMode}} of the surrounding aggregate is preserved (mirrors SPARK-56983
semantics).
Refactor the existing SUM/AVG arms to share a {{WidenedDecimalChild}} extractor
(private object) so the new MIN/MAX arm reuses the predicate.
h3. Scope (non-goals)
* No changes to ANSI overflow semantics — peel is bit-identical only when scale
is preserved and child precision fits.
* No new SQL surface, no new SQLConf.
h3. References
* SPARK-3933 — original DecimalAggregates SUM/AVG peel
* SPARK-56983 — evalMode-preserving variant (sibling implementation)
Labels: decimal optimizer (was: )
Priority: Minor (was: Major)
> Peel scale-preserving widening decimal Cast in front of MIN/MAX
> ---------------------------------------------------------------
>
> Key: SPARK-57023
> URL: https://issues.apache.org/jira/browse/SPARK-57023
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 5.0.0
> Reporter: Kent Yao
> Priority: Minor
> Labels: decimal, optimizer
>
> h3. Problem
> When a decimal aggregate input is wrapped by a widening, scale-preserving
> {{Cast}} (e.g. {{Cast(d as Decimal(p+k, s))}} where {{k >= 0}} and scale
> unchanged), {{MIN}}/{{MAX}} computes the result in the wider type even though
> the cast is order-preserving and the min/max value is bit-identical in the
> narrower type. This bloats codegen / shuffle payload and wastes a
> {{Decimal.changePrecision}} per row.
> {{SUM}} and {{AVG}} already peel this pattern in {{DecimalAggregates}} (see
> SPARK-3933 for the original arm and SPARK-56983 for the {{evalMode}}-aware
> follow-up). {{MIN}}/{{MAX}} were never extended.
> h3. Proposal
> Extend {{DecimalAggregates}} with a {{MIN}}/{{MAX}} arm that peels the
> widening {{Cast}} when:
> * child precision/scale fit, and
> * {{evalMode}} of the surrounding aggregate is preserved (mirrors SPARK-56983
> semantics).
> Refactor the existing SUM/AVG arms to share a {{WidenedDecimalChild}}
> extractor (private object) so the new MIN/MAX arm reuses the predicate.
> h3. Scope (non-goals)
> * No changes to ANSI overflow semantics — peel is bit-identical only when
> scale is preserved and child precision fits.
> * No new SQL surface, no new SQLConf.
> h3. References
> * SPARK-3933 — original DecimalAggregates SUM/AVG peel
> * SPARK-56983 — evalMode-preserving variant (sibling implementation)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]