kim created SPARK-54750:
---------------------------
Summary: [SQL] ROUND returns NULL for Decimal values with low
runtime precision
Key: SPARK-54750
URL: https://issues.apache.org/jira/browse/SPARK-54750
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 4.0.0, 4.1.0, 3.3.4, 3.0.0
Reporter: kim
h4. Summary
{{ROUND(PERCENTILE_APPROX(...), N)}} incorrectly returns {{NULL}} for certain
Decimal values (e.g., {{{}2.15{}}}) while working correctly for others (e.g.,
{{{}2.151{}}}).
h4. Reproduction
{code:java}
-- Returns NULL (BUG)
SELECT ROUND(PERCENTILE_APPROX(2150 / 1000.0, 0.95), 3) as p95
-- Returns 2.151 (OK)
SELECT ROUND(PERCENTILE_APPROX(2151 / 1000.0, 0.95), 3) as p95 {code}
h4. Affected Versions
* {*}Tested and confirmed{*}: 3.0.x, 3.3.4
* {*}Code review confirmed{*}: 4.0.0-preview, master branch
The buggy code has been present since at least Spark 3.0 and remains unfixed in
the current master branch.
h4. Root Cause Analysis
The bug is in {{RoundBase.nullSafeEval}} (mathExpressions.scala):
{code:java}
case DecimalType.Fixed(p, s) =>
val decimal = input1.asInstanceOf[Decimal]
if (_scale >= 0) {
decimal.toPrecision(decimal.precision, s, mode)
// BUG: uses decimal.precision
} {code}
{*}Problem{*}: The code uses {{decimal.precision}} (runtime input value's
precision) instead of {{p}} (schema-defined output precision).
{*}Why it fails{*}:
# {{PERCENTILE_APPROX}} internally computes with Double, then converts back to
Decimal using {{{}Decimal(doubleValue){}}}.
# This creates a "compact" Decimal with minimal precision (e.g., {{2.15}} →
precision=3).
# When {{ROUND(..., 3)}} tries to expand to 3 decimal places, {{2.15}} becomes
{{2.150}} (needs precision=4).
# The buggy code limits precision to {{{}decimal.precision=3{}}}, causing {{4
> 3}} overflow check to fail.
# Result: {{{}NULL{}}}.
{*}Why some values work{*}:
* {{2.151}} has runtime precision=4, which is sufficient for the output
{{2.151}} (also precision=4).
* {{2.15}} has runtime precision=3, but needs precision=4 for output
{{{}2.150{}}}.
h4. Fix:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
{code:java}
- decimal.toPrecision(decimal.precision, s, mode)
+ decimal.toPrecision(p, s, mode) {code}
h4. Workaround
Before this issue is fixed, users can work around the bug by casting to
{{DOUBLE}} before applying {{{}ROUND{}}}:
{code:java}
-- BUG: Returns NULL
SELECT ROUND(PERCENTILE_APPROX(2150 / 1000.0, 0.95), 3) as p95
-- WORKAROUND: Cast to DOUBLE first, returns 2.15
SELECT ROUND(CAST(PERCENTILE_APPROX(2150 / 1000.0, 0.95) AS DOUBLE), 3) as
p95{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]