[ 
https://issues.apache.org/jira/browse/SPARK-54750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-54750:
-----------------------------------
    Labels: pull-request-available  (was: )

> [SQL] ROUND returns NULL for Decimal values with low runtime precision
> ----------------------------------------------------------------------
>
>                 Key: SPARK-54750
>                 URL: https://issues.apache.org/jira/browse/SPARK-54750
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0, 3.3.4, 4.1.0, 4.0.0
>            Reporter: kim
>            Priority: Major
>              Labels: pull-request-available
>
> h4. Summary
> {{ROUND(PERCENTILE_APPROX(...), N)}} incorrectly returns {{NULL}} for certain 
> Decimal values (e.g., {{{}2.15{}}}) while working correctly for others (e.g., 
> {{{}2.151{}}}).
> h4. Reproduction
> {code:java}
> -- Returns NULL (BUG)
> SELECT ROUND(PERCENTILE_APPROX(2150 / 1000.0, 0.95), 3) as p95
> -- Returns 2.151 (OK)
> SELECT ROUND(PERCENTILE_APPROX(2151 / 1000.0, 0.95), 3) as p95 {code}
> h4. Affected Versions
>  * {*}Tested and confirmed{*}: 3.0.x, 3.3.4
>  * {*}Code review confirmed{*}: 4.0.0-preview, master branch 
> The buggy code has been present since at least Spark 3.0 and remains unfixed 
> in the current master branch.
> h4. Root Cause Analysis
> The bug is in {{RoundBase.nullSafeEval}} (mathExpressions.scala):
>  
> {code:java}
> case DecimalType.Fixed(p, s) =>  
> val decimal = input1.asInstanceOf[Decimal]  
> if (_scale >= 0) {    
>           decimal.toPrecision(decimal.precision, s, mode)  
>           // BUG: uses decimal.precision  
> } {code}
> {*}Problem{*}: The code uses {{decimal.precision}} (runtime input value's 
> precision) instead of {{p}} (schema-defined output precision).
> {*}Why it fails{*}:
>  # {{PERCENTILE_APPROX}} internally computes with Double, then converts back 
> to Decimal using {{{}Decimal(doubleValue){}}}.
>  # This creates a "compact" Decimal with minimal precision (e.g., {{2.15}} → 
> precision=3).
>  # When {{ROUND(..., 3)}} tries to expand to 3 decimal places, {{2.15}} 
> becomes {{2.150}} (needs precision=4).
>  # The buggy code limits precision to {{{}decimal.precision=3{}}}, causing 
> {{4 > 3}} overflow check to fail.
>  # Result: {{{}NULL{}}}.
> {*}Why some values work{*}:
>  * {{2.151}} has runtime precision=4, which is sufficient for the output 
> {{2.151}} (also precision=4).
>  * {{2.15}} has runtime precision=3, but needs precision=4 for output 
> {{{}2.150{}}}.
> h4. Fix:  
> sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
>  
> nullSafeEval: 
> {code:java}
> - decimal.toPrecision(decimal.precision, s, mode)
> + decimal.toPrecision(p, s, mode) {code}
>  
> doGenCode:
>  
> {code:java}
> - ${ev.value} = ${ce.value}.toPrecision(${ce.value}.precision(), $s,
> + ${ev.value} = ${ce.value}.toPrecision($p, $s,{code}
>  
> h4.  
> h4. Workaround
> Before this issue is fixed, users can work around the bug by casting to 
> {{DOUBLE}} before applying {{{}ROUND{}}}:
> {code:java}
> -- BUG: Returns NULL
> SELECT ROUND(PERCENTILE_APPROX(2150 / 1000.0, 0.95), 3) as p95
> -- WORKAROUND: Cast to DOUBLE first, returns 2.15
> SELECT ROUND(CAST(PERCENTILE_APPROX(2150 / 1000.0, 0.95) AS DOUBLE), 3) as 
> p95{code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to