[ 
https://issues.apache.org/jira/browse/SPARK-54750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kim updated SPARK-54750:
------------------------
    Description: 
h4. Summary

{{ROUND(PERCENTILE_APPROX(...), N)}} incorrectly returns {{NULL}} for certain 
Decimal values (e.g., {{{}2.15{}}}) while working correctly for others (e.g., 
{{{}2.151{}}}).
h4. Reproduction
{code:java}
-- Returns NULL (BUG)

SELECT ROUND(PERCENTILE_APPROX(2150 / 1000.0, 0.95), 3) as p95

-- Returns 2.151 (OK)

SELECT ROUND(PERCENTILE_APPROX(2151 / 1000.0, 0.95), 3) as p95 {code}
h4. Affected Versions
 * {*}Tested and confirmed{*}: 3.0.x, 3.3.4
 * {*}Code review confirmed{*}: 4.0.0-preview, master branch 

The buggy code has been present since at least Spark 3.0 and remains unfixed in 
the current master branch.
h4. Root Cause Analysis

The bug is in {{RoundBase.nullSafeEval}} (mathExpressions.scala):

 
{code:java}
case DecimalType.Fixed(p, s) =>  
val decimal = input1.asInstanceOf[Decimal]  
if (_scale >= 0) {    
          decimal.toPrecision(decimal.precision, s, mode)  
          // BUG: uses decimal.precision  
} {code}
{*}Problem{*}: The code uses {{decimal.precision}} (runtime input value's 
precision) instead of {{p}} (schema-defined output precision).

{*}Why it fails{*}:
 # {{PERCENTILE_APPROX}} internally computes with Double, then converts back to 
Decimal using {{{}Decimal(doubleValue){}}}.
 # This creates a "compact" Decimal with minimal precision (e.g., {{2.15}} → 
precision=3).
 # When {{ROUND(..., 3)}} tries to expand to 3 decimal places, {{2.15}} becomes 
{{2.150}} (needs precision=4).
 # The buggy code limits precision to {{{}decimal.precision=3{}}}, causing {{4 
> 3}} overflow check to fail.
 # Result: {{{}NULL{}}}.

{*}Why some values work{*}:
 * {{2.151}} has runtime precision=4, which is sufficient for the output 
{{2.151}} (also precision=4).
 * {{2.15}} has runtime precision=3, but needs precision=4 for output 
{{{}2.150{}}}.

h4. Fix:  
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 
nullSafeEval: 
{code:java}
- decimal.toPrecision(decimal.precision, s, mode)
+ decimal.toPrecision(p, s, mode) {code}
 
doGenCode:
 
{code:java}
- ${ev.value} = ${ce.value}.toPrecision(${ce.value}.precision(), $s,
+ ${ev.value} = ${ce.value}.toPrecision($p, $s,{code}
 
h4.  
h4. Workaround

Before this issue is fixed, users can work around the bug by casting to 
{{DOUBLE}} before applying {{{}ROUND{}}}:
{code:java}
-- BUG: Returns NULL
SELECT ROUND(PERCENTILE_APPROX(2150 / 1000.0, 0.95), 3) as p95

-- WORKAROUND: Cast to DOUBLE first, returns 2.15
SELECT ROUND(CAST(PERCENTILE_APPROX(2150 / 1000.0, 0.95) AS DOUBLE), 3) as 
p95{code}
 

 

 

 

  was:
h4. Summary

{{ROUND(PERCENTILE_APPROX(...), N)}} incorrectly returns {{NULL}} for certain 
Decimal values (e.g., {{{}2.15{}}}) while working correctly for others (e.g., 
{{{}2.151{}}}).
h4. Reproduction
{code:java}
-- Returns NULL (BUG)

SELECT ROUND(PERCENTILE_APPROX(2150 / 1000.0, 0.95), 3) as p95

-- Returns 2.151 (OK)

SELECT ROUND(PERCENTILE_APPROX(2151 / 1000.0, 0.95), 3) as p95 {code}
h4. Affected Versions
 * {*}Tested and confirmed{*}: 3.0.x, 3.3.4
 * {*}Code review confirmed{*}: 4.0.0-preview, master branch 

The buggy code has been present since at least Spark 3.0 and remains unfixed in 
the current master branch.
h4. Root Cause Analysis

The bug is in {{RoundBase.nullSafeEval}} (mathExpressions.scala):

 
{code:java}
case DecimalType.Fixed(p, s) =>  
val decimal = input1.asInstanceOf[Decimal]  
if (_scale >= 0) {    
          decimal.toPrecision(decimal.precision, s, mode)  
          // BUG: uses decimal.precision  
} {code}
{*}Problem{*}: The code uses {{decimal.precision}} (runtime input value's 
precision) instead of {{p}} (schema-defined output precision).

{*}Why it fails{*}:
 # {{PERCENTILE_APPROX}} internally computes with Double, then converts back to 
Decimal using {{{}Decimal(doubleValue){}}}.
 # This creates a "compact" Decimal with minimal precision (e.g., {{2.15}} → 
precision=3).
 # When {{ROUND(..., 3)}} tries to expand to 3 decimal places, {{2.15}} becomes 
{{2.150}} (needs precision=4).
 # The buggy code limits precision to {{{}decimal.precision=3{}}}, causing {{4 
> 3}} overflow check to fail.
 # Result: {{{}NULL{}}}.

{*}Why some values work{*}:
 * {{2.151}} has runtime precision=4, which is sufficient for the output 
{{2.151}} (also precision=4).
 * {{2.15}} has runtime precision=3, but needs precision=4 for output 
{{{}2.150{}}}.

h4. Fix:  
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala

 
{code:java}
- decimal.toPrecision(decimal.precision, s, mode)
+ decimal.toPrecision(p, s, mode) {code}
h4. Workaround

Before this issue is fixed, users can work around the bug by casting to 
{{DOUBLE}} before applying {{{}ROUND{}}}:
{code:java}
-- BUG: Returns NULL
SELECT ROUND(PERCENTILE_APPROX(2150 / 1000.0, 0.95), 3) as p95

-- WORKAROUND: Cast to DOUBLE first, returns 2.15
SELECT ROUND(CAST(PERCENTILE_APPROX(2150 / 1000.0, 0.95) AS DOUBLE), 3) as 
p95{code}
 

 

 

 


> [SQL] ROUND returns NULL for Decimal values with low runtime precision
> ----------------------------------------------------------------------
>
>                 Key: SPARK-54750
>                 URL: https://issues.apache.org/jira/browse/SPARK-54750
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0, 3.3.4, 4.1.0, 4.0.0
>            Reporter: kim
>            Priority: Major
>
> h4. Summary
> {{ROUND(PERCENTILE_APPROX(...), N)}} incorrectly returns {{NULL}} for certain 
> Decimal values (e.g., {{{}2.15{}}}) while working correctly for others (e.g., 
> {{{}2.151{}}}).
> h4. Reproduction
> {code:java}
> -- Returns NULL (BUG)
> SELECT ROUND(PERCENTILE_APPROX(2150 / 1000.0, 0.95), 3) as p95
> -- Returns 2.151 (OK)
> SELECT ROUND(PERCENTILE_APPROX(2151 / 1000.0, 0.95), 3) as p95 {code}
> h4. Affected Versions
>  * {*}Tested and confirmed{*}: 3.0.x, 3.3.4
>  * {*}Code review confirmed{*}: 4.0.0-preview, master branch 
> The buggy code has been present since at least Spark 3.0 and remains unfixed 
> in the current master branch.
> h4. Root Cause Analysis
> The bug is in {{RoundBase.nullSafeEval}} (mathExpressions.scala):
>  
> {code:java}
> case DecimalType.Fixed(p, s) =>  
> val decimal = input1.asInstanceOf[Decimal]  
> if (_scale >= 0) {    
>           decimal.toPrecision(decimal.precision, s, mode)  
>           // BUG: uses decimal.precision  
> } {code}
> {*}Problem{*}: The code uses {{decimal.precision}} (runtime input value's 
> precision) instead of {{p}} (schema-defined output precision).
> {*}Why it fails{*}:
>  # {{PERCENTILE_APPROX}} internally computes with Double, then converts back 
> to Decimal using {{{}Decimal(doubleValue){}}}.
>  # This creates a "compact" Decimal with minimal precision (e.g., {{2.15}} → 
> precision=3).
>  # When {{ROUND(..., 3)}} tries to expand to 3 decimal places, {{2.15}} 
> becomes {{2.150}} (needs precision=4).
>  # The buggy code limits precision to {{{}decimal.precision=3{}}}, causing 
> {{4 > 3}} overflow check to fail.
>  # Result: {{{}NULL{}}}.
> {*}Why some values work{*}:
>  * {{2.151}} has runtime precision=4, which is sufficient for the output 
> {{2.151}} (also precision=4).
>  * {{2.15}} has runtime precision=3, but needs precision=4 for output 
> {{{}2.150{}}}.
> h4. Fix:  
> sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
>  
> nullSafeEval: 
> {code:java}
> - decimal.toPrecision(decimal.precision, s, mode)
> + decimal.toPrecision(p, s, mode) {code}
>  
> doGenCode:
>  
> {code:java}
> - ${ev.value} = ${ce.value}.toPrecision(${ce.value}.precision(), $s,
> + ${ev.value} = ${ce.value}.toPrecision($p, $s,{code}
>  
> h4.  
> h4. Workaround
> Before this issue is fixed, users can work around the bug by casting to 
> {{DOUBLE}} before applying {{{}ROUND{}}}:
> {code:java}
> -- BUG: Returns NULL
> SELECT ROUND(PERCENTILE_APPROX(2150 / 1000.0, 0.95), 3) as p95
> -- WORKAROUND: Cast to DOUBLE first, returns 2.15
> SELECT ROUND(CAST(PERCENTILE_APPROX(2150 / 1000.0, 0.95) AS DOUBLE), 3) as 
> p95{code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to