[jira] [Updated] (SPARK-51945) Precision Increase from Decimal(28,20) to Decimal(29,20) When Rounding to 20 Decimal Places in Spark 3.5

Tarique Anwer (Jira) Tue, 29 Apr 2025 02:49:18 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-51945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tarique Anwer updated SPARK-51945:
----------------------------------
    Description: 
In Apache Spark 3.5 (and Databricks Runtime 15.4 LTS), the round function 
increases the precision of a Decimal(28,20) column to Decimal(29,20) when 
rounding to 20 decimal places. This behavior differs from Spark 3.2 (Databricks 
Runtime 10.4 LTS), where the output remains Decimal(28,20). The precision 
increase appears unnecessary and potentially a bug, as a Decimal(28,20) column 
cannot have a 21st decimal digit to trigger a carry-over requiring extra 
precision (e.g., 99999999.99999999999999999999 → 
100000000.00000000000000000000).

*Steps to Reproduce*
 * Create a DataFrame with Decimal(28,20) values.
 * Apply round(col, 20) to the column.
 * Check the output schema and values.

{*}Example Code{*}:

 

 
{code:java}
from pyspark.sql.functions import *
from pyspark.sql.types import DecimalType, IntegerType, StructType, StructField
from decimal import Decimal

schema = StructType(    [        StructField("input", DecimalType(28, 20), 
True),        StructField("scale", IntegerType(), True),    ])

df = spark.createDataFrame(    [        (Decimal("99999999.9"), 1),        
(Decimal("99999999.99999999999999999990"), 20),                
(Decimal("99999999.99999999999999999994"), 20),        
(Decimal("99999999.99999999999999999995"), 20),        
(Decimal("99999999.99999999999999999996"), 20),        
(Decimal("99999999.99999999999999999999"), 20),            ],    schema,)

df.printSchema() 

#root
# |-- input: decimal(28,20) (nullable = true)
# |-- scale: integer (nullable = true)

df_1 = df.select(round(col("input"), 20).alias("input"))
df_1.printSchema()

#root
# |-- input: decimal(29,20) (nullable = true)

df_2 = df.withColumn("input", round(col("input"), 20))
df_2.printSchema()

#root
# |-- input: decimal(29,20) (nullable = true)
# |-- scale: integer (nullable = true){code}
 

 

I'm not entirely sure, but is this possibly related to 
https://issues.apache.org/jira/browse/SPARK-39226??

cc: [~cloud_fan] 

  was:
In Apache Spark 3.5 (and Databricks Runtime 15.4 LTS), the round function 
increases the precision of a Decimal(28,20) column to Decimal(29,20) when 
rounding to 20 decimal places. This behavior differs from Spark 3.2 (Databricks 
Runtime 10.4 LTS), where the output remains Decimal(28,20). The precision 
increase appears unnecessary and potentially a bug, as a Decimal(28,20) column 
cannot have a 21st decimal digit to trigger a carry-over requiring extra 
precision (e.g., 99999999.99999999999999999999 → 
100000000.00000000000000000000).

*Steps to Reproduce*
 * Create a DataFrame with Decimal(28,20) values.
 * Apply round(col, 20) to the column.
 * Check the output schema and values.

{*}Example Code{*}:

 

 
{code:java}
from pyspark.sql.functions import *
from pyspark.sql.types import DecimalType, IntegerType, StructType, StructField
from decimal import Decimal

schema = StructType(    [        StructField("input", DecimalType(28, 20), 
True),        StructField("scale", IntegerType(), True),    ])

df = spark.createDataFrame(    [        (Decimal("99999999.9"), 1),        
(Decimal("99999999.99999999999999999990"), 20),                
(Decimal("99999999.99999999999999999994"), 20),        
(Decimal("99999999.99999999999999999995"), 20),        
(Decimal("99999999.99999999999999999996"), 20),        
(Decimal("99999999.99999999999999999999"), 20),            ],    schema,)

df.printSchema() 

#root
# |-- input: decimal(28,20) (nullable = true)
# |-- scale: integer (nullable = true)

df_1 = df.select(round(col("input"), 20).alias("input"))
df_1.printSchema()

#root
# |-- input: decimal(29,20) (nullable = true)

df_2 = df.withColumn("input", round(col("input"), 20))
df_2.printSchema()

#root
# |-- input: decimal(29,20) (nullable = true)
# |-- scale: integer (nullable = true){code}
 

 

I'm not entirely sure, but is this possibly related to 
https://issues.apache.org/jira/browse/SPARK-39226??


> Precision Increase from Decimal(28,20) to Decimal(29,20) When Rounding to 20 
> Decimal Places in Spark 3.5
> --------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-51945
>                 URL: https://issues.apache.org/jira/browse/SPARK-51945
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.5.0
>            Reporter: Tarique Anwer
>            Priority: Major
>
> In Apache Spark 3.5 (and Databricks Runtime 15.4 LTS), the round function 
> increases the precision of a Decimal(28,20) column to Decimal(29,20) when 
> rounding to 20 decimal places. This behavior differs from Spark 3.2 
> (Databricks Runtime 10.4 LTS), where the output remains Decimal(28,20). The 
> precision increase appears unnecessary and potentially a bug, as a 
> Decimal(28,20) column cannot have a 21st decimal digit to trigger a 
> carry-over requiring extra precision (e.g., 99999999.99999999999999999999 → 
> 100000000.00000000000000000000).
> *Steps to Reproduce*
>  * Create a DataFrame with Decimal(28,20) values.
>  * Apply round(col, 20) to the column.
>  * Check the output schema and values.
> {*}Example Code{*}:
>  
>  
> {code:java}
> from pyspark.sql.functions import *
> from pyspark.sql.types import DecimalType, IntegerType, StructType, 
> StructField
> from decimal import Decimal
> schema = StructType(    [        StructField("input", DecimalType(28, 20), 
> True),        StructField("scale", IntegerType(), True),    ])
> df = spark.createDataFrame(    [        (Decimal("99999999.9"), 1),        
> (Decimal("99999999.99999999999999999990"), 20),                
> (Decimal("99999999.99999999999999999994"), 20),        
> (Decimal("99999999.99999999999999999995"), 20),        
> (Decimal("99999999.99999999999999999996"), 20),        
> (Decimal("99999999.99999999999999999999"), 20),            ],    schema,)
> df.printSchema() 
> #root
> # |-- input: decimal(28,20) (nullable = true)
> # |-- scale: integer (nullable = true)
> df_1 = df.select(round(col("input"), 20).alias("input"))
> df_1.printSchema()
> #root
> # |-- input: decimal(29,20) (nullable = true)
> df_2 = df.withColumn("input", round(col("input"), 20))
> df_2.printSchema()
> #root
> # |-- input: decimal(29,20) (nullable = true)
> # |-- scale: integer (nullable = true){code}
>  
>  
> I'm not entirely sure, but is this possibly related to 
> https://issues.apache.org/jira/browse/SPARK-39226??
> cc: [~cloud_fan] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51945) Precision Increase from Decimal(28,20) to Decimal(29,20) When Rounding to 20 Decimal Places in Spark 3.5

Reply via email to