[ 
https://issues.apache.org/jira/browse/SPARK-51945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18018865#comment-18018865
 ] 

André Souprayane commented on SPARK-51945:
------------------------------------------

Hi Anwer,

What is the use case, please?  If the rounding is done with the same scale as 
the initial value, the value will remained the same. no?

 

André

 

> Precision Increase from Decimal(28,20) to Decimal(29,20) When Rounding to 20 
> Decimal Places in Spark 3.5
> --------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-51945
>                 URL: https://issues.apache.org/jira/browse/SPARK-51945
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.5.0
>            Reporter: Tarique Anwer
>            Priority: Major
>
> In Apache Spark 3.5 (and Databricks Runtime 15.4 LTS), the round function 
> increases the precision of a Decimal(28,20) column to Decimal(29,20) when 
> rounding to 20 decimal places. This behavior differs from Spark 3.2 
> (Databricks Runtime 10.4 LTS), where the output remains Decimal(28,20). The 
> precision increase appears unnecessary and potentially a bug, as a 
> Decimal(28,20) column cannot have a 21st decimal digit to trigger a 
> carry-over requiring extra precision (e.g., 99999999.99999999999999999999 → 
> 100000000.00000000000000000000).
> *Steps to Reproduce*
>  * Create a DataFrame with Decimal(28,20) values.
>  * Apply round(col, 20) to the column.
>  * Check the output schema and values.
> {*}Example Code{*}:
>  
>  
> {code:java}
> from pyspark.sql.functions import *
> from pyspark.sql.types import DecimalType, IntegerType, StructType, 
> StructField
> from decimal import Decimal
> schema = StructType(    [        StructField("input", DecimalType(28, 20), 
> True),        StructField("scale", IntegerType(), True),    ])
> df = spark.createDataFrame(    [        (Decimal("99999999.9"), 1),        
> (Decimal("99999999.99999999999999999990"), 20),                
> (Decimal("99999999.99999999999999999994"), 20),        
> (Decimal("99999999.99999999999999999995"), 20),        
> (Decimal("99999999.99999999999999999996"), 20),        
> (Decimal("99999999.99999999999999999999"), 20),            ],    schema,)
> df.printSchema() 
> #root
> # |-- input: decimal(28,20) (nullable = true)
> # |-- scale: integer (nullable = true)
> df_1 = df.select(round(col("input"), 20).alias("input"))
> df_1.printSchema()
> #root
> # |-- input: decimal(29,20) (nullable = true)
> df_2 = df.withColumn("input", round(col("input"), 20))
> df_2.printSchema()
> #root
> # |-- input: decimal(29,20) (nullable = true)
> # |-- scale: integer (nullable = true){code}
>  
>  
> I'm not entirely sure, but is this possibly related to 
> https://issues.apache.org/jira/browse/SPARK-39226??
> cc: [~cloud_fan] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to