[
https://issues.apache.org/jira/browse/SPARK-51945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18018865#comment-18018865
]
André Souprayane commented on SPARK-51945:
------------------------------------------
Hi Anwer,
What is the use case, please? If the rounding is done with the same scale as
the initial value, the value will remained the same. no?
André
> Precision Increase from Decimal(28,20) to Decimal(29,20) When Rounding to 20
> Decimal Places in Spark 3.5
> --------------------------------------------------------------------------------------------------------
>
> Key: SPARK-51945
> URL: https://issues.apache.org/jira/browse/SPARK-51945
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 3.5.0
> Reporter: Tarique Anwer
> Priority: Major
>
> In Apache Spark 3.5 (and Databricks Runtime 15.4 LTS), the round function
> increases the precision of a Decimal(28,20) column to Decimal(29,20) when
> rounding to 20 decimal places. This behavior differs from Spark 3.2
> (Databricks Runtime 10.4 LTS), where the output remains Decimal(28,20). The
> precision increase appears unnecessary and potentially a bug, as a
> Decimal(28,20) column cannot have a 21st decimal digit to trigger a
> carry-over requiring extra precision (e.g., 99999999.99999999999999999999 →
> 100000000.00000000000000000000).
> *Steps to Reproduce*
> * Create a DataFrame with Decimal(28,20) values.
> * Apply round(col, 20) to the column.
> * Check the output schema and values.
> {*}Example Code{*}:
>
>
> {code:java}
> from pyspark.sql.functions import *
> from pyspark.sql.types import DecimalType, IntegerType, StructType,
> StructField
> from decimal import Decimal
> schema = StructType( [ StructField("input", DecimalType(28, 20),
> True), StructField("scale", IntegerType(), True), ])
> df = spark.createDataFrame( [ (Decimal("99999999.9"), 1),
> (Decimal("99999999.99999999999999999990"), 20),
> (Decimal("99999999.99999999999999999994"), 20),
> (Decimal("99999999.99999999999999999995"), 20),
> (Decimal("99999999.99999999999999999996"), 20),
> (Decimal("99999999.99999999999999999999"), 20), ], schema,)
> df.printSchema()
> #root
> # |-- input: decimal(28,20) (nullable = true)
> # |-- scale: integer (nullable = true)
> df_1 = df.select(round(col("input"), 20).alias("input"))
> df_1.printSchema()
> #root
> # |-- input: decimal(29,20) (nullable = true)
> df_2 = df.withColumn("input", round(col("input"), 20))
> df_2.printSchema()
> #root
> # |-- input: decimal(29,20) (nullable = true)
> # |-- scale: integer (nullable = true){code}
>
>
> I'm not entirely sure, but is this possibly related to
> https://issues.apache.org/jira/browse/SPARK-39226??
> cc: [~cloud_fan]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]