[ https://issues.apache.org/jira/browse/SPARK-51945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tarique Anwer updated SPARK-51945: ---------------------------------- Description: In Apache Spark 3.5 (and Databricks Runtime 15.4 LTS), the round function increases the precision of a Decimal(28,20) column to Decimal(29,20) when rounding to 20 decimal places. This behavior differs from Spark 3.2 (Databricks Runtime 10.4 LTS), where the output remains Decimal(28,20). The precision increase appears unnecessary and potentially a bug, as a Decimal(28,20) column cannot have a 21st decimal digit to trigger a carry-over requiring extra precision (e.g., 99999999.99999999999999999999 → 100000000.00000000000000000000). *Steps to Reproduce* * Create a DataFrame with Decimal(28,20) values. * Apply round(col, 20) to the column. * Check the output schema and values. {*}Example Code{*}: {code:java} from pyspark.sql.functions import * from pyspark.sql.types import DecimalType, IntegerType, StructType, StructField from decimal import Decimal schema = StructType( [ StructField("input", DecimalType(28, 20), True), StructField("scale", IntegerType(), True), ]) df = spark.createDataFrame( [ (Decimal("99999999.9"), 1), (Decimal("99999999.99999999999999999990"), 20), (Decimal("99999999.99999999999999999994"), 20), (Decimal("99999999.99999999999999999995"), 20), (Decimal("99999999.99999999999999999996"), 20), (Decimal("99999999.99999999999999999999"), 20), ], schema,) df.printSchema() #root # |-- input: decimal(28,20) (nullable = true) # |-- scale: integer (nullable = true) df_1 = df.select(round(col("input"), 20).alias("input")) df_1.printSchema() #root # |-- input: decimal(29,20) (nullable = true) df_2 = df.withColumn("input", round(col("input"), 20)) df_2.printSchema() #root # |-- input: decimal(29,20) (nullable = true) # |-- scale: integer (nullable = true){code} I'm not entirely sure, but is this possibly related to https://issues.apache.org/jira/browse/SPARK-39226?? cc: [~cloud_fan] was: In Apache Spark 3.5 (and Databricks Runtime 15.4 LTS), the round function increases the precision of a Decimal(28,20) column to Decimal(29,20) when rounding to 20 decimal places. This behavior differs from Spark 3.2 (Databricks Runtime 10.4 LTS), where the output remains Decimal(28,20). The precision increase appears unnecessary and potentially a bug, as a Decimal(28,20) column cannot have a 21st decimal digit to trigger a carry-over requiring extra precision (e.g., 99999999.99999999999999999999 → 100000000.00000000000000000000). *Steps to Reproduce* * Create a DataFrame with Decimal(28,20) values. * Apply round(col, 20) to the column. * Check the output schema and values. {*}Example Code{*}: {code:java} from pyspark.sql.functions import * from pyspark.sql.types import DecimalType, IntegerType, StructType, StructField from decimal import Decimal schema = StructType( [ StructField("input", DecimalType(28, 20), True), StructField("scale", IntegerType(), True), ]) df = spark.createDataFrame( [ (Decimal("99999999.9"), 1), (Decimal("99999999.99999999999999999990"), 20), (Decimal("99999999.99999999999999999994"), 20), (Decimal("99999999.99999999999999999995"), 20), (Decimal("99999999.99999999999999999996"), 20), (Decimal("99999999.99999999999999999999"), 20), ], schema,) df.printSchema() #root # |-- input: decimal(28,20) (nullable = true) # |-- scale: integer (nullable = true) df_1 = df.select(round(col("input"), 20).alias("input")) df_1.printSchema() #root # |-- input: decimal(29,20) (nullable = true) df_2 = df.withColumn("input", round(col("input"), 20)) df_2.printSchema() #root # |-- input: decimal(29,20) (nullable = true) # |-- scale: integer (nullable = true){code} I'm not entirely sure, but is this possibly related to https://issues.apache.org/jira/browse/SPARK-39226?? > Precision Increase from Decimal(28,20) to Decimal(29,20) When Rounding to 20 > Decimal Places in Spark 3.5 > -------------------------------------------------------------------------------------------------------- > > Key: SPARK-51945 > URL: https://issues.apache.org/jira/browse/SPARK-51945 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.5.0 > Reporter: Tarique Anwer > Priority: Major > > In Apache Spark 3.5 (and Databricks Runtime 15.4 LTS), the round function > increases the precision of a Decimal(28,20) column to Decimal(29,20) when > rounding to 20 decimal places. This behavior differs from Spark 3.2 > (Databricks Runtime 10.4 LTS), where the output remains Decimal(28,20). The > precision increase appears unnecessary and potentially a bug, as a > Decimal(28,20) column cannot have a 21st decimal digit to trigger a > carry-over requiring extra precision (e.g., 99999999.99999999999999999999 → > 100000000.00000000000000000000). > *Steps to Reproduce* > * Create a DataFrame with Decimal(28,20) values. > * Apply round(col, 20) to the column. > * Check the output schema and values. > {*}Example Code{*}: > > > {code:java} > from pyspark.sql.functions import * > from pyspark.sql.types import DecimalType, IntegerType, StructType, > StructField > from decimal import Decimal > schema = StructType( [ StructField("input", DecimalType(28, 20), > True), StructField("scale", IntegerType(), True), ]) > df = spark.createDataFrame( [ (Decimal("99999999.9"), 1), > (Decimal("99999999.99999999999999999990"), 20), > (Decimal("99999999.99999999999999999994"), 20), > (Decimal("99999999.99999999999999999995"), 20), > (Decimal("99999999.99999999999999999996"), 20), > (Decimal("99999999.99999999999999999999"), 20), ], schema,) > df.printSchema() > #root > # |-- input: decimal(28,20) (nullable = true) > # |-- scale: integer (nullable = true) > df_1 = df.select(round(col("input"), 20).alias("input")) > df_1.printSchema() > #root > # |-- input: decimal(29,20) (nullable = true) > df_2 = df.withColumn("input", round(col("input"), 20)) > df_2.printSchema() > #root > # |-- input: decimal(29,20) (nullable = true) > # |-- scale: integer (nullable = true){code} > > > I'm not entirely sure, but is this possibly related to > https://issues.apache.org/jira/browse/SPARK-39226?? > cc: [~cloud_fan] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org