xsys created SPARK-40439: ---------------------------- Summary: DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame Key: SPARK-40439 URL: https://issues.apache.org/jira/browse/SPARK-40439 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1 Reporter: xsys
h3. Describe the bug We are trying to store a DECIMAL value {{33333333333.2222222222}} with more precision than what is defined in the schema: {{{}DECIMAL(20,10){}}}. This leads to a {{NULL}} value being stored if the table is created using DataFrames via {{{}spark-shell{}}}. However, it leads to the following exception if the table is created via {{{}spark-sql{}}}: {code:java} Failed in [insert into decimal_extra_precision select 33333333333.2222222222] java.lang.ArithmeticException: Decimal(expanded,33333333333.2222222222,21,10}) cannot be represented as Decimal(20, 10){code} h3. Step to reproduce: On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}: {code:java} create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC; insert into decimal_extra_precision select 33333333333.2222222222;{code} Execute the following: {code:java} create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC; insert into decimal_extra_precision select 33333333333.2222222222;{code} h3. Expected behavior We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) to behave consistently for the same data type & input combination ({{{}DECIMAL(20,10){}}} and {{{}33333333333.2222222222{}}}). Here is a simplified example in {{{}spark-shell{}}}, where insertion of the aforementioned decimal value evaluates to a {{{}NULL{}}}: {code:java} scala> import org.apache.spark.sql.{Row, SparkSession} import org.apache.spark.sql.{Row, SparkSession} scala> import org.apache.spark.sql.types._ import org.apache.spark.sql.types._ scala> val rdd = sc.parallelize(Seq(Row(BigDecimal("33333333333.2222222222")))) rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = ParallelCollectionRDD[0] at parallelize at <console>:27 scala> val schema = new StructType().add(StructField("c1", DecimalType(20, 10), true)) schema: org.apache.spark.sql.types.StructType = StructType(StructField(c1,DecimalType(20,10),true)) scala> val df = spark.createDataFrame(rdd, schema) df: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)] scala> df.show() +----+ | c1| +----+ |null| +----+ scala> df.write.mode("overwrite").format("orc").saveAsTable("decimal_extra_precision") 22/08/29 10:33:47 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. scala> spark.sql("select * from decimal_extra_precision;") res2: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)] {code} h3. Root Cause The exception is being raised from [Decimal|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala#L358-L373] ({{{}nullOnOverflow {}}}is controlled by {{spark.sql.ansi.enabled}} in [SQLConf|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2542-L2551].): {code:java} private[sql] def toPrecision( precision: Int, scale: Int, roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP, nullOnOverflow: Boolean = true, context: SQLQueryContext = null): Decimal = { val copy = clone() if (copy.changePrecision(precision, scale, roundMode)) { copy } else { if (nullOnOverflow) { null } else { throw QueryExecutionErrors.cannotChangeDecimalPrecisionError( this, precision, scale, context) } } }{code} The above function is invoked from [toPrecision|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L754-L756] (in Cast.scala). However, our attempt to insert {{33333333333.2222222222}} after setting {{spark.sql.ansi.enabled }}to False failed as well (which may be an independent issue). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org