[ https://issues.apache.org/jira/browse/SPARK-40439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607314#comment-17607314 ]
xsys edited comment on SPARK-40439 at 9/20/22 5:23 PM: ------------------------------------------------------- [~hyukjin.kwon]: Thank you for your response! Setting {{spark.sql.storeAssignmentPolicy}} to LEGACY works. However, I believe it could get non-trivial for users to discover that {{spark.sql.storeAssignmentPolicy}} would work. For instance, after inspecting the code, I thought nullOnOverflow is controlled by \{{spark.sql.ansi.enabled.}} I tried to achieve the desired behavior by altering it (but to no avail). Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to the error message? was (Author: JIRAUSER288838): [~hyukjin.kwon]: Thank you for your response! Setting {{spark.sql.storeAssignmentPolicy}} to LEGACY works. However, I believe it could get non-trivial for users to discover that {{spark.sql.storeAssignmentPolicy}} would work. For instance, after inspecting the code, I thought nullOnOverflow is controlled by {{spark.sql.ansi.enabled. }}I tried to achieve the desired behavior by altering it (but to no avail). Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to the error message? > DECIMAL value with more precision than what is defined in the schema raises > exception in SparkSQL but evaluates to NULL for DataFrame > ------------------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-40439 > URL: https://issues.apache.org/jira/browse/SPARK-40439 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.2.1 > Reporter: xsys > Priority: Major > > h3. Describe the bug > We are trying to store a DECIMAL value {{33333333333.2222222222}} with more > precision than what is defined in the schema: {{{}DECIMAL(20,10){}}}. This > leads to a {{NULL}} value being stored if the table is created using > DataFrames via {{{}spark-shell{}}}. However, it leads to the following > exception if the table is created via {{{}spark-sql{}}}: > {code:java} > Failed in [insert into decimal_extra_precision select 33333333333.2222222222] > java.lang.ArithmeticException: > Decimal(expanded,33333333333.2222222222,21,10}) cannot be represented as > Decimal(20, 10){code} > h3. Step to reproduce: > On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}: > {code:java} > create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC; > insert into decimal_extra_precision select 33333333333.2222222222;{code} > Execute the following: > {code:java} > create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC; > insert into decimal_extra_precision select 33333333333.2222222222;{code} > h3. Expected behavior > We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) > to behave consistently for the same data type & input combination > ({{{}DECIMAL(20,10){}}} and {{{}33333333333.2222222222{}}}). > Here is a simplified example in {{{}spark-shell{}}}, where insertion of the > aforementioned decimal value evaluates to a {{{}NULL{}}}: > {code:java} > scala> import org.apache.spark.sql.{Row, SparkSession} > import org.apache.spark.sql.{Row, SparkSession} > scala> import org.apache.spark.sql.types._ > import org.apache.spark.sql.types._ > scala> val rdd = > sc.parallelize(Seq(Row(BigDecimal("33333333333.2222222222")))) > rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = > ParallelCollectionRDD[0] at parallelize at <console>:27 > scala> val schema = new StructType().add(StructField("c1", DecimalType(20, > 10), true)) > schema: org.apache.spark.sql.types.StructType = > StructType(StructField(c1,DecimalType(20,10),true)) > scala> val df = spark.createDataFrame(rdd, schema) > df: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)] > scala> df.show() > +----+ > | c1| > +----+ > |null| > +----+ > scala> > df.write.mode("overwrite").format("orc").saveAsTable("decimal_extra_precision") > 22/08/29 10:33:47 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, > since hive.security.authorization.manager is set to instance of > HiveAuthorizerFactory. > scala> spark.sql("select * from decimal_extra_precision;") > res2: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)] > {code} > h3. Root Cause > The exception is being raised from > [Decimal|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala#L358-L373] > ({{{}nullOnOverflow{}}} is controlled by {{spark.sql.ansi.enabled}} in > [SQLConf|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2542-L2551].): > {code:java} > private[sql] def toPrecision( > precision: Int, > scale: Int, > roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP, > nullOnOverflow: Boolean = true, > context: SQLQueryContext = null): Decimal = { > val copy = clone() > if (copy.changePrecision(precision, scale, roundMode)) { > copy > } else { > if (nullOnOverflow) { > null > } else { > throw QueryExecutionErrors.cannotChangeDecimalPrecisionError( > this, precision, scale, context) > } > } > }{code} > The above function is invoked from > [toPrecision|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L754-L756] > (in Cast.scala). However, our attempt to insert {{33333333333.2222222222}} > after setting {{spark.sql.ansi.enabled}} _to {{False}}_ failed as well > (which may be an independent issue). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org