[jira] [Comment Edited] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

xsys (Jira) Tue, 20 Sep 2022 10:24:39 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-40439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607314#comment-17607314
 ]


xsys edited comment on SPARK-40439 at 9/20/22 5:23 PM:
-------------------------------------------------------

[~hyukjin.kwon]: Thank you for your response! Setting 
{{spark.sql.storeAssignmentPolicy}} to LEGACY works. However, I believe it 
could get non-trivial for users to discover that 
{{spark.sql.storeAssignmentPolicy}} would work.

For instance, after inspecting the code, I thought nullOnOverflow is controlled 
by \{{spark.sql.ansi.enabled.}} I tried to achieve the desired behavior by 
altering it (but to no avail).

Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to 
the error message? 


was (Author: JIRAUSER288838):
[~hyukjin.kwon]: Thank you for your response! Setting 
{{spark.sql.storeAssignmentPolicy}} to LEGACY works. However, I believe it 
could get non-trivial for users to discover that 
{{spark.sql.storeAssignmentPolicy}} would work.

For instance, after inspecting the code, I thought nullOnOverflow is controlled 
by {{spark.sql.ansi.enabled. }}I tried to achieve the desired behavior by 
altering it (but to no avail).

Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to 
the error message? 

> DECIMAL value with more precision than what is defined in the schema raises 
> exception in SparkSQL but evaluates to NULL for DataFrame
> -------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-40439
>                 URL: https://issues.apache.org/jira/browse/SPARK-40439
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.1
>            Reporter: xsys
>            Priority: Major
>
> h3. Describe the bug
> We are trying to store a DECIMAL value {{33333333333.2222222222}} with more 
> precision than what is defined in the schema: {{{}DECIMAL(20,10){}}}. This 
> leads to a {{NULL}} value being stored if the table is created using 
> DataFrames via {{{}spark-shell{}}}. However, it leads to the following 
> exception if the table is created via {{{}spark-sql{}}}:
> {code:java}
> Failed in [insert into decimal_extra_precision select 33333333333.2222222222]
> java.lang.ArithmeticException: 
> Decimal(expanded,33333333333.2222222222,21,10}) cannot be represented as 
> Decimal(20, 10){code}
> h3. Step to reproduce:
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 33333333333.2222222222;{code}
> Execute the following:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 33333333333.2222222222;{code}
> h3. Expected behavior
> We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) 
> to behave consistently for the same data type & input combination 
> ({{{}DECIMAL(20,10){}}} and {{{}33333333333.2222222222{}}}). 
> Here is a simplified example in {{{}spark-shell{}}}, where insertion of the 
> aforementioned decimal value evaluates to a {{{}NULL{}}}:
> {code:java}
> scala> import org.apache.spark.sql.{Row, SparkSession}
> import org.apache.spark.sql.{Row, SparkSession}
> scala> import org.apache.spark.sql.types._
> import org.apache.spark.sql.types._
> scala> val rdd = 
> sc.parallelize(Seq(Row(BigDecimal("33333333333.2222222222"))))
> rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> ParallelCollectionRDD[0] at parallelize at <console>:27
> scala> val schema = new StructType().add(StructField("c1", DecimalType(20, 
> 10), true))
> schema: org.apache.spark.sql.types.StructType = 
> StructType(StructField(c1,DecimalType(20,10),true))
> scala> val df = spark.createDataFrame(rdd, schema)
> df: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> scala> df.show()
> +----+
> |  c1|
> +----+
> |null|
> +----+
> scala> 
> df.write.mode("overwrite").format("orc").saveAsTable("decimal_extra_precision")
> 22/08/29 10:33:47 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> scala> spark.sql("select * from decimal_extra_precision;")
> res2: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> {code}
> h3. Root Cause
> The exception is being raised from 
> [Decimal|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala#L358-L373]
>  ({{{}nullOnOverflow{}}} is controlled by {{spark.sql.ansi.enabled}} in 
> [SQLConf|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2542-L2551].):
> {code:java}
>   private[sql] def toPrecision(
>       precision: Int,
>       scale: Int,
>       roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP,
>       nullOnOverflow: Boolean = true,
>       context: SQLQueryContext = null): Decimal = {
>     val copy = clone()
>     if (copy.changePrecision(precision, scale, roundMode)) {
>       copy
>     } else {
>       if (nullOnOverflow) {
>         null
>       } else {
>         throw QueryExecutionErrors.cannotChangeDecimalPrecisionError(
>           this, precision, scale, context)
>       }
>     }
>   }{code}
> The above function is invoked from 
> [toPrecision|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L754-L756]
>  (in Cast.scala).  However, our attempt to insert {{33333333333.2222222222}} 
> after setting {{spark.sql.ansi.enabled}} _to {{False}}_  failed as well 
> (which may be an independent issue).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

Reply via email to