[jira] [Created] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

xsys (Jira) Wed, 14 Sep 2022 21:31:05 -0700

xsys created SPARK-40439:
----------------------------

             Summary: DECIMAL value with more precision than what is defined in 
the schema raises exception in SparkSQL but evaluates to NULL for DataFrame
                 Key: SPARK-40439
                 URL: https://issues.apache.org/jira/browse/SPARK-40439
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.2.1
            Reporter: xsys



h3. Describe the bug

We are trying to store a DECIMAL value {{33333333333.2222222222}} with more 
precision than what is defined in the schema: {{{}DECIMAL(20,10){}}}. This 
leads to a {{NULL}} value being stored if the table is created using DataFrames 
via {{{}spark-shell{}}}. However, it leads to the following exception if the 
table is created via {{{}spark-sql{}}}:

 
{code:java}
Failed in [insert into decimal_extra_precision select 33333333333.2222222222]
java.lang.ArithmeticException: Decimal(expanded,33333333333.2222222222,21,10}) 
cannot be represented as Decimal(20, 10){code}
 
h3. Step to reproduce:

On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
{code:java}
create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
insert into decimal_extra_precision select 33333333333.2222222222;{code}
Execute the following:
{code:java}
create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
insert into decimal_extra_precision select 33333333333.2222222222;{code}
h3. Expected behavior

We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) to 
behave consistently for the same data type & input combination 
({{{}DECIMAL(20,10){}}} and {{{}33333333333.2222222222{}}}). 

Here is a simplified example in {{{}spark-shell{}}}, where insertion of the 
aforementioned decimal value evaluates to a {{{}NULL{}}}:

 
{code:java}
scala> import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.{Row, SparkSession}
scala> import org.apache.spark.sql.types._
import org.apache.spark.sql.types._
scala> val rdd = sc.parallelize(Seq(Row(BigDecimal("33333333333.2222222222"))))
rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
ParallelCollectionRDD[0] at parallelize at <console>:27
scala> val schema = new StructType().add(StructField("c1", DecimalType(20, 10), 
true))
schema: org.apache.spark.sql.types.StructType = 
StructType(StructField(c1,DecimalType(20,10),true))
scala> val df = spark.createDataFrame(rdd, schema)
df: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
scala> df.show()
+----+
|  c1|
+----+
|null|
+----+
scala> 
df.write.mode("overwrite").format("orc").saveAsTable("decimal_extra_precision")
22/08/29 10:33:47 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
since hive.security.authorization.manager is set to instance of 
HiveAuthorizerFactory.
scala> spark.sql("select * from decimal_extra_precision;")
res2: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
{code}
 
h3. Root Cause

The exception is being raised from 
[Decimal|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala#L358-L373]
 ({{{}nullOnOverflow {}}}is controlled by {{spark.sql.ansi.enabled}} in 
[SQLConf|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2542-L2551].):

 
{code:java}
  private[sql] def toPrecision(
      precision: Int,
      scale: Int,
      roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP,
      nullOnOverflow: Boolean = true,
      context: SQLQueryContext = null): Decimal = {
    val copy = clone()
    if (copy.changePrecision(precision, scale, roundMode)) {
      copy
    } else {
      if (nullOnOverflow) {
        null
      } else {
        throw QueryExecutionErrors.cannotChangeDecimalPrecisionError(
          this, precision, scale, context)
      }
    }
  }{code}
 

The above function is invoked from 
[toPrecision|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L754-L756]
 (in Cast.scala).  However, our attempt to insert {{33333333333.2222222222}} 
after setting {{spark.sql.ansi.enabled }}to False failed as well (which may be 
an independent issue).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

Reply via email to