[jira] [Updated] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

xsys (Jira) Wed, 14 Sep 2022 21:33:07 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-40439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


xsys updated SPARK-40439:
-------------------------
    Description: 
h3. Describe the bug

We are trying to store a DECIMAL value {{33333333333.2222222222}} with more 
precision than what is defined in the schema: {{{}DECIMAL(20,10){}}}. This 
leads to a {{NULL}} value being stored if the table is created using DataFrames 
via {{{}spark-shell{}}}. However, it leads to the following exception if the 
table is created via {{{}spark-sql{}}}:
{code:java}
Failed in [insert into decimal_extra_precision select 33333333333.2222222222]
java.lang.ArithmeticException: Decimal(expanded,33333333333.2222222222,21,10}) 
cannot be represented as Decimal(20, 10){code}
h3. Step to reproduce:

On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
{code:java}
create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
insert into decimal_extra_precision select 33333333333.2222222222;{code}
Execute the following:
{code:java}
create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
insert into decimal_extra_precision select 33333333333.2222222222;{code}
h3. Expected behavior

We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) to 
behave consistently for the same data type & input combination 
({{{}DECIMAL(20,10){}}} and {{{}33333333333.2222222222{}}}). 

Here is a simplified example in {{{}spark-shell{}}}, where insertion of the 
aforementioned decimal value evaluates to a {{{}NULL{}}}:
{code:java}
scala> import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.{Row, SparkSession}
scala> import org.apache.spark.sql.types._
import org.apache.spark.sql.types._
scala> val rdd = sc.parallelize(Seq(Row(BigDecimal("33333333333.2222222222"))))
rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
ParallelCollectionRDD[0] at parallelize at <console>:27
scala> val schema = new StructType().add(StructField("c1", DecimalType(20, 10), 
true))
schema: org.apache.spark.sql.types.StructType = 
StructType(StructField(c1,DecimalType(20,10),true))
scala> val df = spark.createDataFrame(rdd, schema)
df: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
scala> df.show()
+----+
|  c1|
+----+
|null|
+----+
scala> 
df.write.mode("overwrite").format("orc").saveAsTable("decimal_extra_precision")
22/08/29 10:33:47 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
since hive.security.authorization.manager is set to instance of 
HiveAuthorizerFactory.
scala> spark.sql("select * from decimal_extra_precision;")
res2: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
{code}
h3. Root Cause

The exception is being raised from 
[Decimal|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala#L358-L373]
 ({{{}nullOnOverflow{}}} is controlled by {{spark.sql.ansi.enabled}} in 
[SQLConf|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2542-L2551].):
{code:java}
  private[sql] def toPrecision(
      precision: Int,
      scale: Int,
      roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP,
      nullOnOverflow: Boolean = true,
      context: SQLQueryContext = null): Decimal = {
    val copy = clone()
    if (copy.changePrecision(precision, scale, roundMode)) {
      copy
    } else {
      if (nullOnOverflow) {
        null
      } else {
        throw QueryExecutionErrors.cannotChangeDecimalPrecisionError(
          this, precision, scale, context)
      }
    }
  }{code}
The above function is invoked from 
[toPrecision|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L754-L756]
 (in Cast.scala).  However, our attempt to insert {{33333333333.2222222222}} 
after setting {{spark.sql.ansi.enabled}} _to {{False}}_  failed as well (which 
may be an independent issue).

 

 

 

 

  was:
h3. Describe the bug

We are trying to store a DECIMAL value {{33333333333.2222222222}} with more 
precision than what is defined in the schema: {{{}DECIMAL(20,10){}}}. This 
leads to a {{NULL}} value being stored if the table is created using DataFrames 
via {{{}spark-shell{}}}. However, it leads to the following exception if the 
table is created via {{{}spark-sql{}}}:
{code:java}
Failed in [insert into decimal_extra_precision select 33333333333.2222222222]
java.lang.ArithmeticException: Decimal(expanded,33333333333.2222222222,21,10}) 
cannot be represented as Decimal(20, 10){code}
h3. Step to reproduce:

On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
{code:java}
create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
insert into decimal_extra_precision select 33333333333.2222222222;{code}
Execute the following:
{code:java}
create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
insert into decimal_extra_precision select 33333333333.2222222222;{code}
h3. Expected behavior

We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) to 
behave consistently for the same data type & input combination 
({{{}DECIMAL(20,10){}}} and {{{}33333333333.2222222222{}}}). 

Here is a simplified example in {{{}spark-shell{}}}, where insertion of the 
aforementioned decimal value evaluates to a {{{}NULL{}}}:
{code:java}
scala> import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.{Row, SparkSession}
scala> import org.apache.spark.sql.types._
import org.apache.spark.sql.types._
scala> val rdd = sc.parallelize(Seq(Row(BigDecimal("33333333333.2222222222"))))
rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
ParallelCollectionRDD[0] at parallelize at <console>:27
scala> val schema = new StructType().add(StructField("c1", DecimalType(20, 10), 
true))
schema: org.apache.spark.sql.types.StructType = 
StructType(StructField(c1,DecimalType(20,10),true))
scala> val df = spark.createDataFrame(rdd, schema)
df: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
scala> df.show()
+----+
|  c1|
+----+
|null|
+----+
scala> 
df.write.mode("overwrite").format("orc").saveAsTable("decimal_extra_precision")
22/08/29 10:33:47 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
since hive.security.authorization.manager is set to instance of 
HiveAuthorizerFactory.
scala> spark.sql("select * from decimal_extra_precision;")
res2: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
{code}
h3. Root Cause

The exception is being raised from 
[Decimal|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala#L358-L373]
 ({{{}nullOnOverflow{}}} is controlled by {{spark.sql.ansi.enabled}} in 
[SQLConf|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2542-L2551].):
{code:java}
  private[sql] def toPrecision(
      precision: Int,
      scale: Int,
      roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP,
      nullOnOverflow: Boolean = true,
      context: SQLQueryContext = null): Decimal = {
    val copy = clone()
    if (copy.changePrecision(precision, scale, roundMode)) {
      copy
    } else {
      if (nullOnOverflow) {
        null
      } else {
        throw QueryExecutionErrors.cannotChangeDecimalPrecisionError(
          this, precision, scale, context)
      }
    }
  }{code}
 

The above function is invoked from 
[toPrecision|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L754-L756]
 (in Cast.scala).  However, our attempt to insert {{33333333333.2222222222}} 
after setting {{spark.sql.ansi.enabled}} _to {{False}}_  failed as well (which 
may be an independent issue).

 

 

 

 


> DECIMAL value with more precision than what is defined in the schema raises 
> exception in SparkSQL but evaluates to NULL for DataFrame
> -------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-40439
>                 URL: https://issues.apache.org/jira/browse/SPARK-40439
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.1
>            Reporter: xsys
>            Priority: Major
>
> h3. Describe the bug
> We are trying to store a DECIMAL value {{33333333333.2222222222}} with more 
> precision than what is defined in the schema: {{{}DECIMAL(20,10){}}}. This 
> leads to a {{NULL}} value being stored if the table is created using 
> DataFrames via {{{}spark-shell{}}}. However, it leads to the following 
> exception if the table is created via {{{}spark-sql{}}}:
> {code:java}
> Failed in [insert into decimal_extra_precision select 33333333333.2222222222]
> java.lang.ArithmeticException: 
> Decimal(expanded,33333333333.2222222222,21,10}) cannot be represented as 
> Decimal(20, 10){code}
> h3. Step to reproduce:
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 33333333333.2222222222;{code}
> Execute the following:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 33333333333.2222222222;{code}
> h3. Expected behavior
> We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) 
> to behave consistently for the same data type & input combination 
> ({{{}DECIMAL(20,10){}}} and {{{}33333333333.2222222222{}}}). 
> Here is a simplified example in {{{}spark-shell{}}}, where insertion of the 
> aforementioned decimal value evaluates to a {{{}NULL{}}}:
> {code:java}
> scala> import org.apache.spark.sql.{Row, SparkSession}
> import org.apache.spark.sql.{Row, SparkSession}
> scala> import org.apache.spark.sql.types._
> import org.apache.spark.sql.types._
> scala> val rdd = 
> sc.parallelize(Seq(Row(BigDecimal("33333333333.2222222222"))))
> rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> ParallelCollectionRDD[0] at parallelize at <console>:27
> scala> val schema = new StructType().add(StructField("c1", DecimalType(20, 
> 10), true))
> schema: org.apache.spark.sql.types.StructType = 
> StructType(StructField(c1,DecimalType(20,10),true))
> scala> val df = spark.createDataFrame(rdd, schema)
> df: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> scala> df.show()
> +----+
> |  c1|
> +----+
> |null|
> +----+
> scala> 
> df.write.mode("overwrite").format("orc").saveAsTable("decimal_extra_precision")
> 22/08/29 10:33:47 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> scala> spark.sql("select * from decimal_extra_precision;")
> res2: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> {code}
> h3. Root Cause
> The exception is being raised from 
> [Decimal|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala#L358-L373]
>  ({{{}nullOnOverflow{}}} is controlled by {{spark.sql.ansi.enabled}} in 
> [SQLConf|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2542-L2551].):
> {code:java}
>   private[sql] def toPrecision(
>       precision: Int,
>       scale: Int,
>       roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP,
>       nullOnOverflow: Boolean = true,
>       context: SQLQueryContext = null): Decimal = {
>     val copy = clone()
>     if (copy.changePrecision(precision, scale, roundMode)) {
>       copy
>     } else {
>       if (nullOnOverflow) {
>         null
>       } else {
>         throw QueryExecutionErrors.cannotChangeDecimalPrecisionError(
>           this, precision, scale, context)
>       }
>     }
>   }{code}
> The above function is invoked from 
> [toPrecision|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L754-L756]
>  (in Cast.scala).  However, our attempt to insert {{33333333333.2222222222}} 
> after setting {{spark.sql.ansi.enabled}} _to {{False}}_  failed as well 
> (which may be an independent issue).
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

Reply via email to