[ 
https://issues.apache.org/jira/browse/SPARK-37191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37191.
---------------------------------
    Resolution: Fixed

Issue resolved by pull request 34462
[https://github.com/apache/spark/pull/34462]

> Allow merging DecimalTypes with different precision values 
> -----------------------------------------------------------
>
>                 Key: SPARK-37191
>                 URL: https://issues.apache.org/jira/browse/SPARK-37191
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.3, 3.1.0, 3.1.1, 3.2.0
>            Reporter: Ivan
>            Assignee: Ivan
>            Priority: Major
>             Fix For: 3.3.0
>
>
> When merging DecimalTypes with different precision but the same scale, one 
> would get the following error:
> {code:java}
> Failed to merge fields 'col' and 'col'. Failed to merge decimal types with 
> incompatible precision 17 and 12   at 
> org.apache.spark.sql.types.StructType$.$anonfun$merge$2(StructType.scala:652)
>       at scala.Option.map(Option.scala:230)
>       at 
> org.apache.spark.sql.types.StructType$.$anonfun$merge$1(StructType.scala:644)
>       at 
> org.apache.spark.sql.types.StructType$.$anonfun$merge$1$adapted(StructType.scala:641)
>       at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>       at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>       at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
>       at org.apache.spark.sql.types.StructType$.merge(StructType.scala:641)
>       at org.apache.spark.sql.types.StructType.merge(StructType.scala:550) 
> {code}
>  
> We could allow merging DecimalType values with different precision if the 
> scale is the same for both types since there should not be any data 
> correctness issues as one of the types will be extended, for example, 
> DECIMAL(12, 2) -> DECIMAL(17, 2); however, this is not the case for upcasting 
> when the scale is different - this would depend on the actual values.
>  
> Repro code:
> {code:java}
> import org.apache.spark.sql.types._
> val schema1 = StructType(StructField("col", DecimalType(17, 2)) :: Nil)
> val schema2 = StructType(StructField("col", DecimalType(12, 2)) :: Nil)
> schema1.merge(schema2) {code}
>  
> This also affects Parquet schema merge which is where this issue was 
> discovered originally:
> {code:java}
> import java.math.BigDecimal
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.types._
> val data1 = sc.parallelize(Row(new BigDecimal("1234567890000.11")) :: Nil, 1)
> val schema1 = StructType(StructField("col", DecimalType(17, 2)) :: Nil)
> val data2 = sc.parallelize(Row(new BigDecimal("123456789.11")) :: Nil, 1)
> val schema2 = StructType(StructField("col", DecimalType(12, 2)) :: Nil)
> spark.createDataFrame(data2, 
> schema2).write.parquet("/tmp/decimal-test.parquet")
> spark.createDataFrame(data1, 
> schema1).write.mode("append").parquet("/tmp/decimal-test.parquet")
> // Reading the DataFrame fails
> spark.read.option("mergeSchema", 
> "true").parquet("/tmp/decimal-test.parquet").show()
> >>>
> Failed merging schema:
> root
>  |-- col: decimal(17,2) (nullable = true)
> Caused by: Failed to merge fields 'col' and 'col'. Failed to merge decimal 
> types with incompatible precision 12 and 17
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to