[ https://issues.apache.org/jira/browse/SPARK-37191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437064#comment-17437064 ]
Apache Spark commented on SPARK-37191: -------------------------------------- User 'sadikovi' has created a pull request for this issue: https://github.com/apache/spark/pull/34462 > Allow merging DecimalTypes with different precision values > ----------------------------------------------------------- > > Key: SPARK-37191 > URL: https://issues.apache.org/jira/browse/SPARK-37191 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.3, 3.1.0, 3.1.1, 3.2.0 > Reporter: Ivan > Priority: Major > Fix For: 3.3.0 > > > When merging DecimalTypes with different precision but the same scale, one > would get the following error: > {code:java} > Failed to merge fields 'col' and 'col'. Failed to merge decimal types with > incompatible precision 17 and 12 at > org.apache.spark.sql.types.StructType$.$anonfun$merge$2(StructType.scala:652) > at scala.Option.map(Option.scala:230) > at > org.apache.spark.sql.types.StructType$.$anonfun$merge$1(StructType.scala:644) > at > org.apache.spark.sql.types.StructType$.$anonfun$merge$1$adapted(StructType.scala:641) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) > at org.apache.spark.sql.types.StructType$.merge(StructType.scala:641) > at org.apache.spark.sql.types.StructType.merge(StructType.scala:550) > {code} > > We could allow merging DecimalType values with different precision if the > scale is the same for both types since there should not be any data > correctness issues as one of the types will be extended, for example, > DECIMAL(12, 2) -> DECIMAL(17, 2); however, this is not the case for upcasting > when the scale is different - this would depend on the actual values. > > Repro code: > {code:java} > import org.apache.spark.sql.types._ > val schema1 = StructType(StructField("col", DecimalType(17, 2)) :: Nil) > val schema2 = StructType(StructField("col", DecimalType(12, 2)) :: Nil) > schema1.merge(schema2) {code} > > This also affects Parquet schema merge which is where this issue was > discovered originally: > {code:java} > import java.math.BigDecimal > import org.apache.spark.sql.Row > import org.apache.spark.sql.types._ > val data1 = sc.parallelize(Row(new BigDecimal("1234567890000.11")) :: Nil, 1) > val schema1 = StructType(StructField("col", DecimalType(17, 2)) :: Nil) > val data2 = sc.parallelize(Row(new BigDecimal("123456789.11")) :: Nil, 1) > val schema2 = StructType(StructField("col", DecimalType(12, 2)) :: Nil) > spark.createDataFrame(data2, > schema2).write.parquet("/tmp/decimal-test.parquet") > spark.createDataFrame(data1, > schema1).write.mode("append").parquet("/tmp/decimal-test.parquet") > // Reading the DataFrame fails > spark.read.option("mergeSchema", > "true").parquet("/tmp/decimal-test.parquet").show() > >>> > Failed merging schema: > root > |-- col: decimal(17,2) (nullable = true) > Caused by: Failed to merge fields 'col' and 'col'. Failed to merge decimal > types with incompatible precision 12 and 17 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org