[GitHub] [spark] cloud-fan commented on a change in pull request #34462: [SPARK-37191][SQL] Allow merging DecimalTypes with different precision values

GitBox Tue, 02 Nov 2021 14:44:41 -0700


cloud-fan commented on a change in pull request #34462:
URL: https://github.com/apache/spark/pull/34462#discussion_r740747727




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
##########
@@ -643,15 +643,14 @@ object StructType extends AbstractDataType {
         DecimalType.Fixed(rightPrecision, rightScale)) =>
         if ((leftPrecision == rightPrecision) && (leftScale == rightScale)) {

Review comment:
       how about this
   ```
   if(leftScale == rightScale) {
     DecimalType(leftPrecision.max(rightPrecision), leftScale)
   } else {
     throw 
QueryExecutionErrors.cannotMergeDecimalTypesWithIncompatibleScaleError
   }
   ```
   
   I think we don't need to have two different error messages. We just need to 
point out the root cause, which is the incompatible scale.

##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
##########
@@ -388,6 +391,27 @@ class ParquetSchemaSuite extends ParquetSchemaTest {
     }
   }
 
+  test("SPARK-37191: Schema merging for DecimalType with different precision 
but the same scale") {
+    import testImplicits._
+
+    withTempPath { dir =>
+      val path = dir.getCanonicalPath
+
+      val data1 = spark.sparkContext.parallelize(Seq(Row(new 
BigDecimal("123456789.11"))), 1)
+      val schema1 = StructType(StructField("col", DecimalType(12, 2)) :: Nil)
+
+      val data2 = spark.sparkContext.parallelize(Seq(Row(new 
BigDecimal("1234567890000.11"))), 1)
+      val schema2 = StructType(StructField("col", DecimalType(17, 2)) :: Nil)
+
+      spark.createDataFrame(data1, schema1).write.parquet(path)
+      spark.createDataFrame(data2, schema2).write.mode("append").parquet(path)
+
+      val res = spark.read.option("mergeSchema", "true").parquet(path)
+      assert(res.schema("col").dataType == DecimalType(17, 2))
+      res.foreach(_ => ()) // must not throw exception

Review comment:
       shall we use `checkAnswer` to check the result?

##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
##########
@@ -388,6 +391,27 @@ class ParquetSchemaSuite extends ParquetSchemaTest {
     }
   }
 
+  test("SPARK-37191: Schema merging for DecimalType with different precision 
but the same scale") {
+    import testImplicits._
+
+    withTempPath { dir =>
+      val path = dir.getCanonicalPath
+
+      val data1 = spark.sparkContext.parallelize(Seq(Row(new 
BigDecimal("123456789.11"))), 1)
+      val schema1 = StructType(StructField("col", DecimalType(12, 2)) :: Nil)
+
+      val data2 = spark.sparkContext.parallelize(Seq(Row(new 
BigDecimal("1234567890000.11"))), 1)
+      val schema2 = StructType(StructField("col", DecimalType(17, 2)) :: Nil)
+
+      spark.createDataFrame(data1, schema1).write.parquet(path)
+      spark.createDataFrame(data2, schema2).write.mode("append").parquet(path)
+
+      val res = spark.read.option("mergeSchema", "true").parquet(path)
+      assert(res.schema("col").dataType == DecimalType(17, 2))
+      res.foreach(_ => ()) // must not throw exception

Review comment:
       maybe we should put the test in `ParquetQuerySuite`, so that we can test 
all the parquet readers.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34462: [SPARK-37191][SQL] Allow merging DecimalTypes with different precision values

Reply via email to