[GitHub] [spark] Kimahriman commented on a change in pull request #32448: [SPARK-35290][SQL] Use StructType merging for unionByName with null filling

GitBox Mon, 14 Jun 2021 00:57:42 -0700


Kimahriman commented on a change in pull request #32448:
URL: https://github.com/apache/spark/pull/32448#discussion_r650522493




##########
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/types/StructTypeSuite.scala
##########
@@ -150,95 +150,36 @@ class StructTypeSuite extends SparkFunSuite with 
SQLHelper {
     assert(fromDDL(interval).toDDL === interval)
   }
 
-  test("find missing (nested) fields") {
-    val schema = StructType.fromDDL("c1 INT, c2 STRUCT<c3: INT, c4: STRUCT<c5: 
INT, c6: INT>>")
+  test("SPARK-35290: Struct merging case insensitive") {
+    val schema1 = StructType.fromDDL("a1 INT, a2 STRING, nested STRUCT<b1: 
INT, b2: STRING>")
+    val schema2 = StructType.fromDDL("A2 STRING, a3 DOUBLE, nested STRUCT<B2: 
STRING, b3: DOUBLE>")
     val resolver = SQLConf.get.resolver
 
-    val source1 = StructType.fromDDL("c1 INT")
-    val missing1 = StructType.fromDDL("c2 STRUCT<c3: INT, c4: STRUCT<c5: INT, 
c6: INT>>")
-    assert(StructType.findMissingFields(source1, schema, resolver)
-      .exists(_.sameType(missing1)))
+    assert(schema1.merge(schema2, resolver).sameType(StructType.fromDDL(
+      "a1 INT, a2 STRING, nested STRUCT<b1: INT, b2: STRING, b3: DOUBLE>, a3 
DOUBLE"
+    )))
 
-    val source2 = StructType.fromDDL("c1 INT, c3 STRING")
-    val missing2 = StructType.fromDDL("c2 STRUCT<c3: INT, c4: STRUCT<c5: INT, 
c6: INT>>")
-    assert(StructType.findMissingFields(source2, schema, resolver)
-      .exists(_.sameType(missing2)))
-
-    val source3 = StructType.fromDDL("c1 INT, c2 STRUCT<c3: INT>")
-    val missing3 = StructType.fromDDL("c2 STRUCT<c4: STRUCT<c5: INT, c6: 
INT>>")
-    assert(StructType.findMissingFields(source3, schema, resolver)
-      .exists(_.sameType(missing3)))
-
-    val source4 = StructType.fromDDL("c1 INT, c2 STRUCT<c3: INT, c4: 
STRUCT<c6: INT>>")
-    val missing4 = StructType.fromDDL("c2 STRUCT<c4: STRUCT<c5: INT>>")
-    assert(StructType.findMissingFields(source4, schema, resolver)
-      .exists(_.sameType(missing4)))
+    assert(schema2.merge(schema1, resolver).sameType(StructType.fromDDL(
+      "a2 STRING, a3 DOUBLE, nested STRUCT<b2: STRING, b3: DOUBLE, b1: INT>, 
a1 INT"

Review comment:
       Ah yeah it does, `sameType` is just doing a case insensitive comparison 
so it didn't matter that my manual type was wrong. I'll update to `===` instead 
and that seems to fix the check

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveUnion.scala
##########
@@ -181,7 +100,8 @@ object ResolveUnion extends Rule[LogicalPlan] {
             // like that. We will sort columns in the struct expression to 
make sure two sides of
             // union have consistent schema.
             aliased += foundAttr
-            Alias(addFields(foundAttr, target), foundAttr.name)()
+            val targetType = target.merge(source, conf.resolver)

Review comment:
       Hmmm good question, I'm not sure exactly how that would work without 
adding extra logic to StructType.merge to ignore conflicts. And now that you 
bring that up I'm starting to think using StructType.merge isn't the best 
method since it does care about DataType. I just noticed it doesn't handle 
similar types, so you get errors if you try to merge a float and a double, 
whereas the normal union just handles that. I might try to rework this again to 
not use the StructType.merge after all... 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Kimahriman commented on a change in pull request #32448: [SPARK-35290][SQL] Use StructType merging for unionByName with null filling

Reply via email to