Re: [PR] [SPARK-56550][SQL] Support source with fewer columns/fields in INSERT INTO WITH SCHEMA EVOLUTION [spark]

via GitHub Sun, 26 Apr 2026 22:55:10 -0700


szehon-ho commented on code in PR #55427:
URL: https://github.com/apache/spark/pull/55427#discussion_r3145120612



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -407,25 +413,36 @@ object TableOutputResolver extends SQLConfHelper with 
Logging {
       }
     }
 
-    inputCols.zip(actualExpectedCols).flatMap { case (inputCol, expectedCol) =>
+    val matched = inputCols.zip(actualExpectedCols).flatMap { case (inputCol, 
expectedCol) =>
       val newColPath = colPath :+ expectedCol.name
       (inputCol.dataType, expectedCol.dataType) match {
         case (inputType: StructType, expectedType: StructType) =>
           resolveStructType(
             tableName, inputCol, inputType, expectedCol, expectedType,
-            byName = false, conf, addError, newColPath, fillDefaultValue = 
false)
+            byName = false, conf, addError, newColPath, fillDefaultValue)
         case (inputType: ArrayType, expectedType: ArrayType) =>
           resolveArrayType(
             tableName, inputCol, inputType, expectedCol, expectedType,
-            byName = false, conf, addError, newColPath, fillDefaultValue = 
false)
+            byName = false, conf, addError, newColPath, fillDefaultValue)
         case (inputType: MapType, expectedType: MapType) =>
           resolveMapType(
             tableName, inputCol, inputType, expectedCol, expectedType,
-            byName = false, conf, addError, newColPath, fillDefaultValue = 
false)
+            byName = false, conf, addError, newColPath, fillDefaultValue)
         case _ =>
           checkField(tableName, expectedCol, inputCol, byName = false, conf, 
addError, newColPath)
       }
     }
+
+    val defaults = if (fillDefaultValue) {
+      actualExpectedCols.drop(inputCols.size).flatMap { expectedCol =>
+        getDefaultValueExprOrNullLit(expectedCol, 
conf.useNullsForMissingDefaultColumnValues)
+          .map(expr => applyColumnMetadata(expr, expectedCol))
+      }
+    } else {
+      Nil
+    }

Review Comment:
   Addressed in two follow-up commits on the branch:
   
   1. **Trailing default fill (this thread):** Replaced the `flatMap` with an 
explicit check: if `getDefaultValueExprOrNullLit` is empty we now throw 
`INCOMPATIBLE_DATA_FOR_TABLE.CANNOT_FIND_DATA` with the same path quoting as 
the by-name path. Added regression tests (top-level and nested by-position) 
with `USE_NULLS_FOR_MISSING_DEFAULT_COLUMN_VALUES=false`.
   
   2. **Further hardening (same file):** `resolveColumnsByPosition` now asserts 
`matched ++ defaults` has full arity; `reorderColumnsByName` and nested 
`resolveStructType` / `resolveArrayType` / `resolveMapType` take 
`enforceFullOutput` so INSERT output resolution throws on any incomplete nested 
resolution instead of returning `Nil`/`None` from `flatMap`. MERGE 
`resolveUpdate` keeps `enforceFullOutput=false` so `getOrElse` fallback 
semantics are unchanged.
   
   Ran `DataSourceV2SQLSuiteV1Filter` (Insert schema evolution / source tests) 
and `MergeIntoDataFrameSuite` (nested struct merge tests).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-56550][SQL] Support source with fewer columns/fields in INSERT INTO WITH SCHEMA EVOLUTION [spark]

Reply via email to