Re: [PR] [SPARK-52772][SQL] Inconsistent table attribute handling during updates [spark]

via GitHub Mon, 14 Jul 2025 19:42:05 -0700


cloud-fan commented on code in PR #51466:
URL: https://github.com/apache/spark/pull/51466#discussion_r2206162914



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala:
##########
@@ -222,6 +227,37 @@ case class InsertIntoHadoopFsRelationCommand(
     Seq.empty[Row]
   }
 
+  /**
+   * The JSON writer [[org.apache.spark.sql.catalyst.json.JacksonGenerator]] 
has a special feature
+   * that changes the null handling of top-level columns that have a default 
value such that a
+   * explicit null is written.  This is detected today by looking for the 
metadata key
+   * [[ResolveDefaultColumnsUtils#EXISTS_DEFAULT_COLUMN_METADATA_KEY]] on the 
query attribute.
+   * This function copies this key from the table attribute to the query 
attribute only
+   * when a table metadata is available, only for JSON output, and only when 
the configuration
+   * requests the special feature.
+   *
+   * We should instead pass the table description down to the writers instead 
of using query
+   * attribute metadata, but this is a nontrivial change.
+   */
+  private def markColumnsWithDefaultForJson(outputColumns: Seq[Attribute]): 
Seq[Attribute] = {
+    if (catalogTable.isEmpty || !fileFormat.isInstanceOf[JsonFileFormat] ||

Review Comment:
   why JSON only? I think it's fine to propagate the schema definition for all 
file sources.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-52772][SQL] Inconsistent table attribute handling during updates [spark]

Reply via email to