cloud-fan commented on code in PR #51466:
URL: https://github.com/apache/spark/pull/51466#discussion_r2206162914
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala:
##########
@@ -222,6 +227,37 @@ case class InsertIntoHadoopFsRelationCommand(
Seq.empty[Row]
}
+ /**
+ * The JSON writer [[org.apache.spark.sql.catalyst.json.JacksonGenerator]]
has a special feature
+ * that changes the null handling of top-level columns that have a default
value such that a
+ * explicit null is written. This is detected today by looking for the
metadata key
+ * [[ResolveDefaultColumnsUtils#EXISTS_DEFAULT_COLUMN_METADATA_KEY]] on the
query attribute.
+ * This function copies this key from the table attribute to the query
attribute only
+ * when a table metadata is available, only for JSON output, and only when
the configuration
+ * requests the special feature.
+ *
+ * We should instead pass the table description down to the writers instead
of using query
+ * attribute metadata, but this is a nontrivial change.
+ */
+ private def markColumnsWithDefaultForJson(outputColumns: Seq[Attribute]):
Seq[Attribute] = {
+ if (catalogTable.isEmpty || !fileFormat.isInstanceOf[JsonFileFormat] ||
Review Comment:
why JSON only? I think it's fine to propagate the schema definition for all
file sources.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]