[
https://issues.apache.org/jira/browse/SPARK-34378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17279839#comment-17279839
]
Erik Krogen commented on SPARK-34378:
-------------------------------------
Internally we build this feature on top of SPARK-34365, so I will wait until
that JIRA is finalized before posting a PR here.
> Support extra optional Avro fields in AvroSerializer
> ----------------------------------------------------
>
> Key: SPARK-34378
> URL: https://issues.apache.org/jira/browse/SPARK-34378
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.0.1
> Reporter: Erik Krogen
> Priority: Major
>
> Currently, when writing out Avro data using a custom schema ({{avroSchema}}),
> if there are any extra Avro fields which do not have a matching field in the
> Catalyst schema, the serialization will fail. This is much more strict than
> on the deserialization path, where Avro fields not present in the Catalyst
> schema are ignored, and Catalyst fields not present in the Avro schema are
> allowed as long as they are nullable. I believe it will be more user-friendly
> if extra Avro fields are allowed, as long as they are optional. This makes it
> easier for users to write out data with Avro schemas which may be outside of
> their control.
> If there is concern about the safety of this approach (i.e. there are use
> cases where users want strict matching), we can make it configurable.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]