[ https://issues.apache.org/jira/browse/SPARK-31071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reassigned SPARK-31071: ----------------------------------- Assignee: L. C. Hsieh > Spark Encoders.bean() should allow marking non-null fields in its Spark schema > ------------------------------------------------------------------------------ > > Key: SPARK-31071 > URL: https://issues.apache.org/jira/browse/SPARK-31071 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.4.4 > Reporter: Kyrill Alyoshin > Assignee: L. C. Hsieh > Priority: Major > > Spark _Encoders.bean()_ method should allow the generated StructType schema > fields be *non-nullable*. > Currently, any non-primitive type is automatically _nullable_. It is > hard-coded in the _org.apache.spark.sql.catalyst.JavaTypeReference_ class. > This can lead to rather interesting situations... For example, let's say I > want to save a dataframe using an Avro format with my own non-spark generated > Avro schema. Let's also say that my Avro schema has a field that is non-null > (i.e., not a union type). Well, it appears *impossible* to store a dataframe > using such an Avro schema since Spark would assume that the field is nullable > (as it is in its own schema) which would conflict with Avro schema semantics > and throw an exception. > I propose making a change to the _JavaTypeReference_ class to observe the > JSR-305 _Nonnull_ annotation (and its children) on the provided bean class > during StructType schema generation. This would allow bean creators to > control the resulting Spark schema so much better. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org