Kyrill Alyoshin created SPARK-31071:
---------------------------------------

             Summary: Spark Encoders.bean() should allow setting non-null 
fields in its Spark schema
                 Key: SPARK-31071
                 URL: https://issues.apache.org/jira/browse/SPARK-31071
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.4.4
            Reporter: Kyrill Alyoshin


Spark _Encoders.bean()_ method should allow the generated StructType schema 
fields be *non-nullable*.

Currently, any non-primitive type is automatically _nullable_. It is hard-coded 
in the _org.apache.spark.sql.catalyst.JavaTypeReference_ class.  This can lead 
to rather interesting situations... For example, let's say I want to save a 
dataframe using an Avro format with my own non-spark generated Avro schema. 
Let's also say that my Avro schema has a field that is non-null (i.e., not a 
union type). Well, it appears *impossible* to store a dataframe using such an 
Avro schema since Spark would assume that the field is nullable (as it is in 
its own schema) which would conflict with Avro schema semantics and throw an 
exception.

I propose making a change to the _JavaTypeReference_ class to observe the 
JSR-305 _Nonnull_ annotation (and its children) on the provided bean class 
during StructType schema generation. This would allow bean creators to control 
the resulting Spark schema so much better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to