[ https://issues.apache.org/jira/browse/SPARK-23173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-23173. ---------------------------------- Resolution: Fixed Assignee: Michał Świtakowski Fix Version/s: 2.4.0 2.3.1 Fixed in https://github.com/apache/spark/pull/20694 > from_json can produce nulls for fields which are marked as non-nullable > ----------------------------------------------------------------------- > > Key: SPARK-23173 > URL: https://issues.apache.org/jira/browse/SPARK-23173 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.1 > Reporter: Herman van Hovell > Assignee: Michał Świtakowski > Priority: Major > Labels: release-notes > Fix For: 2.3.1, 2.4.0 > > > The {{from_json}} function uses a schema to convert a string into a Spark SQL > struct. This schema can contain non-nullable fields. The underlying > {{JsonToStructs}} expression does not check if a resulting struct respects > the nullability of the schema. This leads to very weird problems in consuming > expressions. In our case parquet writing would produce an illegal parquet > file. > There are roughly solutions here: > # Assume that each field in schema passed to {{from_json}} is nullable, and > ignore the nullability information set in the passed schema. > # Validate the object during runtime, and fail execution if the data is null > where we are not expecting this. > I currently am slightly in favor of option 1, since this is the more > performant option and a lot easier to do. > WDYT? cc [~rxin] [~marmbrus] [~hyukjin.kwon] [~brkyvz] -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org