[ https://issues.apache.org/jira/browse/SPARK-37569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-37569: ------------------------------------ Assignee: Apache Spark > View Analysis incorrectly marks nested fields as nullable > --------------------------------------------------------- > > Key: SPARK-37569 > URL: https://issues.apache.org/jira/browse/SPARK-37569 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.2.0 > Reporter: Shardul Mahadik > Assignee: Apache Spark > Priority: Major > > Consider a view as follows with all fields non-nullable (required) > {code:java} > spark.sql(""" > CREATE OR REPLACE VIEW v AS > SELECT id, named_struct('a', id) AS nested > FROM RANGE(10) > """) > {code} > we can see that the view schema has been correctly stored as non-nullable > {code:java} > scala> > System.out.println(spark.sessionState.catalog.externalCatalog.getTable("default", > "v2")) > CatalogTable( > Database: default > Table: v2 > Owner: smahadik > Created Time: Tue Dec 07 09:00:42 PST 2021 > Last Access: UNKNOWN > Created By: Spark 3.3.0-SNAPSHOT > Type: VIEW > View Text: SELECT id, named_struct('a', id) AS nested > FROM RANGE(10) > View Original Text: SELECT id, named_struct('a', id) AS nested > FROM RANGE(10) > View Catalog and Namespace: spark_catalog.default > View Query Output Columns: [id, nested] > Table Properties: [transient_lastDdlTime=1638896442] > Serde Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > InputFormat: org.apache.hadoop.mapred.SequenceFileInputFormat > OutputFormat: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > Storage Properties: [serialization.format=1] > Schema: root > |-- id: long (nullable = false) > |-- nested: struct (nullable = false) > | |-- a: long (nullable = false) > ) > {code} > However, when trying to read this view, it incorrectly marks nested column > {{a}} as nullable > {code:java} > scala> spark.table("v2").printSchema > root > |-- id: long (nullable = false) > |-- nested: struct (nullable = false) > | |-- a: long (nullable = true) > {code} > This is caused by [this > line|https://github.com/apache/spark/blob/fb40c0e19f84f2de9a3d69d809e9e4031f76ef90/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L3546] > in Analyzer.scala. Going through the history of changes for this block of > code, it seems like {{asNullable}} is a remnant of a time before we added > [checks|https://github.com/apache/spark/blob/fb40c0e19f84f2de9a3d69d809e9e4031f76ef90/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L3543] > to ensure that the from and to types of the cast were compatible. As > nullability is already checked, it should be safe to add a cast without > converting the target datatype to nullable. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org