Shardul Mahadik created SPARK-37569:
---------------------------------------

             Summary: View Analysis incorrectly marks nested fields as nullable
                 Key: SPARK-37569
                 URL: https://issues.apache.org/jira/browse/SPARK-37569
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.2.0
            Reporter: Shardul Mahadik


Consider a view as follows with all fields non-nullable (required)
{code:java}
spark.sql("""
    CREATE OR REPLACE VIEW v AS 
    SELECT id, named_struct('a', id) AS nested
    FROM RANGE(10)
""")
{code}
we can see that the view schema has been correctly stored as non-nullable
{code:java}
scala> 
System.out.println(spark.sessionState.catalog.externalCatalog.getTable("default",
 "v2"))
CatalogTable(
Database: default
Table: v2
Owner: smahadik
Created Time: Tue Dec 07 09:00:42 PST 2021
Last Access: UNKNOWN
Created By: Spark 3.3.0-SNAPSHOT
Type: VIEW
View Text: SELECT id, named_struct('a', id) AS nested
    FROM RANGE(10)
View Original Text: SELECT id, named_struct('a', id) AS nested
    FROM RANGE(10)
View Catalog and Namespace: spark_catalog.default
View Query Output Columns: [id, nested]
Table Properties: [transient_lastDdlTime=1638896442]
Serde Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.SequenceFileInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
Storage Properties: [serialization.format=1]
Schema: root
 |-- id: long (nullable = false)
 |-- nested: struct (nullable = false)
 |    |-- a: long (nullable = false)
)
{code}
However, when trying to read this view, it incorrectly marks nested column 
{{a}} as nullable
{code:java}
scala> spark.table("v2").printSchema
root
 |-- id: long (nullable = false)
 |-- nested: struct (nullable = false)
 |    |-- a: long (nullable = true)
{code}
This is caused by [this 
line|https://github.com/apache/spark/blob/fb40c0e19f84f2de9a3d69d809e9e4031f76ef90/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L3546]
 in Analyzer.scala. Going through the history of changes for this block of 
code, it seems like {{asNullable}} is a remnant of a time before we added 
[checks|https://github.com/apache/spark/blob/fb40c0e19f84f2de9a3d69d809e9e4031f76ef90/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L3543]
 to ensure that the from and to types of the cast were compatible. As 
nullability is already checked, it should be safe to add a cast without 
converting the target datatype to nullable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to