[jira] [Assigned] (SPARK-37569) View Analysis incorrectly marks nested fields as nullable

Wenchen Fan (Jira) Sun, 12 Dec 2021 20:53:07 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-37569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Wenchen Fan reassigned SPARK-37569:
-----------------------------------

    Assignee: Shardul Mahadik

> View Analysis incorrectly marks nested fields as nullable
> ---------------------------------------------------------
>
>                 Key: SPARK-37569
>                 URL: https://issues.apache.org/jira/browse/SPARK-37569
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Shardul Mahadik
>            Assignee: Shardul Mahadik
>            Priority: Major
>
> Consider a view as follows with all fields non-nullable (required)
> {code:java}
> spark.sql("""
>     CREATE OR REPLACE VIEW v AS 
>     SELECT id, named_struct('a', id) AS nested
>     FROM RANGE(10)
> """)
> {code}
> we can see that the view schema has been correctly stored as non-nullable
> {code:java}
> scala> 
> System.out.println(spark.sessionState.catalog.externalCatalog.getTable("default",
>  "v2"))
> CatalogTable(
> Database: default
> Table: v2
> Owner: smahadik
> Created Time: Tue Dec 07 09:00:42 PST 2021
> Last Access: UNKNOWN
> Created By: Spark 3.3.0-SNAPSHOT
> Type: VIEW
> View Text: SELECT id, named_struct('a', id) AS nested
>     FROM RANGE(10)
> View Original Text: SELECT id, named_struct('a', id) AS nested
>     FROM RANGE(10)
> View Catalog and Namespace: spark_catalog.default
> View Query Output Columns: [id, nested]
> Table Properties: [transient_lastDdlTime=1638896442]
> Serde Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> InputFormat: org.apache.hadoop.mapred.SequenceFileInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> Storage Properties: [serialization.format=1]
> Schema: root
>  |-- id: long (nullable = false)
>  |-- nested: struct (nullable = false)
>  |    |-- a: long (nullable = false)
> )
> {code}
> However, when trying to read this view, it incorrectly marks nested column 
> {{a}} as nullable
> {code:java}
> scala> spark.table("v2").printSchema
> root
>  |-- id: long (nullable = false)
>  |-- nested: struct (nullable = false)
>  |    |-- a: long (nullable = true)
> {code}
> This is caused by [this 
> line|https://github.com/apache/spark/blob/fb40c0e19f84f2de9a3d69d809e9e4031f76ef90/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L3546]
>  in Analyzer.scala. Going through the history of changes for this block of 
> code, it seems like {{asNullable}} is a remnant of a time before we added 
> [checks|https://github.com/apache/spark/blob/fb40c0e19f84f2de9a3d69d809e9e4031f76ef90/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L3543]
>  to ensure that the from and to types of the cast were compatible. As 
> nullability is already checked, it should be safe to add a cast without 
> converting the target datatype to nullable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37569) View Analysis incorrectly marks nested fields as nullable

Reply via email to