[ 
https://issues.apache.org/jira/browse/SPARK-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105012#comment-14105012
 ] 

Takuya Ueshin commented on SPARK-3036:
--------------------------------------

Ah, that's right. It was my mistake.
Newer version will be able to read data written by older version.

We are referencing to DataType tree to build converter tree and the converter 
tree needs to be the same as the Parquet schema, so I thought the difference 
between them like older version's Parquet schema and DatType from metadata 
causes the incompatible, but the converter tree is the same regardless of 
"require" or "optional" of map value, i.e. valueContainsNull.

> Add MapType containing null value support to Parquet.
> -----------------------------------------------------
>
>                 Key: SPARK-3036
>                 URL: https://issues.apache.org/jira/browse/SPARK-3036
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Takuya Ueshin
>            Priority: Blocker
>
> Current Parquet schema for {{MapType}} is as follows regardless of 
> {{valueContainsNull}}:
> {noformat}
> message root {
>   optional group a (MAP) {
>     repeated group map (MAP_KEY_VALUE) {
>       required int32 key;
>       required int32 value;
>     }
>   }
> }
> {noformat}
> and if the map contains {{null}} value, it throws runtime exception.
> To handle {{MapType}} containing {{null}} value, the schema should be as 
> follows if {{valueContainsNull}} is {{true}}:
> {noformat}
> message root {
>   optional group a (MAP) {
>     repeated group map (MAP_KEY_VALUE) {
>       required int32 key;
>       optional int32 value;
>     }
>   }
> }
> {noformat}
> FYI:
> Hive's Parquet writer *always* uses the latter schema, but reader can read 
> from both schema.
> NOTICE:
> This change will break backward compatibility when the schema is read from 
> Parquet metadata ({{"org.apache.spark.sql.parquet.row.metadata"}}).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to