[ https://issues.apache.org/jira/browse/SPARK-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Armbrust updated SPARK-3036: ------------------------------------ Assignee: Takuya Ueshin > Add MapType containing null value support to Parquet. > ----------------------------------------------------- > > Key: SPARK-3036 > URL: https://issues.apache.org/jira/browse/SPARK-3036 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Takuya Ueshin > Assignee: Takuya Ueshin > Priority: Blocker > > Current Parquet schema for {{MapType}} is as follows regardless of > {{valueContainsNull}}: > {noformat} > message root { > optional group a (MAP) { > repeated group map (MAP_KEY_VALUE) { > required int32 key; > required int32 value; > } > } > } > {noformat} > and if the map contains {{null}} value, it throws runtime exception. > To handle {{MapType}} containing {{null}} value, the schema should be as > follows if {{valueContainsNull}} is {{true}}: > {noformat} > message root { > optional group a (MAP) { > repeated group map (MAP_KEY_VALUE) { > required int32 key; > optional int32 value; > } > } > } > {noformat} > FYI: > Hive's Parquet writer *always* uses the latter schema, but reader can read > from both schema. > NOTICE: > This change will break backward compatibility when the schema is read from > Parquet metadata ({{"org.apache.spark.sql.parquet.row.metadata"}}). -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org