[jira] [Commented] (SPARK-3036) Add MapType containing null value support to Parquet.
[ https://issues.apache.org/jira/browse/SPARK-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105012#comment-14105012 ] Takuya Ueshin commented on SPARK-3036: -- Ah, that's right. It was my mistake. Newer version will be able to read data written by older version. We are referencing to DataType tree to build converter tree and the converter tree needs to be the same as the Parquet schema, so I thought the difference between them like older version's Parquet schema and DatType from metadata causes the incompatible, but the converter tree is the same regardless of "require" or "optional" of map value, i.e. valueContainsNull. > Add MapType containing null value support to Parquet. > - > > Key: SPARK-3036 > URL: https://issues.apache.org/jira/browse/SPARK-3036 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Takuya Ueshin >Priority: Blocker > > Current Parquet schema for {{MapType}} is as follows regardless of > {{valueContainsNull}}: > {noformat} > message root { > optional group a (MAP) { > repeated group map (MAP_KEY_VALUE) { > required int32 key; > required int32 value; > } > } > } > {noformat} > and if the map contains {{null}} value, it throws runtime exception. > To handle {{MapType}} containing {{null}} value, the schema should be as > follows if {{valueContainsNull}} is {{true}}: > {noformat} > message root { > optional group a (MAP) { > repeated group map (MAP_KEY_VALUE) { > required int32 key; > optional int32 value; > } > } > } > {noformat} > FYI: > Hive's Parquet writer *always* uses the latter schema, but reader can read > from both schema. > NOTICE: > This change will break backward compatibility when the schema is read from > Parquet metadata ({{"org.apache.spark.sql.parquet.row.metadata"}}). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3036) Add MapType containing null value support to Parquet.
[ https://issues.apache.org/jira/browse/SPARK-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104556#comment-14104556 ] Michael Armbrust commented on SPARK-3036: - Can you explain more about what you mean when you say we are breaking backwards compatibility? It seems like newer version of Spark SQL should always be able to read data written by older version as long as we support both versions. Choosing between them when writing based on valueContainsNull seems like the best solution. I think it is okay (though undesirable) for older versions of Spark SQL to be unable to read from data written by newer versions, as this is unavoidable as we add features. > Add MapType containing null value support to Parquet. > - > > Key: SPARK-3036 > URL: https://issues.apache.org/jira/browse/SPARK-3036 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Takuya Ueshin >Priority: Blocker > > Current Parquet schema for {{MapType}} is as follows regardless of > {{valueContainsNull}}: > {noformat} > message root { > optional group a (MAP) { > repeated group map (MAP_KEY_VALUE) { > required int32 key; > required int32 value; > } > } > } > {noformat} > and if the map contains {{null}} value, it throws runtime exception. > To handle {{MapType}} containing {{null}} value, the schema should be as > follows if {{valueContainsNull}} is {{true}}: > {noformat} > message root { > optional group a (MAP) { > repeated group map (MAP_KEY_VALUE) { > required int32 key; > optional int32 value; > } > } > } > {noformat} > FYI: > Hive's Parquet writer *always* uses the latter schema, but reader can read > from both schema. > NOTICE: > This change will break backward compatibility when the schema is read from > Parquet metadata ({{"org.apache.spark.sql.parquet.row.metadata"}}). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3036) Add MapType containing null value support to Parquet.
[ https://issues.apache.org/jira/browse/SPARK-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102033#comment-14102033 ] Apache Spark commented on SPARK-3036: - User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/2032 > Add MapType containing null value support to Parquet. > - > > Key: SPARK-3036 > URL: https://issues.apache.org/jira/browse/SPARK-3036 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Takuya Ueshin >Priority: Blocker > > Current Parquet schema for {{MapType}} is as follows regardless of > {{valueContainsNull}}: > {noformat} > message root { > optional group a (MAP) { > repeated group map (MAP_KEY_VALUE) { > required int32 key; > required int32 value; > } > } > } > {noformat} > and if the map contains {{null}} value, it throws runtime exception. > To handle {{MapType}} containing {{null}} value, the schema should be as > follows if {{valueContainsNull}} is {{true}}: > {noformat} > message root { > optional group a (MAP) { > repeated group map (MAP_KEY_VALUE) { > required int32 key; > optional int32 value; > } > } > } > {noformat} > FYI: > Hive's Parquet writer *always* uses the latter schema, but reader can read > from both schema. > NOTICE: > This change will break backward compatibility when the schema is read from > Parquet metadata ({{"org.apache.spark.sql.parquet.row.metadata"}}). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org