[jira] [Commented] (SPARK-3036) Add MapType containing null value support to Parquet.

2014-08-20 Thread Takuya Ueshin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105012#comment-14105012
 ] 

Takuya Ueshin commented on SPARK-3036:
--

Ah, that's right. It was my mistake.
Newer version will be able to read data written by older version.

We are referencing to DataType tree to build converter tree and the converter 
tree needs to be the same as the Parquet schema, so I thought the difference 
between them like older version's Parquet schema and DatType from metadata 
causes the incompatible, but the converter tree is the same regardless of 
"require" or "optional" of map value, i.e. valueContainsNull.

> Add MapType containing null value support to Parquet.
> -
>
> Key: SPARK-3036
> URL: https://issues.apache.org/jira/browse/SPARK-3036
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Takuya Ueshin
>Priority: Blocker
>
> Current Parquet schema for {{MapType}} is as follows regardless of 
> {{valueContainsNull}}:
> {noformat}
> message root {
>   optional group a (MAP) {
> repeated group map (MAP_KEY_VALUE) {
>   required int32 key;
>   required int32 value;
> }
>   }
> }
> {noformat}
> and if the map contains {{null}} value, it throws runtime exception.
> To handle {{MapType}} containing {{null}} value, the schema should be as 
> follows if {{valueContainsNull}} is {{true}}:
> {noformat}
> message root {
>   optional group a (MAP) {
> repeated group map (MAP_KEY_VALUE) {
>   required int32 key;
>   optional int32 value;
> }
>   }
> }
> {noformat}
> FYI:
> Hive's Parquet writer *always* uses the latter schema, but reader can read 
> from both schema.
> NOTICE:
> This change will break backward compatibility when the schema is read from 
> Parquet metadata ({{"org.apache.spark.sql.parquet.row.metadata"}}).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3036) Add MapType containing null value support to Parquet.

2014-08-20 Thread Michael Armbrust (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104556#comment-14104556
 ] 

Michael Armbrust commented on SPARK-3036:
-

Can you explain more about what you mean when you say we are breaking backwards 
compatibility?  It seems like newer version of Spark SQL should always be able 
to read data written by older version as long as we support both versions.  
Choosing between them when writing based on valueContainsNull seems like the 
best solution.

I think it is okay (though undesirable) for older versions of Spark SQL to be 
unable to read from data written by newer versions, as this is unavoidable as 
we add features.

> Add MapType containing null value support to Parquet.
> -
>
> Key: SPARK-3036
> URL: https://issues.apache.org/jira/browse/SPARK-3036
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Takuya Ueshin
>Priority: Blocker
>
> Current Parquet schema for {{MapType}} is as follows regardless of 
> {{valueContainsNull}}:
> {noformat}
> message root {
>   optional group a (MAP) {
> repeated group map (MAP_KEY_VALUE) {
>   required int32 key;
>   required int32 value;
> }
>   }
> }
> {noformat}
> and if the map contains {{null}} value, it throws runtime exception.
> To handle {{MapType}} containing {{null}} value, the schema should be as 
> follows if {{valueContainsNull}} is {{true}}:
> {noformat}
> message root {
>   optional group a (MAP) {
> repeated group map (MAP_KEY_VALUE) {
>   required int32 key;
>   optional int32 value;
> }
>   }
> }
> {noformat}
> FYI:
> Hive's Parquet writer *always* uses the latter schema, but reader can read 
> from both schema.
> NOTICE:
> This change will break backward compatibility when the schema is read from 
> Parquet metadata ({{"org.apache.spark.sql.parquet.row.metadata"}}).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3036) Add MapType containing null value support to Parquet.

2014-08-19 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102033#comment-14102033
 ] 

Apache Spark commented on SPARK-3036:
-

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/2032

> Add MapType containing null value support to Parquet.
> -
>
> Key: SPARK-3036
> URL: https://issues.apache.org/jira/browse/SPARK-3036
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Takuya Ueshin
>Priority: Blocker
>
> Current Parquet schema for {{MapType}} is as follows regardless of 
> {{valueContainsNull}}:
> {noformat}
> message root {
>   optional group a (MAP) {
> repeated group map (MAP_KEY_VALUE) {
>   required int32 key;
>   required int32 value;
> }
>   }
> }
> {noformat}
> and if the map contains {{null}} value, it throws runtime exception.
> To handle {{MapType}} containing {{null}} value, the schema should be as 
> follows if {{valueContainsNull}} is {{true}}:
> {noformat}
> message root {
>   optional group a (MAP) {
> repeated group map (MAP_KEY_VALUE) {
>   required int32 key;
>   optional int32 value;
> }
>   }
> }
> {noformat}
> FYI:
> Hive's Parquet writer *always* uses the latter schema, but reader can read 
> from both schema.
> NOTICE:
> This change will break backward compatibility when the schema is read from 
> Parquet metadata ({{"org.apache.spark.sql.parquet.row.metadata"}}).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org