Gaurav Shah created SPARK-17527:
-----------------------------------

             Summary: mergeSchema with `_OPTIONAL_` metadata fails
                 Key: SPARK-17527
                 URL: https://issues.apache.org/jira/browse/SPARK-17527
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.0.0
         Environment: mac osx 10.11.6, ubuntu 14, ubuntu 16.
spark 2.0.0, spark-catalyst 2.0.0
            Reporter: Gaurav Shah


Spark added '_OPTIONAL' metadata in 2.0.0 in following commit: 
https://github.com/apache/spark/commit/4637fc08a3733ec313218fb7e4d05064d9a6262d

but merging metadata for data created from spark 1.6.x and 2.0 fails with 
following:

{code}
Exception in thread "main" java.lang.RuntimeException: could not merge 
metadata: key org.apache.spark.sql.parquet.row.metadata has conflicting values:
{code}
and the only difference in those values is metadata now having "_OPTIONAL_" 
field extra.

{code:javascript}
{                           {
              "name": "catalog",                              "name": "catalog",
              "type": {                       "type": {
                "type": "struct",                               "type": 
"struct",
                "fields": [                             "fields": [
                  {                               {
                    "name": "category",                             "name": 
"category",
                    "type": "string",                               "type": 
"string",
                    "nullable": true,                               "nullable": 
true,
                    "metadata": {}                                  "metadata": 
{}
                  },                              },
                  {                               {
                    "name": "department",                                   
"name": "department",
                    "type": "string",                               "type": 
"string",
                    "nullable": true,                               "nullable": 
true,
                    "metadata": {}                                  "metadata": 
{}
                  }                               }
                ]                               ]
              },                              },
              "nullable": true,                       "nullable": true,
              "metadata": {                           "metadata": {}
                "_OPTIONAL_": true              
              }         

{code}

vs
{code:javascript}
                    {
              "name": "catalog",                              "name": "catalog",
              "type": {                       "type": {
                "type": "struct",                               "type": 
"struct",
                "fields": [                             "fields": [
                  {                               {
                    "name": "category",                             "name": 
"category",
                    "type": "string",                               "type": 
"string",
                    "nullable": true,                               "nullable": 
true,
                    "metadata": {}                                  "metadata": 
{}
                  },                              },
                  {                               {
                    "name": "department",                                   
"name": "department",
                    "type": "string",                               "type": 
"string",
                    "nullable": true,                               "nullable": 
true,
                    "metadata": {}                                  "metadata": 
{}
                  }                               }
                ]                               ]
              },                              },
              "nullable": true,                       "nullable": true,
              "metadata": {                           "metadata": {}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to