Tomasz Bartczak created SPARK-23478:
---------------------------------------

             Summary: Inconsistent behaviour of union when columns have 
conflicting metadata
                 Key: SPARK-23478
                 URL: https://issues.apache.org/jira/browse/SPARK-23478
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.2.1
            Reporter: Tomasz Bartczak


When columns have different metadata and we union dataframes with them - the 
end result of metadata depends on union ordering:
{code:java}
df = spark.createDataFrame([{'a':1}])
a = df
b = df.select(col('a').alias('a',metadata={'description':'xxx'}))
print("a.union(b) gives {}".format(a.union(b).schema.fields[0].metadata))
print("b.union(a) gives {}".format(b.union(a).schema.fields[0].metadata))

{code}
gives:
{code:java}
a.union(b) gives {}
b.union(a) gives {'description': 'xxx'}{code}
 

And I wonder if this kind of union should be allowed at all - when fields with 
different metadata are inside a struct - union fails, which can be seen in 
https://issues.apache.org/jira/projects/SPARK/issues/SPARK-23477



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to