Tomasz Bartczak created SPARK-23478: ---------------------------------------
Summary: Inconsistent behaviour of union when columns have conflicting metadata Key: SPARK-23478 URL: https://issues.apache.org/jira/browse/SPARK-23478 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.1 Reporter: Tomasz Bartczak When columns have different metadata and we union dataframes with them - the end result of metadata depends on union ordering: {code:java} df = spark.createDataFrame([{'a':1}]) a = df b = df.select(col('a').alias('a',metadata={'description':'xxx'})) print("a.union(b) gives {}".format(a.union(b).schema.fields[0].metadata)) print("b.union(a) gives {}".format(b.union(a).schema.fields[0].metadata)) {code} gives: {code:java} a.union(b) gives {} b.union(a) gives {'description': 'xxx'}{code} And I wonder if this kind of union should be allowed at all - when fields with different metadata are inside a struct - union fails, which can be seen in https://issues.apache.org/jira/projects/SPARK/issues/SPARK-23477 -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org