[ https://issues.apache.org/jira/browse/SPARK-20660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michel Lemay updated SPARK-20660: --------------------------------- Description: Union on two dataframes with different column orders is not supported and lead to hard to find issues. Here is an example showing the issue. {code} import org.apache.spark.sql.types._ import org.apache.spark.sql.Row var inputSchema = StructType(StructField("key", StringType, nullable=true) :: StructField("value", IntegerType, nullable=true) :: Nil) var a = spark.createDataFrame(sc.parallelize((1 to 10)).map(x => Row(x.toString, 555)), inputSchema) var b = a.select($"value", $"key") // any transformation changing column order will show the problem. a.union(b).show // in order to make it work, we need to reorder columns val bCols = a.columns.map(aCol => b(aCol)) a.union(b.select(bCols:_*)).show {code} was: Union on two dataframes with different column orders is not supported and lead to hard to find issues. Here is an example showing the issue. {code} import org.apache.spark.sql.types._ import org.apache.spark.sql.Row var inputSchema = StructType(StructField("key", StringType, nullable=true) :: StructField("value", IntegerType, nullable=true) :: Nil) var a = spark.createDataFrame(sc.parallelize((1 to 10)).map(x => Row(x.toString, 555)), inputSchema) var b = a.select($"value", $"key") // any transformation changing column order will show the problem. a.union(c).show // in order to make it work, we need to reorder columns val bCols = a.columns.map(aCol => b(aCol)) a.union(b.select(bCols:_*)).show {code} > Not able to merge Dataframes with different column orders > --------------------------------------------------------- > > Key: SPARK-20660 > URL: https://issues.apache.org/jira/browse/SPARK-20660 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.0 > Reporter: Michel Lemay > Priority: Minor > > Union on two dataframes with different column orders is not supported and > lead to hard to find issues. > Here is an example showing the issue. > {code} > import org.apache.spark.sql.types._ > import org.apache.spark.sql.Row > var inputSchema = StructType(StructField("key", StringType, nullable=true) :: > StructField("value", IntegerType, nullable=true) :: Nil) > var a = spark.createDataFrame(sc.parallelize((1 to 10)).map(x => > Row(x.toString, 555)), inputSchema) > var b = a.select($"value", $"key") // any transformation changing column > order will show the problem. > a.union(b).show > // in order to make it work, we need to reorder columns > val bCols = a.columns.map(aCol => b(aCol)) > a.union(b.select(bCols:_*)).show > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org