[ https://issues.apache.org/jira/browse/SPARK-21316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075129#comment-16075129 ]
Dongjoon Hyun commented on SPARK-21316: --------------------------------------- Union assumes the schema ordering are the same for both dataset. If you are interested with `unionByName`, please see SPARK-21043. > Dataset Union output is not consistent with the column sequence > --------------------------------------------------------------- > > Key: SPARK-21316 > URL: https://issues.apache.org/jira/browse/SPARK-21316 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL > Affects Versions: 2.1.0 > Reporter: Kaushal Prajapati > Priority: Critical > Labels: patch > Original Estimate: 168h > Remaining Estimate: 168h > > if i take union of 2 datasets with similar schema, the output should remain > same even if i change the sequence of columns while creating the dataset. > i am attaching the code snippet for details. > {code:java} > public class Person{ > public String name; > public String age; > public Person(String name, String age) { > this.name = name; > this.age = age; > } > public String getName() {return name;} > public void setName(String name) {this.name = name;} > public String getAge() {return age;} > public void setAge(String age) {this.age = age;} > } > {code} > {code:java} > public class Test { > public static void main(String arg[]) throws Exception { > SparkSession spark = SparkConnection.getSpark(); > List<Person> list1 = new ArrayList<>(); > list1.add(new Person("kaushal", "25")); > list1.add(new Person("aman", "26")); > List<Person> list2 = new ArrayList<>(); > list2.add(new Person("sapan", "25")); > list2.add(new Person("yati", "26")); > Dataset<Person> ds1 = spark.createDataset(list1, > Encoders.bean(Person.class)); > Dataset<Person> ds2 = spark.createDataset(list2, > Encoders.bean(Person.class)); > ds1.show(); > ds2.show(); > > ds1.select("name","age").as(Encoders.bean(Person.class)).union(ds2).show(); > } > } > {code} > output :- > {code:java} > +---+-------+ > |age| name| > +---+-------+ > | 25|kaushal| > | 26| aman| > +---+-------+ > +---+-----+ > |age| name| > +---+-----+ > | 25|sapan| > | 26| yati| > +---+-----+ > +-------+-----+ > | name| age| > +-------+-----+ > |kaushal| 25| > | aman| 26| > | 25|sapan| > | 26| yati| > +-------+-----+ > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org