[ https://issues.apache.org/jira/browse/SPARK-18006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588113#comment-15588113 ]
Sean Owen commented on SPARK-18006: ----------------------------------- What is the bug here? the schemas of both things you union have the same schema, same ordering, matching the schema/ordering of the union. > When union, spark SQL didn't complain about schema mismatch > ----------------------------------------------------------- > > Key: SPARK-18006 > URL: https://issues.apache.org/jira/browse/SPARK-18006 > Project: Spark > Issue Type: Bug > Components: Java API > Affects Versions: 2.0.1 > Reporter: Shawn Zhang > Priority: Minor > > When union two Dataset<Row>, spark will check they have same number of > columns. But if the order of column is different, strange result will be > generated. > The output of the following code shows that column have being switched by > Spark. > ================= Code ============= > package test; > import java.util.ArrayList; > import java.util.List; > import org.apache.spark.sql.Dataset; > import org.apache.spark.sql.Row; > import org.apache.spark.sql.SparkSession; > import org.apache.spark.sql.types.DataTypes; > import org.apache.spark.sql.types.Metadata; > import org.apache.spark.sql.types.StructField; > import org.apache.spark.sql.types.StructType; > import audit_spark.SparkConfig; > public class SchemaBug { > public static class User { > > public User(long uid, long dateline) { > this.uid = uid; > this.dateline = dateline; > } > long uid; > long dateline; > public long getUid() { > return uid; > } > public void setUid(long uid) { > this.uid = uid; > } > public long getDateline() { > return dateline; > } > public void setDateline(long dateline) { > this.dateline = dateline; > } > > } > public static void main(String[] args) { > > SparkSession sparkSession = SparkSession > .builder() > .appName("test") > .config("spark.sql.warehouse.dir", "file:///") > .getOrCreate(); > > > StructType userSchema2 = new StructType(new StructField[]{ > new StructField("uid", DataTypes.LongType, > false, Metadata.empty()), > new StructField("dateline", DataTypes.LongType, > false, Metadata.empty()), > > }); > > List userList = new ArrayList(); > userList.add(new User(1, System.currentTimeMillis())); > userList.add(new User(2, System.currentTimeMillis())); > Dataset<Row> ds1 = > SparkConfig.sparkSession.createDataFrame(userList, User.class); > Dataset<Row> ds2 = SparkConfig.sparkSession.createDataFrame(new > ArrayList(), userSchema2); > ds2.union(ds1).show(); > } > } > =========== Program Output =============== > | uid|dateline| > |1476867071496| 1| > |1476867071496| 2| > =========== Expected Output =============== > | dateline |uid| > |1476867071496| 1| > |1476867071496| 2| -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org