[ https://issues.apache.org/jira/browse/SPARK-23493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiaoju Wu resolved SPARK-23493. ------------------------------- Resolution: Not A Bug > insert-into depends on columns order, otherwise incorrect data inserted > ----------------------------------------------------------------------- > > Key: SPARK-23493 > URL: https://issues.apache.org/jira/browse/SPARK-23493 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.1 > Reporter: Xiaoju Wu > Priority: Minor > > insert-into only works when the partitionby key columns are set at last: > val data = Seq( > (7, "test1", 1.0), > (8, "test#test", 0.0), > (9, "test3", 0.0) > ) > import spark.implicits._ > val table = "default.tbl" > spark > .createDataset(data) > .toDF("col1", "col2", "col3") > .write > .partitionBy("col1") > .saveAsTable(table) > val data2 = Seq( > (7, "test2", 1.0), > (8, "test#test", 0.0), > (9, "test3", 0.0) > ) > spark > .createDataset(data2) > .toDF("col1", "col2", "col3") > .write > .insertInto(table) > sql("select * from " + table).show() > +------------+-++---- > |col2|col3|col1| > +------------+-++---- > |test#test|0.0|8| > |test1|1.0|7| > |test3|0.0|9| > |8|null|0| > |9|null|0| > |7|null|1| > +------------+-++---- > > If you try inserting with sql, the issue is the same. > val data = Seq( > (7, "test1", 1.0), > (8, "test#test", 0.0), > (9, "test3", 0.0) > ) > import spark.implicits._ > val table = "default.tbl" > spark > .createDataset(data) > .toDF("col1", "col2", "col3") > .write > .partitionBy("col1") > .saveAsTable(table) > val data2 = Seq( > (7, "test2", 1.0), > (8, "test#test", 0.0), > (9, "test3", 0.0) > ) > sql("insert into " + table + " values(7,'test2',1.0)") > sql("select * from " + table).show() > +----------+---++---- > |col2|col3|col1| > +----------+---++---- > |test#test|0.0|8| > |test1|1.0|7| > |test3|0.0|9| > |7|null|1| > +----------+---++---- > No exception was thrown since I only run insertInto, not together with > partitionBy. The data are inserted incorrectly. The issue is related to > column order. If I change to partitionBy col3, which is the last column, it > works. > val data = Seq( > (7, "test1", 1.0), > (8, "test#test", 0.0), > (9, "test3", 0.0) > ) > import spark.implicits._ > val table = "default.tbl" > spark > .createDataset(data) > .toDF("col1", "col2", "col3") > .write > .partitionBy("col3") > .saveAsTable(table) > val data2 = Seq( > (7, "test2", 1.0), > (8, "test#test", 0.0), > (9, "test3", 0.0) > ) > spark > .createDataset(data2) > .toDF("col1", "col2", "col3") > .write > .insertInto(table) > sql("select * from " + table).show() > +-------+------++---- > |col1|col2|col3| > +-------+------++---- > |8|test#test|0.0| > |9|test3|0.0| > |8|test#test|0.0| > |9|test3|0.0| > |7|test1|1.0| > |7|test2|1.0| > +-------+------++---- -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org