Jinhua Fu created SPARK-25717: --------------------------------- Summary: Insert overwrite a recreated external and partitioned table may result in incorrect query results Key: SPARK-25717 URL: https://issues.apache.org/jira/browse/SPARK-25717 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.2 Reporter: Jinhua Fu
Consider the following scenario: {code:java} spark.range(100).createTempView("temp") (0 until 3).foreach { _ => spark.sql("drop table if exists tableA") spark.sql("create table if not exists tableA(a int) partitioned by (p int) location 'file:/e:/study/warehouse/tableA'") spark.sql("insert overwrite table tableA partition(p=1) select * from temp") spark.sql("select count(1) from tableA where p=1").show } {code} We expect the count always be 100, but the actual results are as follows: {code:java} +--------+ |count(1)| +--------+ | 100| +--------+ +--------+ |count(1)| +--------+ | 200| +--------+ +--------+ |count(1)| +--------+ | 300| +--------+ {code} when spark executes an `insert overwrite` command, it gets the historical partition first, and then delete it from fileSystem. But for recreated external and partitioned table, the partitions were all deleted by the `drop table` command. So the historical data is preserved which lead to the query results incorrect. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org