[ https://issues.apache.org/jira/browse/HIVE-18927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
wangzhihao updated HIVE-18927: ------------------------------ Description: [This post|http://www.ericlin.me/2015/05/hive-insert-overwrite-does-not-remove-existing-data/] describe a way to produce this issue: {noformat} # Add some files into file system but no partition in metastore to track it. hdfs dfs -put test.txt test/p=p1 # Insert overwrite the partition(p = p1) DROP TABLE IF EXISTS partition_test; CREATE EXTERNAL TABLE partition_test (a int) PARTITIONED BY (p string); INSERT OVERWRITE TABLE partition_test PARTITION (p = 'p1') SELECT 123; # verify the text.txt is not removed. hdfs dfs -ls test/p=p1 Found 2 items -rwxr-xr-x 3 hdfs supergroup 194965 2015-05-05 00:15 test/p=p1/000000_0 -rw-r--r-- 3 hdfs supergroup 8 2015-05-05 00:10 test/p=p1/test.txt {noformat} The reason is that [Hive.loadPartition|https://github.com/apache/hive/blob/9b36ffa92cc4e0f47ea03d8d167debe743342f5b/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1652] will try to {{replaceFiles}} only if {{oldPath}} exists. Since metastore has no partition for the files, the {{oldPath}} is null and thus the files get no chance to be cleaned. We should also clean {{destf}} in method [Hive.replaceFiles|https://github.com/apache/hive/blob/b362de3871764731d8371657b07140e37a3c5105/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3817] to fix the issue. was: [This post|http://www.ericlin.me/2015/05/hive-insert-overwrite-does-not-remove-existing-data/] describe a way to produce this issue: {noformat} # Add some files into file system but no partition in metastore to track it. hdfs dfs -put test.txt test/p=p1 # Insert overwrite the partition(p = p1) DROP TABLE IF EXISTS partition_test; CREATE EXTERNAL TABLE partition_test (a int) PARTITIONED BY (p string); INSERT OVERWRITE TABLE partition_test PARTITION (p = 'p1') SELECT 123; # verify the text.txt is not removed. hdfs dfs -ls test/p=p1 Found 2 items -rwxr-xr-x 3 hdfs supergroup 194965 2015-05-05 00:15 test/p=p1/000000_0 -rw-r--r-- 3 hdfs supergroup 8 2015-05-05 00:10 test/p=p1/test.txt {noformat} The reason is that [Hive.loadPartition|https://github.com/apache/hive/blob/9b36ffa92cc4e0f47ea03d8d167debe743342f5b/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1652] will try to {{replaceFiles}} only if {{oldPath}} exists. Since metastore have no partition for the files, the {{oldPath}} is null and thus the files get no chance to be cleaned. We should also clean {{destf}} in method [Hive.replaceFiles|https://github.com/apache/hive/blob/b362de3871764731d8371657b07140e37a3c5105/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3817] to fix the issue. > Hive "insert overwrite" doesn't replace the destination files if no partition > in metastore for the files > -------------------------------------------------------------------------------------------------------- > > Key: HIVE-18927 > URL: https://issues.apache.org/jira/browse/HIVE-18927 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: wangzhihao > Priority: Major > > [This > post|http://www.ericlin.me/2015/05/hive-insert-overwrite-does-not-remove-existing-data/] > describe a way to produce this issue: > {noformat} > # Add some files into file system but no partition in metastore to track it. > hdfs dfs -put test.txt test/p=p1 > # Insert overwrite the partition(p = p1) > DROP TABLE IF EXISTS partition_test; > CREATE EXTERNAL TABLE partition_test (a int) PARTITIONED BY (p string); > INSERT OVERWRITE TABLE partition_test PARTITION (p = 'p1') SELECT 123; > # verify the text.txt is not removed. > hdfs dfs -ls test/p=p1 > Found 2 items > -rwxr-xr-x 3 hdfs supergroup 194965 2015-05-05 00:15 test/p=p1/000000_0 > -rw-r--r-- 3 hdfs supergroup 8 2015-05-05 00:10 test/p=p1/test.txt > {noformat} > The reason is that > [Hive.loadPartition|https://github.com/apache/hive/blob/9b36ffa92cc4e0f47ea03d8d167debe743342f5b/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1652] > will try to {{replaceFiles}} only if {{oldPath}} exists. Since metastore has > no partition for the files, the {{oldPath}} is null and thus the files get no > chance to be cleaned. We should also clean {{destf}} in method > [Hive.replaceFiles|https://github.com/apache/hive/blob/b362de3871764731d8371657b07140e37a3c5105/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3817] > to fix the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)