[ 
https://issues.apache.org/jira/browse/HIVE-18927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangzhihao updated HIVE-18927:
------------------------------
    Description: 
[This 
post|http://www.ericlin.me/2015/05/hive-insert-overwrite-does-not-remove-existing-data/]
 describe a way to produce this issue:
{noformat}
# Add some files into file system but no partition in metastore to track it.
hdfs dfs -put test.txt test/p=p1

# Insert overwrite the partition(p = p1)
DROP TABLE IF EXISTS partition_test;
CREATE EXTERNAL TABLE partition_test (a int) PARTITIONED BY (p string);
INSERT OVERWRITE TABLE partition_test PARTITION (p = 'p1') SELECT 123;

# verify the text.txt is not removed.
hdfs dfs -ls test/p=p1
Found 2 items
-rwxr-xr-x   3 hdfs supergroup     194965 2015-05-05 00:15 test/p=p1/000000_0
-rw-r--r--   3 hdfs supergroup          8 2015-05-05 00:10 test/p=p1/test.txt
{noformat}
The reason is that 
[Hive.loadPartition|https://github.com/apache/hive/blob/9b36ffa92cc4e0f47ea03d8d167debe743342f5b/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1652]
 will try to {{replaceFiles}} only if {{oldPath}} exists. Since metastore has 
no partition for the files, the {{oldPath}} is null and thus the files get no 
chance to be cleaned. We should also clean {{destf}} in method 
[Hive.replaceFiles|https://github.com/apache/hive/blob/b362de3871764731d8371657b07140e37a3c5105/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3817]
 to fix the issue.

  was:
[This 
post|http://www.ericlin.me/2015/05/hive-insert-overwrite-does-not-remove-existing-data/]
 describe a way to produce this issue:
{noformat}
# Add some files into file system but no partition in metastore to track it.
hdfs dfs -put test.txt test/p=p1

# Insert overwrite the partition(p = p1)
DROP TABLE IF EXISTS partition_test;
CREATE EXTERNAL TABLE partition_test (a int) PARTITIONED BY (p string);
INSERT OVERWRITE TABLE partition_test PARTITION (p = 'p1') SELECT 123;

# verify the text.txt is not removed.
hdfs dfs -ls test/p=p1
Found 2 items
-rwxr-xr-x   3 hdfs supergroup     194965 2015-05-05 00:15 test/p=p1/000000_0
-rw-r--r--   3 hdfs supergroup          8 2015-05-05 00:10 test/p=p1/test.txt
{noformat}
The reason is that 
[Hive.loadPartition|https://github.com/apache/hive/blob/9b36ffa92cc4e0f47ea03d8d167debe743342f5b/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1652]
 will try to {{replaceFiles}} only if {{oldPath}} exists. Since metastore have 
no partition for the files, the {{oldPath}} is null and thus the files get no 
chance to be cleaned. We should also clean {{destf}} in method 
[Hive.replaceFiles|https://github.com/apache/hive/blob/b362de3871764731d8371657b07140e37a3c5105/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3817]
 to fix the issue.


> Hive "insert overwrite" doesn't replace the destination files if no partition 
> in metastore for the files
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-18927
>                 URL: https://issues.apache.org/jira/browse/HIVE-18927
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: wangzhihao
>            Priority: Major
>
> [This 
> post|http://www.ericlin.me/2015/05/hive-insert-overwrite-does-not-remove-existing-data/]
>  describe a way to produce this issue:
> {noformat}
> # Add some files into file system but no partition in metastore to track it.
> hdfs dfs -put test.txt test/p=p1
> # Insert overwrite the partition(p = p1)
> DROP TABLE IF EXISTS partition_test;
> CREATE EXTERNAL TABLE partition_test (a int) PARTITIONED BY (p string);
> INSERT OVERWRITE TABLE partition_test PARTITION (p = 'p1') SELECT 123;
> # verify the text.txt is not removed.
> hdfs dfs -ls test/p=p1
> Found 2 items
> -rwxr-xr-x   3 hdfs supergroup     194965 2015-05-05 00:15 test/p=p1/000000_0
> -rw-r--r--   3 hdfs supergroup          8 2015-05-05 00:10 test/p=p1/test.txt
> {noformat}
> The reason is that 
> [Hive.loadPartition|https://github.com/apache/hive/blob/9b36ffa92cc4e0f47ea03d8d167debe743342f5b/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1652]
>  will try to {{replaceFiles}} only if {{oldPath}} exists. Since metastore has 
> no partition for the files, the {{oldPath}} is null and thus the files get no 
> chance to be cleaned. We should also clean {{destf}} in method 
> [Hive.replaceFiles|https://github.com/apache/hive/blob/b362de3871764731d8371657b07140e37a3c5105/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3817]
>  to fix the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to