[ https://issues.apache.org/jira/browse/HIVE-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075406#comment-16075406 ]
Sergio Peña commented on HIVE-17001: ------------------------------------ [~zsombor.klara] I didn't understand the test case. {noformat} # One partition dt='p1' with row ("a",1) is added insert into test_part partition(dt = 'p1') values ("a", 1); # Partition metadata is removed only (no data because it is an external table) alter table test_part drop partition (dt='p1'); # Data is moved dfs -mv ${system:test.tmp.dir}/test/dt=p1/000000_0 ${system:test.tmp.dir}/test/dt=p1/000000_1; # Partition is re-created with dt='p1" with row ("b",2) insert overwrite table test_part partition(dt = 'p1') values ("b", 2); # This is correct, only one row is seen because the row ("a",1) was moved to another location manually. # Where is the issue here? select * from test_part; {noformat} > Insert overwrite table doesn't clean partition directory on HDFS if partition > is missing from HMS > ------------------------------------------------------------------------------------------------- > > Key: HIVE-17001 > URL: https://issues.apache.org/jira/browse/HIVE-17001 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Metastore > Reporter: Barna Zsombor Klara > Assignee: Barna Zsombor Klara > Attachments: HIVE-17001.01.patch > > > Insert overwrite table should clear existing data before creating the new > data files. > For a partitioned table we will clean any folder of existing partitions on > HDFS, however if the partition folder exists only on HDFS and the partition > definition is missing in HMS, the folder is not cleared. > Reproduction steps: > 1. CREATE TABLE test( col1 string) PARTITIONED BY (ds string); > 2. INSERT INTO test PARTITION(ds='p1') values ('a'); > 3. Copy the data to a different folder with different name. > 4. ALTER TABLE test DROP PARTITION (ds='p1'); > 5. Recreate the partition directory, copy and rename the data file back > 6. INSERT INTO test PARTITION(ds='p1') values ('b'); > 7. SELECT * from test; > will result in 2 records being returned instead of 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)