[jira] [Commented] (HIVE-17001) Insert overwrite table doesn't clean partition directory on HDFS if partition is missing from HMS
[ https://issues.apache.org/jira/browse/HIVE-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076209#comment-16076209 ] Barna Zsombor Klara commented on HIVE-17001: [~spena] No, the file is not moved outside in the test just renamed from 00_0 to 00_1, it is there in the same partition directory (unless I made a typo I'm not seeing right now). [~ngangam] Yes I made a mistake in the description, the insert should be an insert overwrite table, let me correct it. But the behaviour is the same, the datafile is not overwritten with insert overwrite either. > Insert overwrite table doesn't clean partition directory on HDFS if partition > is missing from HMS > - > > Key: HIVE-17001 > URL: https://issues.apache.org/jira/browse/HIVE-17001 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Metastore >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Attachments: HIVE-17001.01.patch > > > Insert overwrite table should clear existing data before creating the new > data files. > For a partitioned table we will clean any folder of existing partitions on > HDFS, however if the partition folder exists only on HDFS and the partition > definition is missing in HMS, the folder is not cleared. > Reproduction steps: > 1. CREATE TABLE test( col1 string) PARTITIONED BY (ds string); > 2. INSERT INTO test PARTITION(ds='p1') values ('a'); > 3. Copy the data to a different folder with different name. > 4. ALTER TABLE test DROP PARTITION (ds='p1'); > 5. Recreate the partition directory, copy and rename the data file back > 6. INSERT OVERWRITE TABLE test PARTITION(ds='p1') values ('b'); > 7. SELECT * from test; > will result in 2 records being returned instead of 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17001) Insert overwrite table doesn't clean partition directory on HDFS if partition is missing from HMS
[ https://issues.apache.org/jira/browse/HIVE-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075421#comment-16075421 ] Naveen Gangam commented on HIVE-17001: -- [~zsombor.klara] Quick qs on the issue. I am a bit confused between the jira summary and the reproducer. Summary says "insert overwrite" but the reproducer does not use "insert overwrite". So I am wondering if the reproducer is intended to be the same as written. I am not sure if this is a bug. Say, you execute the following INSERT INTO test PARTITION(ds='p1') values ('a'); INSERT INTO test PARTITION(ds='p1') values ('a'); The resultant partition directory should contain 2 data files and a select * on the table should return 2 rows. This is by design. The testcase in this jira is semantically similar to the case above, where you have some existing data in a partition and you are inserting additional data. Would you agree? Normally, step 4 of the reproducer should have deleted the data for the partition, had it existed. But I think it is legal to manage some or all of the partition data externally, as well. Am I making sense? Thanks > Insert overwrite table doesn't clean partition directory on HDFS if partition > is missing from HMS > - > > Key: HIVE-17001 > URL: https://issues.apache.org/jira/browse/HIVE-17001 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Metastore >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Attachments: HIVE-17001.01.patch > > > Insert overwrite table should clear existing data before creating the new > data files. > For a partitioned table we will clean any folder of existing partitions on > HDFS, however if the partition folder exists only on HDFS and the partition > definition is missing in HMS, the folder is not cleared. > Reproduction steps: > 1. CREATE TABLE test( col1 string) PARTITIONED BY (ds string); > 2. INSERT INTO test PARTITION(ds='p1') values ('a'); > 3. Copy the data to a different folder with different name. > 4. ALTER TABLE test DROP PARTITION (ds='p1'); > 5. Recreate the partition directory, copy and rename the data file back > 6. INSERT INTO test PARTITION(ds='p1') values ('b'); > 7. SELECT * from test; > will result in 2 records being returned instead of 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17001) Insert overwrite table doesn't clean partition directory on HDFS if partition is missing from HMS
[ https://issues.apache.org/jira/browse/HIVE-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075406#comment-16075406 ] Sergio Peña commented on HIVE-17001: [~zsombor.klara] I didn't understand the test case. {noformat} # One partition dt='p1' with row ("a",1) is added insert into test_part partition(dt = 'p1') values ("a", 1); # Partition metadata is removed only (no data because it is an external table) alter table test_part drop partition (dt='p1'); # Data is moved dfs -mv ${system:test.tmp.dir}/test/dt=p1/00_0 ${system:test.tmp.dir}/test/dt=p1/00_1; # Partition is re-created with dt='p1" with row ("b",2) insert overwrite table test_part partition(dt = 'p1') values ("b", 2); # This is correct, only one row is seen because the row ("a",1) was moved to another location manually. # Where is the issue here? select * from test_part; {noformat} > Insert overwrite table doesn't clean partition directory on HDFS if partition > is missing from HMS > - > > Key: HIVE-17001 > URL: https://issues.apache.org/jira/browse/HIVE-17001 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Metastore >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Attachments: HIVE-17001.01.patch > > > Insert overwrite table should clear existing data before creating the new > data files. > For a partitioned table we will clean any folder of existing partitions on > HDFS, however if the partition folder exists only on HDFS and the partition > definition is missing in HMS, the folder is not cleared. > Reproduction steps: > 1. CREATE TABLE test( col1 string) PARTITIONED BY (ds string); > 2. INSERT INTO test PARTITION(ds='p1') values ('a'); > 3. Copy the data to a different folder with different name. > 4. ALTER TABLE test DROP PARTITION (ds='p1'); > 5. Recreate the partition directory, copy and rename the data file back > 6. INSERT INTO test PARTITION(ds='p1') values ('b'); > 7. SELECT * from test; > will result in 2 records being returned instead of 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17001) Insert overwrite table doesn't clean partition directory on HDFS if partition is missing from HMS
[ https://issues.apache.org/jira/browse/HIVE-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070265#comment-16070265 ] Barna Zsombor Klara commented on HIVE-17001: Failures should not be related or are already covered in other Jiras: HIVE-16931, HIVE-16959, HIVE-15165, HIVE-16908. > Insert overwrite table doesn't clean partition directory on HDFS if partition > is missing from HMS > - > > Key: HIVE-17001 > URL: https://issues.apache.org/jira/browse/HIVE-17001 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Metastore >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Attachments: HIVE-17001.01.patch > > > Insert overwrite table should clear existing data before creating the new > data files. > For a partitioned table we will clean any folder of existing partitions on > HDFS, however if the partition folder exists only on HDFS and the partition > definition is missing in HMS, the folder is not cleared. > Reproduction steps: > 1. CREATE TABLE test( col1 string) PARTITIONED BY (ds string); > 2. INSERT INTO test PARTITION(ds='p1') values ('a'); > 3. Copy the data to a different folder with different name. > 4. ALTER TABLE test DROP PARTITION (ds='p1'); > 5. Recreate the partition directory, copy and rename the data file back > 6. INSERT INTO test PARTITION(ds='p1') values ('b'); > 7. SELECT * from test; > will result in 2 records being returned instead of 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17001) Insert overwrite table doesn't clean partition directory on HDFS if partition is missing from HMS
[ https://issues.apache.org/jira/browse/HIVE-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069970#comment-16069970 ] Hive QA commented on HIVE-17001: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12875214/HIVE-17001.01.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10825 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver (batchId=239) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS (batchId=216) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) org.apache.hive.spark.client.rpc.TestRpc.testClientTimeout (batchId=285) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5850/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5850/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5850/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12875214 - PreCommit-HIVE-Build > Insert overwrite table doesn't clean partition directory on HDFS if partition > is missing from HMS > - > > Key: HIVE-17001 > URL: https://issues.apache.org/jira/browse/HIVE-17001 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Metastore >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Attachments: HIVE-17001.01.patch > > > Insert overwrite table should clear existing data before creating the new > data files. > For a partitioned table we will clean any folder of existing partitions on > HDFS, however if the partition folder exists only on HDFS and the partition > definition is missing in HMS, the folder is not cleared. > Reproduction steps: > 1. CREATE TABLE test( col1 string) PARTITIONED BY (ds string); > 2. INSERT INTO test PARTITION(ds='p1') values ('a'); > 3. Copy the data to a different folder with different name. > 4. ALTER TABLE test DROP PARTITION (ds='p1'); > 5. Recreate the partition directory, copy and rename the data file back > 6. INSERT INTO test PARTITION(ds='p1') values ('b'); > 7. SELECT * from test; > will result in 2 records being returned instead of 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)