[jira] [Commented] (HIVE-17001) Insert overwrite table doesn't clean partition directory on HDFS if partition is missing from HMS

2017-07-06 Thread Barna Zsombor Klara (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076209#comment-16076209
 ] 

Barna Zsombor Klara commented on HIVE-17001:


[~spena] No, the file is not moved outside in the test just renamed from 
00_0 to 00_1, it is there in the same partition directory (unless I 
made a typo I'm not seeing right now).

[~ngangam] Yes I made a mistake in the description, the insert should be an 
insert overwrite table, let me correct it. But the behaviour is the same, the 
datafile is not overwritten with insert overwrite either.

> Insert overwrite table doesn't clean partition directory on HDFS if partition 
> is missing from HMS
> -
>
> Key: HIVE-17001
> URL: https://issues.apache.org/jira/browse/HIVE-17001
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-17001.01.patch
>
>
> Insert overwrite table should clear existing data before creating the new 
> data files.
> For a partitioned table we will clean any folder of existing partitions on 
> HDFS, however if the partition folder exists only on HDFS and the partition 
> definition is missing in HMS, the folder is not cleared.
> Reproduction steps:
> 1. CREATE TABLE test( col1 string) PARTITIONED BY (ds string);
> 2. INSERT INTO test PARTITION(ds='p1') values ('a');
> 3. Copy the data to a different folder with different name.
> 4. ALTER TABLE test DROP PARTITION (ds='p1');
> 5. Recreate the partition directory, copy and rename the data file back
> 6. INSERT OVERWRITE TABLE test PARTITION(ds='p1') values ('b');
> 7. SELECT * from test;
> will result in 2 records being returned instead of 1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17001) Insert overwrite table doesn't clean partition directory on HDFS if partition is missing from HMS

2017-07-05 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075421#comment-16075421
 ] 

Naveen Gangam commented on HIVE-17001:
--

[~zsombor.klara] Quick qs on the issue. I am a bit confused between the jira 
summary and the reproducer. Summary says "insert overwrite" but the reproducer 
does not use "insert overwrite". So I am wondering if the reproducer is 
intended to be the same as written.

I am not sure if this is a bug. Say, you execute the following
INSERT INTO test PARTITION(ds='p1') values ('a');
INSERT INTO test PARTITION(ds='p1') values ('a');

The resultant partition directory should contain 2 data files and a select * on 
the table should return 2 rows. This is by design.
The testcase in this jira is semantically similar to the case above, where you 
have some existing data in a partition and you are inserting additional data. 
Would you agree?

Normally, step 4 of the reproducer should have deleted the data for the 
partition, had it existed. But I think it is legal to manage some or all of the 
partition data externally, as well. 

Am I making sense? Thanks

> Insert overwrite table doesn't clean partition directory on HDFS if partition 
> is missing from HMS
> -
>
> Key: HIVE-17001
> URL: https://issues.apache.org/jira/browse/HIVE-17001
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-17001.01.patch
>
>
> Insert overwrite table should clear existing data before creating the new 
> data files.
> For a partitioned table we will clean any folder of existing partitions on 
> HDFS, however if the partition folder exists only on HDFS and the partition 
> definition is missing in HMS, the folder is not cleared.
> Reproduction steps:
> 1. CREATE TABLE test( col1 string) PARTITIONED BY (ds string);
> 2. INSERT INTO test PARTITION(ds='p1') values ('a');
> 3. Copy the data to a different folder with different name.
> 4. ALTER TABLE test DROP PARTITION (ds='p1');
> 5. Recreate the partition directory, copy and rename the data file back
> 6. INSERT INTO test PARTITION(ds='p1') values ('b');
> 7. SELECT * from test;
> will result in 2 records being returned instead of 1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17001) Insert overwrite table doesn't clean partition directory on HDFS if partition is missing from HMS

2017-07-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075406#comment-16075406
 ] 

Sergio Peña commented on HIVE-17001:


[~zsombor.klara] I didn't understand the test case.

{noformat}
# One partition dt='p1' with row ("a",1) is added
insert into test_part partition(dt = 'p1') values ("a", 1);

# Partition metadata is removed only (no data because it is an external table)
alter table test_part drop partition (dt='p1');

# Data is moved
dfs -mv ${system:test.tmp.dir}/test/dt=p1/00_0 
${system:test.tmp.dir}/test/dt=p1/00_1;

# Partition is re-created with dt='p1" with row ("b",2)
insert overwrite table test_part partition(dt = 'p1') values ("b", 2);

# This is correct, only one row is seen because the row ("a",1) was moved to 
another location manually.
# Where is the issue here?
select * from test_part;
{noformat}



> Insert overwrite table doesn't clean partition directory on HDFS if partition 
> is missing from HMS
> -
>
> Key: HIVE-17001
> URL: https://issues.apache.org/jira/browse/HIVE-17001
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-17001.01.patch
>
>
> Insert overwrite table should clear existing data before creating the new 
> data files.
> For a partitioned table we will clean any folder of existing partitions on 
> HDFS, however if the partition folder exists only on HDFS and the partition 
> definition is missing in HMS, the folder is not cleared.
> Reproduction steps:
> 1. CREATE TABLE test( col1 string) PARTITIONED BY (ds string);
> 2. INSERT INTO test PARTITION(ds='p1') values ('a');
> 3. Copy the data to a different folder with different name.
> 4. ALTER TABLE test DROP PARTITION (ds='p1');
> 5. Recreate the partition directory, copy and rename the data file back
> 6. INSERT INTO test PARTITION(ds='p1') values ('b');
> 7. SELECT * from test;
> will result in 2 records being returned instead of 1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17001) Insert overwrite table doesn't clean partition directory on HDFS if partition is missing from HMS

2017-06-30 Thread Barna Zsombor Klara (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070265#comment-16070265
 ] 

Barna Zsombor Klara commented on HIVE-17001:


Failures should not be related or are already covered in other Jiras:
HIVE-16931, HIVE-16959, HIVE-15165, HIVE-16908.

> Insert overwrite table doesn't clean partition directory on HDFS if partition 
> is missing from HMS
> -
>
> Key: HIVE-17001
> URL: https://issues.apache.org/jira/browse/HIVE-17001
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-17001.01.patch
>
>
> Insert overwrite table should clear existing data before creating the new 
> data files.
> For a partitioned table we will clean any folder of existing partitions on 
> HDFS, however if the partition folder exists only on HDFS and the partition 
> definition is missing in HMS, the folder is not cleared.
> Reproduction steps:
> 1. CREATE TABLE test( col1 string) PARTITIONED BY (ds string);
> 2. INSERT INTO test PARTITION(ds='p1') values ('a');
> 3. Copy the data to a different folder with different name.
> 4. ALTER TABLE test DROP PARTITION (ds='p1');
> 5. Recreate the partition directory, copy and rename the data file back
> 6. INSERT INTO test PARTITION(ds='p1') values ('b');
> 7. SELECT * from test;
> will result in 2 records being returned instead of 1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17001) Insert overwrite table doesn't clean partition directory on HDFS if partition is missing from HMS

2017-06-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069970#comment-16069970
 ] 

Hive QA commented on HIVE-17001:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12875214/HIVE-17001.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10825 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=239)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=216)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
org.apache.hive.spark.client.rpc.TestRpc.testClientTimeout (batchId=285)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5850/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5850/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5850/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12875214 - PreCommit-HIVE-Build

> Insert overwrite table doesn't clean partition directory on HDFS if partition 
> is missing from HMS
> -
>
> Key: HIVE-17001
> URL: https://issues.apache.org/jira/browse/HIVE-17001
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-17001.01.patch
>
>
> Insert overwrite table should clear existing data before creating the new 
> data files.
> For a partitioned table we will clean any folder of existing partitions on 
> HDFS, however if the partition folder exists only on HDFS and the partition 
> definition is missing in HMS, the folder is not cleared.
> Reproduction steps:
> 1. CREATE TABLE test( col1 string) PARTITIONED BY (ds string);
> 2. INSERT INTO test PARTITION(ds='p1') values ('a');
> 3. Copy the data to a different folder with different name.
> 4. ALTER TABLE test DROP PARTITION (ds='p1');
> 5. Recreate the partition directory, copy and rename the data file back
> 6. INSERT INTO test PARTITION(ds='p1') values ('b');
> 7. SELECT * from test;
> will result in 2 records being returned instead of 1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)