[jira] [Updated] (HIVE-22077) Inserting overwrite partitions clause does not clean directories while partitions' info is not stored in metadata

2019-08-28 Thread Hui An (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui An updated HIVE-22077:
--
Attachment: HIVE-22077.patch.1
Status: Patch Available  (was: Open)

> Inserting overwrite partitions clause does not clean directories while 
> partitions' info is not stored in metadata
> -
>
> Key: HIVE-22077
> URL: https://issues.apache.org/jira/browse/HIVE-22077
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.4, 1.1.1, 4.0.0
>Reporter: Hui An
>Assignee: Hui An
>Priority: Major
> Attachments: HIVE-22077.patch.1
>
>
> Inserting overwrite static partitions may not clean related HDFS location if 
> partitions' info is not stored in metadata.
> Steps to reproduce this issue : 
> 
> 1. Create a managed table :
> 
> {code:sql}
>  CREATE TABLE `test`(   
>`id` string) 
>  PARTITIONED BY (   
>`dayno` string)  
>  ROW FORMAT SERDE   
>'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  
>  STORED AS INPUTFORMAT  
>'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  
>  OUTPUTFORMAT   
>'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' 
>  LOCATION   
>'hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test' 
>  TBLPROPERTIES (
>'transient_lastDdlTime'='1564731656')   
> {code}
> 
> 2. Create partition's directory and put some data in it
> 
> {code:java}
> hdfs dfs -mkdir 
> hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
> hdfs dfs -put test.data 
> hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
> {code}
> 
> 3. Insert overwrite partition dayno=20190802
> 
> {code:sql}
> INSERT OVERWRITE TABLE test PARTITION(dayno='20190802')
> SELECT "some value";
> {code}
> 
> 4. We could see the test.data under partition directory is not deleted.
> 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22077) Inserting overwrite partitions clause does not clean directories while partitions' info is not stored in metadata

2019-08-27 Thread Hui An (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui An updated HIVE-22077:
--
Description: 
Inserting overwrite static partitions may not clean related HDFS location if 
partitions' info is not stored in metadata.
Steps to reproduce this issue : 

1. Create a managed table :


{code:sql}
 CREATE TABLE `test`(   
   `id` string) 
 PARTITIONED BY (   
   `dayno` string)  
 ROW FORMAT SERDE   
   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  
 STORED AS INPUTFORMAT  
   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  
 OUTPUTFORMAT   
   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' 
 LOCATION   
   'hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test' 
 TBLPROPERTIES (
   'transient_lastDdlTime'='1564731656')   
{code}

2. Create partition's directory and put some data in it


{code:java}
hdfs dfs -mkdir 
hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
hdfs dfs -put test.data 
hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
{code}

3. Insert overwrite partition dayno=20190802


{code:sql}
INSERT OVERWRITE TABLE test PARTITION(dayno='20190802')
SELECT "some value";
{code}

4. We could see the test.data under partition directory is not deleted.


  was:
Inserting overwrite static partitions may not clean related HDFS location if 
partitions' info is not stored in metadata.
Steps to reproduce this issue : 

1. Create a managed table :


{code:sql}
 CREATE TABLE `test`(   
   `id` string) 
 PARTITIONED BY (   
   `dayno` string)  
 ROW FORMAT SERDE   
   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  
 STORED AS INPUTFORMAT  
   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  
 OUTPUTFORMAT   
   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' 
 LOCATION   |
   'hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test' 
 TBLPROPERTIES (
   'transient_lastDdlTime'='1564731656')   
{code}

2. Create partition's directory and put some data in it


{code:java}
hdfs dfs -mkdir 
hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
hdfs dfs -put test.data 
hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
{code}

3. Insert overwrite partition dayno=20190802


{code:sql}
INSERT OVERWRITE TABLE test PARTITION(dayno='20190802')
SELECT "some value";
{code}

4. We could see the test.data under partition directory is not deleted.



> Inserting overwrite partitions clause does not clean directories while 
> partitions' info is not stored in metadata
> -
>
> Key: HIVE-22077
> URL: https://issues.apache.org/jira/browse/HIVE-22077
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.1, 4.0.0, 2.3.4
>Reporter: Hui An
>Assignee: Hui An
>Priority: Major
>
> Inserting overwrite static partitions may not clean related HDFS location if 
> partitions' info is not stored in metadata.
> Steps to reproduce this issue : 
> 
> 1. Create a managed table :
> 
> {code:sql}
>  CREATE TABLE `test`(   
>`id` string) 
>  PARTITIONED BY (   
>`dayno` string)  
>  ROW FORMAT SERDE   
>'org.apache.hadoop.hive.ql.

[jira] [Updated] (HIVE-22077) Inserting overwrite partitions clause does not clean directories while partitions' info is not stored in metadata

2019-08-05 Thread Hui An (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui An updated HIVE-22077:
--
Description: 
Inserting overwrite static partitions may not clean related HDFS location if 
partitions' info is not stored in metadata.
Steps to reproduce this issue : 

1. Create a managed table :


{code:sql}
 CREATE TABLE `test`(   
   `id` string) 
 PARTITIONED BY (   
   `dayno` string)  
 ROW FORMAT SERDE   
   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  
 STORED AS INPUTFORMAT  
   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  
 OUTPUTFORMAT   
   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' 
 LOCATION   |
   'hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test' 
 TBLPROPERTIES (
   'transient_lastDdlTime'='1564731656')   
{code}

2. Create partition's directory and put some data in it


{code:java}
hdfs dfs -mkdir 
hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
hdfs dfs -put test.data 
hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
{code}

3. Insert overwrite partition dayno=20190802


{code:sql}
INSERT OVERWRITE TABLE test PARTITION(dayno='20190802')
SELECT "some value";
{code}

4. We could see the test.data under partition directory is not deleted.


  was:
Inserting overwrite static partitions may not clean related HDFS location if 
partitions' info is not stored in metadata.
Steps to Reproduce this issue : 

1. Create a managed table :


{code:sql}
 CREATE TABLE `test`(   
   `id` string) 
 PARTITIONED BY (   
   `dayno` string)  
 ROW FORMAT SERDE   
   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  
 STORED AS INPUTFORMAT  
   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  
 OUTPUTFORMAT   
   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' 
 LOCATION   |
   'hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test' 
 TBLPROPERTIES (
   'transient_lastDdlTime'='1564731656')   
{code}

2. Create partition's directory and put some data under it


{code:java}
hdfs dfs -mkdir 
hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
hdfs dfs -put test.data 
hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
{code}

3. Insert overwrite partition dayno=20190802


{code:sql}
INSERT OVERWRITE TABLE test PARTITION(dayno='20190802')
SELECT "some value";
{code}

4. We could see the test.data under partition directory is not deleted.



> Inserting overwrite partitions clause does not clean directories while 
> partitions' info is not stored in metadata
> -
>
> Key: HIVE-22077
> URL: https://issues.apache.org/jira/browse/HIVE-22077
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.1, 4.0.0, 2.3.4
>Reporter: Hui An
>Assignee: Hui An
>Priority: Major
>
> Inserting overwrite static partitions may not clean related HDFS location if 
> partitions' info is not stored in metadata.
> Steps to reproduce this issue : 
> 
> 1. Create a managed table :
> 
> {code:sql}
>  CREATE TABLE `test`(   
>`id` string) 
>  PARTITIONED BY (   
>`dayno` string)  
>  ROW FORMAT SERDE   
>'org.apache.hadoop.hive

[jira] [Updated] (HIVE-22077) Inserting overwrite partitions clause does not clean directories while partitions' info is not stored in metadata

2019-08-02 Thread Hui An (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui An updated HIVE-22077:
--
Description: 
Inserting overwrite static partitions may not clean related HDFS location if 
partitions' info is not stored in metadata.
Steps to Reproduce this issue : 

1. Create a managed table :


{code:sql}
 CREATE TABLE `test`(   
   `id` string) 
 PARTITIONED BY (   
   `dayno` string)  
 ROW FORMAT SERDE   
   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  
 STORED AS INPUTFORMAT  
   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  
 OUTPUTFORMAT   
   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' 
 LOCATION   |
   'hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test' 
 TBLPROPERTIES (
   'transient_lastDdlTime'='1564731656')   
{code}

2. Create partition's directory and put some data under it


{code:java}
hdfs dfs -mkdir 
hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
hdfs dfs -put test.data 
hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
{code}

3. Insert overwrite partition dayno=20190802


{code:sql}
INSERT OVERWRITE TABLE test PARTITION(dayno='20190802')
SELECT "some value";
{code}

4. We could see the test.data under partition directory is not deleted.


  was:
Inserting overwrite static partitions may not clean related HDFS location if 
partitions' info is not stored in metadata.
Steps to Reproduce this issue : 

1. Create a managed table :


{code:sql}
 CREATE TABLE `test`(   
   `id` string) 
 PARTITIONED BY (   
   `dayno` string)  
 ROW FORMAT SERDE   
   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  
 STORED AS INPUTFORMAT  
   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  
 OUTPUTFORMAT   
   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' 
 LOCATION   |
   'hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test' 
 TBLPROPERTIES (
   'transient_lastDdlTime'='1564731656')   
{code}

2. Create partition's directory and put some data under it


{code:java}
hdfs dfs -mkdir 
hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
hdfs dfs -put test.data 
hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
{code}

3. Insert overwrite partition dayno=20190802


{code:sql}
INSERT OVERWRITE TABLE test PARTITION(dayno='20190802')
SELECT 1;
{code}

4. We could see the test.data under partition directory is not deleted.



> Inserting overwrite partitions clause does not clean directories while 
> partitions' info is not stored in metadata
> -
>
> Key: HIVE-22077
> URL: https://issues.apache.org/jira/browse/HIVE-22077
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.1, 4.0.0, 2.3.4
>Reporter: Hui An
>Assignee: Hui An
>Priority: Major
>
> Inserting overwrite static partitions may not clean related HDFS location if 
> partitions' info is not stored in metadata.
> Steps to Reproduce this issue : 
> 
> 1. Create a managed table :
> 
> {code:sql}
>  CREATE TABLE `test`(   
>`id` string) 
>  PARTITIONED BY (   
>`dayno` string)  
>  ROW FORMAT SERDE   
>'org.apache.hadoop.hive.ql.io.o