[ 
https://issues.apache.org/jira/browse/HIVE-27951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27951:
----------------------------------
    Status: Patch Available  (was: Open)

> hcatalog dynamic partitioning fails with partition already exist error when 
> exist parent partitions path
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-27951
>                 URL: https://issues.apache.org/jira/browse/HIVE-27951
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>    Affects Versions: 4.0.0-beta-1
>            Reporter: Yi Zhang
>            Assignee: Yi Zhang
>            Priority: Critical
>              Labels: pull-request-available
>
> if a table have multiple partitions (part1=x1, part2=y1), when insert into a 
> new partition(part1=x1, part2=y2) hcatalog FileOutputCommitterContainer 
> throws path already exists error
>  
> reproduce:
> create table source(id int, part1 string, part2 string);
> create table target(id int) partitioned by (part1 string, part2 string)
> insert into table source values (1, "x1", "y1"), (2, "x1", "y2");
>  
> pig -useHcatalog
> A = load 'source' using org.apache.hive.hcatalog.pig.HCatLoader();
> B = filter A by (part2 == 'y1');
> // following succeeds
> store B into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer();
> //following fails with duplicate publishing error
> C = filter A by (part2 == 'y2');
> store C into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer();
>  
>  
> ```
> Partition already present with given partition key values : Data already 
> exists in /user/hive/warehouse/target_data/part1=x1, duplicate publish not 
> possible.
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:243)
> at 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286)
>  
> Caused by: org.apache.hive.hcatalog.common.HCatException : 2002 : Partition 
> already present with given partition key values : Data already exists in 
> /user/hive/warehouse/target_data/part1=x1, duplicate publish not possible.
> at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:564)
> at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:949)
> at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:273)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:241)
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to