[ 
https://issues.apache.org/jira/browse/HCATALOG-545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491157#comment-13491157
 ] 

Feng Peng commented on HCATALOG-545:
------------------------------------

We would propose to add a thread local variable to FileOutputCommitterContainer 
to collect the committed tables/partitions and roll them back when failures 
happen. Any suggestions?

Thanks!
Feng

                
> Improve failure recovery for FileOutputCommitterContainer
> ---------------------------------------------------------
>
>                 Key: HCATALOG-545
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-545
>             Project: HCatalog
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.4, 0.5
>            Reporter: Feng Peng
>
> When a M/R job creates partitions in multiple Hive tables, all partitions are 
> committed in the same cleanup task via multiple instances of the 
> FileOutputCommitterContainer.
> Currently, when one of the FileOutputCommitterContainer fails, the cleanup 
> task exits with failure and retries. However, the retry would be blocked by 
> "partition exists" error caused by the partial commits. 
> Instead, the cleanup task should roll back all previous commits to the 
> different tables in case of failure so that the next retry can continue.
> Also, if all retries of the cleanup taks fail, no partial commit should be 
> left in the Hive metastore.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to