[
https://issues.apache.org/jira/browse/HCATALOG-545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491157#comment-13491157
]
Feng Peng commented on HCATALOG-545:
------------------------------------
We would propose to add a thread local variable to FileOutputCommitterContainer
to collect the committed tables/partitions and roll them back when failures
happen. Any suggestions?
Thanks!
Feng
> Improve failure recovery for FileOutputCommitterContainer
> ---------------------------------------------------------
>
> Key: HCATALOG-545
> URL: https://issues.apache.org/jira/browse/HCATALOG-545
> Project: HCatalog
> Issue Type: Bug
> Components: mapreduce
> Affects Versions: 0.4, 0.5
> Reporter: Feng Peng
>
> When a M/R job creates partitions in multiple Hive tables, all partitions are
> committed in the same cleanup task via multiple instances of the
> FileOutputCommitterContainer.
> Currently, when one of the FileOutputCommitterContainer fails, the cleanup
> task exits with failure and retries. However, the retry would be blocked by
> "partition exists" error caused by the partial commits.
> Instead, the cleanup task should roll back all previous commits to the
> different tables in case of failure so that the next retry can continue.
> Also, if all retries of the cleanup taks fail, no partial commit should be
> left in the Hive metastore.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira