[
https://issues.apache.org/jira/browse/HCATALOG-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449077#comment-13449077
]
Rohini Palaniswamy commented on HCATALOG-451:
---------------------------------------------
The patch is being held on trunk because we want to throw an exception when
cleanupJob() is called. I don't see a point in having to throw exception for
cleanupJob and performing cleanup action on either abortJob() or cleanupJob().
In every component in hadoop stack, we have multiple cases(shims) to work with
multiple versions of hadoop. It is not something worse to have code to work
with multiple versions of pig. To throw exception for cleanupJob and say only
abortJob should be invoked, we will have to have pig-0.10.1 released in
opensource with PIG-2712, fix hive-0.10 to use the antlr version pig-0.10 uses
and then change dependency to pig-0.10.1 in hcat. It is not worth holding the
patch till that happens.
We should commit this patch as there is nothing actually blocking it and is a
critical fix and open a separate jira to throw exception when we move to
pig-0.10.1. Else this patch will soon go out of sync and more time would have
to be spent rebasing it.
> Partitions are created even when Jobs are aborted
> -------------------------------------------------
>
> Key: HCATALOG-451
> URL: https://issues.apache.org/jira/browse/HCATALOG-451
> Project: HCatalog
> Issue Type: Bug
> Components: mapreduce
> Affects Versions: 0.4, 0.5
> Environment: Hadoop 1.0.2, non-dynamic partitions.
> Reporter: Mithun Radhakrishnan
> Assignee: Vandana Ayyalasomayajula
> Fix For: 0.4.1
>
> Attachments: HCAT-451-trunk.02.patch, HCATALOG-451.0.patch,
> HCATALOG-451-branch-0.4.02.patch, HCATALOG-451-branch-0.4.03.patch,
> HCATALOG-451-branch-0.4.patch
>
>
> If an MR job using HCatOutputFormat fails, and
> FileOutputCommitterContainer::abortJob() is called, one would expect that
> partitions aren't created/registered with HCatalog.
> When using dynamic-partitions, one sees that this behaves correctly. But when
> static-partitions are used, partitions are created regardless of whether the
> Job succeeded or failed.
> (This manifested as a failure when the job is repeated. The retry-job fails
> to launch since the partitions already exist from the last failed run.)
> This is a result of bad code in FileOutputCommitter::cleanupJob(), which
> seems to do an unconditional partition-add. This can be fixed by adding a
> check for the output directory before adding partitions (in the
> !dynamicParititoning case), since the directory is removed in abortJob().
> We'll have a patch for this shortly. As an aside, we ought to move the
> partition-creation into commitJob(), where it logically belongs. cleanupJob()
> is deprecated and common to both success and failure code paths.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira