[
https://issues.apache.org/jira/browse/HCATALOG-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418528#comment-13418528
]
Vandana Ayyalasomayajula commented on HCATALOG-451:
---------------------------------------------------
I did the following to reproduce the issue:
1. Create table: create table testPtn( id string, age int) PARTITIONED BY
(datestamp string);
2. Load some data with Pig: B = LOAD '/tmp/input.txt' USING
PigStorage('\u0001') AS ( id:chararray, age:int);
3. Store the data in partition: store B into 'testPtn' USING
org.apache.hcatalog.pig.HCatStorer('datestamp=20120530');
4. While the job was still running, I killed the hadoop job.
Then, when I do:
hive> show partitions testPtn;
OK
datestamp=20120530
Time taken: 1.431 seconds
Since the job was aborted, the partition should never have been registered with
the metastore. But it does,as the committer calls clean up which actually
registers the partition which checking if it actually exists on filesystem.
> Partitions are created even when Jobs are aborted
> -------------------------------------------------
>
> Key: HCATALOG-451
> URL: https://issues.apache.org/jira/browse/HCATALOG-451
> Project: HCatalog
> Issue Type: Bug
> Components: mapreduce
> Affects Versions: 0.4
> Environment: Hadoop 1.0.2, non-dynamic partitions.
> Reporter: Mithun Radhakrishnan
> Fix For: 0.4.1
>
>
> If an MR job using HCatOutputFormat fails, and
> FileOutputCommitterContainer::abortJob() is called, one would expect that
> partitions aren't created/registered with HCatalog.
> When using dynamic-partitions, one sees that this behaves correctly. But when
> static-partitions are used, partitions are created regardless of whether the
> Job succeeded or failed.
> (This manifested as a failure when the job is repeated. The retry-job fails
> to launch since the partitions already exist from the last failed run.)
> This is a result of bad code in FileOutputCommitter::cleanupJob(), which
> seems to do an unconditional partition-add. This can be fixed by adding a
> check for the output directory before adding partitions (in the
> !dynamicParititoning case), since the directory is removed in abortJob().
> We'll have a patch for this shortly. As an aside, we ought to move the
> partition-creation into commitJob(), where it logically belongs. cleanupJob()
> is deprecated and common to both success and failure code paths.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira