ConfX created MAPREDUCE-7448:
--------------------------------

             Summary: Inconsistent Behavior for FileOutputCommitter V1 to 
commit successfully many times
                 Key: MAPREDUCE-7448
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7448
             Project: Hadoop Map/Reduce
          Issue Type: Bug
            Reporter: ConfX
         Attachments: reproduce.sh

h2. What happened

I turned on {{mapreduce.fileoutputcommitter.cleanup.skipped=true}} and then the 
version 1 of {{FileOutputCommitter}} can commit several times, which is 
unexpected.
h2. Where's the problem

In {{{}FileOutputCommitter.commitJobInternal{}}},
{noformat}
if (algorithmVersion == 1) {
        for (FileStatus stat: getAllCommittedTaskPaths(context)) {
          mergePaths(fs, stat, finalOutput, context);
        }
      }      if (skipCleanup) {
        LOG.info("Skip cleanup the _temporary folders under job's output " +
            "directory in commitJob.");
...{noformat}
Here if we skip cleanup, the _temporary folder would not be deleted and the 
_SUCCESS file would also not be created, which cause the {{mergePaths}} next 
time to not fail.
h2. How to reproduce
 # set {{{}mapreduce.fileoutputcommitter.cleanup.skipped{}}}={{{}true{}}}
 # run 
{{org.apache.hadoop.mapred.TestFileOutputCommitter#testCommitterWithDuplicatedCommitV1}}
you should observe
{noformat}
java.lang.AssertionError: Duplicate commit successful: wrong behavior for 
version 1.
    at org.junit.Assert.fail(Assert.java:89)
    at 
org.apache.hadoop.mapred.TestFileOutputCommitter.testCommitterWithDuplicatedCommitInternal(TestFileOutputCommitter.java:295)
    at 
org.apache.hadoop.mapred.TestFileOutputCommitter.testCommitterWithDuplicatedCommitV1(TestFileOutputCommitter.java:269){noformat}
For an easy reproduction, run the reproduce.sh in the attachment.

We are happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

Reply via email to