ConfX created MAPREDUCE-7448: -------------------------------- Summary: Inconsistent Behavior for FileOutputCommitter V1 to commit successfully many times Key: MAPREDUCE-7448 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7448 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: ConfX Attachments: reproduce.sh
h2. What happened I turned on {{mapreduce.fileoutputcommitter.cleanup.skipped=true}} and then the version 1 of {{FileOutputCommitter}} can commit several times, which is unexpected. h2. Where's the problem In {{{}FileOutputCommitter.commitJobInternal{}}}, {noformat} if (algorithmVersion == 1) { for (FileStatus stat: getAllCommittedTaskPaths(context)) { mergePaths(fs, stat, finalOutput, context); } } if (skipCleanup) { LOG.info("Skip cleanup the _temporary folders under job's output " + "directory in commitJob."); ...{noformat} Here if we skip cleanup, the _temporary folder would not be deleted and the _SUCCESS file would also not be created, which cause the {{mergePaths}} next time to not fail. h2. How to reproduce # set {{{}mapreduce.fileoutputcommitter.cleanup.skipped{}}}={{{}true{}}} # run {{org.apache.hadoop.mapred.TestFileOutputCommitter#testCommitterWithDuplicatedCommitV1}} you should observe {noformat} java.lang.AssertionError: Duplicate commit successful: wrong behavior for version 1. at org.junit.Assert.fail(Assert.java:89) at org.apache.hadoop.mapred.TestFileOutputCommitter.testCommitterWithDuplicatedCommitInternal(TestFileOutputCommitter.java:295) at org.apache.hadoop.mapred.TestFileOutputCommitter.testCommitterWithDuplicatedCommitV1(TestFileOutputCommitter.java:269){noformat} For an easy reproduction, run the reproduce.sh in the attachment. We are happy to provide a patch if this issue is confirmed. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org