steveloughran commented on code in PR #6038: URL: https://github.com/apache/hadoop/pull/6038#discussion_r1324843146
########## hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java: ########## @@ -158,6 +158,11 @@ public FileOutputCommitter(Path outputPath, "output directory:" + skipCleanup + ", ignore cleanup failures: " + ignoreCleanupFailures); + if (algorithmVersion == 1 && skipCleanup) { + LOG.warn("Skip cleaning up when using FileOutputCommitter V1 can lead to unexpected behaviors. " + + "For example, committing several times may be allowed falsely."); Review Comment: "Skip cleaning up when using FileOutputCommitter V1 may corrupt the output". there's another option here: we just ignore the setting on v1 jobs? it's only there because directory deletion is so O(files) on GCS, and it targets v2 because that same file-by-file operation means that directory rename is never atomic; you may as well use the already unsafe v2 algorithm. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org