steveloughran commented on code in PR #6038:
URL: https://github.com/apache/hadoop/pull/6038#discussion_r1324843146


##########
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java:
##########
@@ -158,6 +158,11 @@ public FileOutputCommitter(Path outputPath,
         "output directory:" + skipCleanup + ", ignore cleanup failures: " +
         ignoreCleanupFailures);
 
+    if (algorithmVersion == 1 && skipCleanup) {
+        LOG.warn("Skip cleaning up when using FileOutputCommitter V1 can lead 
to unexpected behaviors. " +
+                "For example, committing several times may be allowed 
falsely.");

Review Comment:
   "Skip cleaning up when using FileOutputCommitter V1 may corrupt the output".
   
   there's another option here: we just ignore the setting on v1 jobs?
   it's only there because directory deletion is so O(files) on GCS, and it 
targets v2 because that same file-by-file operation means that directory rename 
is never atomic; you may as well use the already unsafe v2 algorithm. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to