Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/20704
  
    @megaserg : if you are writing to GCS, Azure, algorithm 2 is fine. If S3 is 
the target, then it's only safe to use with a consistent store (Hadoop 3.0 
+S3Guard, Amazon Consistent EMR); you still take a major perf hit from that 
copy. The S3A committers in Hadoop 3.1 deliver that high performance commit 
semantics, and Netflix committers don't (directly) need a consistent store 
—though to chain together work you will.
    
    BTW, how to verify that the v2 algorithm version is being opted for? : set 
the version = 3 and expect a stack trace from the version switch code. It's 
what I do to make sure that the FileOutputCommitter isn't actually being picked 
up.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to