steveloughran edited a comment on pull request #32518:
URL: https://github.com/apache/spark/pull/32518#issuecomment-839943750


   Be aware that https://issues.apache.org/jira/browse/HADOOP-17483 turns the 
magic committer on everywhere, so this patch will make the magic committer the 
default on s3. I am perfectly happy with this.
   
   Note also that [MAPREDUCE-7431](https://github.com/apache/hadoop/pull/2971) 
is adding a committer for ABFS and GCS for max performance on abfs and 
performance and correctness on gcs. (it'll work on HDFS too, FWIW)
   
   Those changes needed in the spark config will be needed there too.
   
   Now, one of the reasons that binding factory stuff is in the spark codebase 
is that it was still using some of the old MRv1 algorithms to create and invoke 
committers, rather than the V2 APIs, _which automatically go through the 
factory mechanism_. So the real solution here would to be find those bits of 
the spark code which uses `org.apache.hadoop.mapred.FileOutputCommitter` and 
other stuff in the same package and see if it can be replaced with a move to 
the stuff in org.apache.hadoop.mapreduce.lib.output. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to