steveloughran edited a comment on pull request #32518: URL: https://github.com/apache/spark/pull/32518#issuecomment-839943750
Be aware that https://issues.apache.org/jira/browse/HADOOP-17483 turns the magic committer on everywhere, so this patch will make the magic committer the default on s3. I am perfectly happy with this. Note also that [MAPREDUCE-7431](https://github.com/apache/hadoop/pull/2971) is adding a committer for ABFS and GCS for max performance on abfs and performance and correctness on gcs. (it'll work on HDFS too, FWIW) Those changes needed in the spark config will be needed there too. Now, one of the reasons that binding factory stuff is in the spark codebase is that it was still using some of the old MRv1 algorithms to create and invoke committers, rather than the V2 APIs, _which automatically go through the factory mechanism_. So the real solution here would to be find those bits of the spark code which uses `org.apache.hadoop.mapred.FileOutputCommitter` and other stuff in the same package and see if it can be replaced with a move to the stuff in org.apache.hadoop.mapreduce.lib.output. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org