[GitHub] [spark] steveloughran edited a comment on pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs

GitBox Wed, 12 May 2021 10:11:11 -0700


steveloughran edited a comment on pull request #32518:
URL: https://github.com/apache/spark/pull/32518#issuecomment-839943750



   Be aware that https://issues.apache.org/jira/browse/HADOOP-17483 turns the 
magic committer on everywhere, so this patch will make the magic committer the 
default on s3. I am perfectly happy with this.
   
   Note also that [MAPREDUCE-7431](https://github.com/apache/hadoop/pull/2971) 
is adding a committer for ABFS and GCS for max performance on abfs and 
performance and correctness on gcs. (it'll work on HDFS too, FWIW)
   
   Those changes needed in the spark config will be needed there too.
   
   Now, one of the reasons that binding factory stuff is in the spark codebase 
is that it was still using some of the old MRv1 algorithms to create and invoke 
committers, rather than the V2 APIs, _which automatically go through the 
factory mechanism_. So the real solution here would to be find those bits of 
the spark code which uses `org.apache.hadoop.mapred.FileOutputCommitter` and 
other stuff in the same package and see if it can be replaced with a move to 
the stuff in org.apache.hadoop.mapreduce.lib.output. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] steveloughran edited a comment on pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs

Reply via email to