Github user megaserg commented on the issue: https://github.com/apache/spark/pull/20704 Thank you @dongjoon-hyun! This was also affecting our Spark job performance! We're using `mapreduce.fileoutputcommitter.algorithm.version=2` in our Spark job config, as recommended e.g. here: http://spark.apache.org/docs/latest/cloud-integration.html. We're using user-provided Hadoop 2.9.0. However, since this 2.6.5 JAR was in spark/jars, it was given priority in the classpath over Hadoop-distributed 2.9.0 JAR. The 2.6.5 was silently ignoring the `mapreduce.fileoutputcommitter.algorithm.version` setting and used the default, slow algorithm (I believe hadoop-mapreduce-client-core only had one, slow, algorithm until 2.7.0). I believe this affects everyone who uses any mapreduce settings with Spark 2.3.0. Great job! Can we double-check that this JAR is not present in the "without-hadoop" Spark distribution anymore?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org