[ https://issues.apache.org/jira/browse/SPARK-23551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438208#comment-16438208 ]
Sergey Serebryakov commented on SPARK-23551: -------------------------------------------- Thank you [~dongjoon]! This was also affecting our Spark job performance! We're using \{{mapreduce.fileoutputcommitter.algorithm.version=2}} in our Spark job config, as recommended e.g. here: [http://spark.apache.org/docs/latest/cloud-integration.html]. We're using user-provided Hadoop 2.9.0. However, since this 2.6.5 JAR was in spark/jars, it was given priority in the classpath over Hadoop-distributed 2.9.0 JAR. The 2.6.5 was silently ignoring the \{{mapreduce.fileoutputcommitter.algorithm.version}} setting and used the default, slow algorithm (I believe hadoop-mapreduce-client-core only had one, slow, algorithm until 2.7.0). I believe this affects everyone who uses any mapreduce settings with Spark 2.3.0. Great job! > Exclude `hadoop-mapreduce-client-core` dependency from `orc-mapreduce` > ---------------------------------------------------------------------- > > Key: SPARK-23551 > URL: https://issues.apache.org/jira/browse/SPARK-23551 > Project: Spark > Issue Type: Bug > Components: Build > Affects Versions: 2.3.0 > Reporter: Dongjoon Hyun > Assignee: Dongjoon Hyun > Priority: Minor > Fix For: 2.3.1, 2.4.0 > > > This issue aims to prevent `orc-mapreduce` dependency from making IDEs and > maven confused. > *BEFORE* > Please note that 2.6.4 at Spark Project SQL. > {code} > $ mvn dependency:tree -Phadoop-2.7 > -Dincludes=org.apache.hadoop:hadoop-mapreduce-client-core > ... > [INFO] > ------------------------------------------------------------------------ > [INFO] Building Spark Project Catalyst 2.4.0-SNAPSHOT > [INFO] > ------------------------------------------------------------------------ > [INFO] > [INFO] --- maven-dependency-plugin:3.0.2:tree (default-cli) @ > spark-catalyst_2.11 --- > [INFO] org.apache.spark:spark-catalyst_2.11:jar:2.4.0-SNAPSHOT > [INFO] \- org.apache.spark:spark-core_2.11:jar:2.4.0-SNAPSHOT:compile > [INFO] \- org.apache.hadoop:hadoop-client:jar:2.7.3:compile > [INFO] \- > org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.7.3:compile > [INFO] > [INFO] > ------------------------------------------------------------------------ > [INFO] Building Spark Project SQL 2.4.0-SNAPSHOT > [INFO] > ------------------------------------------------------------------------ > [INFO] > [INFO] --- maven-dependency-plugin:3.0.2:tree (default-cli) @ spark-sql_2.11 > --- > [INFO] org.apache.spark:spark-sql_2.11:jar:2.4.0-SNAPSHOT > [INFO] \- org.apache.orc:orc-mapreduce:jar:nohive:1.4.3:compile > [INFO] \- org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.6.4:compile > {code} > *AFTER* > {code} > $ mvn dependency:tree -Phadoop-2.7 > -Dincludes=org.apache.hadoop:hadoop-mapreduce-client-core > ... > [INFO] > ------------------------------------------------------------------------ > [INFO] Building Spark Project Catalyst 2.4.0-SNAPSHOT > [INFO] > ------------------------------------------------------------------------ > [INFO] > [INFO] --- maven-dependency-plugin:3.0.2:tree (default-cli) @ > spark-catalyst_2.11 --- > [INFO] org.apache.spark:spark-catalyst_2.11:jar:2.4.0-SNAPSHOT > [INFO] \- org.apache.spark:spark-core_2.11:jar:2.4.0-SNAPSHOT:compile > [INFO] \- org.apache.hadoop:hadoop-client:jar:2.7.3:compile > [INFO] \- > org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.7.3:compile > [INFO] > [INFO] > ------------------------------------------------------------------------ > [INFO] Building Spark Project SQL 2.4.0-SNAPSHOT > [INFO] > ------------------------------------------------------------------------ > [INFO] > [INFO] --- maven-dependency-plugin:3.0.2:tree (default-cli) @ spark-sql_2.11 > --- > [INFO] org.apache.spark:spark-sql_2.11:jar:2.4.0-SNAPSHOT > [INFO] \- org.apache.spark:spark-core_2.11:jar:2.4.0-SNAPSHOT:compile > [INFO] \- org.apache.hadoop:hadoop-client:jar:2.7.3:compile > [INFO] \- > org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.7.3:compile > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org