[ https://issues.apache.org/jira/browse/SPARK-33212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266422#comment-17266422 ]
Apache Spark commented on SPARK-33212: -------------------------------------- User 'sunchao' has created a pull request for this issue: https://github.com/apache/spark/pull/31203 > Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile > ------------------------------------------------------------------------- > > Key: SPARK-33212 > URL: https://issues.apache.org/jira/browse/SPARK-33212 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Spark Submit, SQL, YARN > Affects Versions: 3.0.1 > Reporter: Chao Sun > Assignee: Chao Sun > Priority: Major > Labels: releasenotes > Fix For: 3.2.0 > > > Hadoop 3.x+ offers shaded client jars: hadoop-client-api and > hadoop-client-runtime, which shade 3rd party dependencies such as Guava, > protobuf, jetty etc. This Jira switches Spark to use these jars instead of > hadoop-common, hadoop-client etc. Benefits include: > * It unblocks Spark from upgrading to Hadoop 3.2.2/3.3.0+. The newer > versions of Hadoop have migrated to Guava 27.0+ and in order to resolve Guava > conflicts, Spark depends on Hadoop to not leaking dependencies. > * It makes Spark/Hadoop dependency cleaner. Currently Spark uses both > client-side and server-side Hadoop APIs from modules such as hadoop-common, > hadoop-yarn-server-common etc. Moving to hadoop-client-api allows use to only > use public/client API from Hadoop side. > * Provides a better isolation from Hadoop dependencies. In future Spark can > better evolve without worrying about dependencies pulled from Hadoop side > (which used to be a lot). > *There are some behavior changes introduced with this JIRA, when people use > Spark compiled with Hadoop 3.x:* > - Users now need to make sure class path contains `hadoop-client-api` and > `hadoop-client-runtime` jars when they deploy Spark with the > `hadoop-provided` option. In addition, it is high recommended that they put > these two jars before other Hadoop jars in the class path. Otherwise, > conflicts such as from Guava could happen if classes are loaded from the > other non-shaded Hadoop jars. > - Since the new shaded Hadoop clients no longer include 3rd party > dependencies. Users who used to depend on these now need to explicitly put > the jars in their class path. > Ideally the above should go to release notes. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org