[ https://issues.apache.org/jira/browse/SPARK-33212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289618#comment-17289618 ]
Xiaochen Ouyang edited comment on SPARK-33212 at 2/24/21, 3:28 AM: ------------------------------------------------------------------- Hi [~csun], we submit a spark application with command `spark-submit --master yarn --class org.apache.spark.examples.SparkPi /opt/spark/examples/jars/spark*.jar`. 1、Get AMIpFilter ClassNotFoundException , because there is no 'hadoop-client-minicluster.jar' in classpath. So we remove the line {color:#de350b}_'<scope>test</scope>'_ in parent pom.xml and resource-manager/yarn/pom.xml.{color} 2、rebuild spark project 、depoly binary jars and submit application 3、Get a new Exception as follows: +2021-02-24 08:36:54,391 ERROR org.apache.spark.SparkContext: Error initializing SparkContext. java.lang.IllegalStateException: class org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter is not a javax.servlet.Filter+ The key reason is that spark dirver classloader load class `AmIpFilter` implements javax.servlet.Filter, but in shaded jar the class `Filter` imported like 'import +{color:#de350b}org.apache.hadoop.shaded.javax.servlet.Filter{color}+'. So, AmIpFilter can't be reflected in spark dirver proceess. was (Author: ouyangxc.zte): Hi [~csun], we submit a spark application with command `spark-submit --master yarn --class org.apache.spark.examples.SparkPi /opt/spark/examples/jars/spark*.jar`. 1、Get AMIpFilter ClassNotFoundException , because there is no 'hadoop-client-minicluster.jar' in classpath. So we remove the line {color:#de350b}_'<scope>test</scope>'_ {color:#172b4d}in parent pom.xml and resource-manager/yarn/pom.xml.{color}{color} {color:#de350b}{color:#172b4d}2、rebuild spark project 、depoly binary jars and submit application{color}{color} {color:#de350b}{color:#172b4d}3、Get a new Exception as follows:{color}{color} {color:#de350b}+2021-02-24 08:36:54,391 ERROR org.apache.spark.SparkContext: Error initializing SparkContext. java.lang.IllegalStateException: class org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter is not a javax.servlet.Filter+{color} The key reason is that spark dirver classloader load class `AmIpFilter` implements javax.servlet.Filter, but in shaded jar the class `Filter` imported like 'import org.apache.hadoop.shaded.javax.servlet.Filter'. So, AmIpFilter can't be reflected in spark dirver proceess. > Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile > ------------------------------------------------------------------------- > > Key: SPARK-33212 > URL: https://issues.apache.org/jira/browse/SPARK-33212 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Spark Submit, SQL, YARN > Affects Versions: 3.0.1 > Reporter: Chao Sun > Assignee: Chao Sun > Priority: Major > Labels: releasenotes > Fix For: 3.2.0 > > > Hadoop 3.x+ offers shaded client jars: hadoop-client-api and > hadoop-client-runtime, which shade 3rd party dependencies such as Guava, > protobuf, jetty etc. This Jira switches Spark to use these jars instead of > hadoop-common, hadoop-client etc. Benefits include: > * It unblocks Spark from upgrading to Hadoop 3.2.2/3.3.0+. The newer > versions of Hadoop have migrated to Guava 27.0+ and in order to resolve Guava > conflicts, Spark depends on Hadoop to not leaking dependencies. > * It makes Spark/Hadoop dependency cleaner. Currently Spark uses both > client-side and server-side Hadoop APIs from modules such as hadoop-common, > hadoop-yarn-server-common etc. Moving to hadoop-client-api allows use to only > use public/client API from Hadoop side. > * Provides a better isolation from Hadoop dependencies. In future Spark can > better evolve without worrying about dependencies pulled from Hadoop side > (which used to be a lot). > *There are some behavior changes introduced with this JIRA, when people use > Spark compiled with Hadoop 3.x:* > - Users now need to make sure class path contains `hadoop-client-api` and > `hadoop-client-runtime` jars when they deploy Spark with the > `hadoop-provided` option. In addition, it is high recommended that they put > these two jars before other Hadoop jars in the class path. Otherwise, > conflicts such as from Guava could happen if classes are loaded from the > other non-shaded Hadoop jars. > - Since the new shaded Hadoop clients no longer include 3rd party > dependencies. Users who used to depend on these now need to explicitly put > the jars in their class path. > Ideally the above should go to release notes. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org