[jira] [Comment Edited] (SPARK-33212) Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile

Xiaochen Ouyang (Jira) Tue, 23 Feb 2021 19:29:08 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-33212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289618#comment-17289618
 ]


Xiaochen Ouyang edited comment on SPARK-33212 at 2/24/21, 3:28 AM:
-------------------------------------------------------------------

Hi [~csun], we submit a spark application with command `spark-submit  --master 
yarn --class org.apache.spark.examples.SparkPi  
/opt/spark/examples/jars/spark*.jar`.

1、Get AMIpFilter ClassNotFoundException , because there is no 
'hadoop-client-minicluster.jar' in classpath. So we remove the line 
{color:#de350b}_'<scope>test</scope>'_ in parent pom.xml and 
resource-manager/yarn/pom.xml.{color}
 
 2、rebuild spark project 、depoly binary jars and submit application
 
 3、Get a new Exception as follows:
 
 +2021-02-24 08:36:54,391 ERROR org.apache.spark.SparkContext: Error 
initializing SparkContext.
 java.lang.IllegalStateException: class 
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter is not a 
javax.servlet.Filter+

 

The key reason is that spark dirver classloader load class `AmIpFilter` 
implements javax.servlet.Filter, but in shaded jar the class `Filter` imported 
like 'import 
+{color:#de350b}org.apache.hadoop.shaded.javax.servlet.Filter{color}+'. So, 
AmIpFilter can't be reflected in spark dirver proceess.

 

 


was (Author: ouyangxc.zte):
Hi [~csun], we submit a spark application with command `spark-submit  --master 
yarn --class org.apache.spark.examples.SparkPi  
/opt/spark/examples/jars/spark*.jar`.

1、Get AMIpFilter ClassNotFoundException , because there is no 
'hadoop-client-minicluster.jar' in classpath. So we remove the line 
{color:#de350b}_'<scope>test</scope>'_ {color:#172b4d}in parent pom.xml and 
resource-manager/yarn/pom.xml.{color}{color}

{color:#de350b}{color:#172b4d}2、rebuild spark project 、depoly binary jars and 
submit application{color}{color}

{color:#de350b}{color:#172b4d}3、Get a new Exception as follows:{color}{color}

{color:#de350b}+2021-02-24 08:36:54,391 ERROR org.apache.spark.SparkContext: 
Error initializing SparkContext.
java.lang.IllegalStateException: class 
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter is not a 
javax.servlet.Filter+{color}

 

The key reason is that spark dirver classloader load class `AmIpFilter` 
implements javax.servlet.Filter, but in shaded jar the class `Filter` imported 
like 'import org.apache.hadoop.shaded.javax.servlet.Filter'. So, AmIpFilter 
can't be reflected in spark dirver proceess.

 

 

> Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile
> -------------------------------------------------------------------------
>
>                 Key: SPARK-33212
>                 URL: https://issues.apache.org/jira/browse/SPARK-33212
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, Spark Submit, SQL, YARN
>    Affects Versions: 3.0.1
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>            Priority: Major
>              Labels: releasenotes
>             Fix For: 3.2.0
>
>
> Hadoop 3.x+ offers shaded client jars: hadoop-client-api and 
> hadoop-client-runtime, which shade 3rd party dependencies such as Guava, 
> protobuf, jetty etc. This Jira switches Spark to use these jars instead of 
> hadoop-common, hadoop-client etc. Benefits include:
>  * It unblocks Spark from upgrading to Hadoop 3.2.2/3.3.0+. The newer 
> versions of Hadoop have migrated to Guava 27.0+ and in order to resolve Guava 
> conflicts, Spark depends on Hadoop to not leaking dependencies.
>  * It makes Spark/Hadoop dependency cleaner. Currently Spark uses both 
> client-side and server-side Hadoop APIs from modules such as hadoop-common, 
> hadoop-yarn-server-common etc. Moving to hadoop-client-api allows use to only 
> use public/client API from Hadoop side.
>  * Provides a better isolation from Hadoop dependencies. In future Spark can 
> better evolve without worrying about dependencies pulled from Hadoop side 
> (which used to be a lot).
> *There are some behavior changes introduced with this JIRA, when people use 
> Spark compiled with Hadoop 3.x:*
> - Users now need to make sure class path contains `hadoop-client-api` and 
> `hadoop-client-runtime` jars when they deploy Spark with the 
> `hadoop-provided` option. In addition, it is high recommended that they put 
> these two jars before other Hadoop jars in the class path. Otherwise, 
> conflicts such as from Guava could happen if classes are loaded from the 
> other non-shaded Hadoop jars.
> - Since the new shaded Hadoop clients no longer include 3rd party 
> dependencies. Users who used to depend on these now need to explicitly put 
> the jars in their class path.
> Ideally the above should go to release notes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33212) Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile

Reply via email to