Re: Spark / YARN classpath issues

Andrew Or Thu, 22 May 2014 14:04:31 -0700

I think you should be able to drop "yarn-standalone" altogether. We
recently updated SparkPi to take in 1 argument (num slices, which you set
to 10). Previously, it took in 2 arguments, the master and num slices.


Glad you got it figured out.


2014-05-22 13:41 GMT-07:00 Jon Bender <jonathan.ben...@gmail.com>:

> Andrew,
>
> Brilliant!  I built on Java 7 but was still running our cluster on Java 6.
>  Upgraded the cluster and it worked (with slight tweaks to the args, I
> guess the app args come first then yarn-standalone comes last):
>
> SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar
> \
>       ./bin/spark-class org.apache.spark.deploy.yarn.Client \
>       --jar
> examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar
> \
>       --class org.apache.spark.examples.SparkPi \
>       --args 10 \
>       --args yarn-standalone \
>       --num-workers 3 \
>       --master-memory 4g \
>       --worker-memory 2g \
>       --worker-cores 1
>
> I'll make sure to use spark-submit from here on out.
>
> Thanks very much!
> Jon
>
>
> On Thu, May 22, 2014 at 12:40 PM, Andrew Or <and...@databricks.com> wrote:
>
>> Hi Jon,
>>
>> Your configuration looks largely correct. I have very recently confirmed
>> that the way you launch SparkPi also works for me.
>>
>> I have run into the same problem a bunch of times. My best guess is that
>> this is a Java version issue. If the Spark assembly jar is built with Java
>> 7, it cannot be opened by Java 6 because the two versions use different
>> packaging schemes. This is a known issue:
>> https://issues.apache.org/jira/browse/SPARK-1520.
>>
>> The workaround is to either make sure that all your executor nodes are
>> running Java 7, and, very importantly, have JAVA_HOME point to this
>> version. You can achieve this through
>>
>> export SPARK_YARN_USER_ENV="JAVA_HOME=/path/to/java7/home"
>>
>> in spark-env.sh. Another safe alternative, of course, is to just build
>> the jar with Java 6. An additional debugging step is to review the launch
>> environment of all the containers. This is detailed in the last paragraph
>> of this section:
>> http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/running-on-yarn.html#debugging-your-application.
>> This may not be necessary, but I have personally found it immensely useful.
>>
>> One last thing, launching Spark applications through
>> org.apache.spark.deploy.yarn.Client is deprecated in Spark 1.0. You should
>> use bin/spark-submit instead. You can find information about its usage on
>> the docs I linked to you, or simply through the --help option.
>>
>> Cheers,
>> Andrew
>>
>>
>> 2014-05-22 11:38 GMT-07:00 Jon Bender <jonathan.ben...@gmail.com>:
>>
>> Hey all,
>>>
>>> I'm working through the basic SparkPi example on a YARN cluster, and i'm
>>> wondering why my containers don't pick up the spark assembly classes.
>>>
>>> I built the latest spark code against CDH5.0.0
>>>
>>> Then ran the following:
>>> SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar
>>> \
>>>       ./bin/spark-class org.apache.spark.deploy.yarn.Client \
>>>       --jar
>>> examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar
>>> \
>>>       --class org.apache.spark.examples.SparkPi \
>>>       --args yarn-standalone \
>>>       --num-workers 3 \
>>>       --master-memory 4g \
>>>       --worker-memory 2g \
>>>       --worker-cores 1
>>>
>>> The job dies, and in the stderr from the containers I see
>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> org/apache/spark/deploy/yarn/ApplicationMaster
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.apache.spark.deploy.yarn.ApplicationMaster
>>>  at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>  at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
>>>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
>>>
>>> my yarn-site.xml contains the following classpath:
>>>   <property>
>>>     <name>yarn.application.classpath</name>
>>>     <value>
>>>     /etc/hadoop/conf/,
>>>     /usr/lib/hadoop/*,/usr/lib/hadoop//lib/*,
>>>     /usr/lib/hadoop-hdfs/*,/user/lib/hadoop-hdfs/lib/*,
>>>     /usr/lib/hadoop-mapreduce/*,/usr/lib/hadoop-mapreduce/lib/*,
>>>     /usr/lib/hadoop-yarn/*,/usr/lib/hadoop-yarn/lib/*,
>>>     /usr/lib/avro/*
>>>     </value>
>>>   </property>
>>>
>>> I've confirmed that the spark-assembly JAR has this class.  Does it
>>> actually need to be defined in yarn.application.classpath or should the
>>> spark client take care of ensuring the necessary JARs are added during job
>>> submission?
>>>
>>> Any tips would be greatly appreciated!
>>> Cheers,
>>> Jon
>>>
>>
>>
>

Re: Spark / YARN classpath issues

Reply via email to