Hi, On Fri, Dec 12, 2014 at 7:01 AM, ryaminal <tacmot...@gmail.com> wrote: > > Now our solution is to make a very simply YARN application which execustes > as its command "spark-submit --master yarn-cluster > s3n://application/jar.jar > ...". This seemed so simple and elegant, but it has some weird issues. We > get "NoClassDefFoundErrors". When we ssh to the box, run the same > spark-submit command it works, but doing this through YARN leads in the > NoClassDefFoundErrors mentioned. >
I do something similar, I start Spark using spark-submit from a non-Spark server application. Make sure that HADOOP_CONF_DIR is set correctly when running spark-submit from your program so that the YARN configuration can be found correctly. Also, keep in mind that some parameters to spark-submit have a different behavior when using yarn-cluster vs. local[*] master. For example, system properties set using `--conf` will be available in your Spark application only in local[*] mode, for YARN you need to wrap them with `--conf "spark.executor.extraJavaOptions=..."`. Tobias