Re: Spark on YARN question

2014-09-02 Thread Matt Narrell
I’ve put my Spark JAR into HDFS, and specify the SPARK_JAR variable to point to the HDFS location of the jar. I’m not using any specialized configuration files (like spark-env.sh), but rather setting things either by environment variable per node, passing application arguments to the job, or

Re: Spark on YARN question

2014-09-02 Thread Andrew Or
Hi Greg, You should not need to even manually install Spark on each of the worker nodes or put it into HDFS yourself. Spark on Yarn will ship all necessary jars (i.e. the assembly + additional jars) to each of the containers for you. You can specify additional jars that your application depends

Re: Spark on YARN question

2014-09-02 Thread Greg Hill
@spark.apache.orgmailto:user@spark.apache.org Subject: Re: Spark on YARN question Hi Greg, You should not need to even manually install Spark on each of the worker nodes or put it into HDFS yourself. Spark on Yarn will ship all necessary jars (i.e. the assembly + additional jars) to each of the containers

Re: Spark on YARN question

2014-09-02 Thread Dimension Data, LLC.
Hello friends: I have a follow-up to Andrew's well articulated answer below (thank you for that). (1) I've seen both of these invocations in various places: (a) '--master yarn' (b) '--master yarn-client' the latter of which doesn't appear in

Re: Spark on YARN question

2014-09-02 Thread Andrew Or
Hi Didata, (1) Correct. The default deploy mode is `client`, so both masters `yarn` and `yarn-client` run Spark in client mode. If you explicitly specify master as `yarn-cluster`, Spark will run in cluster mode. If you implicitly specify one deploy mode through the master (e.g. yarn-client) but