I'm working on setting up Spark on YARN using the HDP technical preview -
http://hortonworks.com/kb/spark-1-0-1-technical-preview-hdp-2-1-3/
I have installed the Spark JARs on all the slave nodes and configured YARN to
find the JARs. It seems like everything is working.
Unless I'm
I’ve put my Spark JAR into HDFS, and specify the SPARK_JAR variable to point to
the HDFS location of the jar. I’m not using any specialized configuration
files (like spark-env.sh), but rather setting things either by environment
variable per node, passing application arguments to the job, or
Hi Greg,
You should not need to even manually install Spark on each of the worker
nodes or put it into HDFS yourself. Spark on Yarn will ship all necessary
jars (i.e. the assembly + additional jars) to each of the containers for
you. You can specify additional jars that your application depends
@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: Spark on YARN question
Hi Greg,
You should not need to even manually install Spark on each of the worker nodes
or put it into HDFS yourself. Spark on Yarn will ship all necessary jars (i.e.
the assembly + additional jars) to each of the containers
Hello friends:
I have a follow-up to Andrew's well articulated answer below (thank you
for that).
(1) I've seen both of these invocations in various places:
(a) '--master yarn'
(b) '--master yarn-client'
the latter of which doesn't appear in
Hi Didata,
(1) Correct. The default deploy mode is `client`, so both masters `yarn`
and `yarn-client` run Spark in client mode. If you explicitly specify
master as `yarn-cluster`, Spark will run in cluster mode. If you implicitly
specify one deploy mode through the master (e.g. yarn-client) but