On 12/28/15, 5:16 PM, "Daniel Valdivia" <h...@danielvaldivia.com> wrote:

>Hi,
>
>I'm trying to submit a job to a small spark cluster running in stand
>alone mode, however it seems like the jar file I'm submitting to the
>cluster is "not found" by the workers nodes.
>
>I might have understood wrong, but I though the Driver node would send
>this jar file to the worker nodes, or should I manually send this file to
>each worker node before I submit the job?

Yes, you have misunderstood, but so did I.  So the problem is that
--deploy-mode cluster runs the Driver on the cluster as well, and you
don't know which node it's going to run on, so every node needs access to
the JAR.  spark-submit does not pass the JAR along to the Driver, but the
Driver will pass it to the executors.  I ended up putting the JAR in HDFS
and passing an hdfs:// path to spark-submit.  This is a subtle difference
from Spark on YARN which does pass the JAR along to the Driver
automatically, and IMO should probably be fixed in spark-submit.  It's
really confusing for newcomers.

Another problem I ran into that you also might is that --packages doesn't
work with --deploy-mode cluster.  It downloads the packages to a temporary
location on the node running spark-submit, then passes those paths to the
node that is running the Driver, but since that isn't the same machine, it
can't find anything and fails.  The driver process *should* be the one
doing the downloading, but it isn't. I ended up having to create a fat JAR
with all of the dependencies to get around that one.

Greg


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to