Hi Randy and Gino,

The issue is that standalone-cluster mode is not officially supported.
Please use standalone-client mode instead, i.e. specify --deploy-mode
client in spark-submit, or simply leave out this config because it defaults
to client mode.

Unfortunately, this is not currently documented anywhere, and the existing
explanation for the distinction between cluster and client modes is highly
misleading. In general, cluster mode means the driver runs on one of the
worker nodes, just like the executors. The corollary is that the output of
the application is not forwarded to command that launched the application
(spark-submit in this case), but is accessible instead through the worker
logs. In contrast, client mode means the command that launches the
application also launches the driver, while the executors still run on the
worker nodes. This means the spark-submit command also returns the output
of the application. For instance, it doesn't make sense to run the Spark
shell in cluster mode, because the stdin / stdout / stderr will not be
redirected to the spark-submit command.

If you are hosting your own cluster and can launch applications from within
the cluster, then there is little benefit for launching your application in
cluster mode, which is primarily intended to cut down the latency between
the driver and the executors in the first place. However, if you are still
intent on using standalone-cluster mode after all, you can use the
deprecated way of launching org.apache.spark.deploy.Client directly through
bin/spark-class. Note that this is not recommended and only serves as a
temporary workaround until we fix standalone-cluster mode through
spark-submit.

I have filed the relevant issues:
https://issues.apache.org/jira/browse/SPARK-2259 and
https://issues.apache.org/jira/browse/SPARK-2260. Thanks for pointing this
out, and we will get to fixing these shortly.

Best,
Andrew


2014-06-20 6:06 GMT-07:00 Gino Bustelo <lbust...@gmail.com>:

> I've found that the jar will be copied to the worker from hdfs fine, but
> it is not added to the spark context for you. You have to know that the jar
> will end up in the driver's working dir, and so you just add a the file
> name if the jar to the context in your program.
>
> In your example below, just add to the context "test.jar".
>
> Btw, the context will not have the master URL either, so add that while
> you are at it.
>
> This is a big issue. I've posted about it a week ago and no replies.
> Hopefully it gets more attention as more people start hitting this.
> Basically, spark-submit on standalone cluster with cluster deploy is broken.
>
> Gino B.
>
> > On Jun 20, 2014, at 2:46 AM, randylu <randyl...@gmail.com> wrote:
> >
> > in addition, jar file can be copied to driver node automatically.
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/problem-about-cluster-mode-of-spark-1-0-0-tp7982p7984.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to