Hi Jan,

Most SparkContext constructors are there for legacy reasons. The point of
going through spark-submit is to set up all the classpaths, system
properties, and resolve URIs properly *with respect to the deployment mode*.
For instance, jars are distributed differently between YARN cluster mode
and standalone client mode, and this is not something the Spark user should
have to worry about.

As an example, if you pass jars through the SparkContext constructor, it
won't actually work in cluster mode if the jars are local. This is because
the driver is launched on the cluster and the SparkContext will try to find
the jars on the cluster in vain.

So the more concise answer to your question is: yes technically you don't
need to go through spark-submit, but you'll have to deal with all the
bootstrapping complexity yourself.

-Andrew

2015-07-10 3:37 GMT-07:00 algermissen1971 <algermissen1...@icloud.com>:

> Hi,
>
> I am a bit confused about the steps I need to take to start a Spark
> application on a cluster.
>
> So far I had this impression from the documentation that I need to
> explicitly submit the application using for example spark-submit.
>
> However, from the SparkContext constructur signature I get the impression
> that maybe I do not have to do that after all:
>
> In
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.SparkContext
> the first constructor has (among other things) a parameter 'jars' which
> indicates the "Collection of JARs to send to the cluster".
>
> To me this suggests that I can simply start the application anywhere and
> that it will deploy itself to the cluster in the same way a call to
> spark-submit would.
>
> Is that correct?
>
> If not, can someone explain why I can / need to provide master and jars
> etc. in the call to SparkContext because they essentially only duplicate
> what I would specify in the call to spark-submit.
>
> Jan
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to