subject:"RE\: How does spark\-submit handle Python scripts \(and how to repeat it\)\?"

Re: How does spark-submit handle Python scripts (and how to repeat it)?

2016-04-14 Thread Andrei

Yes, I tried setting YARN_CONF_DIR, but with no luck. I will play around
with environment variables and system properties and post back in case of
success. Thanks for your help so far!

On Thu, Apr 14, 2016 at 5:48 AM, Sun, Rui <rui@intel.com> wrote:

> In SparkSubmit, there is less work for yarn-client than that for
> yarn-cluster. Basically prepare some spark configurations into system prop
> , for example, information on additional resources required by the
> application that need to be distributed to the cluster. These
> configurations will be used in SparkContext initialization later.
>
>
>
> So generally for yarn-client, maybe you can skip spark-submit and directly
> launching the spark application with some configurations setup before new
> SparkContext.
>
>
>
> Not sure about your error, have you setup YARN_CONF_DIR?
>
>
>
> *From:* Andrei [mailto:faithlessfri...@gmail.com]
> *Sent:* Thursday, April 14, 2016 5:45 AM
>
> *To:* Sun, Rui <rui....@intel.com>
> *Cc:* user <user@spark.apache.org>
> *Subject:* Re: How does spark-submit handle Python scripts (and how to
> repeat it)?
>
>
>
> Julia can pick the env var, and set the system properties or directly fill
> the configurations into a SparkConf, and then create a SparkContext
>
>
>
> That's the point - just setting master to "yarn-client" doesn't work, even
> in Java/Scala. E.g. following code in *Scala*:
>
>
> val conf = new SparkConf().setAppName("My App").setMaster("yarn-client")
> val sc = new SparkContext(conf)
> sc.parallelize(1 to 10).collect()
> sc.stop()
>
>
>
> results in an error:
>
>
>
> Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032
>
>
>
> I think for now we can even put Julia aside and concentrate the following
> question: how does submitting application via `spark-submit` with
> "yarn-client" mode differ from setting the same mode directly in
> `SparkConf`?
>
>
>
>
>
>
>
> On Wed, Apr 13, 2016 at 5:06 AM, Sun, Rui <rui@intel.com> wrote:
>
> Spark configurations specified at the command line for spark-submit should
> be passed to the JVM inside Julia process. You can refer to
> https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L267
> and
> https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L295
>
> Generally,
>
> spark-submit JVM -> JuliaRunner -> Env var like
> “JULIA_SUBMIT_ARGS” -> julia process -> new JVM with SparkContext
>
>   Julia can pick the env var, and set the system properties or directly
> fill the configurations into a SparkConf, and then create a SparkContext
>
>
>
> Yes, you are right, `spark-submit` creates new Python/R process that
> connects back to that same JVM and creates SparkContext in it.
>
> Refer to
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L47
> and
>
>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/RRunner.scala#L65
>
>
>
>
>
> *From:* Andrei [mailto:faithlessfri...@gmail.com]
> *Sent:* Wednesday, April 13, 2016 4:32 AM
> *To:* Sun, Rui <rui@intel.com>
> *Cc:* user <user@spark.apache.org>
> *Subject:* Re: How does spark-submit handle Python scripts (and how to
> repeat it)?
>
>
>
> One part is passing the command line options, like “--master”, from the
> JVM launched by spark-submit to the JVM where SparkContext resides
>
>
>
> Since I have full control over both - JVM and Julia parts - I can pass
> whatever options to both. But what exactly should be passed? Currently
> pipeline looks like this:
>
>
>
> spark-submit JVM -> JuliaRunner -> julia process -> new JVM with
> SparkContext
>
>
>
>  I want to make the last JVM's SparkContext to understand that it should
> run on YARN. Obviously, I can't pass `--master yarn` option to JVM itself.
> Instead, I can pass system property "spark.master" = "yarn-client", but
> this results in an error:
>
>
>
> Retrying connect to server: 0.0.0.0/0.0.0.0:8032
>
>
>
>
>
> So it's definitely not enough. I tried to set manually all system
> properties that `spark-submit` adds to the JVM (including
> "spark-submit=true", "spark.submit.deployMode=client", etc.), but it didn't
> help too. Source code is always good, but for a stranger like me it's a
> little bit hard to grasp control flow in SparkSubmit class.
>
>
>
>
>
> For p

RE: How does spark-submit handle Python scripts (and how to repeat it)?

2016-04-13 Thread Sun, Rui

In SparkSubmit, there is less work for yarn-client than that for yarn-cluster. 
Basically prepare some spark configurations into system prop , for example, 
information on additional resources required by the application that need to be 
distributed to the cluster. These configurations will be used in SparkContext 
initialization later.

So generally for yarn-client, maybe you can skip spark-submit and directly 
launching the spark application with some configurations setup before new 
SparkContext.

Not sure about your error, have you setup YARN_CONF_DIR?

From: Andrei [mailto:faithlessfri...@gmail.com]
Sent: Thursday, April 14, 2016 5:45 AM
To: Sun, Rui <rui@intel.com>
Cc: user <user@spark.apache.org>
Subject: Re: How does spark-submit handle Python scripts (and how to repeat it)?

Julia can pick the env var, and set the system properties or directly fill the 
configurations into a SparkConf, and then create a SparkContext

That's the point - just setting master to "yarn-client" doesn't work, even in 
Java/Scala. E.g. following code in Scala:

val conf = new SparkConf().setAppName("My App").setMaster("yarn-client")
val sc = new SparkContext(conf)
sc.parallelize(1 to 10).collect()
sc.stop()

results in an error:

Client: Retrying connect to server: 
0.0.0.0/0.0.0.0:8032<http://0.0.0.0/0.0.0.0:8032>

I think for now we can even put Julia aside and concentrate the following 
question: how does submitting application via `spark-submit` with "yarn-client" 
mode differ from setting the same mode directly in `SparkConf`?



On Wed, Apr 13, 2016 at 5:06 AM, Sun, Rui 
<rui@intel.com<mailto:rui@intel.com>> wrote:
Spark configurations specified at the command line for spark-submit should be 
passed to the JVM inside Julia process. You can refer to 
https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L267
 and 
https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L295
Generally,
spark-submit JVM -> JuliaRunner -> Env var like 
“JULIA_SUBMIT_ARGS” -> julia process -> new JVM with SparkContext
  Julia can pick the env var, and set the system properties or directly fill 
the configurations into a SparkConf, and then create a SparkContext

Yes, you are right, `spark-submit` creates new Python/R process that connects 
back to that same JVM and creates SparkContext in it.
Refer to 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L47
 and
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/RRunner.scala#L65


From: Andrei 
[mailto:faithlessfri...@gmail.com<mailto:faithlessfri...@gmail.com>]
Sent: Wednesday, April 13, 2016 4:32 AM
To: Sun, Rui <rui@intel.com<mailto:rui....@intel.com>>
Cc: user <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: How does spark-submit handle Python scripts (and how to repeat it)?

One part is passing the command line options, like “--master”, from the JVM 
launched by spark-submit to the JVM where SparkContext resides

Since I have full control over both - JVM and Julia parts - I can pass whatever 
options to both. But what exactly should be passed? Currently pipeline looks 
like this:

spark-submit JVM -> JuliaRunner -> julia process -> new JVM with SparkContext

 I want to make the last JVM's SparkContext to understand that it should run on 
YARN. Obviously, I can't pass `--master yarn` option to JVM itself. Instead, I 
can pass system property "spark.master" = "yarn-client", but this results in an 
error:

Retrying connect to server: 0.0.0.0/0.0.0.0:8032<http://0.0.0.0/0.0.0.0:8032>


So it's definitely not enough. I tried to set manually all system properties 
that `spark-submit` adds to the JVM (including "spark-submit=true", 
"spark.submit.deployMode=client", etc.), but it didn't help too. Source code is 
always good, but for a stranger like me it's a little bit hard to grasp control 
flow in SparkSubmit class.


For pySpark & SparkR, when running scripts in client deployment modes 
(standalone client and yarn client), the JVM is the same (py4j/RBackend running 
as a thread in the JVM launched by spark-submit)

Can you elaborate on this? Does it mean that `spark-submit` creates new 
Python/R process that connects back to that same JVM and creates SparkContext 
in it?


On Tue, Apr 12, 2016 at 2:04 PM, Sun, Rui 
<rui@intel.com<mailto:rui@intel.com>> wrote:
There is much deployment preparation work handling different deployment modes 
for pyspark and SparkR in SparkSubmit. It is difficult to summarize it briefly, 
you had better refer to the source code.

Supporting running Julia scripts in SparkSubmit is more than i

Re: How does spark-submit handle Python scripts (and how to repeat it)?

2016-04-13 Thread Andrei

>
> Julia can pick the env var, and set the system properties or directly fill
> the configurations into a SparkConf, and then create a SparkContext


That's the point - just setting master to "yarn-client" doesn't work, even
in Java/Scala. E.g. following code in *Scala*:


val conf = new SparkConf().setAppName("My App").setMaster("yarn-client")
val sc = new SparkContext(conf)
sc.parallelize(1 to 10).collect()
sc.stop()


results in an error:

Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032


I think for now we can even put Julia aside and concentrate the following
question: how does submitting application via `spark-submit` with
"yarn-client" mode differ from setting the same mode directly in
`SparkConf`?



On Wed, Apr 13, 2016 at 5:06 AM, Sun, Rui <rui@intel.com> wrote:

> Spark configurations specified at the command line for spark-submit should
> be passed to the JVM inside Julia process. You can refer to
> https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L267
> and
> https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L295
>
> Generally,
>
> spark-submit JVM -> JuliaRunner -> Env var like
> “JULIA_SUBMIT_ARGS” -> julia process -> new JVM with SparkContext
>
>   Julia can pick the env var, and set the system properties or directly
> fill the configurations into a SparkConf, and then create a SparkContext
>
>
>
> Yes, you are right, `spark-submit` creates new Python/R process that
> connects back to that same JVM and creates SparkContext in it.
>
> Refer to
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L47
> and
>
>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/RRunner.scala#L65
>
>
>
>
>
> *From:* Andrei [mailto:faithlessfri...@gmail.com]
> *Sent:* Wednesday, April 13, 2016 4:32 AM
> *To:* Sun, Rui <rui@intel.com>
> *Cc:* user <user@spark.apache.org>
> *Subject:* Re: How does spark-submit handle Python scripts (and how to
> repeat it)?
>
>
>
> One part is passing the command line options, like “--master”, from the
> JVM launched by spark-submit to the JVM where SparkContext resides
>
>
>
> Since I have full control over both - JVM and Julia parts - I can pass
> whatever options to both. But what exactly should be passed? Currently
> pipeline looks like this:
>
>
>
> spark-submit JVM -> JuliaRunner -> julia process -> new JVM with
> SparkContext
>
>
>
>  I want to make the last JVM's SparkContext to understand that it should
> run on YARN. Obviously, I can't pass `--master yarn` option to JVM itself.
> Instead, I can pass system property "spark.master" = "yarn-client", but
> this results in an error:
>
>
>
> Retrying connect to server: 0.0.0.0/0.0.0.0:8032
>
>
>
>
>
> So it's definitely not enough. I tried to set manually all system
> properties that `spark-submit` adds to the JVM (including
> "spark-submit=true", "spark.submit.deployMode=client", etc.), but it didn't
> help too. Source code is always good, but for a stranger like me it's a
> little bit hard to grasp control flow in SparkSubmit class.
>
>
>
>
>
> For pySpark & SparkR, when running scripts in client deployment modes
> (standalone client and yarn client), the JVM is the same (py4j/RBackend
> running as a thread in the JVM launched by spark-submit)
>
>
>
> Can you elaborate on this? Does it mean that `spark-submit` creates new
> Python/R process that connects back to that same JVM and creates
> SparkContext in it?
>
>
>
>
>
> On Tue, Apr 12, 2016 at 2:04 PM, Sun, Rui <rui@intel.com> wrote:
>
> There is much deployment preparation work handling different deployment
> modes for pyspark and SparkR in SparkSubmit. It is difficult to summarize
> it briefly, you had better refer to the source code.
>
>
>
> Supporting running Julia scripts in SparkSubmit is more than implementing
> a ‘JuliaRunner’. One part is passing the command line options, like
> “--master”, from the JVM launched by spark-submit to the JVM where
> SparkContext resides, in the case that the two JVMs are not the same. For
> pySpark & SparkR, when running scripts in client deployment modes
> (standalone client and yarn client), the JVM is the same (py4j/RBackend
> running as a thread in the JVM launched by spark-submit) , so no need to
> pass the command line options around. However, in your case, Julia
> interpreter la

RE: How does spark-submit handle Python scripts (and how to repeat it)?

2016-04-12 Thread Sun, Rui

Spark configurations specified at the command line for spark-submit should be 
passed to the JVM inside Julia process. You can refer to 
https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L267
 and 
https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L295
Generally,
spark-submit JVM -> JuliaRunner -> Env var like 
“JULIA_SUBMIT_ARGS” -> julia process -> new JVM with SparkContext
  Julia can pick the env var, and set the system properties or directly fill 
the configurations into a SparkConf, and then create a SparkContext

Yes, you are right, `spark-submit` creates new Python/R process that connects 
back to that same JVM and creates SparkContext in it.
Refer to 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L47
 and
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/RRunner.scala#L65


From: Andrei [mailto:faithlessfri...@gmail.com]
Sent: Wednesday, April 13, 2016 4:32 AM
To: Sun, Rui <rui@intel.com>
Cc: user <user@spark.apache.org>
Subject: Re: How does spark-submit handle Python scripts (and how to repeat it)?

One part is passing the command line options, like “--master”, from the JVM 
launched by spark-submit to the JVM where SparkContext resides

Since I have full control over both - JVM and Julia parts - I can pass whatever 
options to both. But what exactly should be passed? Currently pipeline looks 
like this:

spark-submit JVM -> JuliaRunner -> julia process -> new JVM with SparkContext

 I want to make the last JVM's SparkContext to understand that it should run on 
YARN. Obviously, I can't pass `--master yarn` option to JVM itself. Instead, I 
can pass system property "spark.master" = "yarn-client", but this results in an 
error:

Retrying connect to server: 0.0.0.0/0.0.0.0:8032<http://0.0.0.0/0.0.0.0:8032>


So it's definitely not enough. I tried to set manually all system properties 
that `spark-submit` adds to the JVM (including "spark-submit=true", 
"spark.submit.deployMode=client", etc.), but it didn't help too. Source code is 
always good, but for a stranger like me it's a little bit hard to grasp control 
flow in SparkSubmit class.


For pySpark & SparkR, when running scripts in client deployment modes 
(standalone client and yarn client), the JVM is the same (py4j/RBackend running 
as a thread in the JVM launched by spark-submit)

Can you elaborate on this? Does it mean that `spark-submit` creates new 
Python/R process that connects back to that same JVM and creates SparkContext 
in it?


On Tue, Apr 12, 2016 at 2:04 PM, Sun, Rui 
<rui@intel.com<mailto:rui@intel.com>> wrote:
There is much deployment preparation work handling different deployment modes 
for pyspark and SparkR in SparkSubmit. It is difficult to summarize it briefly, 
you had better refer to the source code.

Supporting running Julia scripts in SparkSubmit is more than implementing a 
‘JuliaRunner’. One part is passing the command line options, like “--master”, 
from the JVM launched by spark-submit to the JVM where SparkContext resides, in 
the case that the two JVMs are not the same. For pySpark & SparkR, when running 
scripts in client deployment modes (standalone client and yarn client), the JVM 
is the same (py4j/RBackend running as a thread in the JVM launched by 
spark-submit) , so no need to pass the command line options around. However, in 
your case, Julia interpreter launches an in-process JVM for SparkContext, which 
is a separate JVM from the one launched by spark-submit. So you need a way, 
typically an environment environment variable, like “SPARKR_SUBMIT_ARGS” for 
SparkR or “PYSPARK_SUBMIT_ARGS” for pyspark, to pass command line args to the 
in-process JVM in the Julia interpreter so that SparkConf can pick the options.

From: Andrei 
[mailto:faithlessfri...@gmail.com<mailto:faithlessfri...@gmail.com>]
Sent: Tuesday, April 12, 2016 3:48 AM
To: user <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: How does spark-submit handle Python scripts (and how to repeat it)?

I'm working on a wrapper [1] around Spark for the Julia programming language 
[2] similar to PySpark. I've got it working with Spark Standalone server by 
creating local JVM and setting master programmatically. However, this approach 
doesn't work with YARN (and probably Mesos), which require running via 
`spark-submit`.

In `SparkSubmit` class I see that for Python a special class `PythonRunner` is 
launched, so I tried to do similar `JuliaRunner`, which essentially does the 
following:

val pb = new ProcessBuilder(Seq("julia", juliaScript))
val process = pb.start()
process.waitFor()

where `juliaScript` itself creates new JV

Re: How does spark-submit handle Python scripts (and how to repeat it)?

2016-04-12 Thread Andrei

>
> One part is passing the command line options, like “--master”, from the
> JVM launched by spark-submit to the JVM where SparkContext resides


Since I have full control over both - JVM and Julia parts - I can pass
whatever options to both. But what exactly should be passed? Currently
pipeline looks like this:

spark-submit JVM -> JuliaRunner -> julia process -> new JVM with
SparkContext


 I want to make the last JVM's SparkContext to understand that it should
run on YARN. Obviously, I can't pass `--master yarn` option to JVM itself.
Instead, I can pass system property "spark.master" = "yarn-client", but
this results in an error:

Retrying connect to server: 0.0.0.0/0.0.0.0:8032



So it's definitely not enough. I tried to set manually all system
properties that `spark-submit` adds to the JVM (including
"spark-submit=true", "spark.submit.deployMode=client", etc.), but it didn't
help too. Source code is always good, but for a stranger like me it's a
little bit hard to grasp control flow in SparkSubmit class.


For pySpark & SparkR, when running scripts in client deployment modes
> (standalone client and yarn client), the JVM is the same (py4j/RBackend
> running as a thread in the JVM launched by spark-submit)


Can you elaborate on this? Does it mean that `spark-submit` creates new
Python/R process that connects back to that same JVM and creates
SparkContext in it?


On Tue, Apr 12, 2016 at 2:04 PM, Sun, Rui  wrote:

> There is much deployment preparation work handling different deployment
> modes for pyspark and SparkR in SparkSubmit. It is difficult to summarize
> it briefly, you had better refer to the source code.
>
>
>
> Supporting running Julia scripts in SparkSubmit is more than implementing
> a ‘JuliaRunner’. One part is passing the command line options, like
> “--master”, from the JVM launched by spark-submit to the JVM where
> SparkContext resides, in the case that the two JVMs are not the same. For
> pySpark & SparkR, when running scripts in client deployment modes
> (standalone client and yarn client), the JVM is the same (py4j/RBackend
> running as a thread in the JVM launched by spark-submit) , so no need to
> pass the command line options around. However, in your case, Julia
> interpreter launches an in-process JVM for SparkContext, which is a
> separate JVM from the one launched by spark-submit. So you need a way,
> typically an environment environment variable, like “SPARKR_SUBMIT_ARGS”
> for SparkR or “PYSPARK_SUBMIT_ARGS” for pyspark, to pass command line args
> to the in-process JVM in the Julia interpreter so that SparkConf can pick
> the options.
>
>
>
> *From:* Andrei [mailto:faithlessfri...@gmail.com]
> *Sent:* Tuesday, April 12, 2016 3:48 AM
> *To:* user 
> *Subject:* How does spark-submit handle Python scripts (and how to repeat
> it)?
>
>
>
> I'm working on a wrapper [1] around Spark for the Julia programming
> language [2] similar to PySpark. I've got it working with Spark Standalone
> server by creating local JVM and setting master programmatically. However,
> this approach doesn't work with YARN (and probably Mesos), which require
> running via `spark-submit`.
>
>
>
> In `SparkSubmit` class I see that for Python a special class
> `PythonRunner` is launched, so I tried to do similar `JuliaRunner`, which
> essentially does the following:
>
>
>
> val pb = new ProcessBuilder(Seq("julia", juliaScript))
>
> val process = pb.start()
>
> process.waitFor()
>
>
>
> where `juliaScript` itself creates new JVM and `SparkContext` inside it
> WITHOUT setting master URL. I then tried to launch this class using
>
>
>
> spark-submit --master yarn \
>
>   --class o.a.s.a.j.JuliaRunner \
>
>   project.jar my_script.jl
>
>
>
> I expected that `spark-submit` would set environment variables or
> something that SparkContext would then read and connect to appropriate
> master. This didn't happen, however, and process failed while trying to
> instantiate `SparkContext`, saying that master is not specified.
>
>
>
> So what am I missing? How can use `spark-submit` to run driver in a
> non-JVM language?
>
>
>
>
>
> [1]: https://github.com/dfdx/Sparta.jl
>
> [2]: http://julialang.org/
>

RE: How does spark-submit handle Python scripts (and how to repeat it)?

2016-04-12 Thread Sun, Rui

There is much deployment preparation work handling different deployment modes 
for pyspark and SparkR in SparkSubmit. It is difficult to summarize it briefly, 
you had better refer to the source code.

Supporting running Julia scripts in SparkSubmit is more than implementing a 
‘JuliaRunner’. One part is passing the command line options, like “--master”, 
from the JVM launched by spark-submit to the JVM where SparkContext resides, in 
the case that the two JVMs are not the same. For pySpark & SparkR, when running 
scripts in client deployment modes (standalone client and yarn client), the JVM 
is the same (py4j/RBackend running as a thread in the JVM launched by 
spark-submit) , so no need to pass the command line options around. However, in 
your case, Julia interpreter launches an in-process JVM for SparkContext, which 
is a separate JVM from the one launched by spark-submit. So you need a way, 
typically an environment environment variable, like “SPARKR_SUBMIT_ARGS” for 
SparkR or “PYSPARK_SUBMIT_ARGS” for pyspark, to pass command line args to the 
in-process JVM in the Julia interpreter so that SparkConf can pick the options.

From: Andrei [mailto:faithlessfri...@gmail.com]
Sent: Tuesday, April 12, 2016 3:48 AM
To: user 
Subject: How does spark-submit handle Python scripts (and how to repeat it)?

I'm working on a wrapper [1] around Spark for the Julia programming language 
[2] similar to PySpark. I've got it working with Spark Standalone server by 
creating local JVM and setting master programmatically. However, this approach 
doesn't work with YARN (and probably Mesos), which require running via 
`spark-submit`.

In `SparkSubmit` class I see that for Python a special class `PythonRunner` is 
launched, so I tried to do similar `JuliaRunner`, which essentially does the 
following:

val pb = new ProcessBuilder(Seq("julia", juliaScript))
val process = pb.start()
process.waitFor()

where `juliaScript` itself creates new JVM and `SparkContext` inside it WITHOUT 
setting master URL. I then tried to launch this class using

spark-submit --master yarn \
  --class o.a.s.a.j.JuliaRunner \
  project.jar my_script.jl

I expected that `spark-submit` would set environment variables or something 
that SparkContext would then read and connect to appropriate master. This 
didn't happen, however, and process failed while trying to instantiate 
`SparkContext`, saying that master is not specified.

So what am I missing? How can use `spark-submit` to run driver in a non-JVM 
language?


[1]: https://github.com/dfdx/Sparta.jl
[2]: http://julialang.org/

Re: How does spark-submit handle Python scripts (and how to repeat it)?

RE: How does spark-submit handle Python scripts (and how to repeat it)?

Re: How does spark-submit handle Python scripts (and how to repeat it)?

RE: How does spark-submit handle Python scripts (and how to repeat it)?

Re: How does spark-submit handle Python scripts (and how to repeat it)?

RE: How does spark-submit handle Python scripts (and how to repeat it)?

6 matches

Site Navigation

Mail list logo

Footer information