Re: Cannot submit to a Spark Application to a remote cluster Spark 1.0

2014-07-10 Thread Aris Vlasakakis
Andrew, thank you so much! That worked! I had to manually set the
spark.home configuration in the SparkConf object using
.set("spark.home","/cluster/path/to/spark/"), and then I was able to submit
from my laptop to the cluster!

Aris


On Thu, Jul 10, 2014 at 11:41 AM, Andrew Or  wrote:

> Setting SPARK_HOME is not super effective, because it is overridden very
> quickly by bin/spark-submit here
> .
> Instead you should set the config "spark.home". Here's why:
>
> Each of your executors inherits its spark home from the application
> description, and this is created by your SparkContext on your local
> machine. By default, as you noticed, this uses your local spark home that
> is not applicable to your remote executors. There are two ways of
> controlling the spark home set in your application description: through the
> "spark.home" config or "SPARK_HOME" environment variable, with the former
> taking priority over the latter
> .
> However, since spark-submit overwrites whatever value you set SPARK_HOME
> to, there is really only one way: by setting "spark.home". Note that this
> config is only used to launch the executors, meaning the driver has already
> started by the time this config is consumed, so this does not create any
> ordering issue.
>
> Does that make sense?
>
> Andrew
>
>
> 2014-07-10 10:17 GMT-07:00 Aris Vlasakakis :
>
> Thank you very much Yana for replying!
>>
>> So right now the set up is a single-node machine which is my "cluster",
>> and YES you are right my submitting laptop has a different path to the
>> spark-1.0.0 installation than the "cluster" machine.
>>
>> I tried to set SPARK_HOME on my submittor laptop using the actual path of
>> the CLUSTER's directories...hoping this would help. It didn't change
>> anything. Let me show you the command from the submission side on my laptop
>> - keep in mind I set SPARK_HOME as the path for the cluster.
>>
>> SPARK_HOME=/home/data/Documents/spark-1.0.0/
>>  ~/Documents/spark-1.0.0/bin/spark-submit --verbose --class
>> org.apache.spark.examples.SparkPi --master spark://10.20.10.152:7077
>> ~/Documents/spark-1.0.0/examples/target/scala-2.10/spark-examples-1.0.0-hadoop1.0.4.jar
>>
>> I get the same problems as originally. I am actually confused here - if I
>> set the SPARK_HOME on the same line as the spark-submit command, I don't
>> get how the submittor script did not get all confused just running
>> bin/spark-submit...but what do I know.
>>
>> Am I setting SPARK_HOME correctly?
>>
>>
>> On Wed, Jul 9, 2014 at 7:09 PM, Yana Kadiyska 
>> wrote:
>>
>>>  class java.io.IOException: Cannot run program
>>> "/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh"
>>> (in directory "."): error=2, No such file or directory
>>>
>>> By any chance, are your SPARK_HOME directories different on the
>>> machine where you're submitting from and the cluster? I'm on an older
>>> drop so not sure about the finer points of spark-submit but do
>>> remember a very similar issue when trying to run a Spark driver on a
>>> windows machine against a Spark Master on Ubuntu cluster (the
>>> SPARK_HOME directories were obviously different)
>>>
>>> On Wed, Jul 9, 2014 at 7:18 PM, Aris Vlasakakis 
>>> wrote:
>>> > Hello everybody,
>>> >
>>> > I am trying to figure out how to submit a Spark application from one
>>> > separate physical machine to a Spark stand alone cluster. I have an
>>> > application that I wrote in Python that works if I am on the 1-Node
>>> Spark
>>> > server itself, and from that spark installation I run bin/spark-submit
>>> with
>>> > 1)  MASTER=local[*] or if 2) MASTER=spark://localhost:7077.
>>> >
>>> > However, I want to be on a separate machine that submits a job to
>>> Spark. Am
>>> > I doing something wrong here? I think something is wrong because I am
>>> > working from two different spark "installations" -- as in, on the big
>>> server
>>> > I have one spark installation and I am running sbin/start-all.sh to
>>> run the
>>> > standalone server (and that works), and then on a separate laptop I
>>> have a
>>> > different installation of spark-1.0.0, but I am using the laptop's
>>> > bin/spark-submit script to submit to the remote Spark server (using
>>> > MASTER=spark://:7077
>>> >
>>> > This "submit-to-remote cluster" does not work, even for the Scala
>>> examples
>>> > like SparkPi.
>>> >
>>> > Concrete Example: I want to do submit the example SparkPi to the
>>> cluster,
>>> > from my laptop.
>>> >
>>> > Server is 10.20.10.152, running master and slave, I can look at the
>>> Master
>>> > web UI at http://10.20.10.152:8080. Great.
>>> >
>>> > From laptop (10.20.10.154), I try the following, using bin/run-example
>>> from
>>> > a locally built version of spark 1.0.0 (so that I have the script
>>> 

Re: Cannot submit to a Spark Application to a remote cluster Spark 1.0

2014-07-10 Thread Andrew Or
Setting SPARK_HOME is not super effective, because it is overridden very
quickly by bin/spark-submit here
.
Instead you should set the config "spark.home". Here's why:

Each of your executors inherits its spark home from the application
description, and this is created by your SparkContext on your local
machine. By default, as you noticed, this uses your local spark home that
is not applicable to your remote executors. There are two ways of
controlling the spark home set in your application description: through the
"spark.home" config or "SPARK_HOME" environment variable, with the former
taking priority over the latter
.
However, since spark-submit overwrites whatever value you set SPARK_HOME
to, there is really only one way: by setting "spark.home". Note that this
config is only used to launch the executors, meaning the driver has already
started by the time this config is consumed, so this does not create any
ordering issue.

Does that make sense?

Andrew


2014-07-10 10:17 GMT-07:00 Aris Vlasakakis :

> Thank you very much Yana for replying!
>
> So right now the set up is a single-node machine which is my "cluster",
> and YES you are right my submitting laptop has a different path to the
> spark-1.0.0 installation than the "cluster" machine.
>
> I tried to set SPARK_HOME on my submittor laptop using the actual path of
> the CLUSTER's directories...hoping this would help. It didn't change
> anything. Let me show you the command from the submission side on my laptop
> - keep in mind I set SPARK_HOME as the path for the cluster.
>
> SPARK_HOME=/home/data/Documents/spark-1.0.0/
>  ~/Documents/spark-1.0.0/bin/spark-submit --verbose --class
> org.apache.spark.examples.SparkPi --master spark://10.20.10.152:7077
> ~/Documents/spark-1.0.0/examples/target/scala-2.10/spark-examples-1.0.0-hadoop1.0.4.jar
>
> I get the same problems as originally. I am actually confused here - if I
> set the SPARK_HOME on the same line as the spark-submit command, I don't
> get how the submittor script did not get all confused just running
> bin/spark-submit...but what do I know.
>
> Am I setting SPARK_HOME correctly?
>
>
> On Wed, Jul 9, 2014 at 7:09 PM, Yana Kadiyska 
> wrote:
>
>>  class java.io.IOException: Cannot run program
>> "/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh"
>> (in directory "."): error=2, No such file or directory
>>
>> By any chance, are your SPARK_HOME directories different on the
>> machine where you're submitting from and the cluster? I'm on an older
>> drop so not sure about the finer points of spark-submit but do
>> remember a very similar issue when trying to run a Spark driver on a
>> windows machine against a Spark Master on Ubuntu cluster (the
>> SPARK_HOME directories were obviously different)
>>
>> On Wed, Jul 9, 2014 at 7:18 PM, Aris Vlasakakis 
>> wrote:
>> > Hello everybody,
>> >
>> > I am trying to figure out how to submit a Spark application from one
>> > separate physical machine to a Spark stand alone cluster. I have an
>> > application that I wrote in Python that works if I am on the 1-Node
>> Spark
>> > server itself, and from that spark installation I run bin/spark-submit
>> with
>> > 1)  MASTER=local[*] or if 2) MASTER=spark://localhost:7077.
>> >
>> > However, I want to be on a separate machine that submits a job to
>> Spark. Am
>> > I doing something wrong here? I think something is wrong because I am
>> > working from two different spark "installations" -- as in, on the big
>> server
>> > I have one spark installation and I am running sbin/start-all.sh to run
>> the
>> > standalone server (and that works), and then on a separate laptop I
>> have a
>> > different installation of spark-1.0.0, but I am using the laptop's
>> > bin/spark-submit script to submit to the remote Spark server (using
>> > MASTER=spark://:7077
>> >
>> > This "submit-to-remote cluster" does not work, even for the Scala
>> examples
>> > like SparkPi.
>> >
>> > Concrete Example: I want to do submit the example SparkPi to the
>> cluster,
>> > from my laptop.
>> >
>> > Server is 10.20.10.152, running master and slave, I can look at the
>> Master
>> > web UI at http://10.20.10.152:8080. Great.
>> >
>> > From laptop (10.20.10.154), I try the following, using bin/run-example
>> from
>> > a locally built version of spark 1.0.0 (so that I have the script
>> > spark-submit!):
>> >
>> > bin/spark-submit --verbose --class org.apache.spark.examples.SparkPi
>> > --master spark://10.20.10.152:7077
>> > examples/target/scala-2.10/spark-examples-1.0.0-hadoop1.0.4.jar
>> >
>> >
>> > This fails, with the errors at the bottom of this email.
>> >
>> > Am I doing something wrong? How can I submit to  a remote cluster? I
>> get the
>> > same problem with bin/spark-submit.
>> >

Re: Cannot submit to a Spark Application to a remote cluster Spark 1.0

2014-07-10 Thread Aris Vlasakakis
Thank you very much Yana for replying!

So right now the set up is a single-node machine which is my "cluster", and
YES you are right my submitting laptop has a different path to the
spark-1.0.0 installation than the "cluster" machine.

I tried to set SPARK_HOME on my submittor laptop using the actual path of
the CLUSTER's directories...hoping this would help. It didn't change
anything. Let me show you the command from the submission side on my laptop
- keep in mind I set SPARK_HOME as the path for the cluster.

SPARK_HOME=/home/data/Documents/spark-1.0.0/
 ~/Documents/spark-1.0.0/bin/spark-submit --verbose --class
org.apache.spark.examples.SparkPi --master spark://10.20.10.152:7077
~/Documents/spark-1.0.0/examples/target/scala-2.10/spark-examples-1.0.0-hadoop1.0.4.jar

I get the same problems as originally. I am actually confused here - if I
set the SPARK_HOME on the same line as the spark-submit command, I don't
get how the submittor script did not get all confused just running
bin/spark-submit...but what do I know.

Am I setting SPARK_HOME correctly?


On Wed, Jul 9, 2014 at 7:09 PM, Yana Kadiyska 
wrote:

>  class java.io.IOException: Cannot run program
> "/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh"
> (in directory "."): error=2, No such file or directory
>
> By any chance, are your SPARK_HOME directories different on the
> machine where you're submitting from and the cluster? I'm on an older
> drop so not sure about the finer points of spark-submit but do
> remember a very similar issue when trying to run a Spark driver on a
> windows machine against a Spark Master on Ubuntu cluster (the
> SPARK_HOME directories were obviously different)
>
> On Wed, Jul 9, 2014 at 7:18 PM, Aris Vlasakakis 
> wrote:
> > Hello everybody,
> >
> > I am trying to figure out how to submit a Spark application from one
> > separate physical machine to a Spark stand alone cluster. I have an
> > application that I wrote in Python that works if I am on the 1-Node Spark
> > server itself, and from that spark installation I run bin/spark-submit
> with
> > 1)  MASTER=local[*] or if 2) MASTER=spark://localhost:7077.
> >
> > However, I want to be on a separate machine that submits a job to Spark.
> Am
> > I doing something wrong here? I think something is wrong because I am
> > working from two different spark "installations" -- as in, on the big
> server
> > I have one spark installation and I am running sbin/start-all.sh to run
> the
> > standalone server (and that works), and then on a separate laptop I have
> a
> > different installation of spark-1.0.0, but I am using the laptop's
> > bin/spark-submit script to submit to the remote Spark server (using
> > MASTER=spark://:7077
> >
> > This "submit-to-remote cluster" does not work, even for the Scala
> examples
> > like SparkPi.
> >
> > Concrete Example: I want to do submit the example SparkPi to the cluster,
> > from my laptop.
> >
> > Server is 10.20.10.152, running master and slave, I can look at the
> Master
> > web UI at http://10.20.10.152:8080. Great.
> >
> > From laptop (10.20.10.154), I try the following, using bin/run-example
> from
> > a locally built version of spark 1.0.0 (so that I have the script
> > spark-submit!):
> >
> > bin/spark-submit --verbose --class org.apache.spark.examples.SparkPi
> > --master spark://10.20.10.152:7077
> > examples/target/scala-2.10/spark-examples-1.0.0-hadoop1.0.4.jar
> >
> >
> > This fails, with the errors at the bottom of this email.
> >
> > Am I doing something wrong? How can I submit to  a remote cluster? I get
> the
> > same problem with bin/spark-submit.
> >
> >
> >  bin/spark-submit --verbose --class org.apache.spark.examples.SparkPi
> > --master spark://10.20.10.152:7077
> > examples/target/scala-2.10/spark-examples-1.0.0-hadoop1.0.4.jar
> > Using properties file: null
> > Using properties file: null
> > Parsed arguments:
> >   master  spark://10.20.10.152:7077
> >   deployMode  null
> >   executorMemory  null
> >   executorCores   null
> >   totalExecutorCores  null
> >   propertiesFile  null
> >   driverMemorynull
> >   driverCores null
> >   driverExtraClassPathnull
> >   driverExtraLibraryPath  null
> >   driverExtraJavaOptions  null
> >   supervise   false
> >   queue   null
> >   numExecutorsnull
> >   files   null
> >   pyFiles null
> >   archivesnull
> >   mainClass   org.apache.spark.examples.SparkPi
> >   primaryResource
> >
> file:/Users/aris.vlasakakis/Documents/spark-1.0.0/examples/target/scala-2.10/spark-examples-1.0.0-hadoop1.0.4.jar
> >   nameorg.apache.spark.examples.SparkPi
> >   childArgs   []
> >   jarsnull
> >   verbose true
> >
> > Default properties from null:
> >
> >
> >
> > Using properties file: null
> > Main class:
> > org.apache.

Re: Cannot submit to a Spark Application to a remote cluster Spark 1.0

2014-07-09 Thread Yana Kadiyska
 class java.io.IOException: Cannot run program
"/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh"
(in directory "."): error=2, No such file or directory

By any chance, are your SPARK_HOME directories different on the
machine where you're submitting from and the cluster? I'm on an older
drop so not sure about the finer points of spark-submit but do
remember a very similar issue when trying to run a Spark driver on a
windows machine against a Spark Master on Ubuntu cluster (the
SPARK_HOME directories were obviously different)

On Wed, Jul 9, 2014 at 7:18 PM, Aris Vlasakakis  wrote:
> Hello everybody,
>
> I am trying to figure out how to submit a Spark application from one
> separate physical machine to a Spark stand alone cluster. I have an
> application that I wrote in Python that works if I am on the 1-Node Spark
> server itself, and from that spark installation I run bin/spark-submit with
> 1)  MASTER=local[*] or if 2) MASTER=spark://localhost:7077.
>
> However, I want to be on a separate machine that submits a job to Spark. Am
> I doing something wrong here? I think something is wrong because I am
> working from two different spark "installations" -- as in, on the big server
> I have one spark installation and I am running sbin/start-all.sh to run the
> standalone server (and that works), and then on a separate laptop I have a
> different installation of spark-1.0.0, but I am using the laptop's
> bin/spark-submit script to submit to the remote Spark server (using
> MASTER=spark://:7077
>
> This "submit-to-remote cluster" does not work, even for the Scala examples
> like SparkPi.
>
> Concrete Example: I want to do submit the example SparkPi to the cluster,
> from my laptop.
>
> Server is 10.20.10.152, running master and slave, I can look at the Master
> web UI at http://10.20.10.152:8080. Great.
>
> From laptop (10.20.10.154), I try the following, using bin/run-example from
> a locally built version of spark 1.0.0 (so that I have the script
> spark-submit!):
>
> bin/spark-submit --verbose --class org.apache.spark.examples.SparkPi
> --master spark://10.20.10.152:7077
> examples/target/scala-2.10/spark-examples-1.0.0-hadoop1.0.4.jar
>
>
> This fails, with the errors at the bottom of this email.
>
> Am I doing something wrong? How can I submit to  a remote cluster? I get the
> same problem with bin/spark-submit.
>
>
>  bin/spark-submit --verbose --class org.apache.spark.examples.SparkPi
> --master spark://10.20.10.152:7077
> examples/target/scala-2.10/spark-examples-1.0.0-hadoop1.0.4.jar
> Using properties file: null
> Using properties file: null
> Parsed arguments:
>   master  spark://10.20.10.152:7077
>   deployMode  null
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archivesnull
>   mainClass   org.apache.spark.examples.SparkPi
>   primaryResource
> file:/Users/aris.vlasakakis/Documents/spark-1.0.0/examples/target/scala-2.10/spark-examples-1.0.0-hadoop1.0.4.jar
>   nameorg.apache.spark.examples.SparkPi
>   childArgs   []
>   jarsnull
>   verbose true
>
> Default properties from null:
>
>
>
> Using properties file: null
> Main class:
> org.apache.spark.examples.SparkPi
> Arguments:
>
> System properties:
> SPARK_SUBMIT -> true
> spark.app.name -> org.apache.spark.examples.SparkPi
> spark.jars ->
> file:/Users/aris.vlasakakis/Documents/spark-1.0.0/examples/target/scala-2.10/spark-examples-1.0.0-hadoop1.0.4.jar
> spark.master -> spark://10.20.10.152:7077
> Classpath elements:
> file:/Users/aris.vlasakakis/Documents/spark-1.0.0/examples/target/scala-2.10/spark-examples-1.0.0-hadoop1.0.4.jar
>
>
> 14/07/09 16:16:08 INFO SecurityManager: Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> 14/07/09 16:16:08 INFO SecurityManager: Changing view acls to:
> aris.vlasakakis
> 14/07/09 16:16:08 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions:
> Set(aris.vlasakakis)
> 14/07/09 16:16:08 INFO Slf4jLogger: Slf4jLogger started
> 14/07/09 16:16:08 INFO Remoting: Starting remoting
> 14/07/09 16:16:08 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://spark@10.20.10.154:50478]
> 14/07/09 16:16:08 INFO Remoting: Remoting now listens on addresses:
> [akka.tcp://spark@10.20.10.154:50478]
> 14/07/09 16:16:08 INFO SparkEnv: Registering MapOutputTracker
> 14/07/09 16:16:08 INFO SparkEnv: Registering BlockManagerMaster
> 14/07/09 16:16:08 INFO DiskBlockManager: C