Re: Can't submit job to stand alone cluster

2015-12-30 Thread SparkUser

Sorry need to clarify:

When you say:

   /When the docs say //"If your application is launched through Spark
   submit, then the application jar is automatically distributed to all
   worker nodes,"//it is actually saying that your executors get their
   jars from the driver. This is true whether you're running in client
   mode or cluster mode./


Don't you mean the master, not the driver? I thought the whole point of 
confusion is that people expect the driver to distribute jars but they 
have to be visible to the master on the file system local to the master?


I see a lot of people tripped up by this and a nice mail from Greg Hill 
to the list cleared this up for me but now I am confused again. I am a 
couple days away from having a way to test this myself, so I am just "in 
theory" right now.


   On 12/29/2015 05:18 AM, Greg Hill wrote:

Yes, you have misunderstood, but so did I.  So the problem is that
--deploy-mode cluster runs the Driver on the cluster as well, and you
don't know which node it's going to run on, so every node needs
access to
the JAR.  spark-submit does not pass the JAR along to the Driver,
but the
Driver will pass it to the executors.  I ended up putting the JAR
in HDFS
and passing an hdfs:// path to spark-submit.  This is a subtle
difference
from Spark on YARN which does pass the JAR along to the Driver
automatically, and IMO should probably be fixed in spark-submit. 
It's

really confusing for newcomers.



Thanks,

Jim


On 12/29/2015 04:36 PM, Daniel Valdivia wrote:

That makes things more clear! Thanks

Issue resolved

Sent from my iPhone

On Dec 29, 2015, at 2:43 PM, Annabel Melongo 
> wrote:



Thanks Andrew for this awesome explanation *:) happy


On Tuesday, December 29, 2015 5:30 PM, Andrew Or 
> wrote:



Let me clarify a few things for everyone:

There are three *cluster managers*: standalone, YARN, and Mesos. Each 
cluster manager can run in two *deploy modes*, client or cluster. In 
client mode, the driver runs on the machine that submitted the 
application (the client). In cluster mode, the driver runs on one of 
the worker machines in the cluster.


When I say "standalone cluster mode" I am referring to the standalone 
cluster manager running in cluster deploy mode.


Here's how the resources are distributed in each mode (omitting Mesos):

*Standalone / YARN client mode. *The driver runs on the client
machine (i.e. machine that ran Spark submit) so it should already
have access to the jars. The executors then pull the jars from an
HTTP server started in the driver.

*Standalone cluster mode. *Spark submit does /not/ upload your
jars to the cluster, so all the resources you need must already
be on all of the worker machines. The executors, however,
actually just pull the jars from the driver as in client mode
instead of finding it in their own local file systems.

*YARN cluster mode. *Spark submit /does/ upload your jars to the
cluster. In particular, it puts the jars in HDFS so your driver
can just read from there. As in other deployments, the executors
pull the jars from the driver.


When the docs say "If your application is launched through Spark 
submit, then the application jar is automatically distributed to all 
worker nodes," it is actually saying that your executors get their 
jars from the driver. This is true whether you're running in client 
mode or cluster mode.


If the docs are unclear (and they seem to be), then we should update 
them. I have filed SPARK-12565 
 to track this.


Please let me know if there's anything else I can help clarify.

Cheers,
-Andrew




2015-12-29 13:07 GMT-08:00 Annabel Melongo >:


Andrew,

Now I see where the confusion lays. Standalone cluster mode, your
link, is nothing but a combination of client-mode and standalone
mode, my link, without YARN.

But I'm confused by this paragraph in your link:

If your application is launched through Spark submit, then the
application jar is automatically distributed to all worker nodes.
For any additional jars that your
application depends on, you should specify them through
the |--jars| flag using comma as a delimiter (e.g. |--jars
jar1,jar2|).

That can't be true; this is only the case when Spark runs on top
of YARN. Please correct me, if I'm wrong.

Thanks


On Tuesday, December 29, 2015 2:54 PM, Andrew Or
> wrote:



http://spark.apache.org/docs/latest/spark-standalone.html#launching-spark-applications

2015-12-29 11:48 GMT-08:00 Annabel Melongo
>:

  

Re: Can't submit job to stand alone cluster

2015-12-30 Thread Andrew Or
Hi Jim,

Just to clarify further:

   - *Driver *is the process with SparkContext. A driver represents an
   application (e.g. spark-shell, SparkPi) so there is exactly one driver in
   each application.


   - *Executor *is the process that runs the tasks scheduled by the driver.
   There should be at least one executor in each application.


   - *Master *is the process that handles scheduling of *applications*. It
   decides where drivers and executors are launched and how many cores and how
   much memory to give to each application. This only exists in standalone
   mode.


   - *Worker *is the process that actually launches the executor and driver
   JVMs (the latter only in cluster mode). It talks to the Master to decide
   what to launch with how much memory to give to the process. This only
   exists in standalone mode.

It is actually the *driver*, not the Master, that distributes jars to
executors. The Master is largely unconcerned with individual requirements
from an application apart from cores / memory constraints. This is because
we still need to distribute jars to executors in YARN and Mesos modes, so
the common process, the driver, has to do it.

I thought the whole point of confusion is that people expect the driver to
> distribute jars but they have to be visible to the master on the file
> system local to the master?


Actually the requirement is that the jars have to be visible to the machine
running the *driver*, not the Master. In client mode, your jars have to be
visible to the machine running spark-submit. In cluster mode, your jars
have to be visible to all machines running a Worker, since the driver can
be launched on any of them.

The nice email from Greg is spot-on.

Does that make sense?

-Andrew


2015-12-30 11:23 GMT-08:00 SparkUser :

> Sorry need to clarify:
>
> When you say:
>
> *When the docs say **"If your application is launched through Spark
> submit, then the application jar is automatically distributed to all worker
> nodes,"**it is actually saying that your executors get their jars from
> the driver. This is true whether you're running in client mode or cluster
> mode.*
>
>
> Don't you mean the master, not the driver? I thought the whole point of
> confusion is that people expect the driver to distribute jars but they have
> to be visible to the master on the file system local to the master?
>
> I see a lot of people tripped up by this and a nice mail from Greg Hill to
> the list cleared this up for me but now I am confused again. I am a couple
> days away from having a way to test this myself, so I am just "in theory"
> right now.
>
> On 12/29/2015 05:18 AM, Greg Hill wrote:
>
> Yes, you have misunderstood, but so did I.  So the problem is that
> --deploy-mode cluster runs the Driver on the cluster as well, and you
> don't know which node it's going to run on, so every node needs access to
> the JAR.  spark-submit does not pass the JAR along to the Driver, but the
> Driver will pass it to the executors.  I ended up putting the JAR in HDFS
> and passing an hdfs:// path to spark-submit.  This is a subtle difference
> from Spark on YARN which does pass the JAR along to the Driver
> automatically, and IMO should probably be fixed in spark-submit.  It's
> really confusing for newcomers.
>
>
> Thanks,
>
> Jim
>
>
> On 12/29/2015 04:36 PM, Daniel Valdivia wrote:
>
> That makes things more clear! Thanks
>
> Issue resolved
>
> Sent from my iPhone
>
> On Dec 29, 2015, at 2:43 PM, Annabel Melongo < 
> melongo_anna...@yahoo.com> wrote:
>
> Thanks Andrew for this awesome explanation [image: *:) happy]
>
>
> On Tuesday, December 29, 2015 5:30 PM, Andrew Or < 
> and...@databricks.com> wrote:
>
>
> Let me clarify a few things for everyone:
>
> There are three *cluster managers*: standalone, YARN, and Mesos. Each
> cluster manager can run in two *deploy modes*, client or cluster. In
> client mode, the driver runs on the machine that submitted the application
> (the client). In cluster mode, the driver runs on one of the worker
> machines in the cluster.
>
> When I say "standalone cluster mode" I am referring to the standalone
> cluster manager running in cluster deploy mode.
>
> Here's how the resources are distributed in each mode (omitting Mesos):
>
> *Standalone / YARN client mode. *The driver runs on the client machine
> (i.e. machine that ran Spark submit) so it should already have access to
> the jars. The executors then pull the jars from an HTTP server started in
> the driver.
>
> *Standalone cluster mode. *Spark submit does *not* upload your jars to
> the cluster, so all the resources you need must already be on all of the
> worker machines. The executors, however, actually just pull the jars from
> the driver as in client mode instead of finding it in their own local file
> systems.
>
> *YARN cluster mode. *Spark submit *does* upload your jars to the cluster.
> In particular, it puts the 

Re: Can't submit job to stand alone cluster

2015-12-29 Thread Andrew Or
Hi Greg,

It's actually intentional for standalone cluster mode to not upload jars.
One of the reasons why YARN takes at least 10 seconds before running any
simple application is because there's a lot of random overhead (e.g.
putting jars in HDFS). If this missing functionality is not documented
somewhere then we should add that.

Also, the packages problem seems legitimate. Thanks for reporting it. I
have filed https://issues.apache.org/jira/browse/SPARK-12559.

-Andrew

2015-12-29 4:18 GMT-08:00 Greg Hill :

>
>
> On 12/28/15, 5:16 PM, "Daniel Valdivia"  wrote:
>
> >Hi,
> >
> >I'm trying to submit a job to a small spark cluster running in stand
> >alone mode, however it seems like the jar file I'm submitting to the
> >cluster is "not found" by the workers nodes.
> >
> >I might have understood wrong, but I though the Driver node would send
> >this jar file to the worker nodes, or should I manually send this file to
> >each worker node before I submit the job?
>
> Yes, you have misunderstood, but so did I.  So the problem is that
> --deploy-mode cluster runs the Driver on the cluster as well, and you
> don't know which node it's going to run on, so every node needs access to
> the JAR.  spark-submit does not pass the JAR along to the Driver, but the
> Driver will pass it to the executors.  I ended up putting the JAR in HDFS
> and passing an hdfs:// path to spark-submit.  This is a subtle difference
> from Spark on YARN which does pass the JAR along to the Driver
> automatically, and IMO should probably be fixed in spark-submit.  It's
> really confusing for newcomers.
>
> Another problem I ran into that you also might is that --packages doesn't
> work with --deploy-mode cluster.  It downloads the packages to a temporary
> location on the node running spark-submit, then passes those paths to the
> node that is running the Driver, but since that isn't the same machine, it
> can't find anything and fails.  The driver process *should* be the one
> doing the downloading, but it isn't. I ended up having to create a fat JAR
> with all of the dependencies to get around that one.
>
> Greg
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Can't submit job to stand alone cluster

2015-12-29 Thread Daniel Valdivia
That makes things more clear! Thanks

Issue resolved

Sent from my iPhone

> On Dec 29, 2015, at 2:43 PM, Annabel Melongo  
> wrote:
> 
> Thanks Andrew for this awesome explanation 
> 
> 
> On Tuesday, December 29, 2015 5:30 PM, Andrew Or  
> wrote:
> 
> 
> Let me clarify a few things for everyone:
> 
> There are three cluster managers: standalone, YARN, and Mesos. Each cluster 
> manager can run in two deploy modes, client or cluster. In client mode, the 
> driver runs on the machine that submitted the application (the client). In 
> cluster mode, the driver runs on one of the worker machines in the cluster.
> 
> When I say "standalone cluster mode" I am referring to the standalone cluster 
> manager running in cluster deploy mode.
> 
> Here's how the resources are distributed in each mode (omitting Mesos):
> 
> Standalone / YARN client mode. The driver runs on the client machine (i.e. 
> machine that ran Spark submit) so it should already have access to the jars. 
> The executors then pull the jars from an HTTP server started in the driver.
> 
> Standalone cluster mode. Spark submit does not upload your jars to the 
> cluster, so all the resources you need must already be on all of the worker 
> machines. The executors, however, actually just pull the jars from the driver 
> as in client mode instead of finding it in their own local file systems.
> 
> YARN cluster mode. Spark submit does upload your jars to the cluster. In 
> particular, it puts the jars in HDFS so your driver can just read from there. 
> As in other deployments, the executors pull the jars from the driver.
> 
> When the docs say "If your application is launched through Spark submit, then 
> the application jar is automatically distributed to all worker nodes," it is 
> actually saying that your executors get their jars from the driver. This is 
> true whether you're running in client mode or cluster mode.
> 
> If the docs are unclear (and they seem to be), then we should update them. I 
> have filed SPARK-12565 to track this.
> 
> Please let me know if there's anything else I can help clarify.
> 
> Cheers,
> -Andrew
> 
> 
> 
> 
> 2015-12-29 13:07 GMT-08:00 Annabel Melongo :
> Andrew,
> 
> Now I see where the confusion lays. Standalone cluster mode, your link, is 
> nothing but a combination of client-mode and standalone mode, my link, 
> without YARN.
> 
> But I'm confused by this paragraph in your link:
> 
> If your application is launched through Spark submit, then the 
> application jar is automatically distributed to all worker nodes. For any 
> additional jars that your
>   application depends on, you should specify them through the --jars 
> flag using comma as a delimiter (e.g. --jars jar1,jar2).
> 
> That can't be true; this is only the case when Spark runs on top of YARN. 
> Please correct me, if I'm wrong.
> 
> Thanks
>   
> 
> 
> On Tuesday, December 29, 2015 2:54 PM, Andrew Or  
> wrote:
> 
> 
> http://spark.apache.org/docs/latest/spark-standalone.html#launching-spark-applications
> 
> 2015-12-29 11:48 GMT-08:00 Annabel Melongo :
> Greg,
> 
> Can you please send me a doc describing the standalone cluster mode? 
> Honestly, I never heard about it.
> 
> The three different modes, I've listed appear in the last paragraph of this 
> doc: Running Spark Applications
>  
>  
>  
>  
>  
>  
> Running Spark Applications
> --class The FQCN of the class containing the main method of the application. 
> For example, org.apache.spark.examples.SparkPi. --conf
> View on www.cloudera.com
> Preview by Yahoo
>  
> 
> 
> 
> On Tuesday, December 29, 2015 2:42 PM, Andrew Or  
> wrote:
> 
> 
> The confusion here is the expression "standalone cluster mode". Either it's 
> stand-alone or it's cluster mode but it can't be both.
> 
> @Annabel That's not true. There is a standalone cluster mode where driver 
> runs on one of the workers instead of on the client machine. What you're 
> describing is standalone client mode.
> 
> 2015-12-29 11:32 GMT-08:00 Annabel Melongo :
> Greg,
> 
> The confusion here is the expression "standalone cluster mode". Either it's 
> stand-alone or it's cluster mode but it can't be both.
> 
>  With this in mind, here's how jars are uploaded:
> 1. Spark Stand-alone mode: client and driver run on the same machine; use 
> --packages option to submit a jar
> 2. Yarn Cluster-mode: client and driver run on separate machines; 
> additionally driver runs as a thread in ApplicationMaster; use --jars option 
> with a globally visible path to said jar
> 3. Yarn Client-mode: client and driver run on the same machine. driver is 
> NOT a thread in ApplicationMaster; use --packages to submit a jar
> 
> 
> On Tuesday, December 29, 2015 1:54 PM, Andrew Or  
> wrote:
> 
> 
> Hi Greg,
> 
> It's actually intentional 

Re: Can't submit job to stand alone cluster

2015-12-29 Thread Andrew Or
http://spark.apache.org/docs/latest/spark-standalone.html#launching-spark-applications

2015-12-29 11:48 GMT-08:00 Annabel Melongo :

> Greg,
>
> Can you please send me a doc describing the standalone cluster mode?
> Honestly, I never heard about it.
>
> The three different modes, I've listed appear in the last paragraph of
> this doc: Running Spark Applications
> 
>
>
>
>
>
>
> Running Spark Applications
> 
> --class The FQCN of the class containing the main method of the
> application. For example, org.apache.spark.examples.SparkPi. --conf
> View on www.cloudera.com
> 
> Preview by Yahoo
>
>
>
>
> On Tuesday, December 29, 2015 2:42 PM, Andrew Or 
> wrote:
>
>
> The confusion here is the expression "standalone cluster mode". Either
> it's stand-alone or it's cluster mode but it can't be both.
>
>
> @Annabel That's not true. There *is* a standalone cluster mode where
> driver runs on one of the workers instead of on the client machine. What
> you're describing is standalone client mode.
>
> 2015-12-29 11:32 GMT-08:00 Annabel Melongo :
>
> Greg,
>
> The confusion here is the expression "standalone cluster mode". Either
> it's stand-alone or it's cluster mode but it can't be both.
>
>  With this in mind, here's how jars are uploaded:
> 1. Spark Stand-alone mode: client and driver run on the same machine;
> use --packages option to submit a jar
> 2. Yarn Cluster-mode: client and driver run on separate machines;
> additionally driver runs as a thread in ApplicationMaster; use --jars
> option with a globally visible path to said jar
> 3. Yarn Client-mode: client and driver run on the same machine. driver
> is *NOT* a thread in ApplicationMaster; use --packages to submit a jar
>
>
> On Tuesday, December 29, 2015 1:54 PM, Andrew Or 
> wrote:
>
>
> Hi Greg,
>
> It's actually intentional for standalone cluster mode to not upload jars.
> One of the reasons why YARN takes at least 10 seconds before running any
> simple application is because there's a lot of random overhead (e.g.
> putting jars in HDFS). If this missing functionality is not documented
> somewhere then we should add that.
>
> Also, the packages problem seems legitimate. Thanks for reporting it. I
> have filed https://issues.apache.org/jira/browse/SPARK-12559.
>
> -Andrew
>
> 2015-12-29 4:18 GMT-08:00 Greg Hill :
>
>
>
> On 12/28/15, 5:16 PM, "Daniel Valdivia"  wrote:
>
> >Hi,
> >
> >I'm trying to submit a job to a small spark cluster running in stand
> >alone mode, however it seems like the jar file I'm submitting to the
> >cluster is "not found" by the workers nodes.
> >
> >I might have understood wrong, but I though the Driver node would send
> >this jar file to the worker nodes, or should I manually send this file to
> >each worker node before I submit the job?
>
> Yes, you have misunderstood, but so did I.  So the problem is that
> --deploy-mode cluster runs the Driver on the cluster as well, and you
> don't know which node it's going to run on, so every node needs access to
> the JAR.  spark-submit does not pass the JAR along to the Driver, but the
> Driver will pass it to the executors.  I ended up putting the JAR in HDFS
> and passing an hdfs:// path to spark-submit.  This is a subtle difference
> from Spark on YARN which does pass the JAR along to the Driver
> automatically, and IMO should probably be fixed in spark-submit.  It's
> really confusing for newcomers.
>
> Another problem I ran into that you also might is that --packages doesn't
> work with --deploy-mode cluster.  It downloads the packages to a temporary
> location on the node running spark-submit, then passes those paths to the
> node that is running the Driver, but since that isn't the same machine, it
> can't find anything and fails.  The driver process *should* be the one
> doing the downloading, but it isn't. I ended up having to create a fat JAR
> with all of the dependencies to get around that one.
>
> Greg
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
>
>
>
>
>


Re: Can't submit job to stand alone cluster

2015-12-29 Thread Annabel Melongo
Thanks Andrew for this awesome explanation  

On Tuesday, December 29, 2015 5:30 PM, Andrew Or  
wrote:
 

 Let me clarify a few things for everyone:
There are three cluster managers: standalone, YARN, and Mesos. Each cluster 
manager can run in two deploy modes, client or cluster. In client mode, the 
driver runs on the machine that submitted the application (the client). In 
cluster mode, the driver runs on one of the worker machines in the cluster.
When I say "standalone cluster mode" I am referring to the standalone cluster 
manager running in cluster deploy mode.
Here's how the resources are distributed in each mode (omitting Mesos):

Standalone / YARN client mode. The driver runs on the client machine (i.e. 
machine that ran Spark submit) so it should already have access to the jars. 
The executors then pull the jars from an HTTP server started in the driver.
Standalone cluster mode. Spark submit does not upload your jars to the cluster, 
so all the resources you need must already be on all of the worker machines. 
The executors, however, actually just pull the jars from the driver as in 
client mode instead of finding it in their own local file systems.
YARN cluster mode. Spark submit does upload your jars to the cluster. In 
particular, it puts the jars in HDFS so your driver can just read from there. 
As in other deployments, the executors pull the jars from the driver.

When the docs say "If your application is launched through Spark submit, then 
the application jar is automatically distributed to all worker nodes," it is 
actually saying that your executors get their jars from the driver. This is 
true whether you're running in client mode or cluster mode.
If the docs are unclear (and they seem to be), then we should update them. I 
have filed SPARK-12565 to track this.
Please let me know if there's anything else I can help clarify.
Cheers,-Andrew



2015-12-29 13:07 GMT-08:00 Annabel Melongo :

Andrew,
Now I see where the confusion lays. Standalone cluster mode, your link, is 
nothing but a combination of client-mode and standalone mode, my link, without 
YARN.
But I'm confused by this paragraph in your link:
        If your application is launched through Spark submit, then the 
application jar is automatically distributed to all worker nodes. For any 
additional jars that your          application depends on, you should specify 
them through the --jars flag using comma as a delimiter (e.g. --jars jar1,jar2).
That can't be true; this is only the case when Spark runs on top of YARN. 
Please correct me, if I'm wrong.
Thanks   

On Tuesday, December 29, 2015 2:54 PM, Andrew Or  
wrote:
 

 
http://spark.apache.org/docs/latest/spark-standalone.html#launching-spark-applications

2015-12-29 11:48 GMT-08:00 Annabel Melongo :

Greg,
Can you please send me a doc describing the standalone cluster mode? Honestly, 
I never heard about it.
The three different modes, I've listed appear in the last paragraph of this 
doc: Running Spark Applications
|   |
|   |   |   |   |   |
| Running Spark Applications--class The FQCN of the class containing the main 
method of the application. For example, org.apache.spark.examples.SparkPi. 
--conf  |
|  |
| View on www.cloudera.com | Preview by Yahoo |
|  |
|   |


 

On Tuesday, December 29, 2015 2:42 PM, Andrew Or  
wrote:
 

 
The confusion here is the expression "standalone cluster mode". Either it's 
stand-alone or it's cluster mode but it can't be both.

@Annabel That's not true. There is a standalone cluster mode where driver runs 
on one of the workers instead of on the client machine. What you're describing 
is standalone client mode.
2015-12-29 11:32 GMT-08:00 Annabel Melongo :

Greg,
The confusion here is the expression "standalone cluster mode". Either it's 
stand-alone or it's cluster mode but it can't be both.
 With this in mind, here's how jars are uploaded:    1. Spark Stand-alone mode: 
client and driver run on the same machine; use --packages option to submit a 
jar    2. Yarn Cluster-mode: client and driver run on separate machines; 
additionally driver runs as a thread in ApplicationMaster; use --jars option 
with a globally visible path to said jar    3. Yarn Client-mode: client and 
driver run on the same machine. driver is NOT a thread in ApplicationMaster; 
use --packages to submit a jar 

On Tuesday, December 29, 2015 1:54 PM, Andrew Or  
wrote:
 

 Hi Greg,
It's actually intentional for standalone cluster mode to not upload jars. One 
of the reasons why YARN takes at least 10 seconds before running any simple 
application is because there's a lot of random overhead (e.g. putting jars in 
HDFS). If this missing functionality is not documented somewhere then we should 
add that.

Also, the packages problem seems legitimate. Thanks for reporting it. I have 

Re: Can't submit job to stand alone cluster

2015-12-29 Thread Andrew Or
>
> The confusion here is the expression "standalone cluster mode". Either
> it's stand-alone or it's cluster mode but it can't be both.


@Annabel That's not true. There *is* a standalone cluster mode where driver
runs on one of the workers instead of on the client machine. What you're
describing is standalone client mode.

2015-12-29 11:32 GMT-08:00 Annabel Melongo :

> Greg,
>
> The confusion here is the expression "standalone cluster mode". Either
> it's stand-alone or it's cluster mode but it can't be both.
>
>  With this in mind, here's how jars are uploaded:
> 1. Spark Stand-alone mode: client and driver run on the same machine;
> use --packages option to submit a jar
> 2. Yarn Cluster-mode: client and driver run on separate machines;
> additionally driver runs as a thread in ApplicationMaster; use --jars
> option with a globally visible path to said jar
> 3. Yarn Client-mode: client and driver run on the same machine. driver
> is *NOT* a thread in ApplicationMaster; use --packages to submit a jar
>
>
> On Tuesday, December 29, 2015 1:54 PM, Andrew Or 
> wrote:
>
>
> Hi Greg,
>
> It's actually intentional for standalone cluster mode to not upload jars.
> One of the reasons why YARN takes at least 10 seconds before running any
> simple application is because there's a lot of random overhead (e.g.
> putting jars in HDFS). If this missing functionality is not documented
> somewhere then we should add that.
>
> Also, the packages problem seems legitimate. Thanks for reporting it. I
> have filed https://issues.apache.org/jira/browse/SPARK-12559.
>
> -Andrew
>
> 2015-12-29 4:18 GMT-08:00 Greg Hill :
>
>
>
> On 12/28/15, 5:16 PM, "Daniel Valdivia"  wrote:
>
> >Hi,
> >
> >I'm trying to submit a job to a small spark cluster running in stand
> >alone mode, however it seems like the jar file I'm submitting to the
> >cluster is "not found" by the workers nodes.
> >
> >I might have understood wrong, but I though the Driver node would send
> >this jar file to the worker nodes, or should I manually send this file to
> >each worker node before I submit the job?
>
> Yes, you have misunderstood, but so did I.  So the problem is that
> --deploy-mode cluster runs the Driver on the cluster as well, and you
> don't know which node it's going to run on, so every node needs access to
> the JAR.  spark-submit does not pass the JAR along to the Driver, but the
> Driver will pass it to the executors.  I ended up putting the JAR in HDFS
> and passing an hdfs:// path to spark-submit.  This is a subtle difference
> from Spark on YARN which does pass the JAR along to the Driver
> automatically, and IMO should probably be fixed in spark-submit.  It's
> really confusing for newcomers.
>
> Another problem I ran into that you also might is that --packages doesn't
> work with --deploy-mode cluster.  It downloads the packages to a temporary
> location on the node running spark-submit, then passes those paths to the
> node that is running the Driver, but since that isn't the same machine, it
> can't find anything and fails.  The driver process *should* be the one
> doing the downloading, but it isn't. I ended up having to create a fat JAR
> with all of the dependencies to get around that one.
>
> Greg
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
>
>


Re: Can't submit job to stand alone cluster

2015-12-29 Thread Annabel Melongo
Greg,
The confusion here is the expression "standalone cluster mode". Either it's 
stand-alone or it's cluster mode but it can't be both.
 With this in mind, here's how jars are uploaded:    1. Spark Stand-alone mode: 
client and driver run on the same machine; use --packages option to submit a 
jar    2. Yarn Cluster-mode: client and driver run on separate machines; 
additionally driver runs as a thread in ApplicationMaster; use --jars option 
with a globally visible path to said jar    3. Yarn Client-mode: client and 
driver run on the same machine. driver is NOT a thread in ApplicationMaster; 
use --packages to submit a jar 

On Tuesday, December 29, 2015 1:54 PM, Andrew Or  
wrote:
 

 Hi Greg,
It's actually intentional for standalone cluster mode to not upload jars. One 
of the reasons why YARN takes at least 10 seconds before running any simple 
application is because there's a lot of random overhead (e.g. putting jars in 
HDFS). If this missing functionality is not documented somewhere then we should 
add that.

Also, the packages problem seems legitimate. Thanks for reporting it. I have 
filed https://issues.apache.org/jira/browse/SPARK-12559.
-Andrew
2015-12-29 4:18 GMT-08:00 Greg Hill :



On 12/28/15, 5:16 PM, "Daniel Valdivia"  wrote:

>Hi,
>
>I'm trying to submit a job to a small spark cluster running in stand
>alone mode, however it seems like the jar file I'm submitting to the
>cluster is "not found" by the workers nodes.
>
>I might have understood wrong, but I though the Driver node would send
>this jar file to the worker nodes, or should I manually send this file to
>each worker node before I submit the job?

Yes, you have misunderstood, but so did I.  So the problem is that
--deploy-mode cluster runs the Driver on the cluster as well, and you
don't know which node it's going to run on, so every node needs access to
the JAR.  spark-submit does not pass the JAR along to the Driver, but the
Driver will pass it to the executors.  I ended up putting the JAR in HDFS
and passing an hdfs:// path to spark-submit.  This is a subtle difference
from Spark on YARN which does pass the JAR along to the Driver
automatically, and IMO should probably be fixed in spark-submit.  It's
really confusing for newcomers.

Another problem I ran into that you also might is that --packages doesn't
work with --deploy-mode cluster.  It downloads the packages to a temporary
location on the node running spark-submit, then passes those paths to the
node that is running the Driver, but since that isn't the same machine, it
can't find anything and fails.  The driver process *should* be the one
doing the downloading, but it isn't. I ended up having to create a fat JAR
with all of the dependencies to get around that one.

Greg


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org





  

Re: Can't submit job to stand alone cluster

2015-12-29 Thread Annabel Melongo
Greg,
Can you please send me a doc describing the standalone cluster mode? Honestly, 
I never heard about it.
The three different modes, I've listed appear in the last paragraph of this 
doc: Running Spark Applications
|   |
|   |   |   |   |   |
| Running Spark Applications--class The FQCN of the class containing the main 
method of the application. For example, org.apache.spark.examples.SparkPi. 
--conf  |
|  |
| View on www.cloudera.com | Preview by Yahoo |
|  |
|   |


 

On Tuesday, December 29, 2015 2:42 PM, Andrew Or  
wrote:
 

 
The confusion here is the expression "standalone cluster mode". Either it's 
stand-alone or it's cluster mode but it can't be both.

@Annabel That's not true. There is a standalone cluster mode where driver runs 
on one of the workers instead of on the client machine. What you're describing 
is standalone client mode.
2015-12-29 11:32 GMT-08:00 Annabel Melongo :

Greg,
The confusion here is the expression "standalone cluster mode". Either it's 
stand-alone or it's cluster mode but it can't be both.
 With this in mind, here's how jars are uploaded:    1. Spark Stand-alone mode: 
client and driver run on the same machine; use --packages option to submit a 
jar    2. Yarn Cluster-mode: client and driver run on separate machines; 
additionally driver runs as a thread in ApplicationMaster; use --jars option 
with a globally visible path to said jar    3. Yarn Client-mode: client and 
driver run on the same machine. driver is NOT a thread in ApplicationMaster; 
use --packages to submit a jar 

On Tuesday, December 29, 2015 1:54 PM, Andrew Or  
wrote:
 

 Hi Greg,
It's actually intentional for standalone cluster mode to not upload jars. One 
of the reasons why YARN takes at least 10 seconds before running any simple 
application is because there's a lot of random overhead (e.g. putting jars in 
HDFS). If this missing functionality is not documented somewhere then we should 
add that.

Also, the packages problem seems legitimate. Thanks for reporting it. I have 
filed https://issues.apache.org/jira/browse/SPARK-12559.
-Andrew
2015-12-29 4:18 GMT-08:00 Greg Hill :



On 12/28/15, 5:16 PM, "Daniel Valdivia"  wrote:

>Hi,
>
>I'm trying to submit a job to a small spark cluster running in stand
>alone mode, however it seems like the jar file I'm submitting to the
>cluster is "not found" by the workers nodes.
>
>I might have understood wrong, but I though the Driver node would send
>this jar file to the worker nodes, or should I manually send this file to
>each worker node before I submit the job?

Yes, you have misunderstood, but so did I.  So the problem is that
--deploy-mode cluster runs the Driver on the cluster as well, and you
don't know which node it's going to run on, so every node needs access to
the JAR.  spark-submit does not pass the JAR along to the Driver, but the
Driver will pass it to the executors.  I ended up putting the JAR in HDFS
and passing an hdfs:// path to spark-submit.  This is a subtle difference
from Spark on YARN which does pass the JAR along to the Driver
automatically, and IMO should probably be fixed in spark-submit.  It's
really confusing for newcomers.

Another problem I ran into that you also might is that --packages doesn't
work with --deploy-mode cluster.  It downloads the packages to a temporary
location on the node running spark-submit, then passes those paths to the
node that is running the Driver, but since that isn't the same machine, it
can't find anything and fails.  The driver process *should* be the one
doing the downloading, but it isn't. I ended up having to create a fat JAR
with all of the dependencies to get around that one.

Greg


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org





   



  

Re: Can't submit job to stand alone cluster

2015-12-29 Thread Annabel Melongo
Andrew,
Now I see where the confusion lays. Standalone cluster mode, your link, is 
nothing but a combination of client-mode and standalone mode, my link, without 
YARN.
But I'm confused by this paragraph in your link:
        If your application is launched through Spark submit, then the 
application jar is automatically distributed to all worker nodes. For any 
additional jars that your          application depends on, you should specify 
them through the --jars flag using comma as a delimiter (e.g. --jars jar1,jar2).
That can't be true; this is only the case when Spark runs on top of YARN. 
Please correct me, if I'm wrong.
Thanks   

On Tuesday, December 29, 2015 2:54 PM, Andrew Or  
wrote:
 

 
http://spark.apache.org/docs/latest/spark-standalone.html#launching-spark-applications

2015-12-29 11:48 GMT-08:00 Annabel Melongo :

Greg,
Can you please send me a doc describing the standalone cluster mode? Honestly, 
I never heard about it.
The three different modes, I've listed appear in the last paragraph of this 
doc: Running Spark Applications
|   |
|   |   |   |   |   |
| Running Spark Applications--class The FQCN of the class containing the main 
method of the application. For example, org.apache.spark.examples.SparkPi. 
--conf  |
|  |
| View on www.cloudera.com | Preview by Yahoo |
|  |
|   |


 

On Tuesday, December 29, 2015 2:42 PM, Andrew Or  
wrote:
 

 
The confusion here is the expression "standalone cluster mode". Either it's 
stand-alone or it's cluster mode but it can't be both.

@Annabel That's not true. There is a standalone cluster mode where driver runs 
on one of the workers instead of on the client machine. What you're describing 
is standalone client mode.
2015-12-29 11:32 GMT-08:00 Annabel Melongo :

Greg,
The confusion here is the expression "standalone cluster mode". Either it's 
stand-alone or it's cluster mode but it can't be both.
 With this in mind, here's how jars are uploaded:    1. Spark Stand-alone mode: 
client and driver run on the same machine; use --packages option to submit a 
jar    2. Yarn Cluster-mode: client and driver run on separate machines; 
additionally driver runs as a thread in ApplicationMaster; use --jars option 
with a globally visible path to said jar    3. Yarn Client-mode: client and 
driver run on the same machine. driver is NOT a thread in ApplicationMaster; 
use --packages to submit a jar 

On Tuesday, December 29, 2015 1:54 PM, Andrew Or  
wrote:
 

 Hi Greg,
It's actually intentional for standalone cluster mode to not upload jars. One 
of the reasons why YARN takes at least 10 seconds before running any simple 
application is because there's a lot of random overhead (e.g. putting jars in 
HDFS). If this missing functionality is not documented somewhere then we should 
add that.

Also, the packages problem seems legitimate. Thanks for reporting it. I have 
filed https://issues.apache.org/jira/browse/SPARK-12559.
-Andrew
2015-12-29 4:18 GMT-08:00 Greg Hill :



On 12/28/15, 5:16 PM, "Daniel Valdivia"  wrote:

>Hi,
>
>I'm trying to submit a job to a small spark cluster running in stand
>alone mode, however it seems like the jar file I'm submitting to the
>cluster is "not found" by the workers nodes.
>
>I might have understood wrong, but I though the Driver node would send
>this jar file to the worker nodes, or should I manually send this file to
>each worker node before I submit the job?

Yes, you have misunderstood, but so did I.  So the problem is that
--deploy-mode cluster runs the Driver on the cluster as well, and you
don't know which node it's going to run on, so every node needs access to
the JAR.  spark-submit does not pass the JAR along to the Driver, but the
Driver will pass it to the executors.  I ended up putting the JAR in HDFS
and passing an hdfs:// path to spark-submit.  This is a subtle difference
from Spark on YARN which does pass the JAR along to the Driver
automatically, and IMO should probably be fixed in spark-submit.  It's
really confusing for newcomers.

Another problem I ran into that you also might is that --packages doesn't
work with --deploy-mode cluster.  It downloads the packages to a temporary
location on the node running spark-submit, then passes those paths to the
node that is running the Driver, but since that isn't the same machine, it
can't find anything and fails.  The driver process *should* be the one
doing the downloading, but it isn't. I ended up having to create a fat JAR
with all of the dependencies to get around that one.

Greg


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org





   



   



  

Re: Can't submit job to stand alone cluster

2015-12-29 Thread Andrew Or
Let me clarify a few things for everyone:

There are three *cluster managers*: standalone, YARN, and Mesos. Each
cluster manager can run in two *deploy modes*, client or cluster. In client
mode, the driver runs on the machine that submitted the application (the
client). In cluster mode, the driver runs on one of the worker machines in
the cluster.

When I say "standalone cluster mode" I am referring to the standalone
cluster manager running in cluster deploy mode.

Here's how the resources are distributed in each mode (omitting Mesos):

*Standalone / YARN client mode. *The driver runs on the client machine
(i.e. machine that ran Spark submit) so it should already have access to
the jars. The executors then pull the jars from an HTTP server started in
the driver.

*Standalone cluster mode. *Spark submit does *not* upload your jars to the
cluster, so all the resources you need must already be on all of the worker
machines. The executors, however, actually just pull the jars from the
driver as in client mode instead of finding it in their own local file
systems.

*YARN cluster mode. *Spark submit *does* upload your jars to the cluster.
In particular, it puts the jars in HDFS so your driver can just read from
there. As in other deployments, the executors pull the jars from the driver.


When the docs say "If your application is launched through Spark submit,
then the application jar is automatically distributed to all worker nodes," it
is actually saying that your executors get their jars from the driver. This
is true whether you're running in client mode or cluster mode.

If the docs are unclear (and they seem to be), then we should update them.
I have filed SPARK-12565 
to track this.

Please let me know if there's anything else I can help clarify.

Cheers,
-Andrew




2015-12-29 13:07 GMT-08:00 Annabel Melongo :

> Andrew,
>
> Now I see where the confusion lays. Standalone cluster mode, your link, is
> nothing but a combination of client-mode and standalone mode, my link,
> without YARN.
>
> But I'm confused by this paragraph in your link:
>
> If your application is launched through Spark submit, then the
> application jar is automatically distributed to all worker nodes. For any
> additional jars that your
>   application depends on, you should specify them through the
> --jars flag using comma as a delimiter (e.g. --jars jar1,jar2).
>
> That can't be true; this is only the case when Spark runs on top of YARN.
> Please correct me, if I'm wrong.
>
> Thanks
>
>
>
> On Tuesday, December 29, 2015 2:54 PM, Andrew Or 
> wrote:
>
>
>
> http://spark.apache.org/docs/latest/spark-standalone.html#launching-spark-applications
>
> 2015-12-29 11:48 GMT-08:00 Annabel Melongo :
>
> Greg,
>
> Can you please send me a doc describing the standalone cluster mode?
> Honestly, I never heard about it.
>
> The three different modes, I've listed appear in the last paragraph of
> this doc: Running Spark Applications
> 
>
>
>
>
>
>
> Running Spark Applications
> 
> --class The FQCN of the class containing the main method of the
> application. For example, org.apache.spark.examples.SparkPi. --conf
> View on www.cloudera.com
> 
> Preview by Yahoo
>
>
>
>
> On Tuesday, December 29, 2015 2:42 PM, Andrew Or 
> wrote:
>
>
> The confusion here is the expression "standalone cluster mode". Either
> it's stand-alone or it's cluster mode but it can't be both.
>
>
> @Annabel That's not true. There *is* a standalone cluster mode where
> driver runs on one of the workers instead of on the client machine. What
> you're describing is standalone client mode.
>
> 2015-12-29 11:32 GMT-08:00 Annabel Melongo :
>
> Greg,
>
> The confusion here is the expression "standalone cluster mode". Either
> it's stand-alone or it's cluster mode but it can't be both.
>
>  With this in mind, here's how jars are uploaded:
> 1. Spark Stand-alone mode: client and driver run on the same machine;
> use --packages option to submit a jar
> 2. Yarn Cluster-mode: client and driver run on separate machines;
> additionally driver runs as a thread in ApplicationMaster; use --jars
> option with a globally visible path to said jar
> 3. Yarn Client-mode: client and driver run on the same machine. driver
> is *NOT* a thread in ApplicationMaster; use --packages to submit a jar
>
>
> On Tuesday, December 29, 2015 1:54 PM, Andrew Or 
> wrote:
>
>
> Hi Greg,
>
> It's actually intentional for standalone cluster mode to not upload 

Re: Can't submit job to stand alone cluster

2015-12-29 Thread Greg Hill


On 12/28/15, 5:16 PM, "Daniel Valdivia"  wrote:

>Hi,
>
>I'm trying to submit a job to a small spark cluster running in stand
>alone mode, however it seems like the jar file I'm submitting to the
>cluster is "not found" by the workers nodes.
>
>I might have understood wrong, but I though the Driver node would send
>this jar file to the worker nodes, or should I manually send this file to
>each worker node before I submit the job?

Yes, you have misunderstood, but so did I.  So the problem is that
--deploy-mode cluster runs the Driver on the cluster as well, and you
don't know which node it's going to run on, so every node needs access to
the JAR.  spark-submit does not pass the JAR along to the Driver, but the
Driver will pass it to the executors.  I ended up putting the JAR in HDFS
and passing an hdfs:// path to spark-submit.  This is a subtle difference
from Spark on YARN which does pass the JAR along to the Driver
automatically, and IMO should probably be fixed in spark-submit.  It's
really confusing for newcomers.

Another problem I ran into that you also might is that --packages doesn't
work with --deploy-mode cluster.  It downloads the packages to a temporary
location on the node running spark-submit, then passes those paths to the
node that is running the Driver, but since that isn't the same machine, it
can't find anything and fails.  The driver process *should* be the one
doing the downloading, but it isn't. I ended up having to create a fat JAR
with all of the dependencies to get around that one.

Greg


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Can't submit job to stand alone cluster

2015-12-28 Thread Ted Yu
Have you verified that the following file does exist ?

/home/hadoop/git/scalaspark/./target/scala-2.10/cluster-
incidents_2.10-1.0.jar

Thanks

On Mon, Dec 28, 2015 at 3:16 PM, Daniel Valdivia 
wrote:

> Hi,
>
> I'm trying to submit a job to a small spark cluster running in stand alone
> mode, however it seems like the jar file I'm submitting to the cluster is
> "not found" by the workers nodes.
>
> I might have understood wrong, but I though the Driver node would send
> this jar file to the worker nodes, or should I manually send this file to
> each worker node before I submit the job?
>
> what I'm doing:
>
>  $SPARK_HOME/bin/spark-submit --master spark://sslabnode01:6066
> --deploy-mode cluster  --class ClusterIncidents
> ./target/scala-2.10/cluster-incidents_2.10-1.0.jar
>
> The error I'm getting:
>
> Running Spark using the REST application submission protocol.
> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> 15/12/28 15:13:58 INFO RestSubmissionClient: Submitting a request to
> launch an application in spark://sslabnode01:6066.
> 15/12/28 15:13:59 INFO RestSubmissionClient: Submission successfully
> created as driver-20151228151359-0003. Polling submission state...
> 15/12/28 15:13:59 INFO RestSubmissionClient: Submitting a request for the
> status of submission driver-20151228151359-0003 in spark://sslabnode01:6066.
> 15/12/28 15:13:59 INFO RestSubmissionClient: State of driver
> driver-20151228151359-0003 is now ERROR.
> 15/12/28 15:13:59 INFO RestSubmissionClient: Driver is running on worker
> worker-20151218150246-10.15.235.241-52077 at 10.15.235.241:52077.
> 15/12/28 15:13:59 ERROR RestSubmissionClient: Exception from the cluster:
> java.io.FileNotFoundException:
> /home/hadoop/git/scalaspark/./target/scala-2.10/cluster-incidents_2.10-1.0.jar
> (No such file or directory)
> java.io.FileInputStream.open(Native Method)
> java.io.FileInputStream.(FileInputStream.java:146)
>
> org.spark-project.guava.io.Files$FileByteSource.openStream(Files.java:124)
>
> org.spark-project.guava.io.Files$FileByteSource.openStream(Files.java:114)
> org.spark-project.guava.io.ByteSource.copyTo(ByteSource.java:202)
> org.spark-project.guava.io.Files.copy(Files.java:436)
>
> org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$copyRecursive(Utils.scala:514)
> org.apache.spark.util.Utils$.copyFile(Utils.scala:485)
> org.apache.spark.util.Utils$.doFetchFile(Utils.scala:562)
> org.apache.spark.util.Utils$.fetchFile(Utils.scala:369)
> org.apache.spark.deploy.worker.DriverRunner.org
> $apache$spark$deploy$worker$DriverRunner$$downloadUserJar(DriverRunner.scala:150)
>
> org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:79)
> 15/12/28 15:13:59 INFO RestSubmissionClient: Server responded with
> CreateSubmissionResponse:
> {
>   "action" : "CreateSubmissionResponse",
>   "message" : "Driver successfully submitted as
> driver-20151228151359-0003",
>   "serverSparkVersion" : "1.5.2",
>   "submissionId" : "driver-20151228151359-0003",
>   "success" : true
> }
>
> Thanks in advance
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Can't submit job to stand alone cluster

2015-12-28 Thread vivek.meghanathan
+ if exists whether it has read permission for the user who tries to run the 
job.

Regards
Vivek

On Tue, Dec 29, 2015 at 6:56 am, Ted Yu 
> wrote:

Have you verified that the following file does exist ?

/home/hadoop/git/scalaspark/./target/scala-2.10/cluster-incidents_2.10-1.0.jar

Thanks

On Mon, Dec 28, 2015 at 3:16 PM, Daniel Valdivia 
> wrote:
Hi,

I'm trying to submit a job to a small spark cluster running in stand alone 
mode, however it seems like the jar file I'm submitting to the cluster is "not 
found" by the workers nodes.

I might have understood wrong, but I though the Driver node would send this jar 
file to the worker nodes, or should I manually send this file to each worker 
node before I submit the job?

what I'm doing:

 $SPARK_HOME/bin/spark-submit --master spark://sslabnode01:6066 --deploy-mode 
cluster  --class ClusterIncidents 
./target/scala-2.10/cluster-incidents_2.10-1.0.jar

The error I'm getting:

Running Spark using the REST application submission protocol.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/12/28 15:13:58 INFO RestSubmissionClient: Submitting a request to launch an 
application in spark://sslabnode01:6066.
15/12/28 15:13:59 INFO RestSubmissionClient: Submission successfully created as 
driver-20151228151359-0003. Polling submission state...
15/12/28 15:13:59 INFO RestSubmissionClient: Submitting a request for the 
status of submission driver-20151228151359-0003 in spark://sslabnode01:6066.
15/12/28 15:13:59 INFO RestSubmissionClient: State of driver 
driver-20151228151359-0003 is now ERROR.
15/12/28 15:13:59 INFO RestSubmissionClient: Driver is running on worker 
worker-20151218150246-10.15.235.241-52077 at 
10.15.235.241:52077.
15/12/28 15:13:59 ERROR RestSubmissionClient: Exception from the cluster:
java.io.FileNotFoundException: 
/home/hadoop/git/scalaspark/./target/scala-2.10/cluster-incidents_2.10-1.0.jar 
(No such file or directory)
java.io.FileInputStream.open(Native Method)
java.io.FileInputStream.(FileInputStream.java:146)

org.spark-project.guava.io.Files$FileByteSource.openStream(Files.java:124)

org.spark-project.guava.io.Files$FileByteSource.openStream(Files.java:114)
org.spark-project.guava.io.ByteSource.copyTo(ByteSource.java:202)
org.spark-project.guava.io.Files.copy(Files.java:436)

org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$copyRecursive(Utils.scala:514)
org.apache.spark.util.Utils$.copyFile(Utils.scala:485)
org.apache.spark.util.Utils$.doFetchFile(Utils.scala:562)
org.apache.spark.util.Utils$.fetchFile(Utils.scala:369)

org.apache.spark.deploy.worker.DriverRunner.org$apache$spark$deploy$worker$DriverRunner$$downloadUserJar(DriverRunner.scala:150)

org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:79)
15/12/28 15:13:59 INFO RestSubmissionClient: Server responded with 
CreateSubmissionResponse:
{
  "action" : "CreateSubmissionResponse",
  "message" : "Driver successfully submitted as driver-20151228151359-0003",
  "serverSparkVersion" : "1.5.2",
  "submissionId" : "driver-20151228151359-0003",
  "success" : true
}

Thanks in advance



-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.org


The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com


Can't submit job to stand alone cluster

2015-12-28 Thread Daniel Valdivia
Hi,

I'm trying to submit a job to a small spark cluster running in stand alone 
mode, however it seems like the jar file I'm submitting to the cluster is "not 
found" by the workers nodes.

I might have understood wrong, but I though the Driver node would send this jar 
file to the worker nodes, or should I manually send this file to each worker 
node before I submit the job?

what I'm doing:

 $SPARK_HOME/bin/spark-submit --master spark://sslabnode01:6066 --deploy-mode 
cluster  --class ClusterIncidents 
./target/scala-2.10/cluster-incidents_2.10-1.0.jar 

The error I'm getting:

Running Spark using the REST application submission protocol.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/12/28 15:13:58 INFO RestSubmissionClient: Submitting a request to launch an 
application in spark://sslabnode01:6066.
15/12/28 15:13:59 INFO RestSubmissionClient: Submission successfully created as 
driver-20151228151359-0003. Polling submission state...
15/12/28 15:13:59 INFO RestSubmissionClient: Submitting a request for the 
status of submission driver-20151228151359-0003 in spark://sslabnode01:6066.
15/12/28 15:13:59 INFO RestSubmissionClient: State of driver 
driver-20151228151359-0003 is now ERROR.
15/12/28 15:13:59 INFO RestSubmissionClient: Driver is running on worker 
worker-20151218150246-10.15.235.241-52077 at 10.15.235.241:52077.
15/12/28 15:13:59 ERROR RestSubmissionClient: Exception from the cluster:
java.io.FileNotFoundException: 
/home/hadoop/git/scalaspark/./target/scala-2.10/cluster-incidents_2.10-1.0.jar 
(No such file or directory)
java.io.FileInputStream.open(Native Method)
java.io.FileInputStream.(FileInputStream.java:146)

org.spark-project.guava.io.Files$FileByteSource.openStream(Files.java:124)

org.spark-project.guava.io.Files$FileByteSource.openStream(Files.java:114)
org.spark-project.guava.io.ByteSource.copyTo(ByteSource.java:202)
org.spark-project.guava.io.Files.copy(Files.java:436)

org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$copyRecursive(Utils.scala:514)
org.apache.spark.util.Utils$.copyFile(Utils.scala:485)
org.apache.spark.util.Utils$.doFetchFile(Utils.scala:562)
org.apache.spark.util.Utils$.fetchFile(Utils.scala:369)

org.apache.spark.deploy.worker.DriverRunner.org$apache$spark$deploy$worker$DriverRunner$$downloadUserJar(DriverRunner.scala:150)

org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:79)
15/12/28 15:13:59 INFO RestSubmissionClient: Server responded with 
CreateSubmissionResponse:
{
  "action" : "CreateSubmissionResponse",
  "message" : "Driver successfully submitted as driver-20151228151359-0003",
  "serverSparkVersion" : "1.5.2",
  "submissionId" : "driver-20151228151359-0003",
  "success" : true
}

Thanks in advance



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org