Help connecting to the cluster

2014-03-07 Thread Yana Kadiyska
Hi Spark users,

could someone help me out.

My company has a fully functioning spark cluster with shark running on
top of it (as part of the same cluster, on the same LAN) . I'm
interested in running raw spark code against it but am running against
the following issue -- it seems like the machine hosting the driver
program needs to be reachable by the worker nodes (in my case the
workers cannot route to the machine hosting the driver). Below is a
snippet from my worker log:

14/03/03 20:45:28 INFO executor.StandaloneExecutorBackend: Connecting
to driver: akka://spark@driver_ip:49081/user/StandaloneScheduler
14/03/03 20:45:29 ERROR executor.StandaloneExecutorBackend: Driver
terminated or disconnected! Shutting down.

Does this sound right -- it's not clear to me why a worker would try
to establish a connection to the driver -- the driver already
connected successfully as I see the program listed in the logwhy
is this connection not sufficient?

If you use Amazon EC2, can you run the driver from your personal
machine or do have to install an IDE on one of Amazon machines in
order to debug code? I am not too excited about the EC2 option as our
data is proprietary...but if that's the shortest path to success at
least it would get me started on some toy examples. At the moment I'm
not sure what my options are, other than running a VM cluster or EC2

Any help/insight would be greatly appreciated.


Re: Help connecting to the cluster

2014-03-07 Thread Mayur Rustagi
The driver contains the DAG scheduler which manages stages of jobs  needs
to talk back  forth with workers. So you can run Driver on any machine
that can reach master  drivers(even your laptop). But Driver will need to
be reachable to all machines.
I think 0.9.0 added an ability for the driver to embedded in the master, I
am not sure if its general or restricted to Spark Streaming.


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi



On Fri, Mar 7, 2014 at 12:29 PM, Yana Kadiyska yana.kadiy...@gmail.comwrote:

 Hi Spark users,

 could someone help me out.

 My company has a fully functioning spark cluster with shark running on
 top of it (as part of the same cluster, on the same LAN) . I'm
 interested in running raw spark code against it but am running against
 the following issue -- it seems like the machine hosting the driver
 program needs to be reachable by the worker nodes (in my case the
 workers cannot route to the machine hosting the driver). Below is a
 snippet from my worker log:

 14/03/03 20:45:28 INFO executor.StandaloneExecutorBackend: Connecting
 to driver: akka://spark@driver_ip:49081/user/StandaloneScheduler
 14/03/03 20:45:29 ERROR executor.StandaloneExecutorBackend: Driver
 terminated or disconnected! Shutting down.

 Does this sound right -- it's not clear to me why a worker would try
 to establish a connection to the driver -- the driver already
 connected successfully as I see the program listed in the logwhy
 is this connection not sufficient?

 If you use Amazon EC2, can you run the driver from your personal
 machine or do have to install an IDE on one of Amazon machines in
 order to debug code? I am not too excited about the EC2 option as our
 data is proprietary...but if that's the shortest path to success at
 least it would get me started on some toy examples. At the moment I'm
 not sure what my options are, other than running a VM cluster or EC2

 Any help/insight would be greatly appreciated.