The driver contains the DAG scheduler which manages stages of jobs needs
to talk back forth with workers. So you can run Driver on any machine
that can reach master drivers(even your laptop). But Driver will need to
be reachable to all machines.
I think 0.9.0 added an ability for the driver to embedded in the master, I
am not sure if its general or restricted to Spark Streaming.
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Fri, Mar 7, 2014 at 12:29 PM, Yana Kadiyska yana.kadiy...@gmail.comwrote:
Hi Spark users,
could someone help me out.
My company has a fully functioning spark cluster with shark running on
top of it (as part of the same cluster, on the same LAN) . I'm
interested in running raw spark code against it but am running against
the following issue -- it seems like the machine hosting the driver
program needs to be reachable by the worker nodes (in my case the
workers cannot route to the machine hosting the driver). Below is a
snippet from my worker log:
14/03/03 20:45:28 INFO executor.StandaloneExecutorBackend: Connecting
to driver: akka://spark@driver_ip:49081/user/StandaloneScheduler
14/03/03 20:45:29 ERROR executor.StandaloneExecutorBackend: Driver
terminated or disconnected! Shutting down.
Does this sound right -- it's not clear to me why a worker would try
to establish a connection to the driver -- the driver already
connected successfully as I see the program listed in the logwhy
is this connection not sufficient?
If you use Amazon EC2, can you run the driver from your personal
machine or do have to install an IDE on one of Amazon machines in
order to debug code? I am not too excited about the EC2 option as our
data is proprietary...but if that's the shortest path to success at
least it would get me started on some toy examples. At the moment I'm
not sure what my options are, other than running a VM cluster or EC2
Any help/insight would be greatly appreciated.