One downside I can think of to having the driver node act as a temporary
member of the cluster would be that you may have firewalls between the
workers and the driver machine that would prevent shuffles from working
properly.  Now you'd need to poke holes in the firewalls to get the cluster
to properly run jobs.

Additionally, if you rely on data locality for your Spark jobs (e.g. having
Spark and HDFS services co-located on every machine) to get decent
performance then having the additional temporary member might actually slow
the overall job down.  Since it may not be co-located with data, you might
observe the straggler effect on that machine.

I for one prefer the current delegation of responsibilities between the
driver and the workers.


On Tue, Apr 8, 2014 at 12:24 PM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> So I have a cluster in EC2 doing some work, and when I take a look here
>
> http://driver-node:4040/executors/
>
> I see that my driver node is snoozing on the job: No tasks, no memory
> used, and no RDD blocks cached.
>
> I'm assuming that it was a conscious design choice not to have the driver
> node partake in the cluster's workload.
>
> Why is that? It seems like a wasted resource.
>
> What's more, the slaves may rise up one day and overthrow the driver out
> of resentment.
>
> Nick
>
>
> ------------------------------
> View this message in context: Why doesn't the driver node do any 
> work?<http://apache-spark-user-list.1001560.n3.nabble.com/Why-doesn-t-the-driver-node-do-any-work-tp3909.html>
> Sent from the Apache Spark User List mailing list 
> archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com.
>

Reply via email to