may be unrelated to the question itself, just FYI you can run your driver program in worker node with Spark-0.9
http://spark.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster Best, -- Nan Zhu On Tuesday, April 8, 2014 at 5:11 PM, Nicholas Chammas wrote: > Alright, so I guess I understand now why spark-ec2 allows you to select > different instance types for the driver node and worker nodes. If the driver > node is just driving and not doing any large collect()s or heavy processing, > it can be much smaller than the worker nodes. > > With regards to data locality, that may not be an issue in my usage pattern > if, in theory, I wanted to make the driver node also do work. I launch > clusters using spark-ec2 and source data from S3, so I'm missing out on that > data locality benefit from the get-go. The firewall may be an issue if > spark-ec2 doesn't punch open the appropriate holes. And it may well not, > since it doesn't seem to have an option to configure the driver node to also > do work. > > Anyway, I'll definitely leave things the way they are. If I want a beefier > cluster, it's probably much easier to just launch a cluster with more slaves > using spark-ec2 than it is to set the driver node to a non-default > configuration. > > > On Tue, Apr 8, 2014 at 4:48 PM, Sean Owen <so...@cloudera.com > (mailto:so...@cloudera.com)> wrote: > > If you want the machine that hosts the driver to also do work, you can > > designate it as a worker too, if I'm not mistaken. I don't think the > > driver should do work, logically, but, that's not to say that the > > machine it's on shouldn't do work. > > -- > > Sean Owen | Director, Data Science | London > > > > > > On Tue, Apr 8, 2014 at 8:24 PM, Nicholas Chammas > > <nicholas.cham...@gmail.com (mailto:nicholas.cham...@gmail.com)> wrote: > > > So I have a cluster in EC2 doing some work, and when I take a look here > > > > > > http://driver-node:4040/executors/ > > > > > > I see that my driver node is snoozing on the job: No tasks, no memory > > > used, > > > and no RDD blocks cached. > > > > > > I'm assuming that it was a conscious design choice not to have the driver > > > node partake in the cluster's workload. > > > > > > Why is that? It seems like a wasted resource. > > > > > > What's more, the slaves may rise up one day and overthrow the driver out > > > of > > > resentment. > > > > > > Nick > > > > > > > > > ________________________________ > > > View this message in context: Why doesn't the driver node do any work? > > > Sent from the Apache Spark User List mailing list archive at Nabble.com > > > (http://Nabble.com). >