Also Driver can run on one of the slave nodes. (you will stil need a spark master though for resource allocation etc). Regards Mayur
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Tue, Apr 8, 2014 at 2:46 PM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > may be unrelated to the question itself, just FYI > > you can run your driver program in worker node with Spark-0.9 > > > http://spark.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster > > Best, > > -- > Nan Zhu > > > On Tuesday, April 8, 2014 at 5:11 PM, Nicholas Chammas wrote: > > Alright, so I guess I understand now why spark-ec2 allows you to select > different instance types for the driver node and worker nodes. If the > driver node is just driving and not doing any large collect()s or heavy > processing, it can be much smaller than the worker nodes. > > With regards to data locality, that may not be an issue in my usage > pattern if, in theory, I wanted to make the driver node also do work. I > launch clusters using spark-ec2 and source data from S3, so I'm missing out > on that data locality benefit from the get-go. The firewall may be an issue > if spark-ec2 doesn't punch open the appropriate holes. And it may well not, > since it doesn't seem to have an option to configure the driver node to > also do work. > > Anyway, I'll definitely leave things the way they are. If I want a beefier > cluster, it's probably much easier to just launch a cluster with more > slaves using spark-ec2 than it is to set the driver node to a non-default > configuration. > > > On Tue, Apr 8, 2014 at 4:48 PM, Sean Owen <so...@cloudera.com> wrote: > > If you want the machine that hosts the driver to also do work, you can > designate it as a worker too, if I'm not mistaken. I don't think the > driver should do work, logically, but, that's not to say that the > machine it's on shouldn't do work. > -- > Sean Owen | Director, Data Science | London > > > On Tue, Apr 8, 2014 at 8:24 PM, Nicholas Chammas > <nicholas.cham...@gmail.com> wrote: > > So I have a cluster in EC2 doing some work, and when I take a look here > > > > http://driver-node:4040/executors/ > > > > I see that my driver node is snoozing on the job: No tasks, no memory > used, > > and no RDD blocks cached. > > > > I'm assuming that it was a conscious design choice not to have the driver > > node partake in the cluster's workload. > > > > Why is that? It seems like a wasted resource. > > > > What's more, the slaves may rise up one day and overthrow the driver out > of > > resentment. > > > > Nick > > > > > > ________________________________ > > View this message in context: Why doesn't the driver node do any work? > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > >