Also Driver can run on one of the slave nodes. (you will stil need a spark
master though for resource allocation etc).
Regards
Mayur

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Tue, Apr 8, 2014 at 2:46 PM, Nan Zhu <zhunanmcg...@gmail.com> wrote:

>  may be unrelated to the question itself, just FYI
>
> you can run your driver program in worker node with Spark-0.9
>
>
> http://spark.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster
>
> Best,
>
> --
> Nan Zhu
>
>
> On Tuesday, April 8, 2014 at 5:11 PM, Nicholas Chammas wrote:
>
> Alright, so I guess I understand now why spark-ec2 allows you to select
> different instance types for the driver node and worker nodes. If the
> driver node is just driving and not doing any large collect()s or heavy
> processing, it can be much smaller than the worker nodes.
>
> With regards to data locality, that may not be an issue in my usage
> pattern if, in theory, I wanted to make the driver node also do work. I
> launch clusters using spark-ec2 and source data from S3, so I'm missing out
> on that data locality benefit from the get-go. The firewall may be an issue
> if spark-ec2 doesn't punch open the appropriate holes. And it may well not,
> since it doesn't seem to have an option to configure the driver node to
> also do work.
>
> Anyway, I'll definitely leave things the way they are. If I want a beefier
> cluster, it's probably much easier to just launch a cluster with more
> slaves using spark-ec2 than it is to set the driver node to a non-default
> configuration.
>
>
> On Tue, Apr 8, 2014 at 4:48 PM, Sean Owen <so...@cloudera.com> wrote:
>
> If you want the machine that hosts the driver to also do work, you can
> designate it as a worker too, if I'm not mistaken. I don't think the
> driver should do work, logically, but, that's not to say that the
> machine it's on shouldn't do work.
> --
> Sean Owen | Director, Data Science | London
>
>
> On Tue, Apr 8, 2014 at 8:24 PM, Nicholas Chammas
> <nicholas.cham...@gmail.com> wrote:
> > So I have a cluster in EC2 doing some work, and when I take a look here
> >
> > http://driver-node:4040/executors/
> >
> > I see that my driver node is snoozing on the job: No tasks, no memory
> used,
> > and no RDD blocks cached.
> >
> > I'm assuming that it was a conscious design choice not to have the driver
> > node partake in the cluster's workload.
> >
> > Why is that? It seems like a wasted resource.
> >
> > What's more, the slaves may rise up one day and overthrow the driver out
> of
> > resentment.
> >
> > Nick
> >
> >
> > ________________________________
> > View this message in context: Why doesn't the driver node do any work?
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>
>
>

Reply via email to