Re: Why doesn't the driver node do any work?

Nan Zhu Tue, 08 Apr 2014 14:45:47 -0700

may be unrelated to the question itself, just FYI 

you can run your driver program in worker node with Spark-0.9


http://spark.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster

Best, 

-- 
Nan Zhu



On Tuesday, April 8, 2014 at 5:11 PM, Nicholas Chammas wrote:

> Alright, so I guess I understand now why spark-ec2 allows you to select 
> different instance types for the driver node and worker nodes. If the driver 
> node is just driving and not doing any large collect()s or heavy processing, 
> it can be much smaller than the worker nodes.
> 
> With regards to data locality, that may not be an issue in my usage pattern 
> if, in theory, I wanted to make the driver node also do work. I launch 
> clusters using spark-ec2 and source data from S3, so I'm missing out on that 
> data locality benefit from the get-go. The firewall may be an issue if 
> spark-ec2 doesn't punch open the appropriate holes. And it may well not, 
> since it doesn't seem to have an option to configure the driver node to also 
> do work. 
> 
> Anyway, I'll definitely leave things the way they are. If I want a beefier 
> cluster, it's probably much easier to just launch a cluster with more slaves 
> using spark-ec2 than it is to set the driver node to a non-default 
> configuration. 
> 
> 
> On Tue, Apr 8, 2014 at 4:48 PM, Sean Owen <so...@cloudera.com 
> (mailto:so...@cloudera.com)> wrote:
> > If you want the machine that hosts the driver to also do work, you can
> > designate it as a worker too, if I'm not mistaken. I don't think the
> > driver should do work, logically, but, that's not to say that the
> > machine it's on shouldn't do work.
> > --
> > Sean Owen | Director, Data Science | London
> > 
> > 
> > On Tue, Apr 8, 2014 at 8:24 PM, Nicholas Chammas
> > <nicholas.cham...@gmail.com (mailto:nicholas.cham...@gmail.com)> wrote:
> > > So I have a cluster in EC2 doing some work, and when I take a look here
> > >
> > > http://driver-node:4040/executors/
> > >
> > > I see that my driver node is snoozing on the job: No tasks, no memory 
> > > used,
> > > and no RDD blocks cached.
> > >
> > > I'm assuming that it was a conscious design choice not to have the driver
> > > node partake in the cluster's workload.
> > >
> > > Why is that? It seems like a wasted resource.
> > >
> > > What's more, the slaves may rise up one day and overthrow the driver out 
> > > of
> > > resentment.
> > >
> > > Nick
> > >
> > >
> > > ________________________________
> > > View this message in context: Why doesn't the driver node do any work?
> > > Sent from the Apache Spark User List mailing list archive at Nabble.com 
> > > (http://Nabble.com).
>

Re: Why doesn't the driver node do any work?

Reply via email to