subject:"Why doesn't the driver node do any work\?"

Re: Why doesn't the driver node do any work?

2014-07-09 Thread aminn_524

I have one master and two slave nodes, I did not set any ip for spark driver.
My question is should I set a ip for spark driver and can I host the driver
inside the cluster in master node? if so, how to host it? will it be hosted
automatically in that node we submit the application by spark-submit?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Why-doesn-t-the-driver-node-do-any-work-tp3909p9153.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Why doesn't the driver node do any work?

2014-04-09 Thread Mayur Rustagi

Also Driver can run on one of the slave nodes. (you will stil need a spark
master though for resource allocation etc).
Regards
Mayur

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi

On Tue, Apr 8, 2014 at 2:46 PM, Nan Zhu zhunanmcg...@gmail.com wrote:

may be unrelated to the question itself, just FYI

you can run your driver program in worker node with Spark-0.9

http://spark.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster

Best,

--
Nan Zhu

On Tuesday, April 8, 2014 at 5:11 PM, Nicholas Chammas wrote:

Alright, so I guess I understand now why spark-ec2 allows you to select
different instance types for the driver node and worker nodes. If the
driver node is just driving and not doing any large collect()s or heavy
processing, it can be much smaller than the worker nodes.

With regards to data locality, that may not be an issue in my usage
pattern if, in theory, I wanted to make the driver node also do work. I
launch clusters using spark-ec2 and source data from S3, so I'm missing out
on that data locality benefit from the get-go. The firewall may be an issue
if spark-ec2 doesn't punch open the appropriate holes. And it may well not,
since it doesn't seem to have an option to configure the driver node to
also do work.

Anyway, I'll definitely leave things the way they are. If I want a beefier
cluster, it's probably much easier to just launch a cluster with more
slaves using spark-ec2 than it is to set the driver node to a non-default
configuration.

On Tue, Apr 8, 2014 at 4:48 PM, Sean Owen so...@cloudera.com wrote:

If you want the machine that hosts the driver to also do work, you can
designate it as a worker too, if I'm not mistaken. I don't think the
driver should do work, logically, but, that's not to say that the
machine it's on shouldn't do work.
--
Sean Owen | Director, Data Science | London

On Tue, Apr 8, 2014 at 8:24 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
So I have a cluster in EC2 doing some work, and when I take a look here

http://driver-node:4040/executors/

I see that my driver node is snoozing on the job: No tasks, no memory
used,
and no RDD blocks cached.

I'm assuming that it was a conscious design choice not to have the driver
node partake in the cluster's workload.

Why is that? It seems like a wasted resource.

What's more, the slaves may rise up one day and overthrow the driver out
of
resentment.

Nick

View this message in context: Why doesn't the driver node do any work?
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Why doesn't the driver node do any work?

2014-04-08 Thread Sean Owen

If you want the machine that hosts the driver to also do work, you can
designate it as a worker too, if I'm not mistaken. I don't think the
driver should do work, logically, but, that's not to say that the
machine it's on shouldn't do work.
--
Sean Owen | Director, Data Science | London


On Tue, Apr 8, 2014 at 8:24 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
 So I have a cluster in EC2 doing some work, and when I take a look here

 http://driver-node:4040/executors/

 I see that my driver node is snoozing on the job: No tasks, no memory used,
 and no RDD blocks cached.

 I'm assuming that it was a conscious design choice not to have the driver
 node partake in the cluster's workload.

 Why is that? It seems like a wasted resource.

 What's more, the slaves may rise up one day and overthrow the driver out of
 resentment.

 Nick


 
 View this message in context: Why doesn't the driver node do any work?
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Why doesn't the driver node do any work?

2014-04-08 Thread Nicholas Chammas

Alright, so I guess I understand now why spark-ec2 allows you to select
different instance types for the driver node and worker nodes. If the
driver node is just driving and not doing any large collect()s or heavy
processing, it can be much smaller than the worker nodes.

With regards to data locality, that may not be an issue in my usage pattern
if, in theory, I wanted to make the driver node also do work. I launch
clusters using spark-ec2 and source data from S3, so I'm missing out on
that data locality benefit from the get-go. The firewall may be an issue if
spark-ec2 doesn't punch open the appropriate holes. And it may well not,
since it doesn't seem to have an option to configure the driver node to
also do work.

Anyway, I'll definitely leave things the way they are. If I want a beefier
cluster, it's probably much easier to just launch a cluster with more
slaves using spark-ec2 than it is to set the driver node to a non-default
configuration.


On Tue, Apr 8, 2014 at 4:48 PM, Sean Owen so...@cloudera.com wrote:

 If you want the machine that hosts the driver to also do work, you can
 designate it as a worker too, if I'm not mistaken. I don't think the
 driver should do work, logically, but, that's not to say that the
 machine it's on shouldn't do work.
 --
 Sean Owen | Director, Data Science | London


 On Tue, Apr 8, 2014 at 8:24 PM, Nicholas Chammas
 nicholas.cham...@gmail.com wrote:
  So I have a cluster in EC2 doing some work, and when I take a look here
 
  http://driver-node:4040/executors/
 
  I see that my driver node is snoozing on the job: No tasks, no memory
 used,
  and no RDD blocks cached.
 
  I'm assuming that it was a conscious design choice not to have the driver
  node partake in the cluster's workload.
 
  Why is that? It seems like a wasted resource.
 
  What's more, the slaves may rise up one day and overthrow the driver out
 of
  resentment.
 
  Nick
 
 
  
  View this message in context: Why doesn't the driver node do any work?
  Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Why doesn't the driver node do any work?

2014-04-08 Thread Nan Zhu

may be unrelated to the question itself, just FYI

you can run your driver program in worker node with Spark-0.9

http://spark.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster

Best,

--
Nan Zhu

On Tuesday, April 8, 2014 at 5:11 PM, Nicholas Chammas wrote:

Alright, so I guess I understand now why spark-ec2 allows you to select
different instance types for the driver node and worker nodes. If the driver
node is just driving and not doing any large collect()s or heavy processing,
it can be much smaller than the worker nodes.

With regards to data locality, that may not be an issue in my usage pattern
if, in theory, I wanted to make the driver node also do work. I launch
clusters using spark-ec2 and source data from S3, so I'm missing out on that
data locality benefit from the get-go. The firewall may be an issue if
spark-ec2 doesn't punch open the appropriate holes. And it may well not,
since it doesn't seem to have an option to configure the driver node to also
do work.

Anyway, I'll definitely leave things the way they are. If I want a beefier
cluster, it's probably much easier to just launch a cluster with more slaves
using spark-ec2 than it is to set the driver node to a non-default
configuration.

On Tue, Apr 8, 2014 at 4:48 PM, Sean Owen so...@cloudera.com
(mailto:so...@cloudera.com) wrote:
If you want the machine that hosts the driver to also do work, you can
designate it as a worker too, if I'm not mistaken. I don't think the
driver should do work, logically, but, that's not to say that the
machine it's on shouldn't do work.
--
Sean Owen | Director, Data Science | London

On Tue, Apr 8, 2014 at 8:24 PM, Nicholas Chammas
nicholas.cham...@gmail.com (mailto:nicholas.cham...@gmail.com) wrote:
So I have a cluster in EC2 doing some work, and when I take a look here

http://driver-node:4040/executors/

I see that my driver node is snoozing on the job: No tasks, no memory
used,
and no RDD blocks cached.

I'm assuming that it was a conscious design choice not to have the driver
node partake in the cluster's workload.

Why is that? It seems like a wasted resource.

What's more, the slaves may rise up one day and overthrow the driver out
of
resentment.

Nick

View this message in context: Why doesn't the driver node do any work?
Sent from the Apache Spark User List mailing list archive at Nabble.com
(http://Nabble.com).

Re: Why doesn't the driver node do any work?

Re: Why doesn't the driver node do any work?

Re: Why doesn't the driver node do any work?

Re: Why doesn't the driver node do any work?

Re: Why doesn't the driver node do any work?

5 matches

Site Navigation

Mail list logo

Footer information