Re: Why doesn't the driver node do any work?

2014-07-09 Thread aminn_524
I have one master and two slave nodes, I did not set any ip for spark driver.
My question is should I set a ip for spark driver and can I host the driver
inside the cluster in master node? if so, how to host it? will it be hosted
automatically in that node we submit the application by spark-submit?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Why-doesn-t-the-driver-node-do-any-work-tp3909p9153.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Why doesn't the driver node do any work?

2014-04-09 Thread Mayur Rustagi
Also Driver can run on one of the slave nodes. (you will stil need a spark
master though for resource allocation etc).
Regards
Mayur

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi



On Tue, Apr 8, 2014 at 2:46 PM, Nan Zhu zhunanmcg...@gmail.com wrote:

  may be unrelated to the question itself, just FYI

 you can run your driver program in worker node with Spark-0.9


 http://spark.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster

 Best,

 --
 Nan Zhu


 On Tuesday, April 8, 2014 at 5:11 PM, Nicholas Chammas wrote:

 Alright, so I guess I understand now why spark-ec2 allows you to select
 different instance types for the driver node and worker nodes. If the
 driver node is just driving and not doing any large collect()s or heavy
 processing, it can be much smaller than the worker nodes.

 With regards to data locality, that may not be an issue in my usage
 pattern if, in theory, I wanted to make the driver node also do work. I
 launch clusters using spark-ec2 and source data from S3, so I'm missing out
 on that data locality benefit from the get-go. The firewall may be an issue
 if spark-ec2 doesn't punch open the appropriate holes. And it may well not,
 since it doesn't seem to have an option to configure the driver node to
 also do work.

 Anyway, I'll definitely leave things the way they are. If I want a beefier
 cluster, it's probably much easier to just launch a cluster with more
 slaves using spark-ec2 than it is to set the driver node to a non-default
 configuration.


 On Tue, Apr 8, 2014 at 4:48 PM, Sean Owen so...@cloudera.com wrote:

 If you want the machine that hosts the driver to also do work, you can
 designate it as a worker too, if I'm not mistaken. I don't think the
 driver should do work, logically, but, that's not to say that the
 machine it's on shouldn't do work.
 --
 Sean Owen | Director, Data Science | London


 On Tue, Apr 8, 2014 at 8:24 PM, Nicholas Chammas
 nicholas.cham...@gmail.com wrote:
  So I have a cluster in EC2 doing some work, and when I take a look here
 
  http://driver-node:4040/executors/
 
  I see that my driver node is snoozing on the job: No tasks, no memory
 used,
  and no RDD blocks cached.
 
  I'm assuming that it was a conscious design choice not to have the driver
  node partake in the cluster's workload.
 
  Why is that? It seems like a wasted resource.
 
  What's more, the slaves may rise up one day and overthrow the driver out
 of
  resentment.
 
  Nick
 
 
  
  View this message in context: Why doesn't the driver node do any work?
  Sent from the Apache Spark User List mailing list archive at Nabble.com.






Re: Why doesn't the driver node do any work?

2014-04-08 Thread Sean Owen
If you want the machine that hosts the driver to also do work, you can
designate it as a worker too, if I'm not mistaken. I don't think the
driver should do work, logically, but, that's not to say that the
machine it's on shouldn't do work.
--
Sean Owen | Director, Data Science | London


On Tue, Apr 8, 2014 at 8:24 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
 So I have a cluster in EC2 doing some work, and when I take a look here

 http://driver-node:4040/executors/

 I see that my driver node is snoozing on the job: No tasks, no memory used,
 and no RDD blocks cached.

 I'm assuming that it was a conscious design choice not to have the driver
 node partake in the cluster's workload.

 Why is that? It seems like a wasted resource.

 What's more, the slaves may rise up one day and overthrow the driver out of
 resentment.

 Nick


 
 View this message in context: Why doesn't the driver node do any work?
 Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Why doesn't the driver node do any work?

2014-04-08 Thread Nicholas Chammas
Alright, so I guess I understand now why spark-ec2 allows you to select
different instance types for the driver node and worker nodes. If the
driver node is just driving and not doing any large collect()s or heavy
processing, it can be much smaller than the worker nodes.

With regards to data locality, that may not be an issue in my usage pattern
if, in theory, I wanted to make the driver node also do work. I launch
clusters using spark-ec2 and source data from S3, so I'm missing out on
that data locality benefit from the get-go. The firewall may be an issue if
spark-ec2 doesn't punch open the appropriate holes. And it may well not,
since it doesn't seem to have an option to configure the driver node to
also do work.

Anyway, I'll definitely leave things the way they are. If I want a beefier
cluster, it's probably much easier to just launch a cluster with more
slaves using spark-ec2 than it is to set the driver node to a non-default
configuration.


On Tue, Apr 8, 2014 at 4:48 PM, Sean Owen so...@cloudera.com wrote:

 If you want the machine that hosts the driver to also do work, you can
 designate it as a worker too, if I'm not mistaken. I don't think the
 driver should do work, logically, but, that's not to say that the
 machine it's on shouldn't do work.
 --
 Sean Owen | Director, Data Science | London


 On Tue, Apr 8, 2014 at 8:24 PM, Nicholas Chammas
 nicholas.cham...@gmail.com wrote:
  So I have a cluster in EC2 doing some work, and when I take a look here
 
  http://driver-node:4040/executors/
 
  I see that my driver node is snoozing on the job: No tasks, no memory
 used,
  and no RDD blocks cached.
 
  I'm assuming that it was a conscious design choice not to have the driver
  node partake in the cluster's workload.
 
  Why is that? It seems like a wasted resource.
 
  What's more, the slaves may rise up one day and overthrow the driver out
 of
  resentment.
 
  Nick
 
 
  
  View this message in context: Why doesn't the driver node do any work?
  Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: Why doesn't the driver node do any work?

2014-04-08 Thread Nan Zhu
may be unrelated to the question itself, just FYI 

you can run your driver program in worker node with Spark-0.9

http://spark.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster

Best, 

-- 
Nan Zhu



On Tuesday, April 8, 2014 at 5:11 PM, Nicholas Chammas wrote:

 Alright, so I guess I understand now why spark-ec2 allows you to select 
 different instance types for the driver node and worker nodes. If the driver 
 node is just driving and not doing any large collect()s or heavy processing, 
 it can be much smaller than the worker nodes.
 
 With regards to data locality, that may not be an issue in my usage pattern 
 if, in theory, I wanted to make the driver node also do work. I launch 
 clusters using spark-ec2 and source data from S3, so I'm missing out on that 
 data locality benefit from the get-go. The firewall may be an issue if 
 spark-ec2 doesn't punch open the appropriate holes. And it may well not, 
 since it doesn't seem to have an option to configure the driver node to also 
 do work. 
 
 Anyway, I'll definitely leave things the way they are. If I want a beefier 
 cluster, it's probably much easier to just launch a cluster with more slaves 
 using spark-ec2 than it is to set the driver node to a non-default 
 configuration. 
 
 
 On Tue, Apr 8, 2014 at 4:48 PM, Sean Owen so...@cloudera.com 
 (mailto:so...@cloudera.com) wrote:
  If you want the machine that hosts the driver to also do work, you can
  designate it as a worker too, if I'm not mistaken. I don't think the
  driver should do work, logically, but, that's not to say that the
  machine it's on shouldn't do work.
  --
  Sean Owen | Director, Data Science | London
  
  
  On Tue, Apr 8, 2014 at 8:24 PM, Nicholas Chammas
  nicholas.cham...@gmail.com (mailto:nicholas.cham...@gmail.com) wrote:
   So I have a cluster in EC2 doing some work, and when I take a look here
  
   http://driver-node:4040/executors/
  
   I see that my driver node is snoozing on the job: No tasks, no memory 
   used,
   and no RDD blocks cached.
  
   I'm assuming that it was a conscious design choice not to have the driver
   node partake in the cluster's workload.
  
   Why is that? It seems like a wasted resource.
  
   What's more, the slaves may rise up one day and overthrow the driver out 
   of
   resentment.
  
   Nick
  
  
   
   View this message in context: Why doesn't the driver node do any work?
   Sent from the Apache Spark User List mailing list archive at Nabble.com 
   (http://Nabble.com).