Re: Why doesn't the driver node do any work?
I have one master and two slave nodes, I did not set any ip for spark driver. My question is should I set a ip for spark driver and can I host the driver inside the cluster in master node? if so, how to host it? will it be hosted automatically in that node we submit the application by spark-submit? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-doesn-t-the-driver-node-do-any-work-tp3909p9153.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Why doesn't the driver node do any work?
Also Driver can run on one of the slave nodes. (you will stil need a spark master though for resource allocation etc). Regards Mayur Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Tue, Apr 8, 2014 at 2:46 PM, Nan Zhu zhunanmcg...@gmail.com wrote: may be unrelated to the question itself, just FYI you can run your driver program in worker node with Spark-0.9 http://spark.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster Best, -- Nan Zhu On Tuesday, April 8, 2014 at 5:11 PM, Nicholas Chammas wrote: Alright, so I guess I understand now why spark-ec2 allows you to select different instance types for the driver node and worker nodes. If the driver node is just driving and not doing any large collect()s or heavy processing, it can be much smaller than the worker nodes. With regards to data locality, that may not be an issue in my usage pattern if, in theory, I wanted to make the driver node also do work. I launch clusters using spark-ec2 and source data from S3, so I'm missing out on that data locality benefit from the get-go. The firewall may be an issue if spark-ec2 doesn't punch open the appropriate holes. And it may well not, since it doesn't seem to have an option to configure the driver node to also do work. Anyway, I'll definitely leave things the way they are. If I want a beefier cluster, it's probably much easier to just launch a cluster with more slaves using spark-ec2 than it is to set the driver node to a non-default configuration. On Tue, Apr 8, 2014 at 4:48 PM, Sean Owen so...@cloudera.com wrote: If you want the machine that hosts the driver to also do work, you can designate it as a worker too, if I'm not mistaken. I don't think the driver should do work, logically, but, that's not to say that the machine it's on shouldn't do work. -- Sean Owen | Director, Data Science | London On Tue, Apr 8, 2014 at 8:24 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: So I have a cluster in EC2 doing some work, and when I take a look here http://driver-node:4040/executors/ I see that my driver node is snoozing on the job: No tasks, no memory used, and no RDD blocks cached. I'm assuming that it was a conscious design choice not to have the driver node partake in the cluster's workload. Why is that? It seems like a wasted resource. What's more, the slaves may rise up one day and overthrow the driver out of resentment. Nick View this message in context: Why doesn't the driver node do any work? Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Why doesn't the driver node do any work?
If you want the machine that hosts the driver to also do work, you can designate it as a worker too, if I'm not mistaken. I don't think the driver should do work, logically, but, that's not to say that the machine it's on shouldn't do work. -- Sean Owen | Director, Data Science | London On Tue, Apr 8, 2014 at 8:24 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: So I have a cluster in EC2 doing some work, and when I take a look here http://driver-node:4040/executors/ I see that my driver node is snoozing on the job: No tasks, no memory used, and no RDD blocks cached. I'm assuming that it was a conscious design choice not to have the driver node partake in the cluster's workload. Why is that? It seems like a wasted resource. What's more, the slaves may rise up one day and overthrow the driver out of resentment. Nick View this message in context: Why doesn't the driver node do any work? Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Why doesn't the driver node do any work?
Alright, so I guess I understand now why spark-ec2 allows you to select different instance types for the driver node and worker nodes. If the driver node is just driving and not doing any large collect()s or heavy processing, it can be much smaller than the worker nodes. With regards to data locality, that may not be an issue in my usage pattern if, in theory, I wanted to make the driver node also do work. I launch clusters using spark-ec2 and source data from S3, so I'm missing out on that data locality benefit from the get-go. The firewall may be an issue if spark-ec2 doesn't punch open the appropriate holes. And it may well not, since it doesn't seem to have an option to configure the driver node to also do work. Anyway, I'll definitely leave things the way they are. If I want a beefier cluster, it's probably much easier to just launch a cluster with more slaves using spark-ec2 than it is to set the driver node to a non-default configuration. On Tue, Apr 8, 2014 at 4:48 PM, Sean Owen so...@cloudera.com wrote: If you want the machine that hosts the driver to also do work, you can designate it as a worker too, if I'm not mistaken. I don't think the driver should do work, logically, but, that's not to say that the machine it's on shouldn't do work. -- Sean Owen | Director, Data Science | London On Tue, Apr 8, 2014 at 8:24 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: So I have a cluster in EC2 doing some work, and when I take a look here http://driver-node:4040/executors/ I see that my driver node is snoozing on the job: No tasks, no memory used, and no RDD blocks cached. I'm assuming that it was a conscious design choice not to have the driver node partake in the cluster's workload. Why is that? It seems like a wasted resource. What's more, the slaves may rise up one day and overthrow the driver out of resentment. Nick View this message in context: Why doesn't the driver node do any work? Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Why doesn't the driver node do any work?
may be unrelated to the question itself, just FYI you can run your driver program in worker node with Spark-0.9 http://spark.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster Best, -- Nan Zhu On Tuesday, April 8, 2014 at 5:11 PM, Nicholas Chammas wrote: Alright, so I guess I understand now why spark-ec2 allows you to select different instance types for the driver node and worker nodes. If the driver node is just driving and not doing any large collect()s or heavy processing, it can be much smaller than the worker nodes. With regards to data locality, that may not be an issue in my usage pattern if, in theory, I wanted to make the driver node also do work. I launch clusters using spark-ec2 and source data from S3, so I'm missing out on that data locality benefit from the get-go. The firewall may be an issue if spark-ec2 doesn't punch open the appropriate holes. And it may well not, since it doesn't seem to have an option to configure the driver node to also do work. Anyway, I'll definitely leave things the way they are. If I want a beefier cluster, it's probably much easier to just launch a cluster with more slaves using spark-ec2 than it is to set the driver node to a non-default configuration. On Tue, Apr 8, 2014 at 4:48 PM, Sean Owen so...@cloudera.com (mailto:so...@cloudera.com) wrote: If you want the machine that hosts the driver to also do work, you can designate it as a worker too, if I'm not mistaken. I don't think the driver should do work, logically, but, that's not to say that the machine it's on shouldn't do work. -- Sean Owen | Director, Data Science | London On Tue, Apr 8, 2014 at 8:24 PM, Nicholas Chammas nicholas.cham...@gmail.com (mailto:nicholas.cham...@gmail.com) wrote: So I have a cluster in EC2 doing some work, and when I take a look here http://driver-node:4040/executors/ I see that my driver node is snoozing on the job: No tasks, no memory used, and no RDD blocks cached. I'm assuming that it was a conscious design choice not to have the driver node partake in the cluster's workload. Why is that? It seems like a wasted resource. What's more, the slaves may rise up one day and overthrow the driver out of resentment. Nick View this message in context: Why doesn't the driver node do any work? Sent from the Apache Spark User List mailing list archive at Nabble.com (http://Nabble.com).