Re: question about when do resource matching in YARN
On 21 September 2013 09:19, Sandy Ryza sandy.r...@cloudera.com wrote: I don't believe there is any reason scheduling decisions need to be coupled with NodeManager heartbeats. It doesn't sidestep any race conditions because a NodeManager could die immediately after heartbeating. historically its been done for scale: you don't need the JT reaching out to 4K TT's just to give them work to do, instead let them connect in anyway and get work that way. And once they start reporting in completion then they can get given more work. It's very biased towards worker nodes talk to the master over master approaches workers -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: question about when do resource matching in YARN
Yes, but the heartbeat coupling isn't necessary I think. One could even use ZK write/watch approach for faster assignment of regular work? On Tue, Sep 24, 2013 at 2:24 PM, Steve Loughran ste...@hortonworks.com wrote: On 21 September 2013 09:19, Sandy Ryza sandy.r...@cloudera.com wrote: I don't believe there is any reason scheduling decisions need to be coupled with NodeManager heartbeats. It doesn't sidestep any race conditions because a NodeManager could die immediately after heartbeating. historically its been done for scale: you don't need the JT reaching out to 4K TT's just to give them work to do, instead let them connect in anyway and get work that way. And once they start reporting in completion then they can get given more work. It's very biased towards worker nodes talk to the master over master approaches workers -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Harsh J
Re: question about when do resource matching in YARN
How would the ZK approach make things faster? Are you saying the AMs would do the watching? Currently containers assignments aren't actually sent to the NodeManagers on heartbeats. The first time a NM hears about a container is when an AM launches it. On Tue, Sep 24, 2013 at 4:12 AM, Harsh J ha...@cloudera.com wrote: Yes, but the heartbeat coupling isn't necessary I think. One could even use ZK write/watch approach for faster assignment of regular work? On Tue, Sep 24, 2013 at 2:24 PM, Steve Loughran ste...@hortonworks.com wrote: On 21 September 2013 09:19, Sandy Ryza sandy.r...@cloudera.com wrote: I don't believe there is any reason scheduling decisions need to be coupled with NodeManager heartbeats. It doesn't sidestep any race conditions because a NodeManager could die immediately after heartbeating. historically its been done for scale: you don't need the JT reaching out to 4K TT's just to give them work to do, instead let them connect in anyway and get work that way. And once they start reporting in completion then they can get given more work. It's very biased towards worker nodes talk to the master over master approaches workers -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Harsh J
Re: question about when do resource matching in YARN
I don't believe there is any reason scheduling decisions need to be coupled with NodeManager heartbeats. It doesn't sidestep any race conditions because a NodeManager could die immediately after heartbeating. On Sat, Sep 21, 2013 at 2:11 AM, Omkar Joshi ojo...@hortonworks.com wrote: Hi Wei, Yes there is a clear lag between AM requesting resource and satisfying NM heartbeats (thereby we process the event) are received. Developers in project Tez ( http://incubator.apache.org/projects/tez.html ) have done some similar stuff. You can check it there. I hope it helps. Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Fri, Sep 20, 2013 at 8:56 AM, Xuan Gong xg...@hortonworks.com wrote: Hey, Wei: The nodeHeartBeat is used to let RM knows this NM is still alive. We only assign containers from alive NM. Another thing is when scheduler receives the nodeHeartBeat, the scheduler will get the container status (such as completed, new launched) from NM, and it can use it to update the resource. You can take a look those source codes, it can help you understand better. 1. NodeStatusUpdaterImpl::startStatusUpdater(). it used to send out the nodeheartbeat 2. ResourceTrackerService::nodeHeartbeat(). This one is used to get heartbeat from NM, and send to RMNodeImpl 3. RMNodeImpl::StatusUpdateWhenHealthyTransition(). Get the heartBeat, and do locally update. 4. CapacityScheduler::nodeUpdate(). Processing the heartbeat info, and potentially assign containers. Thanks Xuan On Fri, Sep 20, 2013 at 7:17 AM, wei yan @ Gmail ywsk...@gmail.com wrote: Hi, all, I have a simple question. Currently in YARN, the resource matching is triggered by the node manager heartbeat. That is, assignContainers() is only invoked when a new heartbeat comes in. Why we don't use resource request triggered mechanism? That is, when AM submits allocateRequest, we do the resource matching and assign containers. Does anybody have any idea about this? thanks, Wei -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: question about when do resource matching in YARN
Hey, Wei: The nodeHeartBeat is used to let RM knows this NM is still alive. We only assign containers from alive NM. Another thing is when scheduler receives the nodeHeartBeat, the scheduler will get the container status (such as completed, new launched) from NM, and it can use it to update the resource. You can take a look those source codes, it can help you understand better. 1. NodeStatusUpdaterImpl::startStatusUpdater(). it used to send out the nodeheartbeat 2. ResourceTrackerService::nodeHeartbeat(). This one is used to get heartbeat from NM, and send to RMNodeImpl 3. RMNodeImpl::StatusUpdateWhenHealthyTransition(). Get the heartBeat, and do locally update. 4. CapacityScheduler::nodeUpdate(). Processing the heartbeat info, and potentially assign containers. Thanks Xuan On Fri, Sep 20, 2013 at 7:17 AM, wei yan @ Gmail ywsk...@gmail.com wrote: Hi, all, I have a simple question. Currently in YARN, the resource matching is triggered by the node manager heartbeat. That is, assignContainers() is only invoked when a new heartbeat comes in. Why we don't use resource request triggered mechanism? That is, when AM submits allocateRequest, we do the resource matching and assign containers. Does anybody have any idea about this? thanks, Wei -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: question about when do resource matching in YARN
Hi Wei, Yes there is a clear lag between AM requesting resource and satisfying NM heartbeats (thereby we process the event) are received. Developers in project Tez ( http://incubator.apache.org/projects/tez.html ) have done some similar stuff. You can check it there. I hope it helps. Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Fri, Sep 20, 2013 at 8:56 AM, Xuan Gong xg...@hortonworks.com wrote: Hey, Wei: The nodeHeartBeat is used to let RM knows this NM is still alive. We only assign containers from alive NM. Another thing is when scheduler receives the nodeHeartBeat, the scheduler will get the container status (such as completed, new launched) from NM, and it can use it to update the resource. You can take a look those source codes, it can help you understand better. 1. NodeStatusUpdaterImpl::startStatusUpdater(). it used to send out the nodeheartbeat 2. ResourceTrackerService::nodeHeartbeat(). This one is used to get heartbeat from NM, and send to RMNodeImpl 3. RMNodeImpl::StatusUpdateWhenHealthyTransition(). Get the heartBeat, and do locally update. 4. CapacityScheduler::nodeUpdate(). Processing the heartbeat info, and potentially assign containers. Thanks Xuan On Fri, Sep 20, 2013 at 7:17 AM, wei yan @ Gmail ywsk...@gmail.com wrote: Hi, all, I have a simple question. Currently in YARN, the resource matching is triggered by the node manager heartbeat. That is, assignContainers() is only invoked when a new heartbeat comes in. Why we don't use resource request triggered mechanism? That is, when AM submits allocateRequest, we do the resource matching and assign containers. Does anybody have any idea about this? thanks, Wei -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.