Re: Node manager or Resource Manager crash

2014-03-05 Thread Krishna Kishore Bonagiri
Vinod, One more observation I can share is that all the times the NM or RM is getting killed, I see the following kind of messages in the NM's log 2014-03-05 05:33:23,824 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's health-status : true, 2014-03-05 05:33:23,824

Node manager or Resource Manager crash

2014-03-04 Thread Krishna Kishore Bonagiri
Hi, I am running an application on a 2-node cluster, which tries to acquire all the containers that are available on one of those nodes and remaining containers from the other node in the cluster. When I run this application continuously in a loop, one of the NM or RM is getting killed at a

Re: Node manager or Resource Manager crash

2014-03-04 Thread Vinod Kumar Vavilapalli
I remember you asking this question before. Check if your OS' OOM killer is killing it. +Vinod On Mar 4, 2014, at 6:53 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am running an application on a 2-node cluster, which tries to acquire all the containers that are

Re: Node manager or Resource Manager crash

2014-03-04 Thread Krishna Kishore Bonagiri
Yes Vinod, I was asking this question sometime back, and I got back to resolve the issue again. I tried to see if the OOM is killing but it is not. I have checked the free swap space on my box while my test is going on, but it doesn't seem to be the issue. Also, I have verified if OOM score is