Vinod,
One more observation I can share is that all the times the NM or RM is
getting killed, I see the following kind of messages in the NM's log
2014-03-05 05:33:23,824 DEBUG
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's
health-status : true,
2014-03-05 05:33:23,824
Hi,
I am running an application on a 2-node cluster, which tries to acquire
all the containers that are available on one of those nodes and remaining
containers from the other node in the cluster. When I run this application
continuously in a loop, one of the NM or RM is getting killed at a
I remember you asking this question before. Check if your OS' OOM killer is
killing it.
+Vinod
On Mar 4, 2014, at 6:53 AM, Krishna Kishore Bonagiri write2kish...@gmail.com
wrote:
Hi,
I am running an application on a 2-node cluster, which tries to acquire all
the containers that are
Yes Vinod, I was asking this question sometime back, and I got back to
resolve the issue again.
I tried to see if the OOM is killing but it is not. I have checked the free
swap space on my box while my test is going on, but it doesn't seem to be
the issue. Also, I have verified if OOM score is