When nodes are not reporting heartbeats, can you ssh into them?
Can they see the JT machine?
What does netstat -a show?

Cheers,

Joep
________________________________
From: Rahul Das [rahul.h...@gmail.com]
Sent: Tuesday, August 02, 2011 11:21 PM
To: hdfs-user@hadoop.apache.org
Subject: Dananode not sending the Hearbeat messages to Namenode

Hi,

I found a strange behavior in my cluster. The data nodes stop sending any 
information randomly (no logs coming). So the namenode thinks its down. But 
after some time ( approx 30 mints) the datanode nodes comes up and start 
behaving properly. I tried finding any error log, but the datanode node is not 
writing any error message during this time.

The Namenode shows some warning similar to

2011-07-28 20:59:35,275 WARN 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: PendingReplicationMonitor 
timed out block blk_8370263993564715002_23947922

I checked this is not happening due to network outage or some other process 
eating up the CPU.

Please help me with this.
--
Rahul

Reply via email to