Shorten interval between datanode going down and being detected as dead by namenode?

nesvarbu No Fri, 08 May 2009 14:13:34 -0700

Hi All,

I've been testing hdfs with 3 datanodes cluster, and I've noticed that if I
stopped 1 datanode I still can read all the files, but "hadoop dfs
-copyFromLocal" command fails. In the namenode web interface I can see that
it still thinks that datanode is alive and basically detects that it's dead
in 10 minutes. After reading list archives I've tried modifying heartbeat
intervals, by using these options:


<property>
  <name>dfs.heartbeat.interval</name>
  <value>1</value>
  <description>Determines datanode heartbeat interval in
seconds.</description>
</property>

<property>
  <name>dfs.heartbeat.recheck.interval</name>
  <value>1</value>
  <description>Determines datanode heartbeat interval in
seconds.</description>
</property>

<property>
  <name>dfs.namenode.decommission.interval</name>
  <value>1</value>
  <description>Determines datanode heartbeat interval in
seconds.</description>
</property>

It still detects in 10 minutes. Is there a way to shorten this interval? (I
thought if I set data replication to 2, and have 3 nodes (basically have one
spare) writes won't fail, but they still do fail.)

Shorten interval between datanode going down and being detected as dead by namenode?

Reply via email to