Hi All,

I've been testing hdfs with 3 datanodes cluster, and I've noticed that if I
stopped 1 datanode I still can read all the files, but "hadoop dfs
-copyFromLocal" command fails. In the namenode web interface I can see that
it still thinks that datanode is alive and basically detects that it's dead
in 10 minutes. After reading list archives I've tried modifying heartbeat
intervals, by using these options:

<property>
  <name>dfs.heartbeat.interval</name>
  <value>1</value>
  <description>Determines datanode heartbeat interval in
seconds.</description>
</property>

<property>
  <name>dfs.heartbeat.recheck.interval</name>
  <value>1</value>
  <description>Determines datanode heartbeat interval in
seconds.</description>
</property>

<property>
  <name>dfs.namenode.decommission.interval</name>
  <value>1</value>
  <description>Determines datanode heartbeat interval in
seconds.</description>
</property>

It still detects in 10 minutes. Is there a way to shorten this interval? (I
thought if I set data replication to 2, and have 3 nodes (basically have one
spare) writes won't fail, but they still do fail.)

Reply via email to