Hi All, I've been testing hdfs with 3 datanodes cluster, and I've noticed that if I stopped 1 datanode I still can read all the files, but "hadoop dfs -copyFromLocal" command fails. In the namenode web interface I can see that it still thinks that datanode is alive and basically detects that it's dead in 10 minutes. After reading list archives I've tried modifying heartbeat intervals, by using these options:
<property> <name>dfs.heartbeat.interval</name> <value>1</value> <description>Determines datanode heartbeat interval in seconds.</description> </property> <property> <name>dfs.heartbeat.recheck.interval</name> <value>1</value> <description>Determines datanode heartbeat interval in seconds.</description> </property> <property> <name>dfs.namenode.decommission.interval</name> <value>1</value> <description>Determines datanode heartbeat interval in seconds.</description> </property> It still detects in 10 minutes. Is there a way to shorten this interval? (I thought if I set data replication to 2, and have 3 nodes (basically have one spare) writes won't fail, but they still do fail.)