Namenode returning the same Datanode to client, due to infrequent heartbeat ---------------------------------------------------------------------------
Key: HDFS-1235 URL: https://issues.apache.org/jira/browse/HDFS-1235 Project: Hadoop HDFS Issue Type: Bug Components: name-node Reporter: Thanh Do This bug has been reported. Basically since datanode's hearbeat messages are infrequent (~ every 10 minutes), NameNode always gives the client the same datanode even if the datanode is dead. We want to point out that the client wait 6 seconds before retrying, which could be considered long and useless retries in this scenario, because in 6 secs, the namenode hasn't declared the datanode dead. Overall this happens when a datanode is dead during the first phase of the pipeline (file setups). If a datanode is dead during the second phase (byte transfer), the DFSClient still could proceed with the other surviving datanodes (which is consistent with what Hadoop books always say -- the write should proceed if at least we have one good datanode). But unfortunately this specification is not true during the first phase of the pipeline. Overall we suggest that the namenode take into consideration the client's view of unreachable datanodes. That is, if a client says that it cannot reach DN-X, then the namenode might give the client another node other than X (but the namenode does not have to declare N dead). This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.