[ https://issues.apache.org/jira/browse/HDFS-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669982#comment-13669982 ]
Kihwal Lee commented on HDFS-4859: ---------------------------------- Thank you for the clarification. Fortunately I have a little bit of experience in Linux kernel, storage and fault-tolerance, so I was able to digest what you have described so far. Although you do not recommend the NFS + manual failover, I now understand you are not against a FJM improvement that could benefit existing installations as part of continuing support. I don't believe addition of this feature will be interpreted as a promotion or encouragement, since QJM+ZKFC is clearly newer, more capable and users like your customers are happy with it. > Add timeout in FileJournalManager > --------------------------------- > > Key: HDFS-4859 > URL: https://issues.apache.org/jira/browse/HDFS-4859 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode > Affects Versions: 2.0.4-alpha > Reporter: Kihwal Lee > > Due to absence of explicit timeout in FileJournalManager, error conditions > that incur long delay (usually until driver timeout) can make namenode > unresponsive for long time. This directly affects NN's failure detection > latency, which is critical in HA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira