[ https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046767#comment-13046767 ]
Aaron T. Myers commented on HDFS-1973: -------------------------------------- Hi Hari, bq. Can you please elaborate a little bit on your area of interest with ZOOKEEPER-1080? As noted in Sanjay's design doc, one proposal for detecting NN failure would be to use an external ZK service. The HDFS proposal doesn't go into great detail on this, but it suggests using ZK with a heartbeat mechanism to see if the NN is still alive. I personally like the ZK recipe better (i.e. using ephemeral + sequence nodes). Another possible use for ZK in the implementation of NN HA would be to use ZK as the source of truth for clients to determine the active NN. This would seem to flow naturally from the part of the ZK recipe which says "Applications may consider creating a separate to znode to acknowledge that the leader has executed the leader procedure." If NN HA were to utilize an implementation of the ZK leader election recipe, then perhaps this "leader-procedure-complete znode" could store the IP or hostname of the active NN which clients could use. I haven't read the design doc posted on ZOOKEEPER-1080 yet. I'll go ahead and do that and post my comments there. I should also mention that we have not settled upon what strategy we'll take to do NN failure detection or client failover. As noted in Sanjay's design doc, we're also strongly considering using virtual IPs for client failover. > HA: HDFS clients must handle namenode failover and switch over to the new > active namenode. > ------------------------------------------------------------------------------------------ > > Key: HDFS-1973 > URL: https://issues.apache.org/jira/browse/HDFS-1973 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Suresh Srinivas > Assignee: Aaron T. Myers > > During failover, a client must detect the current active namenode failure and > switch over to the new active namenode. The switch over might make use of IP > failover or some thing more elaborate such as zookeeper to discover the new > active. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira