[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046767#comment-13046767
 ] 

Aaron T. Myers commented on HDFS-1973:
--------------------------------------

Hi Hari,

bq. Can you please elaborate a little bit on your area of interest with 
ZOOKEEPER-1080?

As noted in Sanjay's design doc, one proposal for detecting NN failure would be 
to use an external ZK service. The HDFS proposal doesn't go into great detail 
on this, but it suggests using ZK with a heartbeat mechanism to see if the NN 
is still alive. I personally like the ZK recipe better (i.e. using ephemeral + 
sequence nodes).

Another possible use for ZK in the implementation of NN HA would be to use ZK 
as the source of truth for clients to determine the active NN. This would seem 
to flow naturally from the part of the ZK recipe which says "Applications may 
consider creating a separate to znode to acknowledge that the leader has 
executed the leader procedure." If NN HA were to utilize an implementation of 
the ZK leader election recipe, then perhaps this "leader-procedure-complete 
znode" could store the IP or hostname of the active NN which clients could use.

I haven't read the design doc posted on ZOOKEEPER-1080 yet. I'll go ahead and 
do that and post my comments there.

I should also mention that we have not settled upon what strategy we'll take to 
do NN failure detection or client failover. As noted in Sanjay's design doc, 
we're also strongly considering using virtual IPs for client failover.

> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> ------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1973
>                 URL: https://issues.apache.org/jira/browse/HDFS-1973
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Suresh Srinivas
>            Assignee: Aaron T. Myers
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to