[ 
https://issues.apache.org/jira/browse/HDFS-7858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated HDFS-7858:
------------------------------
    Description: 
In an HA deployment, Clients are configured with the hostnames of both the 
Active and Standby Namenodes.Clients will first try one of the NNs 
(non-deterministically) and if its a standby NN, then it will respond to the 
client to retry the request on the other Namenode.
If the client happens to talks to the Standby first, and the standby is 
undergoing some GC / is busy, then those clients might not get a response soon 
enough to try the other NN.

Proposed Approach to solve this :
1) Use hedged RPCs to simultaneously call multiple configured NNs to decide 
which is the active Namenode.
2) Subsequent calls, will invoke the previously successful NN.
3) On failover of the currently active NN, the remaining NNs will be invoked to 
decide which is the new active 

  was:
In an HA deployment, Clients are configured with the hostnames of both the 
Active and Standby Namenodes.Clients will first try one of the NNs 
(non-deterministically) and if its a standby NN, then it will respond to the 
client to retry the request on the other Namenode.
If the client happens to talks to the Standby first, and the standby is 
undergoing some GC / is busy, then those clients might not get a response soon 
enough to try the other NN.

Proposed Approaches to solve this :
1) Use hedged RPCs to simultaneously call multiple configured NNs to decide 
which is the active one.
2) Since Zookeeper is already used as the failover controller, the clients 
could talk to ZK and find out which is the active namenode before contacting it.
3) Long-lived DFSClients would have a ZK watch configured which fires when 
there is a failover so they do not have to query ZK everytime to find out the 
active NN
4) Clients can also cache the last active NN in the user's home directory 
(~/.lastNN) so that short-lived clients can try that Namenode first before 
querying ZK


> Improve HA Namenode Failover detection on the client
> ----------------------------------------------------
>
>                 Key: HDFS-7858
>                 URL: https://issues.apache.org/jira/browse/HDFS-7858
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>              Labels: BB2015-05-TBR
>             Fix For: 2.8.0
>
>         Attachments: HDFS-7858.1.patch, HDFS-7858.10.patch, 
> HDFS-7858.10.patch, HDFS-7858.11.patch, HDFS-7858.12.patch, 
> HDFS-7858.13.patch, HDFS-7858.2.patch, HDFS-7858.2.patch, HDFS-7858.3.patch, 
> HDFS-7858.4.patch, HDFS-7858.5.patch, HDFS-7858.6.patch, HDFS-7858.7.patch, 
> HDFS-7858.8.patch, HDFS-7858.9.patch
>
>
> In an HA deployment, Clients are configured with the hostnames of both the 
> Active and Standby Namenodes.Clients will first try one of the NNs 
> (non-deterministically) and if its a standby NN, then it will respond to the 
> client to retry the request on the other Namenode.
> If the client happens to talks to the Standby first, and the standby is 
> undergoing some GC / is busy, then those clients might not get a response 
> soon enough to try the other NN.
> Proposed Approach to solve this :
> 1) Use hedged RPCs to simultaneously call multiple configured NNs to decide 
> which is the active Namenode.
> 2) Subsequent calls, will invoke the previously successful NN.
> 3) On failover of the currently active NN, the remaining NNs will be invoked 
> to decide which is the new active 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to