[ 
https://issues.apache.org/jira/browse/HDFS-7858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640152#comment-14640152
 ] 

Arun Suresh commented on HDFS-7858:
-----------------------------------

bq. in case of failover of HA, only one request will be invoked (SNN) in hedged 
invocations. Am I right?
yup.. although in the case of more than 2 NNs, the subsequent request will be 
hedged to ALL remaining NNs except the current failed-over NN. 

bq. This way I feel both ConfiguredFailoverProxyProvider and 
RequestHedgingProxyProvider work same way, except at the very first time. ..
Yup.. as well as the above mentioned condition. 

bq.  ..if no. of proxies to try to are more than 2 then 
RequestHedgingProxyProvider will be best.
yup.. now that HDFS-6440 is resolved, I am hoping ReqHedging would be default. 
It is also useful in cases where there are large number of adhoc clients (MR 
jobs) where many of the calls will be one time calls. 
RequestHedgingProxyProvider will ensure that these tasks don't have to wait for 
a timed-out request / Exception from a Failed NN to failover to failover to the 
SNN.

> Improve HA Namenode Failover detection on the client
> ----------------------------------------------------
>
>                 Key: HDFS-7858
>                 URL: https://issues.apache.org/jira/browse/HDFS-7858
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>              Labels: BB2015-05-TBR
>         Attachments: HDFS-7858.1.patch, HDFS-7858.2.patch, HDFS-7858.2.patch, 
> HDFS-7858.3.patch, HDFS-7858.4.patch, HDFS-7858.5.patch, HDFS-7858.6.patch, 
> HDFS-7858.7.patch, HDFS-7858.8.patch, HDFS-7858.9.patch
>
>
> In an HA deployment, Clients are configured with the hostnames of both the 
> Active and Standby Namenodes.Clients will first try one of the NNs 
> (non-deterministically) and if its a standby NN, then it will respond to the 
> client to retry the request on the other Namenode.
> If the client happens to talks to the Standby first, and the standby is 
> undergoing some GC / is busy, then those clients might not get a response 
> soon enough to try the other NN.
> Proposed Approach to solve this :
> 1) Since Zookeeper is already used as the failover controller, the clients 
> could talk to ZK and find out which is the active namenode before contacting 
> it.
> 2) Long-lived DFSClients would have a ZK watch configured which fires when 
> there is a failover so they do not have to query ZK everytime to find out the 
> active NN
> 2) Clients can also cache the last active NN in the user's home directory 
> (~/.lastNN) so that short-lived clients can try that Namenode first before 
> querying ZK



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to