[jira] [Commented] (HDFS-7858) Improve HA Namenode Failover detection on the client

Arun Suresh (JIRA) Sat, 14 Mar 2015 13:54:06 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-7858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362020#comment-14362020
 ]


Arun Suresh commented on HDFS-7858:
-----------------------------------

[~kasha],

bq. one is not required to configure a fencing mechanism when using QJM ?
Yup, QJM ensures only 1 namenode can write, but fencing is still recommended 
since there is still a possibility of the stale reads from the old Active NN 
before going down (I am hoping this will not be too much of an issue)

bq.  it would be nice to make the solution here accessible to YARN as well. 
The current patch extends the {{ConfigredFailoverProxyProvider}} in the hdfs 
code base. The {{ConfiguredRMFailoverProxyProvider}} looks like it belongs to 
the same class hierarchy.. so it shouldnt be too hard. But like you mentioned, 
if YARN is not deployed with {{ZKRMStateStore}}, there is a possibility of 
split-brain.. which leads mean to think.. wouldnt it be nice to incorporate QJM 
and JNs into YARN deployment ? thoughts ?

> Improve HA Namenode Failover detection on the client
> ----------------------------------------------------
>
>                 Key: HDFS-7858
>                 URL: https://issues.apache.org/jira/browse/HDFS-7858
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>         Attachments: HDFS-7858.1.patch, HDFS-7858.2.patch, HDFS-7858.2.patch, 
> HDFS-7858.3.patch
>
>
> In an HA deployment, Clients are configured with the hostnames of both the 
> Active and Standby Namenodes.Clients will first try one of the NNs 
> (non-deterministically) and if its a standby NN, then it will respond to the 
> client to retry the request on the other Namenode.
> If the client happens to talks to the Standby first, and the standby is 
> undergoing some GC / is busy, then those clients might not get a response 
> soon enough to try the other NN.
> Proposed Approach to solve this :
> 1) Since Zookeeper is already used as the failover controller, the clients 
> could talk to ZK and find out which is the active namenode before contacting 
> it.
> 2) Long-lived DFSClients would have a ZK watch configured which fires when 
> there is a failover so they do not have to query ZK everytime to find out the 
> active NN
> 2) Clients can also cache the last active NN in the user's home directory 
> (~/.lastNN) so that short-lived clients can try that Namenode first before 
> querying ZK



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7858) Improve HA Namenode Failover detection on the client

Reply via email to