[ 
https://issues.apache.org/jira/browse/HDFS-14588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868968#comment-16868968
 ] 

Erik Krogen commented on HDFS-14588:
------------------------------------

Seems like bad behavior. Am I correct in saying that your proposed fix is to 
have WebHDFS throw a {{StandbyException}} when the FSImage is in a loading 
state?

> Client retries Standby NN continuously even if Active NN is available 
> (WebHDFS)
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-14588
>                 URL: https://issues.apache.org/jira/browse/HDFS-14588
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: CR Hota
>            Priority: Major
>
> This is a behavior we have observed in our HA setup of HDFS.
>  # Active NN is up and serving traffic.
>  # Stand By NN is restarted for maintenance.
>  # After step 2 all new clients (webhdfs only) which connect to Stand By keep 
> seeing Retriable Exception as Stand By NN is not yet started (Rpc server is 
> yet to come up as FS image is loading) but http server is started and ready 
> to accept traffic. This keeps happening till rpcserver is up and SNN knows 
> that it's truely standby. Based on start up time this behavior can continue 
> based on start-up times which is high (many minutes) for big clusters.
> This above behavior is causing low availability of HDFS when HDFS is actually 
> still available.
> Ideally webhdfs should throw standby exception (if HA is enabled) and let 
> clients connect to active following that. If active is also not available 
> clients will bounce and automatically connect to the right active.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to