[ https://issues.apache.org/jira/browse/HDFS-14588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868970#comment-16868970 ]
CR Hota commented on HDFS-14588: -------------------------------- [~xkrogen] Thanks for the review. Yes to throw StandbyException but ONLY if HA is enabled. > Client retries Standby NN continuously even if Active NN is available > (WebHDFS) > ------------------------------------------------------------------------------- > > Key: HDFS-14588 > URL: https://issues.apache.org/jira/browse/HDFS-14588 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: CR Hota > Priority: Major > > This is a behavior we have observed in our HA setup of HDFS. > # Active NN is up and serving traffic. > # Stand By NN is restarted for maintenance. > # After step 2 all new clients (webhdfs only) which connect to Stand By keep > seeing Retriable Exception as Stand By NN is not yet started (Rpc server is > yet to come up as FS image is loading) but http server is started and ready > to accept traffic. This keeps happening till rpcserver is up and SNN knows > that it's truely standby. Based on start up time this behavior can continue > based on start-up times which is high (many minutes) for big clusters. > This above behavior is causing low availability of HDFS when HDFS is actually > still available. > Ideally webhdfs should throw standby exception (if HA is enabled) and let > clients connect to active following that. If active is also not available > clients will bounce and automatically connect to the right active. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org