Janus Chow created HDFS-15738:
---------------------------------

             Summary: Forbid the transition to Observer state when NameNode is 
in StartupSafeMode
                 Key: HDFS-15738
                 URL: https://issues.apache.org/jira/browse/HDFS-15738
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: Janus Chow


Currently when a _getBlockLocation_ request comes to an Observer Namenode which 
is in safemode, NameNode will have a check that if the result is empty, it will 
reply to the client with a _RetriableException_, noting the client to retry the 
request later.

And If the Observer Namenode is in startup safe mode, the client would have to 
wait for the Observer NameNode to leave the safe mode. For a big cluster, it 
may cause a long time of waiting for the client. In our cluster, we met this 
problem, and the client needs to wait for about 30 minutes before the service 
back to normal.

The reason for this situation is that the NameNode becomes the state of 
Observer when it's still in safe mode getting Datanode's block reports. And 
here are two solutions for this issue:
 # Throw _ObserverRetryOnActiveException_ when the Observer NameNode is in 
startup safe mode, redirecting the user's requests to active NN.
 # Forbid the transition to Observer state when the cluster maintainer is 
trying to do the transition operation.

We choose the second solution because the first one would abet the bad 
operation of transition NN to Observers while it's not ready for real service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to