[ 
https://issues.apache.org/jira/browse/HDFS-15738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17252478#comment-17252478
 ] 

Ayush Saxena commented on HDFS-15738:
-------------------------------------

Well I wouldn't prefer going towards an incompatible change in general, that 
could break scripts, Moreover validation and checks are supposed to be in 
general for the end-user, the admin/operator is supposedly a person aware of 
the system, So should be smart enough to take a call.

Moreover this problem can come up in case of the Observer goes to safemode due 
to some other reason as well, In that case also retrying on the same node when 
there are other nodes available doesn't seems to be very apt choice.

 

Anyway, lets wait for more opinions here, and we can take a call then.

> Forbid the transition to Observer state when NameNode is in StartupSafeMode
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-15738
>                 URL: https://issues.apache.org/jira/browse/HDFS-15738
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Janus Chow
>            Assignee: Janus Chow
>            Priority: Major
>         Attachments: HDFS-15738.001.patch
>
>
> Currently when a _getBlockLocation_ request comes to an Observer Namenode 
> which is in safemode, NameNode will have a check that if the result is empty, 
> it will reply to the client with a _RetriableException_, noting the client to 
> retry the request later.
> And If the Observer Namenode is in startup safe mode, the client would have 
> to wait for the Observer NameNode to leave the safe mode. For a big cluster, 
> it may cause a long time of waiting for the client. In our cluster, we met 
> this problem, and the client needs to wait for about 30 minutes before the 
> service back to normal.
> The reason for this situation is that the NameNode becomes the state of 
> Observer when it's still in safe mode getting Datanode's block reports. And 
> here are two solutions for this issue:
>  # Throw _ObserverRetryOnActiveException_ when the Observer NameNode is in 
> startup safe mode, redirecting the user's requests to active NN.
>  # Forbid the transition to Observer state when the cluster maintainer is 
> trying to do the transition operation.
> We choose the second solution because the first one would abet the bad 
> operation of transition NN to Observers while it's not ready for real service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to