[ 
https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535052#comment-14535052
 ] 

Arpit Agarwal commented on HDFS-8277:
-------------------------------------

Need to think about this for a bit. I am not sure what was the contract around 
safe mode and HA. Succeeding the command if just one NN is transitioned 
successfully may lead to an unpredictable state if the other NN comes back and 
there is a failover for any reason. Looks like safe mode is not logged in the 
edit log either.

Perhaps someone with more knowledge of the expected contract can chime in.

In any case thanks for reporting this [~harisekhon] and the proposed fix 
[~surendrasingh].



> Safemode enter fails when Standby NameNode is down
> --------------------------------------------------
>
>                 Key: HDFS-8277
>                 URL: https://issues.apache.org/jira/browse/HDFS-8277
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, HDFS, namenode
>    Affects Versions: 2.6.0
>         Environment: HDP 2.2.0
>            Reporter: Hari Sekhon
>            Assignee: surendra singh lilhore
>            Priority: Minor
>              Labels: BB2015-05-RFC
>         Attachments: HDFS-8277.patch, HDFS-8277_1.patch, HDFS-8277_2.patch, 
> HDFS-8277_3.patch, HDFS-8277_4.patch
>
>
> HDFS fails to enter safemode when the Standby NameNode is down (eg. due to 
> AMBARI-10536).
> {code}hdfs dfsadmin -safemode enter
> safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused{code}
> This appears to be a bug in that it's not trying both NameNodes like the 
> standard hdfs client code does, and is instead stopping after getting a 
> connection refused from nn1 which is down. I verified normal hadoop fs writes 
> and reads via cli did work at this time, using nn2. I happened to run this 
> command as the hdfs user on nn2 which was the surviving Active NameNode.
> After I re-bootstrapped the Standby NN to fix it the command worked as 
> expected again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to