[jira] [Commented] (HDFS-8277) Safemode enter fails when Standby NameNode is down

Arpit Agarwal (JIRA) Fri, 08 May 2015 12:08:30 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535283#comment-14535283
 ]


Arpit Agarwal commented on HDFS-8277:
-------------------------------------

bq. Same problem will come if both NN in safe mode and one namenode got 
restarted.
Agreed, and this is is because the manual safe mode transition is not logged in 
the edit log. That choice could have been intentional but it looks like the 
wrong choice in an HA setup. If it had been logged we would only have to 
contact the active NN. I agree your patch did not introduce this problem 
however I'd like to at least understand the reason for the original choice 
before changing any more behavior.

Surendra, feel free to kick off a thread on the dev mailing list. Also I'll ask 
someone with more background offline and post my findings here.

> Safemode enter fails when Standby NameNode is down
> --------------------------------------------------
>
>                 Key: HDFS-8277
>                 URL: https://issues.apache.org/jira/browse/HDFS-8277
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, HDFS, namenode
>    Affects Versions: 2.6.0
>         Environment: HDP 2.2.0
>            Reporter: Hari Sekhon
>            Assignee: surendra singh lilhore
>            Priority: Minor
>              Labels: BB2015-05-RFC
>         Attachments: HDFS-8277.patch, HDFS-8277_1.patch, HDFS-8277_2.patch, 
> HDFS-8277_3.patch, HDFS-8277_4.patch
>
>
> HDFS fails to enter safemode when the Standby NameNode is down (eg. due to 
> AMBARI-10536).
> {code}hdfs dfsadmin -safemode enter
> safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused{code}
> This appears to be a bug in that it's not trying both NameNodes like the 
> standard hdfs client code does, and is instead stopping after getting a 
> connection refused from nn1 which is down. I verified normal hadoop fs writes 
> and reads via cli did work at this time, using nn2. I happened to run this 
> command as the hdfs user on nn2 which was the surviving Active NameNode.
> After I re-bootstrapped the Standby NN to fix it the command worked as 
> expected again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8277) Safemode enter fails when Standby NameNode is down

Reply via email to