[ https://issues.apache.org/jira/browse/HDFS-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15627171#comment-15627171 ]
John Zhuge commented on HDFS-4957: ---------------------------------- To fix this issue, far more complex change is needed: * Failover needs to succeed with partial set of JNs as long as at least 3 JNs remain. Since the level of failure tolerance is dropped, this behavior probably should be enabled only by a flag. * When < 3 JNs survive, can failover still succeed and NN gets into a special mode? * Optionally, add a JN back once it recovers by triggering a log roll. Not trivial. See discussions in HDFS-3867. [~tlipcon], [~cmccabe], thoughts? > NameNode failover should not fail because a DNS entry for a quorum node > cannot be resolved > ------------------------------------------------------------------------------------------ > > Key: HDFS-4957 > URL: https://issues.apache.org/jira/browse/HDFS-4957 > Project: Hadoop HDFS > Issue Type: Bug > Components: qjm > Affects Versions: 2.3.0, 2.6.0 > Reporter: Colin P. McCabe > Assignee: John Zhuge > > When a StandbyNameNode is becoming active, we should not bail out because a > DNS entry for a quorum node cannot be resolved. Currently it does fail in > this scenario, with a message like this: > {code} > 2013-07-03 21:28:40,576 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services > required for active state > 2013-07-03 21:28:40,579 FATAL > org.apache.hadoop.hdfs.server.namenode.NameNode: Error encountered requiring > NN shutdown. Shutting down immediately. > java.lang.IllegalArgumentException: Unable to construct journal, > qjournal://hadoop-mm:8485;hadoop-nn-0:8485;hadoop-nn-1:8485/hadoop > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1254) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:226) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournalsForWrite(FSEditLog.java:193) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:722) > <etc> > {code} > reported by Matt Bookman -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org