[ https://issues.apache.org/jira/browse/HDFS-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500475#comment-13500475 ]
Colin Patrick McCabe commented on HDFS-4210: -------------------------------------------- It should definitely throw a more helpful exception than {{NullPointerException}}. However, I think the general idea that the quorum format should fail if some {{JournalNodes}} could not be formatted makes some sense. If some {{JournalNodes}} could not be formatted, the system is running at reduced redundancy. This could cause major problems down the road if we silently return success here. Can you write a script to wait until all nodes are accessible (keep pinging every second until you get through, or something like that)? Alternately, perhaps we could add a switch like {{\-partial}} that would return success from a partial format as long as a quorum of JNs got formatted. But I don't think it should be the default... > NameNode Format should not fail for DNS resolution on minority of JournalNode > ----------------------------------------------------------------------------- > > Key: HDFS-4210 > URL: https://issues.apache.org/jira/browse/HDFS-4210 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, journal-node, name-node > Affects Versions: 2.0.0-alpha > Environment: CDH4.1.2 > Reporter: Damien Hardy > Priority: Trivial > > Setting : > qjournal://cdh4master01:8485;cdh4master02:8485;cdh4worker03:8485/hdfscluster > cdh4master01 and cdh4master02 JournalNode up and running, > cdh4worker03 not yet provisionning (no DNS entrie) > With : > `hadoop namenode -format` fails with : > 12/11/19 14:42:42 FATAL namenode.NameNode: Exception in namenode join > java.lang.IllegalArgumentException: Unable to construct journal, > qjournal://cdh4master01:8485;cdh4master02:8485;cdh4worker03:8485/hdfscluster > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1235) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:226) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournalsForWrite(FSEditLog.java:193) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:745) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1099) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1233) > ... 5 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.getName(IPCLoggerChannelMetrics.java:107) > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.create(IPCLoggerChannelMetrics.java:91) > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.<init>(IPCLoggerChannel.java:161) > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$1.createLogger(IPCLoggerChannel.java:141) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:353) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:135) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.<init>(QuorumJournalManager.java:104) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.<init>(QuorumJournalManager.java:93) > ... 10 more > I suggest that if quorum is up format should not fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira