[ 
https://issues.apache.org/jira/browse/HDFS-10719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377862#comment-17377862
 ] 

Denis Serduik commented on HDFS-10719:
--------------------------------------

[~kpalanisamy] I've been beaten by this fix/issue. Consider scenario when 
active NN VM goes down ( temporary due to network issue or permanently due to 
hardware failure).  Standby NN can't get activate exactly for this reason as 
part of qjournal connection string. It causes whole (what? ) NN goes down. See 
logs below:


{noformat}
2021-07-09 06:34:53,932 INFO namenode.RedundantEditLogInputStream: 
Fast-forwarding stream 
'http://XX-XXXXX-XXX.internal.cloudapp.net:8480/getJournal?jid=conduit&segmentTxId=449&storageInfo=-65%3A326724551%3A1625810812563%3ACID-b9e44cb5-91e4-4a55-85dc-6442fa9e44e4&inProgressOk=true'
 to transaction ID 4492021-07-09 06:34:53,932 INFO 
namenode.RedundantEditLogInputStream: Fast-forwarding stream 
'http://XX-XXXXX-XXX.internal.cloudapp.net:8480/getJournal?jid=conduit&segmentTxId=449&storageInfo=-65%3A326724551%3A1625810812563%3ACID-b9e44cb5-91e4-4a55-85dc-6442fa9e44e4&inProgressOk=true'
 to transaction ID 4492021-07-09 06:34:53,970 INFO namenode.FSImage: Edits file 
http://XX-XXXXX-XXX.internal.cloudapp.net:8480/getJournal?jid=conduit&segmentTxId=449&storageInfo=-65%3A326724551%3A1625810812563%3ACID-b9e44cb5-91e4-4a55-85dc-6442fa9e44e4&inProgressOk=true,
 
http://YYY-YYYYY-YYYY.internal.cloudapp.net:8480/getJournal?jid=conduit&segmentTxId=449&storageInfo=-65%3A326724551%3A1625810812563%3ACID-b9e44cb5-91e4-4a55-85dc-6442fa9e44e4&inProgressOk=true
 of size 42 edits # 2 loaded in 0 seconds2021-07-09 06:36:22,127 INFO 
namenode.FSNamesystem: Stopping services started for standby state2021-07-09 
06:36:22,128 WARN ha.EditLogTailer: Edit log tailer 
interruptedjava.lang.InterruptedException: sleep interrupted at 
java.lang.Thread.sleep(Native Method) at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:469)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:399)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:416)
 at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:484)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:412)2021-07-09
 06:36:22,130 INFO namenode.FSNamesystem: Starting services required for active 
state2021-07-09 06:36:22,131 ERROR namenode.NameNode: Error encountered 
requiring NN shutdown. Shutting down 
immediately.java.lang.IllegalArgumentException: Unable to construct journal, 
qjournal://XX-XXXXX-XXX.internal.cloudapp.net:8485;YYY-YYYYY-YYYY.internal.cloudapp.net:8485;ZZ-ZZZZZ-ZZZZ.internal.cloudapp.net:8485/conduit
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1824)
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:294)
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournalsForWrite(FSEditLog.java:259)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1223)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1890)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1749)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1742)
 at 
org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
 at 
org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at 
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at 
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)Caused by: 
java.lang.reflect.InvocationTargetException at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1811)
 ... 19 moreCaused by: java.net.UnknownHostException: 
XX-XXXXX-XXX.internal.cloudapp.net:8485 at 
org.apache.hadoop.hdfs.server.common.Util.getAddressesList(Util.java:378) at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:388)
 at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:170)
 at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.<init>(QuorumJournalManager.java:126)
 at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.<init>(QuorumJournalManager.java:105)
 ... 24 more2021-07-09 06:36:22,132 INFO util.ExitUtil: Exiting with status 1: 
java.lang.IllegalArgumentException: Unable to construct journal, 
qjournal://XX-XXXXX-XXX.internal.cloudapp.net:8485;YYY-YYYYY-YYYY.internal.cloudapp.net:8485;ZZ-ZZZZZ-ZZZZ.internal.cloudapp.net:8485/conduit2021-07-09
 06:36:22,133 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************SHUTDOWN_MSG: 
Shutting down NameNode at 
YYY-YYYYY-YYYY/10.9.0.5************************************************************/WARNING:
 HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of 
HADOOP_PREFIX.2021-07-09 06:36:22,837 INFO namenode.NameNode: STARTUP_MSG: 
{noformat}

> In HA, Namenode is failed to start If any of the Quorum hostname is 
> unresolved.
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-10719
>                 URL: https://issues.apache.org/jira/browse/HDFS-10719
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: journal-node, namenode
>    Affects Versions: 2.7.1
>         Environment: HDP-2.4
>            Reporter: Karthik Palanisamy
>            Assignee: Karthik Palanisamy
>            Priority: Major
>              Labels: patch
>         Attachments: HDFS-10719-1.patch, HDFS-10719-2.patch, 
> HDFS-10719-3.patch, HDFS-10719-4.patch
>
>
> 2016-08-03 02:53:53,760 ERROR namenode.NameNode (NameNode.java:main(1712)) - 
> Failed to start namenode.
> java.lang.IllegalArgumentException: Unable to construct journal, 
> qjournal://xxxx1:8485;xxxx2:8485;xxxx3:8485/shva
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1637)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:282)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.initSharedJournalsForRead(FSEditLog.java:260)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.initEditLog(FSImage.java:789)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:634)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:983)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:688)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:662)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:726)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:951)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:935)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1641)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1707)
> Caused by: java.lang.reflect.InvocationTargetException
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1635)
>         ... 13 more
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.getName(IPCLoggerChannelMetrics.java:107)
>         at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.create(IPCLoggerChannelMetrics.java:91)
>         at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.<init>(IPCLoggerChannel.java:178)
>         at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$1.createLogger(IPCLoggerChannel.java:156)
>         at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:367)
>         at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:149)
>         at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.<init>(QuorumJournalManager.java:116)
>         at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.<init>(QuorumJournalManager.java:105)
>         ... 18 more
> 2016-08-03 02:53:53,765 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - 
> Exiting with status 1
> 2016-08-03 02:53:53,768 INFO  namenode.NameNode (LogAdapter.java:info(47)) - 
> SHUTDOWN_MSG:
> *and the failover is not successful*
> I have attached the patch, It allows the Namenode to start if the majority of 
> the Quorums are resolvable.
> throws warning if the quorum is unresolvable.
> throws Unknown host exception if the majority of the journals are 
> unresolvable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to