[ https://issues.apache.org/jira/browse/HDFS-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007576#comment-16007576 ]
Doris Gu commented on HDFS-11753: --------------------------------- I made more tests these days, and the conclusion about multiple jn daemons was: *1.Branch-2(e.g. 2.8.1,2.7.3,2.6.0) does have the bug* {code:title=A. First a normal environment.|borderStyle=solid} hdfs@localhost:~> jps 46453 DFSZKFailoverController 5119 Jps 4311 JournalNode 46859 NameNode 46888 DataNode {code} {code:title=B. Start jn once more and it hangs up while nn or dn don't have the problem.|borderStyle=solid} hdfs@localhost:~> hdfs journalnode 2017-05-12 10:32:17,291 INFO org.apache.hadoop.hdfs.qjournal.server.JournalNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting JournalNode STARTUP_MSG: host = localhost/127.0.0.1 STARTUP_MSG: args = [] ...... 2017-05-12 10:32:18,571 INFO org.apache.hadoop.http.HttpServer2: HttpServer.start() threw a non Bind IOException java.net.BindException: Port in use: 0.0.0.0:8480 at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:919) at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:856) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeHttpServer.start(JournalNodeHttpServer.java:69) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.start(JournalNode.java:163) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.run(JournalNode.java:137) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.main(JournalNode.java:310) Caused by: java.net.BindException: address in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216) at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:914) ... 7 more Exception in thread "main" java.net.BindException: Port in use: 0.0.0.0:8480 at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:919) at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:856) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeHttpServer.start(JournalNodeHttpServer.java:69) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.start(JournalNode.java:163) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.run(JournalNode.java:137) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.main(JournalNode.java:310) Caused by: java.net.BindException: address in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216) at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:914) ... 7 more {code} {code:title=C. Get multiple jn daemons.|borderStyle=solid} hdfs@localhost:~> jps 45930 JournalNode 46453 DFSZKFailoverController 4311 JournalNode 46305 Jps 46859 NameNode 46888 DataNode {code} {code:title=Appendix. Abnormal jn thread dump.|borderStyle=solid} hdfs@localhost:~> jstack 45930 2017-05-12 10:42:52 Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.55-b03 mixed mode): "Attach Listener" daemon prio=10 tid=0x00007f87e478b800 nid=0x110a3 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "DestroyJavaVM" prio=10 tid=0x00007f87e400f000 nid=0xb392 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "pool-1-thread-1" prio=10 tid=0x00007f87e4a2e000 nid=0xb3ae waiting on condition [0x00007f87d9a92000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000ef60b7a8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "Timer for 'JournalNode' metrics system" daemon prio=10 tid=0x00007f87e4896000 nid=0xb3ac in Object.wait() [0x00007f87d9de4000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000000ed889a90> (a java.util.TaskQueue) at java.util.TimerThread.mainLoop(Timer.java:552) - locked <0x00000000ed889a90> (a java.util.TaskQueue) at java.util.TimerThread.run(Timer.java:505) "Service Thread" daemon prio=10 tid=0x00007f87e40a8000 nid=0xb3a2 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE "C2 CompilerThread1" daemon prio=10 tid=0x00007f87e40a5800 nid=0xb3a1 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "C2 CompilerThread0" daemon prio=10 tid=0x00007f87e40a2800 nid=0xb3a0 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x00007f87e4098800 nid=0xb39f runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE ...... {code} These exceptions should be caught and make later jn daemon exit, I apply my patch, and start jn once more, it exit. *2.Trunk has made excellent shell rewrite and improvement that avoid multiple jn daemons in advance* {code:title=Shell makes protection|borderStyle=solid} root:~/version/hadoop-3.0.0-alpha2/bin$ ./hdfs journalnode journalnode is running as process 30273. Stop it first. {code} Yet I still think journalnode itself should catch exceptions as namenode and datanode do, this will be better. Above all, I split usage into HDFS-11806. And modify this issue to focus on solving multiple journalnode daemons. > Make Some Enhancements about JournalNode Daemon > ------------------------------------------------ > > Key: HDFS-11753 > URL: https://issues.apache.org/jira/browse/HDFS-11753 > Project: Hadoop HDFS > Issue Type: Improvement > Components: journal-node > Affects Versions: 3.0.0-alpha2 > Reporter: Doris Gu > Attachments: HDFS-11753.001.patch > > > 1.Add support -h. Right now, if I use *hdfs journalnode -h* , I straightly > start journalnode daemon. But generally speakingļ¼ I just want to look at the > usage. > 2.Add exception catch and termination. If I start journalnode with different > directions stored pids, I get servel journalnode daemons that don't work for > I config the same port. > {quote}[hdfs@localhost ~]$ jps > *10107 JournalNode* > *46023 JournalNode* > 57944 NameNode > 46539 Jps > 57651 DFSZKFailoverController > 57909 DataNode > *57739 JournalNode* > *45721 JournalNode*{quote} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org