[ https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215697#comment-17215697 ]
Ahmed Hussein commented on HDFS-15618: -------------------------------------- I checked the failing Junit tests. They are unrelated to the patch. I will file new Jiras for those tests that seem to be broken for sometime. On multiple occasions, I started to notice "Dominos effect" in HDFS tests. A test fails or times out causes other tests to fail because they could not bind to a port or they could not get enough resources. An example was that in the testRead() between TestBlockTokenWithDFSStriped and TestBlockTokenWithDFS where port 19870 is used by the two test cases. The following is the stack trace of TestDeadNodeDetection failure reported by hadoopQA. {code:bash} java.net.BindException: Problem binding to [localhost:44881] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:908) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:809) at org.apache.hadoop.ipc.Server.bind(Server.java:640) at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:1210) at org.apache.hadoop.ipc.Server.<init>(Server.java:3103) at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:1039) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server.<init>(ProtobufRpcEngine2.java:430) at org.apache.hadoop.ipc.ProtobufRpcEngine2.getServer(ProtobufRpcEngine2.java:350) at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:848) at org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:1031) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1452) at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:513) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2868) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2774) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2818) at org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2494) at org.apache.hadoop.hdfs.TestDeadNodeDetection.testDeadNodeDetectionDeadNodeRecovery(TestDeadNodeDetection.java:226) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:220) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:85) at org.apache.hadoop.ipc.Server.bind(Server.java:623) ... 41 more {code} > Improve datanode shutdown latency > --------------------------------- > > Key: HDFS-15618 > URL: https://issues.apache.org/jira/browse/HDFS-15618 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Reporter: Ahmed Hussein > Assignee: Ahmed Hussein > Priority: Major > Attachments: HDFS-15618.001.patch, HDFS-15618.002.patch, > HDFS-15618.003.patch, HDFS-15618.004.patch > > > The shutdown of Datanode is a very long latency. A block scanner waits for 5 > minutes to join on each VolumeScanner thread. > Since the scanners are daemon threads and do not alter the block content, it > is safe to ignore such conditions on shutdown of Datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org