[jira] [Commented] (HDFS-15618) Improve datanode shutdown latency

Ahmed Hussein (Jira) Fri, 16 Oct 2020 17:05:33 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215697#comment-17215697
 ]


Ahmed Hussein commented on HDFS-15618:
--------------------------------------

I checked the failing Junit tests. They are unrelated to the patch.
I will file new Jiras for those tests that seem to be broken for sometime. On 
multiple occasions, I started to notice "Dominos effect" in HDFS tests. A test 
fails or times out causes other tests to fail because they could not bind to a 
port or they could not get enough resources.
An example was that in the testRead() between TestBlockTokenWithDFSStriped and 
TestBlockTokenWithDFS where port 19870 is used by the two test cases.

The following is the stack trace of TestDeadNodeDetection failure reported by 
hadoopQA.

{code:bash}
java.net.BindException: Problem binding to [localhost:44881] 
java.net.BindException: Address already in use; For more details see:  
http://wiki.apache.org/hadoop/BindException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:908)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:809)
        at org.apache.hadoop.ipc.Server.bind(Server.java:640)
        at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:1210)
        at org.apache.hadoop.ipc.Server.<init>(Server.java:3103)
        at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:1039)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server.<init>(ProtobufRpcEngine2.java:430)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2.getServer(ProtobufRpcEngine2.java:350)
        at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:848)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:1031)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1452)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:513)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2868)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2774)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2818)
        at 
org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2494)
        at 
org.apache.hadoop.hdfs.TestDeadNodeDetection.testDeadNodeDetectionDeadNodeRecovery(TestDeadNodeDetection.java:226)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
        at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
        at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
        at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
        at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
Caused by: java.net.BindException: Address already in use
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:433)
        at sun.nio.ch.Net.bind(Net.java:425)
        at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:220)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:85)
        at org.apache.hadoop.ipc.Server.bind(Server.java:623)
        ... 41 more
{code}


> Improve datanode shutdown latency
> ---------------------------------
>
>                 Key: HDFS-15618
>                 URL: https://issues.apache.org/jira/browse/HDFS-15618
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Major
>         Attachments: HDFS-15618.001.patch, HDFS-15618.002.patch, 
> HDFS-15618.003.patch, HDFS-15618.004.patch
>
>
> The shutdown of Datanode is a very long latency. A block scanner waits for 5 
> minutes to join on each VolumeScanner thread.
> Since the scanners are daemon threads and do not alter the block content, it 
> is safe to ignore such conditions on shutdown of Datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15618) Improve datanode shutdown latency

Reply via email to