[ https://issues.apache.org/jira/browse/HDFS-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eric Badger updated HDFS-10755: ------------------------------- Status: Patch Available (was: Open) > TestDecommissioningStatus BindException Failure > ----------------------------------------------- > > Key: HDFS-10755 > URL: https://issues.apache.org/jira/browse/HDFS-10755 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Eric Badger > Assignee: Eric Badger > Attachments: HDFS-10755.001.patch > > > Tests in TestDecomissioningStatus call MiniDFSCluster.dataNodeRestart(). They > are required to come back up on the same (initially ephemeral) port that they > were on before being shutdown. Because of this, there is an inherent race > condition where another process could bind to the port while the datanode is > down. If this happens then we get a BindException failure. However, all of > the tests in TestDecommissioningStatus depend on the cluster being up and > running for them to run correctly. So if a test blows up the cluster, the > subsequent tests will also fail. Below I show the BindException failure as > well as the subsequent test failure that occurred. > {noformat} > java.net.BindException: Problem binding to [localhost:35370] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:436) > at sun.nio.ch.Net.bind(Net.java:428) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at org.apache.hadoop.ipc.Server.bind(Server.java:430) > at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:768) > at org.apache.hadoop.ipc.Server.<init>(Server.java:2391) > at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:951) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:523) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:498) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:796) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:802) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1134) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:429) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2387) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2274) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2321) > at > org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2037) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionDeadDN(TestDecommissioningStatus.java:426) > {noformat} > {noformat} > java.lang.AssertionError: Number of Datanodes expected:<2> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:275) > {noformat} > I don't think there's any way to avoid the inherent race condition with > getting the same ephemeral port, but we can definitely fix the tests so that > it doesn't cause subsequent tests to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org