[ https://issues.apache.org/jira/browse/HDFS-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105447#comment-16105447 ]
Chen Liang commented on HDFS-12216: ----------------------------------- Thanks [~msingh] for filing this! I was also just about to this :). I spend some time looking at {{TestKeys}} fail. So just to add some info here. Looks like {{testPutAndGetKeyWithDnRestart}} is the one that fails all the time. Seems that after {{restartDatanode(cluster, 0, helper.client);}} gets called, the call to {{xceiverClientManager.acquireClient(pipeline);}} will fail. The exact place where the exception is thrown is {{XceiverClientManager#getClient()}}, look like when XceverClient tries to connect the same server port as it was before the restart, but the other end is simply no longer listening (Connection refused). I am currently thinking it might be that the dn restart is not properly functioning somewhere, or the XceverClient should be connecting to some different port after the restart. Haven't looked deeper though. Hope this helps. > Ozone: TestKeys and TestKeysRatis are failing consistently > ---------------------------------------------------------- > > Key: HDFS-12216 > URL: https://issues.apache.org/jira/browse/HDFS-12216 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone > Affects Versions: HDFS-7240 > Reporter: Mukul Kumar Singh > Assignee: Mukul Kumar Singh > Fix For: HDFS-7240 > > > TestKeys and TestKeysRatis are failing consistently as noted in test logs for > HDFS-12183 > TestKeysRatis is failing because of the following error > {code} > 2017-07-28 23:11:28,783 [StateMachineUpdater-127.0.0.1:55793] ERROR > impl.StateMachineUpdater (ExitUtils.java:terminate(80)) - Terminating with > exit status 2: StateMachineUpdater-127.0.0.1:55793: the StateMachineUpdater > hits Throwable > org.iq80.leveldb.DBException: Closed > at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:123) > at org.apache.hadoop.utils.LevelDBStore.put(LevelDBStore.java:98) > at > org.apache.hadoop.ozone.container.common.impl.KeyManagerImpl.putKey(KeyManagerImpl.java:90) > at > org.apache.hadoop.ozone.container.common.impl.Dispatcher.handlePutKey(Dispatcher.java:547) > at > org.apache.hadoop.ozone.container.common.impl.Dispatcher.keyProcessHandler(Dispatcher.java:206) > at > org.apache.hadoop.ozone.container.common.impl.Dispatcher.dispatch(Dispatcher.java:110) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatch(ContainerStateMachine.java:94) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.applyTransaction(ContainerStateMachine.java:81) > at > org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:913) > at > org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:142) > at java.lang.Thread.run(Thread.java:745) > {code} > where as TestKeys is failing because of > {code} > 2017-07-28 23:14:20,889 [Thread-486] INFO scm.XceiverClientManager > (XceiverClientManager.java:getClient(158)) - exception > java.util.concurrent.ExecutionException: java.net.ConnectException: > Connection refused: /127.0.0.1:55914 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org