[ https://issues.apache.org/jira/browse/HDFS-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985319#comment-13985319 ]
Binglin Chang commented on HDFS-6308: ------------------------------------- Related error log: {code} 2014-04-28 05:18:19,700 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(197)) - 1418: Call -> /127.0.0.1:58789: getHdfsBlockLocations {tokens { identifier: "" password: "" kind: "" service: "" } tokens { identifier: "" password: "" kind: "" service: "" } blockPoolId: "BP-1664789652-67.195.138.24-1398662297553" blockIds: 1073741825 blockIds: 1073741826} 2014-04-28 05:18:19,700 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(197)) - 1419: Call -> /127.0.0.1:45933: getHdfsBlockLocations {tokens { identifier: "" password: "" kind: "" service: "" } tokens { identifier: "" password: "" kind: "" service: "" } blockPoolId: "BP-1664789652-67.195.138.24-1398662297553" blockIds: 1073741825 blockIds: 1073741826} 2014-04-28 05:18:19,701 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(211)) - 1418: Exception <- localhost/127.0.0.1:58789: getHdfsBlockLocations {java.net.ConnectException: Call From asf000.sp2.ygridcore.net/67.195.138.24 to localhost:58789 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused} 2014-04-28 05:18:19,701 INFO ipc.Server (Server.java:doRead(762)) - Socket Reader #1 for port 45933: readAndProcess from client 127.0.0.1 threw exception [java.io.IOException: Connection reset by peer] java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) at sun.nio.ch.IOUtil.read(IOUtil.java:171) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.hadoop.ipc.Server.channelRead(Server.java:2644) at org.apache.hadoop.ipc.Server.access$2800(Server.java:133) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1517) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:753) at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:627) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:598) 2014-04-28 05:18:19,702 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(211)) - 1419: Exception <- /127.0.0.1:45933: getHdfsBlockLocations {java.net.SocketTimeoutException: Call From asf000.sp2.ygridcore.net/67.195.138.24 to localhost:45933 failed on socket timeout exception: java.net.SocketTimeoutException: 1500 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:56102 remote=/127.0.0.1:45933]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout} 2014-04-28 05:18:19,702 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(211)) - 1415: Exception <- localhost/127.0.0.1:45933: getHdfsBlockLocations {java.net.SocketTimeoutException: Call From asf000.sp2.ygridcore.net/67.195.138.24 to localhost:45933 failed on socket timeout exception: java.net.SocketTimeoutException: 1500 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:56102 remote=/127.0.0.1:45933]; For more details see: {code} socket read/write timeout is set to 1500ms, timeout error is global(per connection), so when timeout occurs, all calls in this connection are marked timeout, but the expected behavior should be: first call timeout, second call normal. There is a simple fix, just invoke second call after the connection is closed for sure. We can consider improving ipc.Client to prevent this kind of corner case later. > TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky > ------------------------------------------------------------------------ > > Key: HDFS-6308 > URL: https://issues.apache.org/jira/browse/HDFS-6308 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Binglin Chang > > Found this on pre-commit build of HDFS-6261 > {code} > java.lang.AssertionError: Expected one valid and one invalid volume > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hdfs.TestDistributedFileSystem.testGetFileBlockStorageLocationsError(TestDistributedFileSystem.java:837) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)