[ 
https://issues.apache.org/jira/browse/HDFS-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985319#comment-13985319
 ] 

Binglin Chang commented on HDFS-6308:
-------------------------------------

Related error log:

{code}
2014-04-28 05:18:19,700 TRACE ipc.ProtobufRpcEngine 
(ProtobufRpcEngine.java:invoke(197)) - 1418: Call -> /127.0.0.1:58789: 
getHdfsBlockLocations {tokens { identifier: "" password: "" kind: "" service: 
"" } tokens { identifier: "" password: "" kind: "" service: "" } blockPoolId: 
"BP-1664789652-67.195.138.24-1398662297553" blockIds: 1073741825 blockIds: 
1073741826}
2014-04-28 05:18:19,700 TRACE ipc.ProtobufRpcEngine 
(ProtobufRpcEngine.java:invoke(197)) - 1419: Call -> /127.0.0.1:45933: 
getHdfsBlockLocations {tokens { identifier: "" password: "" kind: "" service: 
"" } tokens { identifier: "" password: "" kind: "" service: "" } blockPoolId: 
"BP-1664789652-67.195.138.24-1398662297553" blockIds: 1073741825 blockIds: 
1073741826}
2014-04-28 05:18:19,701 TRACE ipc.ProtobufRpcEngine 
(ProtobufRpcEngine.java:invoke(211)) - 1418: Exception <- 
localhost/127.0.0.1:58789: getHdfsBlockLocations {java.net.ConnectException: 
Call From asf000.sp2.ygridcore.net/67.195.138.24 to localhost:58789 failed on 
connection exception: java.net.ConnectException: Connection refused; For more 
details see:  http://wiki.apache.org/hadoop/ConnectionRefused}
2014-04-28 05:18:19,701 INFO  ipc.Server (Server.java:doRead(762)) - Socket 
Reader #1 for port 45933: readAndProcess from client 127.0.0.1 threw exception 
[java.io.IOException: Connection reset by peer]
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
        at sun.nio.ch.IOUtil.read(IOUtil.java:171)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
        at org.apache.hadoop.ipc.Server.channelRead(Server.java:2644)
        at org.apache.hadoop.ipc.Server.access$2800(Server.java:133)
        at 
org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1517)
        at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:753)
        at 
org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:627)
        at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:598)
2014-04-28 05:18:19,702 TRACE ipc.ProtobufRpcEngine 
(ProtobufRpcEngine.java:invoke(211)) - 1419: Exception <- /127.0.0.1:45933: 
getHdfsBlockLocations {java.net.SocketTimeoutException: Call From 
asf000.sp2.ygridcore.net/67.195.138.24 to localhost:45933 failed on socket 
timeout exception: java.net.SocketTimeoutException: 1500 millis timeout while 
waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/127.0.0.1:56102 
remote=/127.0.0.1:45933]; For more details see:  
http://wiki.apache.org/hadoop/SocketTimeout}
2014-04-28 05:18:19,702 TRACE ipc.ProtobufRpcEngine 
(ProtobufRpcEngine.java:invoke(211)) - 1415: Exception <- 
localhost/127.0.0.1:45933: getHdfsBlockLocations 
{java.net.SocketTimeoutException: Call From 
asf000.sp2.ygridcore.net/67.195.138.24 to localhost:45933 failed on socket 
timeout exception: java.net.SocketTimeoutException: 1500 millis timeout while 
waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/127.0.0.1:56102 
remote=/127.0.0.1:45933]; For more details see:  
{code}

socket read/write timeout is set to 1500ms, timeout error is global(per 
connection), so when timeout occurs, all calls in this connection are marked 
timeout, but the expected behavior should be: first call timeout, second call 
normal.

There is a simple fix, just invoke second call after the connection is closed 
for sure.

We can consider improving ipc.Client to prevent this kind of corner case later.




> TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky
> ------------------------------------------------------------------------
>
>                 Key: HDFS-6308
>                 URL: https://issues.apache.org/jira/browse/HDFS-6308
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Binglin Chang
>
> Found this on pre-commit build of HDFS-6261
> {code}
> java.lang.AssertionError: Expected one valid and one invalid volume
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.assertTrue(Assert.java:41)
>       at 
> org.apache.hadoop.hdfs.TestDistributedFileSystem.testGetFileBlockStorageLocationsError(TestDistributedFileSystem.java:837)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to