[ https://issues.apache.org/jira/browse/HDFS-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923203#action_12923203 ]
Alex Rovner commented on HDFS-1075: ----------------------------------- We are constantly experiencing this issue. When is the planned resolution date? For the short term should I lower the dfs.datanode.socket.write.timeout ?? If so to what value? Excerpt from the log: 2010-10-19 19:51:27,499 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.5.8:50010, storageID=DS-686623457-192.168.5.8-50010-1249666572430, infoPort=50075, ipcPor t=50020):DataXceiver java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/192.168.5.8:50010 remote=/192.168.5.8:3 2828] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:401) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) at java.lang.Thread.run(Thread.java:619) > Separately configure connect timeouts from read timeouts in data path > --------------------------------------------------------------------- > > Key: HDFS-1075 > URL: https://issues.apache.org/jira/browse/HDFS-1075 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client > Reporter: Todd Lipcon > > The timeout configurations in the write pipeline overload the read timeout to > also include a connect timeout. In my experience, if a node is down it can > take many seconds to get back an exception connect, whereas if it is up it > will accept almost immediately, even if heavily loaded (the kernel listen > backlog picks it up very fast). So in the interest of faster dead node > detection from the writer perspective, the connect timeout should be > configured separately, usually to a much lower time than the read timeout. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.