[ 
https://issues.apache.org/jira/browse/HDFS-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923203#action_12923203
 ] 

Alex Rovner commented on HDFS-1075:
-----------------------------------

We are constantly experiencing this issue. When is the planned resolution date?

For the short term should I lower the dfs.datanode.socket.write.timeout ?? 

If so to what value?

Excerpt from the log:
2010-10-19 19:51:27,499 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(192.168.5.8:50010, 
storageID=DS-686623457-192.168.5.8-50010-1249666572430, infoPort=50075, ipcPor
t=50020):DataXceiver
java.net.SocketTimeoutException: 480000 millis timeout while waiting for 
channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
local=/192.168.5.8:50010 remote=/192.168.5.8:3
2828]
at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:401)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
at java.lang.Thread.run(Thread.java:619) 

> Separately configure connect timeouts from read timeouts in data path
> ---------------------------------------------------------------------
>
>                 Key: HDFS-1075
>                 URL: https://issues.apache.org/jira/browse/HDFS-1075
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node, hdfs client
>            Reporter: Todd Lipcon
>
> The timeout configurations in the write pipeline overload the read timeout to 
> also include a connect timeout. In my experience, if a node is down it can 
> take many seconds to get back an exception connect, whereas if it is up it 
> will accept almost immediately, even if heavily loaded (the kernel listen 
> backlog picks it up very fast). So in the interest of faster dead node 
> detection from the writer perspective, the connect timeout should be 
> configured separately, usually to a much lower time than the read timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to