[ https://issues.apache.org/jira/browse/HDFS-693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894627#action_12894627 ]
Cody Saunders commented on HDFS-693: ------------------------------------ I just wanted to re-iterate the above statement about the portions of code where the value is added in with the WRITE_TIMEOUT_EXTENSION constant * number of nodes... If using '0' to get around write timeout problems is a bad practice or not is probably the first question. If so, is it documented somewhere? If not, then logic like I've pointed out above would break the idea of using "infinite" wait. I ran into timeout conditions this time starting with exception: 50010-1267539292546, infoPort=50075, ipcPort=50020):Exception writing block blk_3120944928137673159_2109400 to mirror 192.168.130.94:50010 java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:401) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) at java.lang.Thread.run(Thread.java:619) When I look at BlockReceiver.java:401 I see it goes back to mirrorOut, defined before the call to receiveBlock in dataXceiver, as: (line 285) mirrorOut = new DataOutputStream( new BufferedOutputStream( NetUtils.getOutputStream(mirrorSock, writeTimeout), SMALL_BUFFER_SIZE)); with writeTimeout from line 280: int writeTimeout = datanode.socketWriteTimeout + (HdfsConstants.WRITE_TIMEOUT_EXTENSION * numTargets); I have avoided almost every timeout but occasional read-side timeouts in my very slow VM environment by setting dfs.datanode.socket.write.timeout to something like 1000000, but had not yet tried this in production, where it is still zero, and I received the timeout complaint. At the time I was writing 1M records per hour from about 8 different clients of hbase. > java.net.SocketTimeoutException: 480000 millis timeout while waiting for > channel to be ready for write exceptions were cast when trying to read file > via StreamFile. > -------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-693 > URL: https://issues.apache.org/jira/browse/HDFS-693 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node > Affects Versions: 0.20.1 > Reporter: Yajun Dong > Attachments: HDFS-693.log > > > To exclude the case of network problem, I found the count of dataXceiver is > about 30. Also, I could see the output of netstate -a | grep 50075 has many > TIME_WAIT status when this happened. > partial log in attachment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.