[ 
https://issues.apache.org/jira/browse/HDFS-693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894627#action_12894627
 ] 

Cody Saunders commented on HDFS-693:
------------------------------------

I just wanted to re-iterate the above statement about the portions of code 
where the value is added in with the WRITE_TIMEOUT_EXTENSION constant * number 
of nodes...

If using '0' to get around write timeout problems is a bad practice or not is 
probably the first question. If so, is it documented somewhere? If not, then 
logic like I've pointed out above would break the idea of using "infinite" 
wait. 

I ran into timeout conditions this time starting with exception:
50010-1267539292546, infoPort=50075, ipcPort=50020):Exception writing block 
blk_3120944928137673159_2109400 to mirror 192.168.130.94:50010
java.net.SocketException: Broken pipe
    at java.net.SocketOutputStream.socketWrite0(Native Method)
    at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
    at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
    at java.io.DataOutputStream.write(DataOutputStream.java:90)
    at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:401)
    at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524)
    at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
    at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
    at java.lang.Thread.run(Thread.java:619)


When I look at
BlockReceiver.java:401

I see it goes back to mirrorOut, defined before the call to receiveBlock in 
dataXceiver, as:

(line 285)          mirrorOut = new DataOutputStream(
             new BufferedOutputStream(
                         NetUtils.getOutputStream(mirrorSock, writeTimeout),
                         SMALL_BUFFER_SIZE));

with writeTimeout from line 280:

 int writeTimeout = datanode.socketWriteTimeout +
                             (HdfsConstants.WRITE_TIMEOUT_EXTENSION * 
numTargets);


I have avoided almost every timeout but occasional read-side timeouts in my 
very slow VM environment by setting dfs.datanode.socket.write.timeout to 
something like 1000000, but had not yet tried this in production, where it is 
still zero, and I received the timeout complaint.

At the time I was writing 1M records per hour from about 8 different clients of 
hbase.

> java.net.SocketTimeoutException: 480000 millis timeout while waiting for 
> channel to be ready for write exceptions were cast when trying to read file 
> via StreamFile.
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-693
>                 URL: https://issues.apache.org/jira/browse/HDFS-693
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 0.20.1
>            Reporter: Yajun Dong
>         Attachments: HDFS-693.log
>
>
> To exclude the case of network problem, I found the count of  dataXceiver is 
> about 30.  Also, I could see the output of netstate -a | grep 50075 has many 
> TIME_WAIT status when this happened.
> partial log in attachment. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to