[ 
http://issues.apache.org/jira/browse/HADOOP-141?page=comments#action_12375401 ] 

Doug Cutting commented on HADOOP-141:
-------------------------------------

Some timeouts during the copy phase may not be bad.  If too many nodes are 
transferring from a given node, then it may time out additional requests.  And 
if a one node is already transferring from a another node for one task, then 
attempts by a second task to transfer may timeout (due to the shared connection 
pool).  These should not affect overall performance too much, especially if the 
timeout is relatively short.

> Disk thrashing / task timeouts during map output copy phase
> -----------------------------------------------------------
>
>          Key: HADOOP-141
>          URL: http://issues.apache.org/jira/browse/HADOOP-141
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>  Environment: linux
>     Reporter: paul sutter

>
> MapOutputProtocol connections cause timeouts because of system thrashing and 
> transferring the same file over and over again, ultimately leading to making 
> no forward progress(medium sized job, 500GB input file, map output about as 
> large as the input, 10 node cluster).
> There are several bugs behind this, but the following two changes improved 
> matters considerably.
> (1) 
> The buffersize in MapOutputFile is currently hardcoded to 8192 bytes (for 
> both reads and writes). By changing this buffer size to 256KB, the number of 
> disk seeks are reduced and the problem went away. 
> Ideally there would be a buffer size parameter for this that is separate from 
> the DFS io buffer size.
> (2)
> I also added the following code to the socket configuration in both 
> Server.java and Client.java. No linger is a minor good idea in an enivronment 
> with some packet loss (and you will have that when all the nodes get busy at 
> once), but 256KB buffers is probably excessive, especially on a LAN, but it 
> takes me two hours to test changes so I havent experimented.
> socket.setSendBufferSize(256*1024);
> socket.setReceiveBufferSize(256*1024);
> socket.setSoLinger(false, 0);
> socket.setKeepAlive(true);

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to