[jira] Commented: (HADOOP-141) Disk thrashing / task timeouts during map output copy phase

p sutter (JIRA) Fri, 13 Oct 2006 19:51:23 -0700

    [ 
http://issues.apache.org/jira/browse/HADOOP-141?page=comments#action_12442193 ] 
            
p sutter commented on HADOOP-141:
---------------------------------



   [[ Old comment, sent by email on Wed, 2 Aug 2006 13:47:05 -0700 ]]

Close it out! The new shuffle path is really great.




> Disk thrashing / task timeouts during map output copy phase
> -----------------------------------------------------------
>
>                 Key: HADOOP-141
>                 URL: http://issues.apache.org/jira/browse/HADOOP-141
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>         Environment: linux
>            Reporter: p sutter
>
> MapOutputProtocol connections cause timeouts because of system thrashing and 
> transferring the same file over and over again, ultimately leading to making 
> no forward progress(medium sized job, 500GB input file, map output about as 
> large as the input, 10 node cluster).
> There are several bugs behind this, but the following two changes improved 
> matters considerably.
> (1) 
> The buffersize in MapOutputFile is currently hardcoded to 8192 bytes (for 
> both reads and writes). By changing this buffer size to 256KB, the number of 
> disk seeks are reduced and the problem went away. 
> Ideally there would be a buffer size parameter for this that is separate from 
> the DFS io buffer size.
> (2)
> I also added the following code to the socket configuration in both 
> Server.java and Client.java. No linger is a minor good idea in an enivronment 
> with some packet loss (and you will have that when all the nodes get busy at 
> once), but 256KB buffers is probably excessive, especially on a LAN, but it 
> takes me two hours to test changes so I havent experimented.
> socket.setSendBufferSize(256*1024);
> socket.setReceiveBufferSize(256*1024);
> socket.setSoLinger(false, 0);
> socket.setKeepAlive(true);

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-141) Disk thrashing / task timeouts during map output copy phase

Reply via email to