[ http://issues.apache.org/jira/browse/HADOOP-195?page=comments#action_12378311 ]
eric baldeschwieler commented on HADOOP-195: -------------------------------------------- Good list paul! Some very simple changes should have a big impact on sort behavior as you observe. We'll start working on that once it becomes the bottleneck. One simple way to increase the file sizes is to reduce the number of reduces significantly and increase the DFS block size to 64 or 128meg. We'll play with these (if I can convince owen). I think we should bump the hadoop default block size to 128m, this is still small enough to replicate quickly, but will reduce #map jobs significantly when you just want to scan data. Reduce the number of reduces as well and we'll have significantly larger transactions. All that said, I think we are probably uncovering things in the RPC layer (and server threading) more than basic network issues, since we're running on a decent network and not even beginning to approach saturating it. But we'll certainly play with "setTcpNoDelay." It will be interesting to see if that moves things along. > transfer map output transfer with http instead of rpc > ----------------------------------------------------- > > Key: HADOOP-195 > URL: http://issues.apache.org/jira/browse/HADOOP-195 > Project: Hadoop > Type: Improvement > Components: mapred > Versions: 0.2 > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Fix For: 0.3 > > The data transfer of the map output should be transfered via http instead > rpc, because rpc is very slow for this application and the timeout behavior > is suboptimal. (server sends data and client ignores it because it took more > than 10 seconds to be received.) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
