[ 
http://issues.apache.org/jira/browse/HADOOP-195?page=comments#action_12378311 ] 

eric baldeschwieler commented on HADOOP-195:
--------------------------------------------

Good list paul!

Some very simple changes should have a big impact on sort behavior as you 
observe. We'll start working on that once it becomes the bottleneck.

One simple way to increase the file sizes is to reduce the number of reduces 
significantly and increase the DFS block size to 64 or 128meg.

We'll play with these (if I can convince owen).  I think we should bump the 
hadoop default block size to 128m, this is still small enough to replicate 
quickly, but will reduce #map jobs significantly when you just want to scan 
data.  Reduce the number of reduces as well and we'll have significantly larger 
transactions.

All that said, I think we are probably uncovering things in the RPC layer (and 
server threading) more than basic network issues, since we're running on a 
decent network and not even beginning to approach saturating it.  But we'll 
certainly play with "setTcpNoDelay."  It will be interesting to see if that 
moves things along.

> transfer map output transfer with http instead of rpc
> -----------------------------------------------------
>
>          Key: HADOOP-195
>          URL: http://issues.apache.org/jira/browse/HADOOP-195
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.3

>
> The data transfer of the map output should be transfered via http instead 
> rpc, because rpc is very slow for this application and the timeout behavior 
> is suboptimal. (server sends data and client ignores it because it took more 
> than 10 seconds to be received.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to