[ 
http://issues.apache.org/jira/browse/HADOOP-195?page=comments#action_12378655 ] 

Owen O'Malley commented on HADOOP-195:
--------------------------------------

Since there is obviously interest in my benchmark, here is an update:

I reran my sort test yesterday with:
   1. fewer reduces (2/node) (hadoop-202)
   2. the map ids replaced with integers (hadoop-200)
   3. the number of server threads for map output serving set to 20

I sorted 1760 gig of data on 179 nodes in 18.6 hours, which is much better than 
before.

I had 20 reduce tasks fail and reexecute themselves (last original reduce 
finished in ~16.5 hours)

2 of those tasks were assigned to the same node and were the only two tasks 
running for the last hour, which clearly shows that we need speculative reduces.

> transfer map output transfer with http instead of rpc
> -----------------------------------------------------
>
>          Key: HADOOP-195
>          URL: http://issues.apache.org/jira/browse/HADOOP-195
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.3
>  Attachments: netstat.log, netstat.xls
>
> The data transfer of the map output should be transfered via http instead 
> rpc, because rpc is very slow for this application and the timeout behavior 
> is suboptimal. (server sends data and client ignores it because it took more 
> than 10 seconds to be received.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to