[ 
http://issues.apache.org/jira/browse/HADOOP-195?page=comments#action_12378715 ] 

Doug Cutting commented on HADOOP-195:
-------------------------------------

> Are you still using 64,000 mappers? If so, wouldnt your average map output 
> file size be around 80KB?

Yes.  But this comes from having a map task per input file block, which permits 
map tasks to be placed on nodes where their data is local.  Simply reducing the 
number of map tasks will defeat that important optimization.  Better to instead 
increase the dfs block size to 128m, as Eric suggested.  This would increase 
the map outputs to 340k (with 376 reducers).  But, once we move to a larger 
cluster, with 1000 or more reducers, then the map outputs will again become 
small.  So optimizing for small map outputs will remain important, even as we 
increase the dfs block size.

> transfer map output transfer with http instead of rpc
> -----------------------------------------------------
>
>          Key: HADOOP-195
>          URL: http://issues.apache.org/jira/browse/HADOOP-195
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.3
>  Attachments: netstat.log, netstat.xls
>
> The data transfer of the map output should be transfered via http instead 
> rpc, because rpc is very slow for this application and the timeout behavior 
> is suboptimal. (server sends data and client ignores it because it took more 
> than 10 seconds to be received.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to