[ http://issues.apache.org/jira/browse/HADOOP-195?page=comments#action_12378715 ]
Doug Cutting commented on HADOOP-195: ------------------------------------- > Are you still using 64,000 mappers? If so, wouldnt your average map output > file size be around 80KB? Yes. But this comes from having a map task per input file block, which permits map tasks to be placed on nodes where their data is local. Simply reducing the number of map tasks will defeat that important optimization. Better to instead increase the dfs block size to 128m, as Eric suggested. This would increase the map outputs to 340k (with 376 reducers). But, once we move to a larger cluster, with 1000 or more reducers, then the map outputs will again become small. So optimizing for small map outputs will remain important, even as we increase the dfs block size. > transfer map output transfer with http instead of rpc > ----------------------------------------------------- > > Key: HADOOP-195 > URL: http://issues.apache.org/jira/browse/HADOOP-195 > Project: Hadoop > Type: Improvement > Components: mapred > Versions: 0.2 > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Fix For: 0.3 > Attachments: netstat.log, netstat.xls > > The data transfer of the map output should be transfered via http instead > rpc, because rpc is very slow for this application and the timeout behavior > is suboptimal. (server sends data and client ignores it because it took more > than 10 seconds to be received.) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
