[ http://issues.apache.org/jira/browse/HADOOP-717?page=all ]
Owen O'Malley resolved HADOOP-717. ---------------------------------- Fix Version/s: 0.10.0 Resolution: Fixed This was fixed by HADOOP-331. > When there are few reducers, sorting should be done by mappers > -------------------------------------------------------------- > > Key: HADOOP-717 > URL: http://issues.apache.org/jira/browse/HADOOP-717 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Reporter: arkady borkovsky > Assigned To: Owen O'Malley > Fix For: 0.10.0 > > > If I understand correctly, currently, sort happens on the reducer side. > So if few hundred mappers produce few (or many) Gig of data, and there is > just ONE reduce to consume it, copying and sorting takes forever. > It may make sense to have a special case optimization for a single reducer. > (E.g. "when there is only reducer and many mappers, sort is done by the > mappers, and reducer does only a merge") > Or to have some smarter policy that makes sure that sorting uses as many CPUs > as it makes sense. If the map step has produced data on all the nodes of > the cluster, it makes sense to use all the nodes for sorting. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira