[ http://issues.apache.org/jira/browse/HADOOP-717?page=all ]
Owen O'Malley resolved HADOOP-717.
----------------------------------
Fix Version/s: 0.10.0
Resolution: Fixed
This was fixed by HADOOP-331.
> When there are few reducers, sorting should be done by mappers
> --------------------------------------------------------------
>
> Key: HADOOP-717
> URL: http://issues.apache.org/jira/browse/HADOOP-717
> Project: Hadoop
> Issue Type: Improvement
> Components: mapred
> Reporter: arkady borkovsky
> Assigned To: Owen O'Malley
> Fix For: 0.10.0
>
>
> If I understand correctly, currently, sort happens on the reducer side.
> So if few hundred mappers produce few (or many) Gig of data, and there is
> just ONE reduce to consume it, copying and sorting takes forever.
> It may make sense to have a special case optimization for a single reducer.
> (E.g. "when there is only reducer and many mappers, sort is done by the
> mappers, and reducer does only a merge")
> Or to have some smarter policy that makes sure that sorting uses as many CPUs
> as it makes sense. If the map step has produced data on all the nodes of
> the cluster, it makes sense to use all the nodes for sorting.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira