[ http://issues.apache.org/jira/browse/HADOOP-717?page=all ]

Owen O'Malley resolved HADOOP-717.
----------------------------------

    Fix Version/s: 0.10.0
       Resolution: Fixed

This was fixed by HADOOP-331.

> When there are few reducers, sorting should be done by mappers
> --------------------------------------------------------------
>
>                 Key: HADOOP-717
>                 URL: http://issues.apache.org/jira/browse/HADOOP-717
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: arkady borkovsky
>         Assigned To: Owen O'Malley
>             Fix For: 0.10.0
>
>
> If I understand correctly, currently, sort happens on the reducer side.
> So if few hundred mappers produce few (or many) Gig of data, and there is 
> just ONE reduce to consume it, copying and sorting takes forever.
> It may make sense to have a special case optimization for a single reducer.  
> (E.g. "when there is only reducer and many mappers, sort is done by the 
> mappers, and reducer does only a merge")
> Or to have some smarter policy that makes sure that sorting uses as many CPUs 
> as it makes sense.   If  the map step has produced data on all the nodes of 
> the cluster, it makes sense to use all the nodes for sorting.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to