Hi Jiwei,

In trunk (i.e. MR2), the completion events selection + scheduling
logic lies under class EventFetcher's getMapCompletionEvents() method,
as viewable at 
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java?view=markup

This EventFetcher thread is used by the Shuffle (reduce package)
class, to continually do the shuffling. The Shuffle class is then
itself used by the ReduceTask class (look in mapred package of same
maven module).

I guess you can start there, to see if a better selection+scheduling
logic would yield better results.

On Wed, Nov 7, 2012 at 12:26 PM, Jiwei Li <cxm...@gmail.com> wrote:
> Dear all,
>
> For jobs like Sort, massive amounts of network traffic happen during
> shuffle phase. The simple mechanism in Hadoop 1.0.4 to choose reduce nodes
> does not help reduce network traffic. If JobTracker is fully aware of
> locations of every map output, why not take advantage of this topology
> knowledge?
>
> So, is there anyone who knows where to develop such codes upon? Many thanks.
>
> Regards.
> --
> Jiwei



-- 
Harsh J

Reply via email to