Hi, On Sun, Mar 24, 2013 at 12:00 AM, preethi ganeshan <preethiganesha...@gmail.com> wrote: > Hey all, > I am working on project that schedules data local reduce tasks.
Great, are you planning to contribute it upstream too? See https://issues.apache.org/jira/browse/MAPREDUCE-199. I'm also hoping you're working on trunk and not the maintenance branch branch-1, which is very outdated with where MR is today. > However , i wanted to know if there is a way using MapTask.java to keep track > of the > inputs and size of the input to every reducer. In other words what code do > i add to get the size of the intermediate output that is fed to a reduce > task before a reduce task begins. Change the thinking here a bit: A map does not feed a reduce (i.e. its not a push). A reduce consumes a map output after its completion (they map task JVM may terminate for all it cares). Upon a map's completion, its counters are available at the central (i.e. the ApplicationMaster) which the reduce task can poll for sizes (it may already be doing this). -- Harsh J