The reducer's primary work begins by pulling in data files from all the other tasktrackers. Due to this fact, assigning multiple reduce tasks in one go would tax the node (in terms of number of network connections) since they'll all begin individually connecting and pulling at about the same time, and for this reason it was chosen to assign only one per heartbeat, and thereby give each r-task some breather time to finish up a round of connections before another comes in to do the same.
On Wed, Aug 24, 2011 at 4:18 PM, Sudharsan Sampath <sudha...@gmail.com> wrote: > Hi, > I see in the code that while we assign a number of map tasks, we assign only > one reduce task per tasktracker during the heartbeat. > Is there a brief somewhere on why this design decision is made ? > Thanks > Sudhan S -- Harsh J