Also make sure you've enough input files for the next stage mappers to work with...
Read thru the input splits part of tutorial: http://wiki.apache.org/hadoop/HadoopMapReduce If the last stage had only 4 reducers running, they'd generate 4 output files. This will limit the # of mappers started in the next stage to 4, unless you tune your input split parameters or write a custom input split. Hope this helps, there is lot more literature on this on the web and hadoop books released till date. -Rahul On Fri, Jan 7, 2011 at 1:19 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Set higher values for mapred.tasktracker.map.tasks.maximum (and > mapred.tasktracker.reduce.tasks.maximum) in mapred-site.xml > > On Fri, Jan 7, 2011 at 12:58 PM, Tali K <ncherr...@hotmail.com> wrote: > > > > > > > > > > > We have a jobs which runs in several map/reduce stages. In the first > job, > > a large number of map tasks -82 are initiated, as expected. > > And that cause all nodes to be used. > > In a > > later job, where we are still dealing with large amounts of > > data, only 4 map tasks are initiated, and that caused to use only 4 > nodes. > > This stage is actually the > > workhorse of the job, and requires much more processing power than the > > initial stage. > > We are trying to understand why only a few map tasks are > > being used, as we are not getting the full advantage of our cluster. > > > > > > > > >