It would depend on your input format. If the job is using an InputFormat that does not let it split files, you would get only mappers == no. of files. For splittable input files, you get mappers > no. of files. Little more information on what the input format is could help tracking down the problem a bit more.
On Sat, Jan 8, 2011 at 3:10 AM, Tali K <ncherr...@hotmail.com> wrote: > > According to the documentation, that parameter is for the number of > tasks *per TaskTracker*. I am asking about the number of tasks > for the entire job and entire cluster. That parameter is already > set to 3, which is one less than the number of cores on each node's > CPU, as recommended.In my question I stated that > 82 tasks were run for the first job, yet only 4 for the second - > both numbers being cluster-wide. > > > >> Date: Fri, 7 Jan 2011 13:19:42 -0800 >> Subject: Re: Help: How to increase amont maptasks per job ? >> From: yuzhih...@gmail.com >> To: common-user@hadoop.apache.org >> >> Set higher values for mapred.tasktracker.map.tasks.maximum (and >> mapred.tasktracker.reduce.tasks.maximum) in mapred-site.xml >> >> On Fri, Jan 7, 2011 at 12:58 PM, Tali K <ncherr...@hotmail.com> wrote: >> >> > >> > >> > >> > >> > We have a jobs which runs in several map/reduce stages. In the first job, >> > a large number of map tasks -82 are initiated, as expected. >> > And that cause all nodes to be used. >> > In a >> > later job, where we are still dealing with large amounts of >> > data, only 4 map tasks are initiated, and that caused to use only 4 nodes. >> > This stage is actually the >> > workhorse of the job, and requires much more processing power than the >> > initial stage. >> > We are trying to understand why only a few map tasks are >> > being used, as we are not getting the full advantage of our cluster. >> > >> > >> > >> > > -- Harsh J www.harshj.com