I have found that this parameter tends to be the limiting factor: <property> <name>mapred.tasktracker.tasks.maximum</name> <value>3</value> <description>The maximum number of tasks that will be run simultaneously by a task tracker. </description> </property>
There are several competing constraints at work which makes it kind of hard to determine just how many map tasks will be run. On 9/17/07 5:01 AM, "Toby DiPasquale" <[EMAIL PROTECTED]> wrote: > Hi all, > > No matter what I try, the number of mapper tasks on a given machine is > always 2. JobConf.setNumMapTasks(X) has no effect, nor does setting > mapred.map.tasks in the mapred-default.xml configuration. Why are > these settings ignored? How can I truly increase the number of map > tasks on a given machine? > > I ran a job last night (using 0.14.1) that took 31.5 minutes to map > 7.5 GB (on HDFS, not s3fs) and then 78 seconds to reduce the results > of that map (starting from 15% complete when the map phase hit 100%). > The map took so long because only 6 - 8 out of the 171 mappers were > running at any one time. I'd really like to know how to move the > needle on this one so if anyone has any insight, I'd really appreciate > it. Thanks.