Is that setting in the hadoop-site.xml file on every node? Each tasktracker reads in that file once and sets its max map tasks from that. There's no way to control this setting on a per-job basis or from the client (submitting) system. If you've changed hadoop-site.xml after starting the tasktracker, you need to restart the tasktracker daemon on each node.
Note that 32 maps/node is considered a *lot*. This will likely not provide you with optimal throughput, since they'll be competing for cores, RAM, I/O, etc. ...Unless you've got some really super-charged machines in your datacenter :grin: Also, in terms of optimizing your job -- do you really have 6,000 big files worth reading? Or are you running a job over 6,000 small files (where small means less than 100 MB or so)? If the latter, consider using MultiFileInputFormat to allow each task to operate on multiple files. See http://www.cloudera.com/blog/2009/02/02/the-small-files-problem/ for some more detail. Even after all 6,000 map tasks run, you'll have to deal with reassembling 6,000 intermediate data shards into 6 or 12 reduce tasks. This will also be slow, unless you bunch up multiple files into a single task. Cheers, - Aaron On Wed, Aug 5, 2009 at 5:06 PM, Zeev Milin <zeevm...@gmail.com> wrote: > I now see that the mapred.tasktracker.map.tasks.maximum=32 on the job level > and still only 6 maps running and 5000+ pending.. > > Not sure how to force the cluster to run more maps. >