The mapred.map.tasks parameter is used as a hint more than anything else. If
there are more files than this, it'll have more map tasks. So if you've got
four input files, that's going to be four map tasks.

The value of mapred.reduce.tasks will be taken from the hadoop-site.xml file
on the machine that submits the job -- not the JobTracker. If those two
machines are separate, the client's hadoop-site.xml will win.

- Aaron

On Tue, May 19, 2009 at 12:52 PM, Foss User <foss...@gmail.com> wrote:

> I ran a job. In the jobtracker web interface, I found 4 maps and 1
> reduce running. This is not what I set in my configuration files
> (hadoop-site.xml).
>
> My configuration file, conf/hadoop-site.xml is set as follows:
>
> mapred.map.tasks = 2
> mapred.reduce.tasks = 2
>
> However, the description of these properties mention that these
> settings would be ignored if mapred.job.tracker is set as 'local'.
> Mine is set properly with IP address, port number. Please note that
> the above configuration is from the 'conf/hadoop-site.xml' file of the
> job tracker node.
>
> I have also not overridden these settings in my Job class (java code).
>
> So, can anyone please explain why it was executing 4 maps but only 1
> reduce? I have included some important entries from the job.xml of
> this job below:
>
> name    value
> mapred.skip.reduce.max.skip.groups      0
> mapred.reduce.max.attempts      4
> mapred.reduce.tasks     1
> mapred.reduce.tasks.speculative.execution       true
> mapred.tasktracker.reduce.tasks.maximum 2
> dfs.replication 2
> mapred.reduce.copy.backoff      300
>
> mapred.task.cache.levels        2
> mapred.max.tracker.failures     4
> mapred.map.tasks        4
> mapred.map.tasks.speculative.execution  true
> mapred.tasktracker.map.tasks.maximum    2
>
> Please help.
>

Reply via email to