The number of maps to use is calculated on the client, since splits
are computed on the client, so changing the value of mapred.map.tasks
only on the jobtracker will not have any effect.

Note that the number of map tasks that you set is only a suggestion,
and depends on the number of splits actually created. In your case it
looks like 4 splits were created. As a rule, you shouldn't set the
number of map tasks, since by default one map task is created for each
HDFS block, which works well for most applications. This is explained
further in the javadoc:
http://hadoop.apache.org/core/docs/r0.19.1/api/org/apache/hadoop/mapred/JobConf.html#setNumMapTasks(int)

The number of reduces to use is determined by the JobConf that is
created on the client, so it uses the client's hadoop-site.xml, not
the jobtracker's one. This is why it is set to 1, even though you set
it to 2 on the jobtracker.

If you don't want to set configuration properties in code (and I agree
it's often a good idea not to hardcode things like the number of maps
or reduces in code), then you can make your driver use Tool and
ToolRunner as Chuck explained.

Finally, in general you should try to keep hadoop-site.xml the same
across your clients and cluster nodes to avoid surprises about which
value has been set.

Hope this helps,

Tom

On Wed, May 20, 2009 at 5:21 AM, Foss User <foss...@gmail.com> wrote:
> On Wed, May 20, 2009 at 3:39 AM, Chuck Lam <chuck....@gmail.com> wrote:
>> Can you set the number of reducers to zero and see if it becomes a map only
>> job? If it does, then it's able to read in the mapred.reduce.tasks property
>> correctly but just refuse to have 2 reducers. In that case, it's most likely
>> you're running in local mode, which doesn't allow more than 1 reducer.
>
> As I have already mentioned in my original mail, I am not running it
> in local mode. Quoting from my original mail:
>
> "My configuration file is set as follows:
>
> mapred.map.tasks = 2
> mapred.reduce.tasks = 2
>
> However, the description of these properties mention that these
> settings would be ignored if mapred.job.tracker is set as 'local'.
> Mine is set properly with IP address, port number."
>
>>
>> If setting zero doesn't change anything, then your config file is not being
>> read, or it's being overridden.
>>
>> As an aside, if you use ToolRunner in your Hadoop program, then it will
>> support generic options such that you can run your program with the option
>> -D mapred.reduce.tasks=2
>> to tell it to use 2 reducers. This allows you to set the number of reducers
>> on a per-job basis.
>>
>>
>
> I understand that it is being overridden by something else. What I
> want to know is which file is overriding it. Also, please note that I
> have these settings only in the conf/hadoop-site.xml of job tracker
> node. Is that enough?
>

Reply via email to