Re: Number of maps and reduces not obeying my configuration

Tom White Thu, 21 May 2009 00:08:09 -0700

On Thu, May 21, 2009 at 5:18 AM, Foss User <foss...@gmail.com> wrote:
> On Wed, May 20, 2009 at 3:18 PM, Tom White <t...@cloudera.com> wrote:
>> The number of maps to use is calculated on the client, since splits
>> are computed on the client, so changing the value of mapred.map.tasks
>> only on the jobtracker will not have any effect.
>>
>> Note that the number of map tasks that you set is only a suggestion,
>> and depends on the number of splits actually created. In your case it
>> looks like 4 splits were created. As a rule, you shouldn't set the
>> number of map tasks, since by default one map task is created for each
>> HDFS block, which works well for most applications. This is explained
>> further in the javadoc:
>> http://hadoop.apache.org/core/docs/r0.19.1/api/org/apache/hadoop/mapred/JobConf.html#setNumMapTasks(int)
>>
>> The number of reduces to use is determined by the JobConf that is
>> created on the client, so it uses the client's hadoop-site.xml, not
>> the jobtracker's one. This is why it is set to 1, even though you set
>> it to 2 on the jobtracker.
>>
>> If you don't want to set configuration properties in code (and I agree
>> it's often a good idea not to hardcode things like the number of maps
>> or reduces in code), then you can make your driver use Tool and
>> ToolRunner as Chuck explained.
>>
>> Finally, in general you should try to keep hadoop-site.xml the same
>> across your clients and cluster nodes to avoid surprises about which
>> value has been set.
>>
>> Hope this helps,
>>
>> Tom
>
> By client do you mean the machine where I logged in and invoked
> 'hadoop jar' command to submit and run my job?


Yes.

Re: Number of maps and reduces not obeying my configuration

Reply via email to