On Wed, May 20, 2009 at 3:18 PM, Tom White <t...@cloudera.com> wrote:
> The number of maps to use is calculated on the client, since splits
> are computed on the client, so changing the value of mapred.map.tasks
> only on the jobtracker will not have any effect.
>
> Note that the number of map tasks that you set is only a suggestion,
> and depends on the number of splits actually created. In your case it
> looks like 4 splits were created. As a rule, you shouldn't set the
> number of map tasks, since by default one map task is created for each
> HDFS block, which works well for most applications. This is explained
> further in the javadoc:
> http://hadoop.apache.org/core/docs/r0.19.1/api/org/apache/hadoop/mapred/JobConf.html#setNumMapTasks(int)
>
> The number of reduces to use is determined by the JobConf that is
> created on the client, so it uses the client's hadoop-site.xml, not
> the jobtracker's one. This is why it is set to 1, even though you set
> it to 2 on the jobtracker.
>
> If you don't want to set configuration properties in code (and I agree
> it's often a good idea not to hardcode things like the number of maps
> or reduces in code), then you can make your driver use Tool and
> ToolRunner as Chuck explained.
>
> Finally, in general you should try to keep hadoop-site.xml the same
> across your clients and cluster nodes to avoid surprises about which
> value has been set.
>
> Hope this helps,
>
> Tom

By client do you mean the machine where I logged in and invoked
'hadoop jar' command to submit and run my job?

Reply via email to