On Wed, May 20, 2009 at 3:18 PM, Tom White <t...@cloudera.com> wrote: > The number of maps to use is calculated on the client, since splits > are computed on the client, so changing the value of mapred.map.tasks > only on the jobtracker will not have any effect. > > Note that the number of map tasks that you set is only a suggestion, > and depends on the number of splits actually created. In your case it > looks like 4 splits were created. As a rule, you shouldn't set the > number of map tasks, since by default one map task is created for each > HDFS block, which works well for most applications. This is explained > further in the javadoc: > http://hadoop.apache.org/core/docs/r0.19.1/api/org/apache/hadoop/mapred/JobConf.html#setNumMapTasks(int) > > The number of reduces to use is determined by the JobConf that is > created on the client, so it uses the client's hadoop-site.xml, not > the jobtracker's one. This is why it is set to 1, even though you set > it to 2 on the jobtracker. > > If you don't want to set configuration properties in code (and I agree > it's often a good idea not to hardcode things like the number of maps > or reduces in code), then you can make your driver use Tool and > ToolRunner as Chuck explained. > > Finally, in general you should try to keep hadoop-site.xml the same > across your clients and cluster nodes to avoid surprises about which > value has been set. > > Hope this helps, > > Tom
By client do you mean the machine where I logged in and invoked 'hadoop jar' command to submit and run my job?