On Thu, May 21, 2009 at 5:18 AM, Foss User <foss...@gmail.com> wrote: > On Wed, May 20, 2009 at 3:18 PM, Tom White <t...@cloudera.com> wrote: >> The number of maps to use is calculated on the client, since splits >> are computed on the client, so changing the value of mapred.map.tasks >> only on the jobtracker will not have any effect. >> >> Note that the number of map tasks that you set is only a suggestion, >> and depends on the number of splits actually created. In your case it >> looks like 4 splits were created. As a rule, you shouldn't set the >> number of map tasks, since by default one map task is created for each >> HDFS block, which works well for most applications. This is explained >> further in the javadoc: >> http://hadoop.apache.org/core/docs/r0.19.1/api/org/apache/hadoop/mapred/JobConf.html#setNumMapTasks(int) >> >> The number of reduces to use is determined by the JobConf that is >> created on the client, so it uses the client's hadoop-site.xml, not >> the jobtracker's one. This is why it is set to 1, even though you set >> it to 2 on the jobtracker. >> >> If you don't want to set configuration properties in code (and I agree >> it's often a good idea not to hardcode things like the number of maps >> or reduces in code), then you can make your driver use Tool and >> ToolRunner as Chuck explained. >> >> Finally, in general you should try to keep hadoop-site.xml the same >> across your clients and cluster nodes to avoid surprises about which >> value has been set. >> >> Hope this helps, >> >> Tom > > By client do you mean the machine where I logged in and invoked > 'hadoop jar' command to submit and run my job?
Yes.