The number of maps to use is calculated on the client, since splits are computed on the client, so changing the value of mapred.map.tasks only on the jobtracker will not have any effect.
Note that the number of map tasks that you set is only a suggestion, and depends on the number of splits actually created. In your case it looks like 4 splits were created. As a rule, you shouldn't set the number of map tasks, since by default one map task is created for each HDFS block, which works well for most applications. This is explained further in the javadoc: http://hadoop.apache.org/core/docs/r0.19.1/api/org/apache/hadoop/mapred/JobConf.html#setNumMapTasks(int) The number of reduces to use is determined by the JobConf that is created on the client, so it uses the client's hadoop-site.xml, not the jobtracker's one. This is why it is set to 1, even though you set it to 2 on the jobtracker. If you don't want to set configuration properties in code (and I agree it's often a good idea not to hardcode things like the number of maps or reduces in code), then you can make your driver use Tool and ToolRunner as Chuck explained. Finally, in general you should try to keep hadoop-site.xml the same across your clients and cluster nodes to avoid surprises about which value has been set. Hope this helps, Tom On Wed, May 20, 2009 at 5:21 AM, Foss User <foss...@gmail.com> wrote: > On Wed, May 20, 2009 at 3:39 AM, Chuck Lam <chuck....@gmail.com> wrote: >> Can you set the number of reducers to zero and see if it becomes a map only >> job? If it does, then it's able to read in the mapred.reduce.tasks property >> correctly but just refuse to have 2 reducers. In that case, it's most likely >> you're running in local mode, which doesn't allow more than 1 reducer. > > As I have already mentioned in my original mail, I am not running it > in local mode. Quoting from my original mail: > > "My configuration file is set as follows: > > mapred.map.tasks = 2 > mapred.reduce.tasks = 2 > > However, the description of these properties mention that these > settings would be ignored if mapred.job.tracker is set as 'local'. > Mine is set properly with IP address, port number." > >> >> If setting zero doesn't change anything, then your config file is not being >> read, or it's being overridden. >> >> As an aside, if you use ToolRunner in your Hadoop program, then it will >> support generic options such that you can run your program with the option >> -D mapred.reduce.tasks=2 >> to tell it to use 2 reducers. This allows you to set the number of reducers >> on a per-job basis. >> >> > > I understand that it is being overridden by something else. What I > want to know is which file is overriding it. Also, please note that I > have these settings only in the conf/hadoop-site.xml of job tracker > node. Is that enough? >