Re: mapred.map.tasks getting set, but not sure where

Robert Evans Fri, 04 Nov 2011 08:30:25 -0700

In 0.20.2 The JobClient will update mapred.map.tasks to be equal to the number 
of splits returned by the InputFormat.  The input format will usually take 
mapred.map.tasks as a recommendation when deciding on what splits to make.  
That is the only place in the code that I could find that is setting the value 
and could have any impact on the number of mappers launched.  It could be that 
Someone changed the number of files that are being read in as input, or that 
the block size of the files being read in is now different.  It could also be 
that someone started compressing the input files, so now they can not be split. 
 If the number of mappers is different it probably means that the input is 
different some how.


--Bobby Evans

On 11/4/11 10:12 AM, "Brendan W." <bw8...@gmail.com> wrote:

All the same, no change in that...0.20.2.

Other people do have access to this system to change things like conf
files, but nobody's owning up and I have to figure this out.  I have
verified that the mapred.map.tasks property is not getting set in the
mapred-site.xml files on the cluster or in the job.  Just out of other
ideas about where it might be getting set...

Thanks,

Brendan

On Fri, Nov 4, 2011 at 11:04 AM, Robert Evans <ev...@yahoo-inc.com> wrote:

> What versions of Hadoop were you running with previously, and what version
> are you running with now?
>
> --Bobby Evans
>
> On 11/4/11 9:33 AM, "Brendan W." <bw8...@gmail.com> wrote:
>
> Hi,
>
> In the jobs running on my cluster of 20 machines, I used to run jobs (via
> "hadoop jar ...") that would spawn around 4000 map tasks.  Now when I run
> the same jobs, that number is 20; and I notice that in the job
> configuration, the parameter mapred.map.tasks is set to 20, whereas it
> never used to be present at all in the configuration file.
>
> Changing the input split size in the job doesn't affect this--I get the
> size split I ask for, but the *number* of input splits is still capped at
> 20--i.e., the job isn't reading all of my data.
>
> The mystery to me is where this parameter could be getting set.  It is not
> present in the mapred-site.xml file in <hadoop home>/conf on any machine in
> the cluster, and it is not being set in the job (I'm running out of the
> same jar I always did; no updates).
>
> Is there *anywhere* else this parameter could possibly be getting set?
> I've stopped and restarted map-reduce on the cluster with no effect...it's
> getting re-read in from somewhere, but I can't figure out where.
>
> Thanks a lot,
>
> Brendan
>
>

Re: mapred.map.tasks getting set, but not sure where

Reply via email to