Re: mapred.map.tasks and mapred.reduce.tasks parameter meaning

Harsh J Wed, 22 Feb 2012 03:54:06 -0800

Amit,

On Wed, Feb 22, 2012 at 5:08 PM, sangroya <sangroyaa...@gmail.com> wrote:
> Hello,
>
> Could someone please help me to understand these configuration parameters in
> depth.
>
> mapred.map.tasks and mapred.reduce.tasks
>
> It is mentioned that default value of these parameters is 2 and 1.
>
> *What does it mean?*
>
> Does it mean 2 maps and 1 reduce per node.
>
> Does it mean 2 maps and 1 reduce in total (for the cluster). Or
>
> Does it mean 2 maps and 1 reduce per Job.


These are set per-job, and therefore mean 2 maps and 1 reducer for the
single job you notice the value in.

> Can we change maps and reduce for default example Jobs such as Wordcount
> etc. too?

You can tweak the # of reducers at will. With the default
HashPartitioner, scaling reducers is easy by just increasing the #s.

> At the same time, I believe that total number of maps are dependent upon
> input data size?

Yes maps are dependent on the # of input files and their size (if they
are splittable). At minimum, with FileInputFormat derivatives, you
will have at least one map per file. You can have multiple maps per
file if they extend beyond a single block and can be split.

For some more info, take a look at
http://wiki.apache.org/hadoop/HowManyMapsAndReduces

-- 
Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about

Re: mapred.map.tasks and mapred.reduce.tasks parameter meaning

Reply via email to