Re: mapred.map.tasks and mapred.reduce.tasks parameter meaning

2012-02-23 Thread kaveh minooie



On 02/22/2012 03:38 AM, sangroya wrote:

Hello,

Could someone please help me to understand these configuration parameters in
depth.

mapred.map.tasks and mapred.reduce.tasks

It is mentioned that default value of these parameters is 2 and 1.

*What does it mean?*

Does it mean 2 maps and 1 reduce per node.

Does it mean 2 maps and 1 reduce in total (for the cluster). Or

Does it mean 2 maps and 1 reduce per Job.


it is the suggested number of map and reduce tasks that each job would 
create if there is no other factor affecting the situation. in my 
experience they are useful for setting the minimum number of tasks that 
you want each job to have.


Can we change maps and reduce for default example Jobs such as Wordcount
etc. too?



you can of course change the default value in your nutch-site.xml file, 
but if you want to specify that individually for each job then you have 
to instead set the them on the command line when you are running the job:

-Dmapred.map.tasks=..

At the same time, I believe that total number of maps are dependent upon
input data size?


yes the ultimate factor is number of input files (not their size)

so for example the situation that I myself had trouble figuring out was 
that I wanted different number of map tasks for fetch jobs and I found 
out that the best way to indicate that is using the -numFetchers switch 
with the generate command


http://wiki.apache.org/nutch/bin/nutch_generate
--
Kaveh Minooie

www.plutoz.com


mapred.map.tasks and mapred.reduce.tasks parameter meaning

2012-02-22 Thread sangroya
Hello,

Could someone please help me to understand these configuration parameters in
depth.

mapred.map.tasks and mapred.reduce.tasks

It is mentioned that default value of these parameters is 2 and 1.

*What does it mean?*

Does it mean 2 maps and 1 reduce per node.

Does it mean 2 maps and 1 reduce in total (for the cluster). Or

Does it mean 2 maps and 1 reduce per Job.

Can we change maps and reduce for default example Jobs such as Wordcount
etc. too?

At the same time, I believe that total number of maps are dependent upon
input data size?


Please help me understand these two parameters clearly.

Thanks in advance,
Amit

-
Sangroya
--
View this message in context: 
http://lucene.472066.n3.nabble.com/mapred-map-tasks-and-mapred-reduce-tasks-parameter-meaning-tp3766224p3766224.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.


Re: mapred.map.tasks and mapred.reduce.tasks parameter meaning

2012-02-22 Thread Harsh J
Amit,

On Wed, Feb 22, 2012 at 5:08 PM, sangroya sangroyaa...@gmail.com wrote:
 Hello,

 Could someone please help me to understand these configuration parameters in
 depth.

 mapred.map.tasks and mapred.reduce.tasks

 It is mentioned that default value of these parameters is 2 and 1.

 *What does it mean?*

 Does it mean 2 maps and 1 reduce per node.

 Does it mean 2 maps and 1 reduce in total (for the cluster). Or

 Does it mean 2 maps and 1 reduce per Job.

These are set per-job, and therefore mean 2 maps and 1 reducer for the
single job you notice the value in.

 Can we change maps and reduce for default example Jobs such as Wordcount
 etc. too?

You can tweak the # of reducers at will. With the default
HashPartitioner, scaling reducers is easy by just increasing the #s.

 At the same time, I believe that total number of maps are dependent upon
 input data size?

Yes maps are dependent on the # of input files and their size (if they
are splittable). At minimum, with FileInputFormat derivatives, you
will have at least one map per file. You can have multiple maps per
file if they extend beyond a single block and can be split.

For some more info, take a look at
http://wiki.apache.org/hadoop/HowManyMapsAndReduces

-- 
Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about


Re: mapred.map.tasks and mapred.reduce.tasks parameter meaning

2012-02-22 Thread praveenesh kumar
If I am correct :

For setting mappers/node --- mapred.tasktracker.map.tasks.maximum
For setting reducers/node --- mapred.tasktracker.reduce.tasks.maximum

For setting mappers/job  mapred.map.tasks (applicable for whole cluster)
For setting reducers/job  mapred.reduce.tasks(same)


You can change these values in your M/R code using Job / configurattion
object


Thanks,
Praveenesh


On Wed, Feb 22, 2012 at 5:08 PM, sangroya sangroyaa...@gmail.com wrote:

 Hello,

 Could someone please help me to understand these configuration parameters
 in
 depth.

 mapred.map.tasks and mapred.reduce.tasks

 It is mentioned that default value of these parameters is 2 and 1.

 *What does it mean?*

 Does it mean 2 maps and 1 reduce per node.

 Does it mean 2 maps and 1 reduce in total (for the cluster). Or

 Does it mean 2 maps and 1 reduce per Job.

 Can we change maps and reduce for default example Jobs such as Wordcount
 etc. too?

 At the same time, I believe that total number of maps are dependent upon
 input data size?


 Please help me understand these two parameters clearly.

 Thanks in advance,
 Amit

 -
 Sangroya
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/mapred-map-tasks-and-mapred-reduce-tasks-parameter-meaning-tp3766224p3766224.html
 Sent from the Hadoop lucene-users mailing list archive at Nabble.com.