Hi, I am looking for config parameter similar to the following which would allow us to limit the total number of mapper and reducer tasks:
mapred.tasktracker.tasks.maximum Please advise. On Thu, Sep 10, 2009 at 6:31 AM, Chandraprakash Bhagtani < cpbhagt...@gmail.com> wrote: > Hi, > > You should definitely change mapred.tasktracker.map/reduce.tasks.maximum. > If > your tasks are more CPU bound then you should run the tasks equal to the > number of CPU cores otherwise you can run more tasks than cores. You can > determine CPU and memory usage by running "top" command on datanodes. You > should also take care of following configuration parameters to achieve best > performance > > *mapred.compress.map.output:* Faster data transfer (from mapper to > reducers), saves disk space, faster disk writing. Extra time in compression > and decompression > > *io.sort.mb: *If you have idle physical memory after running all tasks you > can increase this value. But swap space should not be used since it makes > it > slow.* > > **io.sort.factor: *If your map tasks have large number of spills* *then you > should increase this value.It also helps in merging at reducers. > > *mapred.job.reuse.jvm.num.tasks: *The overhead of JVM creation for each > task > is around 1 second. So for the tasks which live for seconds or a few > minutes > and have lengthy initialization, this value can be increased to gain > performance. > > *mapred.reduce.parallel.copies: *For Large jobs (the jobs in which map > output is very large), value of this property can be increased keeping in > mind that it will increase the total CPU usage.* > > **mapred.map/reduce.tasks.speculative.execution: *set to false to gain high > throughput. > > *dfs.block.size* or *mapred.min.split.size* or *mapred.max.split.size* : to > control the number of maps > > On Thu, Sep 10, 2009 at 8:06 AM, Mat Kelcey <matthew.kel...@gmail.com > >wrote: > > > > I've a cluster where every node is a multicore. From doing internet > > searches I've figured out that I definitely need to change > > mapred.tasktracker.tasks.maximum according to the number of clusters. But > > there are definitely other things that I would like to change for example > > mapred.map.tasks. Can someone point me out the list of things I should > > change to get the best performance out of my cluster ? > > > > nothing will give you better results than benchmarking with some jobs > > indicative to your domain! > > > > > > -- > Thanks & Regards, > Chandra Prakash Bhagtani, >