Hello,

Something Something <[email protected]> wrote:
> PS:  One thing I have noticed is that it goes to 66% very fast and then
> slows down from there..

It seems that only one reducer works. You should increase reduce tasks.
The default reduce task's number is written on hadoop/docs/mapred-default.html.
The default parameter of mapred.reduce.tasks is 1. So only one reduce task runs.

There are two ways to increase reduce tasks:
1. Use Job.setNumReduceTasks(int tasks) on your MapReduce job file.
2. Denote more mapred.reduce.tasks on hadoop/conf/mapred-site.xml.

You can get the best perfomance if you run 20 reduce tasks. The detail of the 
number
of reduce tasks is written on
http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Reducer
at "How many Reduces?" as J-D wrote. Notice that JobConf.setNumReduceTasks(int) 
is
already deprecated, so you should use Job.setNumReduceTasks(int tasks) rather 
than
JobConf.setNumReduceTasks(int).
--
Motohiko Mouri

Reply via email to