Hello, Something Something <[email protected]> wrote: > PS: One thing I have noticed is that it goes to 66% very fast and then > slows down from there..
It seems that only one reducer works. You should increase reduce tasks. The default reduce task's number is written on hadoop/docs/mapred-default.html. The default parameter of mapred.reduce.tasks is 1. So only one reduce task runs. There are two ways to increase reduce tasks: 1. Use Job.setNumReduceTasks(int tasks) on your MapReduce job file. 2. Denote more mapred.reduce.tasks on hadoop/conf/mapred-site.xml. You can get the best perfomance if you run 20 reduce tasks. The detail of the number of reduce tasks is written on http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Reducer at "How many Reduces?" as J-D wrote. Notice that JobConf.setNumReduceTasks(int) is already deprecated, so you should use Job.setNumReduceTasks(int tasks) rather than JobConf.setNumReduceTasks(int). -- Motohiko Mouri
