Re: Why Only 1 Reducer is running ??

praveenesh kumar Sun, 22 May 2011 21:39:05 -0700

My program is a basic program like this :

import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;


public class WordCount {

    public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, IntWritable> {
 private final static IntWritable one = new IntWritable(1);
 private Text word = new Text();

    public void map(LongWritable key, Text value, OutputCollector<Text,
IntWritable> output, Reporter reporter) throws IOException {
  String line = value.toString();
  StringTokenizer tokenizer = new StringTokenizer(line);
   while (tokenizer.hasMoreTokens()) {
   word.set(tokenizer.nextToken());
   output.collect(word, one);
   }
  }
 }

    public static class Reduce extends MapReduceBase implements
Reducer<Text, IntWritable, Text, IntWritable> {
      public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter) throws
IOException {
        int sum = 0;
        while (values.hasNext()) {
          sum += values.next().get();
        }
        output.collect(key, new IntWritable(sum));
      }
    }

    public static void main(String[] args) throws Exception {
      JobConf conf = new JobConf(WordCount.class);
      conf.setJobName("wordcount");

      conf.setOutputKeyClass(Text.class);
      conf.setOutputValueClass(IntWritable.class);

      conf.setMapperClass(Map.class);
      conf.setCombinerClass(Reduce.class);
      conf.setReducerClass(Reduce.class);

      conf.setInputFormat(TextInputFormat.class);
      conf.setOutputFormat(TextOutputFormat.class);

      FileInputFormat.setInputPaths(conf, new Path(args[0]));
      FileOutputFormat.setOutputPath(conf, new Path(args[1]));
   Job.Job.setNumReduceTasks(10);
      JobClient.runJob(conf);
    }
 }


How to use Job.setNumReduceTasks(INT) function here.. I am not using any Job
class object here.

Thanks.
Praveenesh


On Fri, May 20, 2011 at 7:07 PM, Evert Lammerts <evert.lamme...@sara.nl>wrote:

> Hi Praveenesh,
>
> * You can set the maximum amount of reducers per node in your
> mapred-site.xml using mapred.tasktracker.reduce.tasks.maximum (default set
> to 2).
> * You can set the default number of reduce tasks with mapred.reduce.tasks
> (default set to 1 - this causes your single reducer).
> * Your job can try to override this setting by calling
> Job.setNumReduceTasks(INT) (
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setNumReduceTasks(int)
> ).
>
> Cheers,
> Evert
>
>
> > -----Original Message-----
> > From: modemide [mailto:modem...@gmail.com]
> > Sent: vrijdag 20 mei 2011 15:26
> > To: common-user@hadoop.apache.org
> > Subject: Re: Why Only 1 Reducer is running ??
> >
> > what does your mapred-site.xml file say?
> >
> > I've used wordcount and had close to 12 reduces running on a 6
> > datanode cluster on a 3 GB file.
> >
> >
> > I have a configuration in there which says:
> > mapred.reduce.tasks = 12
> >
> > The reason I chose 12 was because it was recommended that I choose 2x
> > number of tasktrackers.
> >
> >
> >
> >
> >
> > On 5/20/11, praveenesh kumar <praveen...@gmail.com> wrote:
> > > Hello everyone,
> > >
> > > I am using wordcount application to test on my hadoop cluster of 5
> > nodes.
> > > The file size is around 5 GB.
> > > Its taking around 2 min - 40 sec for execution.
> > > But when I am checking the JobTracker web portal, I am seeing only
> > one
> > > reducer is running. Why so  ??
> > > How can I change the code so that I will run multiple reducers also
> > ??
> > >
> > > Thanks,
> > > Praveenesh
> > >
>

Re: Why Only 1 Reducer is running ??

Reply via email to