Okie I figured it out.. it was simple..

conf.setsetNumReduceTasks(10);
my mistake..

Anyhow when I am running 10 reducers for Wordcount problem.. I am seeing
only slight increase in the speed of the program... Why so ??
So more reducers do not gauranteee faster execution ??
How can we decide to use how many reducers to make our program run in the
best way possible ??

Thanks,
Praveenesh

On Mon, May 23, 2011 at 10:08 AM, praveenesh kumar <praveen...@gmail.com>wrote:

> My program is a basic program like this :
>
> import java.io.IOException;
> import java.util.*;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.conf.*;
> import org.apache.hadoop.io.*;
> import org.apache.hadoop.mapred.*;
> import org.apache.hadoop.util.*;
>
> public class WordCount {
>
>     public static class Map extends MapReduceBase implements
> Mapper<LongWritable, Text, Text, IntWritable> {
>  private final static IntWritable one = new IntWritable(1);
>  private Text word = new Text();
>
>     public void map(LongWritable key, Text value, OutputCollector<Text,
> IntWritable> output, Reporter reporter) throws IOException {
>   String line = value.toString();
>   StringTokenizer tokenizer = new StringTokenizer(line);
>    while (tokenizer.hasMoreTokens()) {
>    word.set(tokenizer.nextToken());
>    output.collect(word, one);
>    }
>   }
>  }
>
>     public static class Reduce extends MapReduceBase implements
> Reducer<Text, IntWritable, Text, IntWritable> {
>       public void reduce(Text key, Iterator<IntWritable> values,
> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
> IOException {
>         int sum = 0;
>         while (values.hasNext()) {
>           sum += values.next().get();
>         }
>         output.collect(key, new IntWritable(sum));
>       }
>     }
>
>     public static void main(String[] args) throws Exception {
>       JobConf conf = new JobConf(WordCount.class);
>       conf.setJobName("wordcount");
>
>       conf.setOutputKeyClass(Text.class);
>       conf.setOutputValueClass(IntWritable.class);
>
>       conf.setMapperClass(Map.class);
>       conf.setCombinerClass(Reduce.class);
>       conf.setReducerClass(Reduce.class);
>
>       conf.setInputFormat(TextInputFormat.class);
>       conf.setOutputFormat(TextOutputFormat.class);
>
>       FileInputFormat.setInputPaths(conf, new Path(args[0]));
>       FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>    Job.setNumReduceTasks(10);
>       JobClient.runJob(conf);
>     }
>  }
>
>
> How to use Job.setNumReduceTasks(INT) function here.. I am not using any
> Job class object here.
>
> Thanks.
> Praveenesh
>
>
>   On Fri, May 20, 2011 at 7:07 PM, Evert Lammerts 
> <evert.lamme...@sara.nl>wrote:
>
>> Hi Praveenesh,
>>
>> * You can set the maximum amount of reducers per node in your
>> mapred-site.xml using mapred.tasktracker.reduce.tasks.maximum (default set
>> to 2).
>> * You can set the default number of reduce tasks with mapred.reduce.tasks
>> (default set to 1 - this causes your single reducer).
>> * Your job can try to override this setting by calling
>> Job.setNumReduceTasks(INT) (
>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setNumReduceTasks(int)
>> ).
>>
>> Cheers,
>> Evert
>>
>>
>> > -----Original Message-----
>> > From: modemide [mailto:modem...@gmail.com]
>> > Sent: vrijdag 20 mei 2011 15:26
>> > To: common-user@hadoop.apache.org
>> > Subject: Re: Why Only 1 Reducer is running ??
>> >
>> > what does your mapred-site.xml file say?
>> >
>> > I've used wordcount and had close to 12 reduces running on a 6
>> > datanode cluster on a 3 GB file.
>> >
>> >
>> > I have a configuration in there which says:
>> > mapred.reduce.tasks = 12
>> >
>> > The reason I chose 12 was because it was recommended that I choose 2x
>> > number of tasktrackers.
>> >
>> >
>> >
>> >
>> >
>> > On 5/20/11, praveenesh kumar <praveen...@gmail.com> wrote:
>> > > Hello everyone,
>> > >
>> > > I am using wordcount application to test on my hadoop cluster of 5
>> > nodes.
>> > > The file size is around 5 GB.
>> > > Its taking around 2 min - 40 sec for execution.
>> > > But when I am checking the JobTracker web portal, I am seeing only
>> > one
>> > > reducer is running. Why so  ??
>> > > How can I change the code so that I will run multiple reducers also
>> > ??
>> > >
>> > > Thanks,
>> > > Praveenesh
>> > >
>>
>
>

Reply via email to