Hi,

I am wondering is there any built-in function to automatically add a self-increment line number in reducer output (like the relation DB auto-key).

I have this problem because in 0.19.2 API, I used a variable linecount increasing in the reducer like:

public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text,IntWritable>{
        private long linecount = 0;

public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

        //.....some code here
        linecount ++;
        output.collect(new Text(Long.toString(linecount)), var);

       }

}


However, I found that this is not working in 0.20.2 API, if I write the code like:

public static class Reduce extends org.apache.hadoop.mapreduce.Reducer<Text, IntWritable, Text, IntWritable>{
       private long linecount = 0;

public void reduce (Text key, Iterator<IntWritable> values, org.apache.hadoop.mapreduce.Reducer.Context context) throws IOException, InterruptedException {

       //some code here
       linecount ++;
       context.write(new Text(Long.toString(linecount)),var);
      }
}

but it seems not working anymore.


I would also like to know if there are combiner and reducer implemented, how to avoid that line number being written twice (cause I only want it in reducer, not in combiner). Thanks!


Shi


Reply via email to