Re: Automatic line number in reducer output

Shi Yu Thu, 09 Jun 2011 10:58:20 -0700

Hi,

Thanks for the reply. The line count in new API works fine now, it was abug in my code. In new API,


Iterator  is changed to Iterable,

but I didn't pay attention to that and was still using Iterator and hasNext(), 
Next() method. Surprisingly, the wrong code still ran and got output, but the 
line number count did not work and I think it was null value. After fixing that 
Iterable mistake, the code works fine.

The remaining problem is when combiner and reducer are both implemented, the 
output is like

00000   00000   value1
00001   00000   value2
00002   00000   value3
00003   00001   value4
00004   00001   value5

The first column are counts from reducer, the second column are counts from 
combiner. I want to avoid the line counter in combiner, so my plan is to create 
another class which is almost the same as Reducer, but without the line count. 
I think it is doable to set Combiner and Reducer to different classes in 
jobconf, but I haven't tried it yet.

Best,

Shi

On 6/9/2011 8:49 AM, Robert Evans wrote:

What exactly is linecount being output as in the new APIs?

--Bobby

On 6/7/11 11:21 AM, "Shi Yu"<sh...@uchicago.edu>  wrote:

Hi,

I am wondering is there any built-in function to automatically add a
self-increment line number in reducer output (like the relation DB
auto-key).

I have this problem because in 0.19.2 API, I used a variable linecount
increasing in the reducer like:

   public static class Reduce extends MapReduceBase implements
Reducer<Text, IntWritable, Text,IntWritable>{
          private long linecount = 0;

          public void reduce(Text key, Iterator<IntWritable>  values,
OutputCollector<Text, IntWritable>  output, Reporter reporter) throws
IOException {

          //.....some code here
          linecount ++;
          output.collect(new Text(Long.toString(linecount)), var);

         }

}


However, I found that this is not working in 0.20.2 API, if I write the
code like:

public static class Reduce extends
org.apache.hadoop.mapreduce.Reducer<Text, IntWritable, Text, IntWritable>{
         private long linecount = 0;

         public void reduce (Text key, Iterator<IntWritable>  values,
org.apache.hadoop.mapreduce.Reducer.Context context) throws IOException,
InterruptedException {

         //some code here
         linecount ++;
         context.write(new Text(Long.toString(linecount)),var);
        }
}

but it seems not working anymore.


I would also like to know if there are combiner and reducer implemented,
how to avoid that line number being written twice (cause I only want it
in reducer, not in combiner). Thanks!


Shi

Re: Automatic line number in reducer output

Reply via email to