Re: A question about the combiner, reducer and the Output value class: can they be different?

Jeremy Chow Thu, 20 Nov 2008 13:13:34 -0800

Hey Saptarshi ,

In fact there is a interesting wrapper can help you output many different
types of values, that is org.apache.hadoop.io.GenericWritable.
You can write your own Writable class, inherite from it. Following is its
document,


A wrapper for Writable instances.

When two sequence files, which have same Key type but different Value types,
are mapped out to reduce, multiple Value types is not allowed. In this case,
this class can help you wrap instances with different types.

Compared with ObjectWritable, this class is much more effective, because
ObjectWritable will append the class declaration as a String into the output
file in every Key-Value pair.

Generic Writable implements Configurable interface, so that it will be
configured by the framework. The configuration is passed to the wrapped
objects implementing Configurable interface *before deserialization*.
how to use it:
1. Write your own class, such as GenericObject, which extends
GenericWritable.
2. Implements the abstract method getTypes(), defines the classes which will
be wrapped in GenericObject in application. Attention: this classes defined
in getTypes() method, must implement Writable interface.

The code looks like this:

 public class GenericObject extends GenericWritable {

   private static Class[] CLASSES = {
               ClassType1.class,
               ClassType2.class,
               ClassType3.class,
               };

   protected Class[] getTypes() {
       return CLASSES;
   }

 }

For example,  in your case,

public class YourWritable extends GenericWritable {
  private static Class<? extends Writable>[] CLASSES = null;

  static {
    CLASSES = (Class<? extends Writable>[]) new Class[] {
        org.apache.hadoop.io.IntWritable.class,
        org.apache.hadoop.io.BytesWritable.class};
  }

  public YourWritable () {
  }

  public YourWritable(Writable instance) {
    set(instance);
  }

  @Override
  protected Class<? extends Writable>[] getTypes() {
    return CLASSES;
  }
}

then modify your Jobconf like this,

  theJob.setOutputKeyClass(IntWritable.class);
  theJob.setOutputValueClass(YourWritable.class);
  ...

after that, you mapper and reducer class can be written as

       public static class ClosestCenterCB extends MapReduceBase implements
Reducer<IntWritable, Text, IntWritable, YourWritable>{
               public void reduce(IntWritable key, Iterator<Text> values,
OutputCollector<IntWritable, YourWritable> output, Reporter reporter){
            BytesWritable outValue = .... ;
            ouput.collect(outKey, new YourWritable(outValue)); // wrap it
          }
       }

       public static class YourReducer extends MapReduceBase implements
Reducer<IntWritable, YourWritable, IntWritable, YourWritable>{
               public void reduce(IntWritable key, Iterator<YourWritable>
values, OutputCollector<IntWritable, YourWritable> output, Reporter
reporter) throws IOException {
                  // retrieve value like this
                  BytesWritable realValue = (BytesWritable)
values.next().get();
                  // generate the output Value, then wrap it
                  Text outValue = ...;
                  ouput.collect(outKey, new YourWritable(outValue));
               }
       }


you can also check out http://coderplay.javaeye.com/blog/259880,  this link
will show you a real example in every inch.


-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

http://coderplay.javaeye.com

Re: A question about the combiner, reducer and the Output value class: can they be different?

Reply via email to