One output file per key with MultipleOutputs (r0.20.205.0)

Berry, Matt Tue, 17 Jul 2012 14:06:40 -0700

I would like to create a hierarchy of output files based on the keys passed to 
the reducer. The first folder level is the first few digits of the key, the 
next level is the next few, etc. I had written a very ugly hack that achieved 
this by passing a filesystem object into the record writer. It seems however 
that this use case is what the MultipleOutputs APi was designed to handle. I 
began to implement it based on examples I have found but I am getting stuck.


In my Tool I have the following:
----------------
  MultipleOutputs.addNamedOutput(job, "namedOutput", 
SlightlyModifiedTextOutputFormat.class, keyClass, valueClass);
----------------


In my Reducer I have the following:
----------------
    private MultipleOutputs<KeyValue> mo_context;

    public void setup(Context context) {
        mo_context = new MultipleOutputs<Key, Value>(context);
    }
    
    protected void reduce(Key key, Iterable<Value> values, Context context) 
        throws IOException, InterruptedException {
              
        for(Value value: values) {    
            //context.write(key, value);
            mo_context.write(key, value, key.toString()); // I can change 
key.toString() to include the folder tree if needed
            context.progress();
        }
    }

    public void cleanup(Context context) throws IOException, 
InterruptedException {
        if (mo_context != null) {
            mo_context.close();
        }
    }
----------------

When I run it I receive the following stack trace just as reducing begins:
----------------
java.lang.NoSuchMethodError: 
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputName(Lorg/apache/hadoop/mapreduce/JobContext;Ljava/lang/String;)V
        at 
org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:439)
        at 
org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:408)
        at 
xxxxxx.xxxxxxxxxxx.xxxx.xxxxx.xxxxxxxxxxxxxxxxxReducer.reduce(xxxxxxxxxxxxxxxxxReducer.java:54)
        at 
xxxxxx.xxxxxxxxxxx.xxxx.xxxxx.xxxxxxxxxxxxxxxxxReducer.reduce(xxxxxxxxxxxxxxxxxReducer.java:27)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
        at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
----------------

I must be setting this up incorrectly somehow. Does anyone have a solid example 
of using OutputFormats that shows the job setup, reduction, and possibly the 
output format, and is using a version around 0.20.205.0?

One output file per key with MultipleOutputs (r0.20.205.0)

Reply via email to