I would like to create a hierarchy of output files based on the keys passed to
the reducer. The first folder level is the first few digits of the key, the
next level is the next few, etc. I had written a very ugly hack that achieved
this by passing a filesystem object into the record writer. It seems however
that this use case is what the MultipleOutputs APi was designed to handle. I
began to implement it based on examples I have found but I am getting stuck.
In my Tool I have the following:
----------------
MultipleOutputs.addNamedOutput(job, "namedOutput",
SlightlyModifiedTextOutputFormat.class, keyClass, valueClass);
----------------
In my Reducer I have the following:
----------------
private MultipleOutputs<KeyValue> mo_context;
public void setup(Context context) {
mo_context = new MultipleOutputs<Key, Value>(context);
}
protected void reduce(Key key, Iterable<Value> values, Context context)
throws IOException, InterruptedException {
for(Value value: values) {
//context.write(key, value);
mo_context.write(key, value, key.toString()); // I can change
key.toString() to include the folder tree if needed
context.progress();
}
}
public void cleanup(Context context) throws IOException,
InterruptedException {
if (mo_context != null) {
mo_context.close();
}
}
----------------
When I run it I receive the following stack trace just as reducing begins:
----------------
java.lang.NoSuchMethodError:
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputName(Lorg/apache/hadoop/mapreduce/JobContext;Ljava/lang/String;)V
at
org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:439)
at
org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:408)
at
xxxxxx.xxxxxxxxxxx.xxxx.xxxxx.xxxxxxxxxxxxxxxxxReducer.reduce(xxxxxxxxxxxxxxxxxReducer.java:54)
at
xxxxxx.xxxxxxxxxxx.xxxx.xxxxx.xxxxxxxxxxxxxxxxxReducer.reduce(xxxxxxxxxxxxxxxxxReducer.java:27)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
----------------
I must be setting this up incorrectly somehow. Does anyone have a solid example
of using OutputFormats that shows the job setup, reduction, and possibly the
output format, and is using a version around 0.20.205.0?