Hadoop 0.19, Cascading 1.0 and MultipleOutputs problem

Mikhail Yakshin Wed, 28 Jan 2009 07:54:58 -0800

Hi,

We have a system based on Hadoop 0.18 / Cascading 0.8.1 and now I'm
trying to port it to Hadoop 0.19 / Cascading 1.0. The first serious
problem I've got into that we're extensively using MultipleOutputs in
our jobs dealing with sequence files that store Cascading's Tuples.


Since Cascading 0.9, Tuples stopped being WritableComparable and
implemented generic Hadoop serialization interface and framework.
However, in Hadoop 0.19, MultipleOutputs require use of older
WritableComparable interface. Thus, trying to do something like:

MultipleOutputs.addNamedOutput(conf, "output-name",
MySpecialMultiSplitOutputFormat.class, Tuple.class, Tuple.class);
mos = new MultipleOutputs(conf);
...
mos.getCollector("output-name", reporter).collect(tuple1, tuple2);

yields an error:

java.lang.RuntimeException: java.lang.RuntimeException: class
cascading.tuple.Tuple not org.apache.hadoop.io.WritableComparable
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:752)
        at 
org.apache.hadoop.mapred.lib.MultipleOutputs.getNamedOutputKeyClass(MultipleOutputs.java:252)
        at 
org.apache.hadoop.mapred.lib.MultipleOutputs$InternalFileOutputFormat.getRecordWriter(MultipleOutputs.java:556)
        at 
org.apache.hadoop.mapred.lib.MultipleOutputs.getRecordWriter(MultipleOutputs.java:425)
        at 
org.apache.hadoop.mapred.lib.MultipleOutputs.getCollector(MultipleOutputs.java:511)
        at 
org.apache.hadoop.mapred.lib.MultipleOutputs.getCollector(MultipleOutputs.java:476)
        at my.namespace.MyReducer.reduce(MyReducer.java:xxx)

Is there any known workaround for that? Any progress going on to make
MultipleOutputs use generic Hadoop serialization?

-- 
WBR, Mikhail Yakshin

Hadoop 0.19, Cascading 1.0 and MultipleOutputs problem

Reply via email to