So I've been futzing with Crunch a bit, and trying to understand how to
build a pipeline that outputs Avro data files. Roughly, I'm doing something
along these lines:
Schema.Parser schemaParser = new Schema.Parser();
final Schema avroObjSchema = schemaParser.parse(
schemaJsonString);
AvroType avroType = new AvroType<MyAvroObject>(MyAvroObject.class,
avroObjSchema, new
AvroDeepCopier.AvroReflectDeepCopier<MyAvroObject>(
MyAvroObject.class, avroObjSchema));
PCollection<MyAvroObject> words = logs.parallelDo(new DoFn<String,
MyAvroObject>() {
public void process(String line, Emitter<MyAvroObject> emitter) {
emitter.emit(convertStringToAvroObj(line));
}
}, avroType);
However, this results in a class cast exception:
Exception in thread "main" java.lang.ClassCastException: class
com.company.MyAvroObject
at java.lang.Class.asSubclass(Class.java:3039)
at
org.apache.crunch.types.writable.Writables.records(Writables.java:250)
at
org.apache.crunch.types.writable.WritableTypeFamily.records(WritableTypeFamily.java:86)
at org.apache.crunch.types.PTypeUtils.convert(PTypeUtils.java:61)
at org.apache.crunch.types.writable.WritableTypeFamily.as
(WritableTypeFamily.java:135)
at
org.apache.crunch.impl.mr.MRPipeline.writeTextFile(MRPipeline.java:319)
Anybody have any thoughts? There's got to be a magical incantation that I
have slightly off.