Hi all,
Something I find myself doing reasonably often in mapreduce is to use
the reduce step as nothing more than a means to merge data into larger
files. Unless I've missed something in the API, there doesn't appear
to be a neat way to do this with Crunch. Here's what I have now:
PGroupedTable<MyAvroRecord, Void> grouped =
collection.parallelDo(new MapFn<MyAvroRecord, Pair<MyAvroRecord, Void>>() {
@Override
public Pair<MyAvroRecord, Void> map(MyAvroRecord input) {
return Pair.of(input, null);
}
}, Avros.tableOf(Avros.specifics(MyAvroRecord.class),
Avros.nulls())).groupByKey(4);
pipeline.write(grouped,At.avroFile(MyAvroRecord.class));
Is there a better way? Or if not, maybe we could have a utility
function to do this or similar?
Thanks,
Dave