Merging files using identity reduce

Dave Beech Tue, 05 Feb 2013 09:00:21 -0800

Hi all,

Something I find myself doing reasonably often in mapreduce is to use
the reduce step as nothing more than a means to merge data into larger
files. Unless I've missed something in the API, there doesn't appear
to be a neat way to do this with Crunch. Here's what I have now:


PGroupedTable<MyAvroRecord, Void> grouped =
  collection.parallelDo(new MapFn<MyAvroRecord, Pair<MyAvroRecord, Void>>() {
      @Override
        public Pair<MyAvroRecord, Void> map(MyAvroRecord input) {
            return Pair.of(input, null);
        }
    }, Avros.tableOf(Avros.specifics(MyAvroRecord.class),
Avros.nulls())).groupByKey(4);

pipeline.write(grouped,At.avroFile(MyAvroRecord.class));

Is there a better way? Or if not, maybe we could have a utility
function to do this or similar?

Thanks,
Dave

Merging files using identity reduce

Reply via email to