Re: Merging files using identity reduce

Dave Beech Tue, 05 Feb 2013 09:15:45 -0800

Thanks Josh. I'll open a JIRA

On 5 Feb 2013, at 17:13, Josh Wills <[email protected]> wrote:


> Sounds useful, no way to do it now, I think.
> 
> On Feb 5, 2013 12:00 PM, "Dave Beech" <[email protected]> wrote:
>> Hi all,
>> 
>> Something I find myself doing reasonably often in mapreduce is to use
>> the reduce step as nothing more than a means to merge data into larger
>> files. Unless I've missed something in the API, there doesn't appear
>> to be a neat way to do this with Crunch. Here's what I have now:
>> 
>> PGroupedTable<MyAvroRecord, Void> grouped =
>>   collection.parallelDo(new MapFn<MyAvroRecord, Pair<MyAvroRecord, Void>>() {
>>       @Override
>>         public Pair<MyAvroRecord, Void> map(MyAvroRecord input) {
>>             return Pair.of(input, null);
>>         }
>>     }, Avros.tableOf(Avros.specifics(MyAvroRecord.class),
>> Avros.nulls())).groupByKey(4);
>> 
>> pipeline.write(grouped,At.avroFile(MyAvroRecord.class));
>> 
>> Is there a better way? Or if not, maybe we could have a utility
>> function to do this or similar?
>> 
>> Thanks,
>> Dave

Re: Merging files using identity reduce

Reply via email to