[ 
https://issues.apache.org/jira/browse/CRUNCH-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572313#comment-13572313
 ] 

Dave Beech commented on CRUNCH-162:
-----------------------------------

I've got a patch for this but I've just realised the code is almost identical 
to something in the Sort class (or at least, how it was before the hardcoded 
single reducer from CRUNCH-23). The "sort-pre" function followed by the GBK and 
ungroup is what I need, but with configurable numbers of reducers rather than 
configurable sort order. Also I've just noticed that the "sort-post" function 
inside Sort is a duplicate of PTables.keys

I don't want to add any further duplication. Any ideas?

 


                
> Add utility function for merging output by identity reduce
> ----------------------------------------------------------
>
>                 Key: CRUNCH-162
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-162
>             Project: Crunch
>          Issue Type: Improvement
>          Components: MapReduce Patterns
>    Affects Versions: 0.4.0
>            Reporter: Dave Beech
>            Priority: Minor
>
> Something I find myself doing reasonably often in mapreduce is to use
> the reduce step as nothing more than a means to merge data into larger
> files (using the identity reducer). 
> There doesn't appear to be a neat way to do this with Crunch at the moment.
> Ref: 
> http://mail-archives.apache.org/mod_mbox/incubator-crunch-user/201302.mbox/%3CCAFZSZPsXRxWT45c9w4ef7Ruij2exE4HP2CDNMjd%2BVc%3D9RWX-Jw%40mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to