[ 
https://issues.apache.org/jira/browse/CRUNCH-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904607#comment-13904607
 ] 

Gabriel Reid commented on CRUNCH-347:
-------------------------------------

I think that Shard is indeed the best way to take care of something like this.

[~jgmath2000] about the granularity of crunch.max.reducers, PTable#groupByKey 
(which triggers a reduce) can take a number of partitions as a parameter, which 
allows you to specify how many reducers will be used on that specific reduce. 
Does that resolve your issue on the reducer count granularity?

> Allow writing of single file outputs
> ------------------------------------
>
>                 Key: CRUNCH-347
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-347
>             Project: Crunch
>          Issue Type: New Feature
>          Components: IO
>    Affects Versions: 0.9.0
>            Reporter: Jason Gauci
>            Priority: Minor
>
> One of the outputs from our system needs to be a single file to support a 
> system that is ingesting the data downstream.  We currently run the job and 
> then cat the output files together to create the final output, but it would 
> be nice if we could pass a flag to the write(...) function to handle this 
> case.
> Note that setting the number of reducers globally for the entire job doesn't 
> work in this case because of the significant performance implications.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to