[
https://issues.apache.org/jira/browse/CRUNCH-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904563#comment-13904563
]
Jason Gauci commented on CRUNCH-347:
------------------------------------
I guess what the issue is asking for is more granularity on the
crunch.max.reducers. If I set this configuration parameter to '1', then I
would enforce one reducer and thus create one file. It would be nice if I
could force one reducer on the final mapreduce in the job that needs to output
a single file without affecting the other mapreduces in the pipeline.
Another approach would be a utility function that takes a materialized
PCollection that could be composed of many files on HDFS and merges them into
one file by using an identity mapper & reducer but with the max # of reducers
in that mapreduce set to 1.
> Allow writing of single file outputs
> ------------------------------------
>
> Key: CRUNCH-347
> URL: https://issues.apache.org/jira/browse/CRUNCH-347
> Project: Crunch
> Issue Type: New Feature
> Components: IO
> Affects Versions: 0.9.0
> Reporter: Jason Gauci
> Priority: Minor
>
> One of the outputs from our system needs to be a single file to support a
> system that is ingesting the data downstream. We currently run the job and
> then cat the output files together to create the final output, but it would
> be nice if we could pass a flag to the write(...) function to handle this
> case.
> Note that setting the number of reducers globally for the entire job doesn't
> work in this case because of the significant performance implications.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)