[
https://issues.apache.org/jira/browse/CRUNCH-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904592#comment-13904592
]
Dominique Dierickx commented on CRUNCH-347:
-------------------------------------------
We face a similar problem that we have a need to merge the final output of the
pipeline using a "cat"-like process. If the output is simply text, then "hadoop
-getmerge" would do the trick but when using Avro or Sequencefile you basically
have to write your own logic.
One thing we're investigation is using either Crunch' Shard (See
http://crunch.apache.org/user-guide.html#shard) or running a Sort on the final
output in a separate pipeline, however, if sorting is not a requirement than
this may just be too much overhead I guess.
> Allow writing of single file outputs
> ------------------------------------
>
> Key: CRUNCH-347
> URL: https://issues.apache.org/jira/browse/CRUNCH-347
> Project: Crunch
> Issue Type: New Feature
> Components: IO
> Affects Versions: 0.9.0
> Reporter: Jason Gauci
> Priority: Minor
>
> One of the outputs from our system needs to be a single file to support a
> system that is ingesting the data downstream. We currently run the job and
> then cat the output files together to create the final output, but it would
> be nice if we could pass a flag to the write(...) function to handle this
> case.
> Note that setting the number of reducers globally for the entire job doesn't
> work in this case because of the significant performance implications.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)