[jira] [Commented] (CRUNCH-347) Allow writing of single file outputs

Dominique Dierickx (JIRA) Tue, 18 Feb 2014 13:08:37 -0800

    [ 
https://issues.apache.org/jira/browse/CRUNCH-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904592#comment-13904592
 ]


Dominique Dierickx commented on CRUNCH-347:
-------------------------------------------

We face a similar problem that we have a need to merge the final output of the 
pipeline using a "cat"-like process. If the output is simply text, then "hadoop 
-getmerge" would do the trick but when using Avro or Sequencefile you basically 
have to write your own logic.

One thing we're investigation is using either Crunch' Shard (See 
http://crunch.apache.org/user-guide.html#shard) or running a Sort on the final 
output in a separate pipeline, however, if sorting is not a requirement than 
this may just be too much overhead I guess.

> Allow writing of single file outputs
> ------------------------------------
>
>                 Key: CRUNCH-347
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-347
>             Project: Crunch
>          Issue Type: New Feature
>          Components: IO
>    Affects Versions: 0.9.0
>            Reporter: Jason Gauci
>            Priority: Minor
>
> One of the outputs from our system needs to be a single file to support a 
> system that is ingesting the data downstream.  We currently run the job and 
> then cat the output files together to create the final output, but it would 
> be nice if we could pass a flag to the write(...) function to handle this 
> case.
> Note that setting the number of reducers globally for the entire job doesn't 
> work in this case because of the significant performance implications.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CRUNCH-347) Allow writing of single file outputs

Reply via email to