[
https://issues.apache.org/jira/browse/CRUNCH-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877547#comment-15877547
]
Josh Wills commented on CRUNCH-636:
-----------------------------------
Yeah, I'm not wild about this one to be honest. I see the appeal for certain
use cases, but we also have ways of configuring custom per-output settings via
the conf/outputConf methods on the Target API, and we should always let folks
have enough control over how things are configured on a per-job basis to be
able to do what they want. Like, I think the Crunch philosophy should be that
anything is possible (i.e., there's nothing you can do in vanilla MR that isn't
possible in Crunch), but that sane/stable defaults are also good, so let's not
make it all that easy to do something that is going to yield a bad/unreliable
user experience.
> Make replication factor for temporary files configurable
> --------------------------------------------------------
>
> Key: CRUNCH-636
> URL: https://issues.apache.org/jira/browse/CRUNCH-636
> Project: Crunch
> Issue Type: New Feature
> Reporter: Attila Sasvari
> Assignee: Attila Sasvari
>
> As of now, Crunch does not allow having different replication factor for
> temporary files and non-temporary files (e.g. final output data of leaf
> nodes) at the same time. If a user has a large amount of data (say hundreds a
> of gigabytes) to process, they might want to have lower replication factor
> for large temporary files between Crunch jobs.
> We could make this configurable via a new setting (e.g.
> {{crunch.tmp.dir.replication}}).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)