[
https://issues.apache.org/jira/browse/CRUNCH-636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Attila Sasvari updated CRUNCH-636:
----------------------------------
Attachment: CRUNCH-636.04-amendment.patch
- Fixed test failure in
{{JobPrototypeTest#initialReplicationFactorUsedFromFileSystem}}
- Fixed 1 JLint warning in JobPrototype (Local variable 'job' shadows component
of class - Configuration argument of {{setInitialJobReplicationConfig()}} was
called job)
> Make replication factor for temporary files configurable
> --------------------------------------------------------
>
> Key: CRUNCH-636
> URL: https://issues.apache.org/jira/browse/CRUNCH-636
> Project: Crunch
> Issue Type: New Feature
> Reporter: Attila Sasvari
> Assignee: Attila Sasvari
> Fix For: 1.0.0
>
> Attachments: CRUNCH-636.01.patch, CRUNCH-636.02.patch,
> CRUNCH-636.03.patch, CRUNCH-636.04-amendment.patch, CRUNCH-636.04.patch,
> test.WordCount_2017-03-08_16.31.55.737_jobplan.dot.png,
> test.WordCount_2017-03-08_16.31.55.737.log
>
>
> As of now, Crunch does not allow having different replication factor for
> temporary files and non-temporary files (e.g. final output data of leaf
> nodes) at the same time. If a user has a large amount of data (say hundreds a
> of gigabytes) to process, they might want to have lower replication factor
> for large temporary files between Crunch jobs.
> We could make this configurable via a new setting (e.g.
> {{crunch.tmp.dir.replication}}).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)