[ 
https://issues.apache.org/jira/browse/CRUNCH-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15939137#comment-15939137
 ] 

Attila Sasvari commented on CRUNCH-636:
---------------------------------------

[~joshwills] One of the new unittests 
([initialReplicationFactorUsedFromFileSystem() | 
https://github.com/apache/crunch/blob/master/crunch-core/src/test/java/org/apache/crunch/impl/mr/plan/JobPrototypeTest.java#L88]
 ) fails on CentOS 6.4 with java version "1.8.0_121" if you run all the 
unittests together with {{mvn test}}. The issue is that if you create a new 
Hadoop {{Configuration()}}, it checks for default files to load by default (see 
[contructor | 
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L714]).
 Some tests earlier in crunch-core leaves something behind which confuses 
{{initialReplicationFactorUsedFromFileSystem()}}. Please note I could not 
see/reproduce this test failure on Mac OSX.

Does Crunch use amendment patches? Is it planned to use jenkins to catch issues 
like this?

> Make replication factor for temporary files configurable
> --------------------------------------------------------
>
>                 Key: CRUNCH-636
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-636
>             Project: Crunch
>          Issue Type: New Feature
>            Reporter: Attila Sasvari
>            Assignee: Attila Sasvari
>             Fix For: 1.0.0
>
>         Attachments: CRUNCH-636.01.patch, CRUNCH-636.02.patch, 
> CRUNCH-636.03.patch, CRUNCH-636.04.patch, 
> test.WordCount_2017-03-08_16.31.55.737_jobplan.dot.png, 
> test.WordCount_2017-03-08_16.31.55.737.log
>
>
> As of now, Crunch does not allow having different replication factor for 
> temporary files and non-temporary files (e.g. final output data of leaf 
> nodes) at the same time. If a user has a large amount of data (say hundreds a 
> of gigabytes) to process, they might want to have lower replication factor 
> for large temporary files between Crunch jobs. 
> We could make this configurable via a new setting (e.g. 
> {{crunch.tmp.dir.replication}}).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to