Clément MATHIEU created CRUNCH-637:
--------------------------------------

             Summary: crunch.bytes.per.reduce.task cannot be used with 
GroupingOptions
                 Key: CRUNCH-637
                 URL: https://issues.apache.org/jira/browse/CRUNCH-637
             Project: Crunch
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.14.0
            Reporter: Clément MATHIEU
            Assignee: Josh Wills


I had expect to be able to use {{crunch.bytes.per.reduce.task}} in 
{{GroupingOptions}} to fine tune job parallelism. 
           
{code:java}
     .groupByKey(
                        GroupingOptions.builder()
                                .conf(PartitionUtils.BYTES_PER_REDUCE_TASK, 
Long.toString(50_000_000))
                                .partitionerClass(RoundRobinPartitioner.class)
                                .build())
{code}

However, {{PGroupedTableImpl}} does not care about 
{{GroupingOptions.extraConf}} and gets {{crunch.bytes.per.reduce.task}} from 
pipeline configuration.

{code:java}
public class PGroupedTableImpl<K, V> extends BaseGroupedTable<K, V> implements 
MRCollection {

    public void configureShuffle(Job job) {
        this.ptype.configureShuffle(job, this.groupingOptions);
        if(this.groupingOptions == null || 
this.groupingOptions.getNumReducers() <= 0) {
            int numReduceTasks = PartitionUtils.getRecommendedPartitions(this, 
this.getPipeline().getConfiguration());
            if(numReduceTasks > 0) {
                // [...] 
{code}

Is there any reason to not give {{GroupingOptions.extraConf}} a chance ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to