Xavier created CRUNCH-642:
-----------------------------

             Summary: Enable numReducers option for methods in Distinct
                 Key: CRUNCH-642
                 URL: https://issues.apache.org/jira/browse/CRUNCH-642
             Project: Crunch
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.14.0
            Reporter: Xavier
            Assignee: Josh Wills
            Priority: Trivial


The {{groupByKey}} invocation in the {{Distinct}} class currently uses the 
default  (recommended) number of reducers without providing an option to 
override this:

{code}
public static <S> PCollection<S> distinct(PCollection<S> input, int flushEvery) 
{
  Preconditions.checkArgument(flushEvery > 0);
  PType<S> pt = input.getPType();
  PTypeFamily ptf = pt.getFamily();
  return input
      .parallelDo("pre-distinct", new PreDistinctFn<S>(flushEvery, pt), 
ptf.tableOf(pt, ptf.nulls()))
      .groupByKey()
      .parallelDo("post-distinct", new PostDistinctFn<S>(), pt);
}
{code}

Would it be possible to enhance this method such that it is possible to 
customize the number of reducers? Either explicitly or via a 
{{GroupingOptions}} object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to