Xavier created CRUNCH-642:
-----------------------------
Summary: Enable numReducers option for methods in Distinct
Key: CRUNCH-642
URL: https://issues.apache.org/jira/browse/CRUNCH-642
Project: Crunch
Issue Type: Improvement
Components: Core
Affects Versions: 0.14.0
Reporter: Xavier
Assignee: Josh Wills
Priority: Trivial
The {{groupByKey}} invocation in the {{Distinct}} class currently uses the
default (recommended) number of reducers without providing an option to
override this:
{code}
public static <S> PCollection<S> distinct(PCollection<S> input, int flushEvery)
{
Preconditions.checkArgument(flushEvery > 0);
PType<S> pt = input.getPType();
PTypeFamily ptf = pt.getFamily();
return input
.parallelDo("pre-distinct", new PreDistinctFn<S>(flushEvery, pt),
ptf.tableOf(pt, ptf.nulls()))
.groupByKey()
.parallelDo("post-distinct", new PostDistinctFn<S>(), pt);
}
{code}
Would it be possible to enhance this method such that it is possible to
customize the number of reducers? Either explicitly or via a
{{GroupingOptions}} object.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)