[
https://issues.apache.org/jira/browse/CRUNCH-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15964179#comment-15964179
]
Xavier commented on CRUNCH-642:
-------------------------------
Thank you for the swift response!
> Enable numReducers option for methods in Distinct
> -------------------------------------------------
>
> Key: CRUNCH-642
> URL: https://issues.apache.org/jira/browse/CRUNCH-642
> Project: Crunch
> Issue Type: Improvement
> Components: Core
> Affects Versions: 0.14.0
> Reporter: Xavier
> Assignee: Josh Wills
> Priority: Trivial
> Attachments: CRUNCH-642.patch
>
>
> The {{groupByKey}} invocation in the {{Distinct}} class currently uses the
> default (recommended) number of reducers without providing an option to
> override this:
> {code}
> public static <S> PCollection<S> distinct(PCollection<S> input, int
> flushEvery) {
> Preconditions.checkArgument(flushEvery > 0);
> PType<S> pt = input.getPType();
> PTypeFamily ptf = pt.getFamily();
> return input
> .parallelDo("pre-distinct", new PreDistinctFn<S>(flushEvery, pt),
> ptf.tableOf(pt, ptf.nulls()))
> .groupByKey()
> .parallelDo("post-distinct", new PostDistinctFn<S>(), pt);
> }
> {code}
> Would it be possible to enhance this method such that it is possible to
> customize the number of reducers? Either explicitly or via a
> {{GroupingOptions}} object.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)