For now, I think it is worth documenting and leaving it as it is.

A while back we thought about adding a Static Code Analysis rule to find
such cases and create a warning. For Reduce, that is quite straightforward,
for GrouReduce quite tricky...
Am 02.02.2016 21:55 schrieb "Greg Hogan" <c...@greghogan.com>:

> If a user modifies keyed fields of a grouped reduce during a combine then
> the reduce will receive incorrect groupings. For example, a useless
> modification to word count:
>
>   public WC reduce(WC in1, WC in2) {
>     return new WC(in1.word + " " + in2.word, in1.count + in2.count);
>   }
>
> I don't see an efficient means to prevent this. Is this limitation worth
> documenting, or can we safely assume that no one will ever attempt this?
> MapReduce also has this limitation, and Spark gets around this by
> separating keys and values and only presenting values to reduce.
>
> "Reduce on Grouped DataSet: A Reduce transformation that is applied on a
> grouped DataSet reduces each group to a single element using a user-defined
> reduce function. For each group of input elements, a reduce function
> successively combines pairs of elements into one element until only a
> single element for each group remains."
>
> Greg
>

Reply via email to