Fabian,
Thanks for the clear response. You addressed my question, and the
suggestions provide clear context on how to address.
Best,
Will
On Fri, Aug 10, 2018 at 5:52 AM Fabian Hueske wrote:
> Hi Will,
>
> The distinct operator is implemented as a groupBy(distinctKeys) and a
> ReduceFunction
Hi Will,
The distinct operator is implemented as a groupBy(distinctKeys) and a
ReduceFunction that returns the first argument.
Hence, it depends on the order in which the records are processed by the
ReduceFunction.
Flink does not maintain a deterministic order because it is quite expensive
in
I'm operating on a data set with some challenges to overcome. They are:
1. There is possibility for multiple entries for a single key
and
2. For a single key, there may be multiple unique value-tuples
For example
key, val1, val2, val3
1, 0,0,0
1, 0,0,0
1,