subject:"Dataset.distinct \\\\\\\- Question on deterministic results"

Re: Dataset.distinct - Question on deterministic results

2018-08-10 Thread Will Bastian

Fabian, Thanks for the clear response. You addressed my question, and the suggestions provide clear context on how to address. Best, Will On Fri, Aug 10, 2018 at 5:52 AM Fabian Hueske wrote: > Hi Will, > > The distinct operator is implemented as a groupBy(distinctKeys) and a > ReduceFunction

Re: Dataset.distinct - Question on deterministic results

2018-08-10 Thread Fabian Hueske

Hi Will, The distinct operator is implemented as a groupBy(distinctKeys) and a ReduceFunction that returns the first argument. Hence, it depends on the order in which the records are processed by the ReduceFunction. Flink does not maintain a deterministic order because it is quite expensive in

Dataset.distinct - Question on deterministic results

2018-08-09 Thread Will Bastian

I'm operating on a data set with some challenges to overcome. They are: 1. There is possibility for multiple entries for a single key and 2. For a single key, there may be multiple unique value-tuples For example key, val1, val2, val3 1, 0,0,0 1, 0,0,0 1,

Re: Dataset.distinct - Question on deterministic results

Re: Dataset.distinct - Question on deterministic results

Dataset.distinct - Question on deterministic results

3 matches

Site Navigation

Mail list logo

Footer information