On Fri, May 18, 2018 at 12:21 PM Robert Bradshaw <rober...@google.com> wrote:
> On Fri, May 18, 2018 at 11:46 AM Raghu Angadi <rang...@google.com> wrote: > >> Thanks Kenn. >> >> On Fri, May 18, 2018 at 11:02 AM Kenneth Knowles <k...@google.com> wrote: >> >>> The fact that its usage has grown probably indicates that we have a >>> large number of transforms that can easily cause data loss / duplication. >>> >> >> Is this specific to Reshuffle or it is true for any GroupByKey? I see >> Reshuffle as just a wrapper around GBK. >> > The issue is when it's used in such a way that data corruption can occur > when the underlying GBK output is not stable. > Could you describe this breakage bit more in detail or give a example? Apologies in advance, I know this came up in multiple contexts in the past, but I haven't grokked the issue well. It is the window rewrite that Reshuffle does that causes misuse of GBK? Thanks.