On Fri, May 18, 2018 at 11:46 AM Raghu Angadi <[email protected]> wrote:
> Thanks Kenn. > > On Fri, May 18, 2018 at 11:02 AM Kenneth Knowles <[email protected]> wrote: > >> The fact that its usage has grown probably indicates that we have a large >> number of transforms that can easily cause data loss / duplication. >> > > Is this specific to Reshuffle or it is true for any GroupByKey? I see > Reshuffle as just a wrapper around GBK. > The issue is when it's used in such a way that data corruption can occur when the underlying GBK output is not stable.
