Thanks Kenn. On Fri, May 18, 2018 at 11:02 AM Kenneth Knowles <k...@google.com> wrote:
> The fact that its usage has grown probably indicates that we have a large > number of transforms that can easily cause data loss / duplication. > Is this specific to Reshuffle or it is true for any GroupByKey? I see Reshuffle as just a wrapper around GBK. Raghu. > > > Yes, it is deprecated because it is primarily used as a Dataflow-specific > way to ensure stable input. My understanding is that the SparkRunner also > materializes at every GBK so it works there too (is this still the case?). > It doesn't work at all for other runners AFAIK. So it is @Deprecated not > because there is a replacement, but because it is kind of dangerous to use. > Beam > could just say "GBK must ensure stable output" and "a composite containing > a GBK has to ensure stable output even if replaced" and that would solve > the issue, but I think it would make Beam on Flink impossibly slow - I > could be wrong about that. Generally stable input is tied to durability > model which is a key design point for engines. > > True that it isn't the only use, and I know you have been trying to nail > down what the uses actually are. Ben wrote up various uses in a portable > manner at https://beam.apache.org/documentation/execution-model. > > - Coupled failure is the use where Reshuffle is to provide stable input > - Breaking dependent parallelism is more portable - but since it is the > identity transform a runner may just elide it; it is a hint, basically, and > that seem OK (but can we do it more directly?) > > What I don't want is to build something where the implementation details > are the spec, and not fundamental, which is sort of where Reshuffle lies. > This thread highlights that this is a pretty urgent problem with our SDKs > and runners that it would be very helpful to work on. > > Kenn > > > > On Fri, May 18, 2018 at 7:50 AM Eugene Kirpichov <kirpic...@google.com> > wrote: > >> Agreed that it should be undeprecated, many users are getting confused by >> this. >> I know that some people are working on a replacement for at least one of >> its use cases (RequiresStableInput), but the use case of breaking fusion >> is, as of yet, unaddressed, and there's not much to be gained by keeping it >> deprecated. >> >> On Fri, May 18, 2018 at 7:45 AM Raghu Angadi <rang...@google.com> wrote: >> >>> I am interested in more clarity on this as well. It has been deprecated >>> for a long time without a replacement, and its usage has only grown, both >>> within Beam code base as well as in user applications. >>> >>> If we are certain that it will not be removed before there is a good >>> replacement for it, can we undeprecate it until there are proper plans for >>> replacement? >>> >>> On Fri, May 18, 2018 at 7:12 AM Ismaël Mejía <ieme...@gmail.com> wrote: >>> >>>> I saw in a recent thread that the use of the Reshuffle transform was >>>> recommended to solve an user issue: >>>> >>>> >>>> https://lists.apache.org/thread.html/87ef575ac67948868648e0a8110be242f811bfff8fdaa7f9b758b933@%3Cdev.beam.apache.org%3E >>>> >>>> I can see why it may fix the reported issue. I am just curious about >>>> the fact that the Reshuffle transform is marked as both @Internal and >>>> @Deprecated in Beam's SDK. >>>> >>>> Do we have some alternative? So far the class documentation does not >>>> recommend any replacement. >>>> >>>