The fact that its usage has grown probably indicates that we have a large number of transforms that can easily cause data loss / duplication.
Yes, it is deprecated because it is primarily used as a Dataflow-specific way to ensure stable input. My understanding is that the SparkRunner also materializes at every GBK so it works there too (is this still the case?). It doesn't work at all for other runners AFAIK. So it is @Deprecated not because there is a replacement, but because it is kind of dangerous to use. Beam could just say "GBK must ensure stable output" and "a composite containing a GBK has to ensure stable output even if replaced" and that would solve the issue, but I think it would make Beam on Flink impossibly slow - I could be wrong about that. Generally stable input is tied to durability model which is a key design point for engines. True that it isn't the only use, and I know you have been trying to nail down what the uses actually are. Ben wrote up various uses in a portable manner at https://beam.apache.org/documentation/execution-model. - Coupled failure is the use where Reshuffle is to provide stable input - Breaking dependent parallelism is more portable - but since it is the identity transform a runner may just elide it; it is a hint, basically, and that seem OK (but can we do it more directly?) What I don't want is to build something where the implementation details are the spec, and not fundamental, which is sort of where Reshuffle lies. This thread highlights that this is a pretty urgent problem with our SDKs and runners that it would be very helpful to work on. Kenn On Fri, May 18, 2018 at 7:50 AM Eugene Kirpichov <[email protected]> wrote: > Agreed that it should be undeprecated, many users are getting confused by > this. > I know that some people are working on a replacement for at least one of > its use cases (RequiresStableInput), but the use case of breaking fusion > is, as of yet, unaddressed, and there's not much to be gained by keeping it > deprecated. > > On Fri, May 18, 2018 at 7:45 AM Raghu Angadi <[email protected]> wrote: > >> I am interested in more clarity on this as well. It has been deprecated >> for a long time without a replacement, and its usage has only grown, both >> within Beam code base as well as in user applications. >> >> If we are certain that it will not be removed before there is a good >> replacement for it, can we undeprecate it until there are proper plans for >> replacement? >> >> On Fri, May 18, 2018 at 7:12 AM Ismaël Mejía <[email protected]> wrote: >> >>> I saw in a recent thread that the use of the Reshuffle transform was >>> recommended to solve an user issue: >>> >>> >>> https://lists.apache.org/thread.html/87ef575ac67948868648e0a8110be242f811bfff8fdaa7f9b758b933@%3Cdev.beam.apache.org%3E >>> >>> I can see why it may fix the reported issue. I am just curious about >>> the fact that the Reshuffle transform is marked as both @Internal and >>> @Deprecated in Beam's SDK. >>> >>> Do we have some alternative? So far the class documentation does not >>> recommend any replacement. >>> >>
