On Fri, May 18, 2018 at 12:22 PM Robert Bradshaw <rober...@google.com> wrote:
> [resending] > Agreed that keeping this deprecated without a clear replacement for so long > is not ideal. > > I would at least break this into two separate transforms, the > parallelism-breaking one (which seems OK) and the stable input one (which > may just call the parallelism-breaking one, but should be decorated with > lots of caveats and maybe even still have the deprecated annotation). > +1. Parallelism-breaking one is the most relevant to many users. Would love to see that part deprecated, ideally keeping the name Reshuffle. Raghu. > > > On Fri, May 18, 2018 at 11:02 AM Kenneth Knowles <k...@google.com> wrote: > >> The fact that its usage has grown probably indicates that we have a large >> number of transforms that can easily cause data loss / duplication. >> >> Yes, it is deprecated because it is primarily used as a Dataflow-specific >> way to ensure stable input. My understanding is that the SparkRunner also >> materializes at every GBK so it works there too (is this still the case?). >> It doesn't work at all for other runners AFAIK. So it is @Deprecated not >> because there is a replacement, but because it is kind of dangerous to use. >> Beam >> could just say "GBK must ensure stable output" and "a composite containing >> a GBK has to ensure stable output even if replaced" and that would solve >> the issue, but I think it would make Beam on Flink impossibly slow - I >> could be wrong about that. Generally stable input is tied to durability >> model which is a key design point for engines. >> >> True that it isn't the only use, and I know you have been trying to nail >> down what the uses actually are. Ben wrote up various uses in a portable >> manner at https://beam.apache.org/documentation/execution-model. >> >> - Coupled failure is the use where Reshuffle is to provide stable input >> - Breaking dependent parallelism is more portable - but since it is the >> identity transform a runner may just elide it; it is a hint, basically, and >> that seem OK (but can we do it more directly?) >> >> What I don't want is to build something where the implementation details >> are the spec, and not fundamental, which is sort of where Reshuffle lies. >> This thread highlights that this is a pretty urgent problem with our SDKs >> and runners that it would be very helpful to work on. >> >> Kenn >> >> >> >> On Fri, May 18, 2018 at 7:50 AM Eugene Kirpichov <kirpic...@google.com> >> wrote: >> >>> Agreed that it should be undeprecated, many users are getting confused >>> by this. >>> I know that some people are working on a replacement for at least one of >>> its use cases (RequiresStableInput), but the use case of breaking fusion >>> is, as of yet, unaddressed, and there's not much to be gained by keeping it >>> deprecated. >>> >>> On Fri, May 18, 2018 at 7:45 AM Raghu Angadi <rang...@google.com> wrote: >>> >>>> I am interested in more clarity on this as well. It has been deprecated >>>> for a long time without a replacement, and its usage has only grown, both >>>> within Beam code base as well as in user applications. >>>> >>>> If we are certain that it will not be removed before there is a good >>>> replacement for it, can we undeprecate it until there are proper plans for >>>> replacement? >>>> >>>> On Fri, May 18, 2018 at 7:12 AM Ismaël Mejía <ieme...@gmail.com> wrote: >>>> >>>>> I saw in a recent thread that the use of the Reshuffle transform was >>>>> recommended to solve an user issue: >>>>> >>>>> >>>>> https://lists.apache.org/thread.html/87ef575ac67948868648e0a8110be242f811bfff8fdaa7f9b758b933@%3Cdev.beam.apache.org%3E >>>>> >>>>> I can see why it may fix the reported issue. I am just curious about >>>>> the fact that the Reshuffle transform is marked as both @Internal and >>>>> @Deprecated in Beam's SDK. >>>>> >>>>> Do we have some alternative? So far the class documentation does not >>>>> recommend any replacement. >>>>> >>>>