The fact that its usage has grown probably indicates that we have a large
number of transforms that can easily cause data loss / duplication.

Yes, it is deprecated because it is primarily used as a Dataflow-specific
way to ensure stable input. My understanding is that the SparkRunner also
materializes at every GBK so it works there too (is this still the case?).
It doesn't work at all for other runners AFAIK. So it is @Deprecated not
because there is a replacement, but because it is kind of dangerous to
use. Beam
could just say "GBK must ensure stable output" and "a composite containing
a GBK has to ensure stable output even if replaced" and that would solve
the issue, but I think it would make Beam on Flink impossibly slow - I
could be wrong about that. Generally stable input is tied to durability
model which is a key design point for engines.

True that it isn't the only use, and I know you have been trying to nail
down what the uses actually are. Ben wrote up various uses in a portable
manner at https://beam.apache.org/documentation/execution-model.

 - Coupled failure is the use where Reshuffle is to provide stable input
 - Breaking dependent parallelism is more portable - but since it is the
identity transform a runner may just elide it; it is a hint, basically, and
that seem OK (but can we do it more directly?)

What I don't want is to build something where the implementation details
are the spec, and not fundamental, which is sort of where Reshuffle lies.
This thread highlights that this is a pretty urgent problem with our SDKs
and runners that it would be very helpful to work on.

Kenn



On Fri, May 18, 2018 at 7:50 AM Eugene Kirpichov <[email protected]>
wrote:

> Agreed that it should be undeprecated, many users are getting confused by
> this.
> I know that some people are working on a replacement for at least one of
> its use cases (RequiresStableInput), but the use case of breaking fusion
> is, as of yet, unaddressed, and there's not much to be gained by keeping it
> deprecated.
>
> On Fri, May 18, 2018 at 7:45 AM Raghu Angadi <[email protected]> wrote:
>
>> I am interested in more clarity on this as well. It has been deprecated
>> for a long time without a replacement, and its usage has only grown, both
>> within Beam code base as well as in user applications.
>>
>> If we are certain that it will not be removed before there is a good
>> replacement for it, can we undeprecate it until there are proper plans for
>> replacement?
>>
>> On Fri, May 18, 2018 at 7:12 AM Ismaël Mejía <[email protected]> wrote:
>>
>>> I saw in a recent thread that the use of the Reshuffle transform was
>>> recommended to solve an user issue:
>>>
>>>
>>> https://lists.apache.org/thread.html/87ef575ac67948868648e0a8110be242f811bfff8fdaa7f9b758b933@%3Cdev.beam.apache.org%3E
>>>
>>> I can see why it may fix the reported issue. I am just curious about
>>> the fact that the Reshuffle transform is marked as both @Internal and
>>> @Deprecated in Beam's SDK.
>>>
>>> Do we have some alternative? So far the class documentation does not
>>> recommend any replacement.
>>>
>>

Reply via email to