Thanks Kenn.

On Fri, May 18, 2018 at 11:02 AM Kenneth Knowles <k...@google.com> wrote:

> The fact that its usage has grown probably indicates that we have a large
> number of transforms that can easily cause data loss / duplication.
>

Is this specific to Reshuffle or it is true for any GroupByKey? I see
Reshuffle as just a wrapper around GBK.

Raghu.

>
>
> Yes, it is deprecated because it is primarily used as a Dataflow-specific
> way to ensure stable input. My understanding is that the SparkRunner also
> materializes at every GBK so it works there too (is this still the case?).
> It doesn't work at all for other runners AFAIK. So it is @Deprecated not
> because there is a replacement, but because it is kind of dangerous to use. 
> Beam
> could just say "GBK must ensure stable output" and "a composite containing
> a GBK has to ensure stable output even if replaced" and that would solve
> the issue, but I think it would make Beam on Flink impossibly slow - I
> could be wrong about that. Generally stable input is tied to durability
> model which is a key design point for engines.
>
> True that it isn't the only use, and I know you have been trying to nail
> down what the uses actually are. Ben wrote up various uses in a portable
> manner at https://beam.apache.org/documentation/execution-model.
>
>  - Coupled failure is the use where Reshuffle is to provide stable input
>  - Breaking dependent parallelism is more portable - but since it is the
> identity transform a runner may just elide it; it is a hint, basically, and
> that seem OK (but can we do it more directly?)
>
> What I don't want is to build something where the implementation details
> are the spec, and not fundamental, which is sort of where Reshuffle lies.
> This thread highlights that this is a pretty urgent problem with our SDKs
> and runners that it would be very helpful to work on.
>
> Kenn
>
>
>
> On Fri, May 18, 2018 at 7:50 AM Eugene Kirpichov <kirpic...@google.com>
> wrote:
>
>> Agreed that it should be undeprecated, many users are getting confused by
>> this.
>> I know that some people are working on a replacement for at least one of
>> its use cases (RequiresStableInput), but the use case of breaking fusion
>> is, as of yet, unaddressed, and there's not much to be gained by keeping it
>> deprecated.
>>
>> On Fri, May 18, 2018 at 7:45 AM Raghu Angadi <rang...@google.com> wrote:
>>
>>> I am interested in more clarity on this as well. It has been deprecated
>>> for a long time without a replacement, and its usage has only grown, both
>>> within Beam code base as well as in user applications.
>>>
>>> If we are certain that it will not be removed before there is a good
>>> replacement for it, can we undeprecate it until there are proper plans for
>>> replacement?
>>>
>>> On Fri, May 18, 2018 at 7:12 AM Ismaël Mejía <ieme...@gmail.com> wrote:
>>>
>>>> I saw in a recent thread that the use of the Reshuffle transform was
>>>> recommended to solve an user issue:
>>>>
>>>>
>>>> https://lists.apache.org/thread.html/87ef575ac67948868648e0a8110be242f811bfff8fdaa7f9b758b933@%3Cdev.beam.apache.org%3E
>>>>
>>>> I can see why it may fix the reported issue. I am just curious about
>>>> the fact that the Reshuffle transform is marked as both @Internal and
>>>> @Deprecated in Beam's SDK.
>>>>
>>>> Do we have some alternative? So far the class documentation does not
>>>> recommend any replacement.
>>>>
>>>

Reply via email to