On Fri, May 18, 2018 at 12:22 PM Robert Bradshaw <rober...@google.com>
wrote:

> [resending]
>
Agreed that keeping this deprecated without a clear replacement for so long
> is not ideal.
>
> I would at least break this into two separate transforms, the
> parallelism-breaking one (which seems OK) and the stable input one (which
> may just call the parallelism-breaking one, but should be decorated with
> lots of caveats and maybe even still have the deprecated annotation).
>

+1. Parallelism-breaking one is the most relevant to many users. Would love
to see that part deprecated, ideally keeping the name Reshuffle.

Raghu.


>
>
> On Fri, May 18, 2018 at 11:02 AM Kenneth Knowles <k...@google.com> wrote:
>
>> The fact that its usage has grown probably indicates that we have a large
>> number of transforms that can easily cause data loss / duplication.
>>
>> Yes, it is deprecated because it is primarily used as a Dataflow-specific
>> way to ensure stable input. My understanding is that the SparkRunner also
>> materializes at every GBK so it works there too (is this still the case?).
>> It doesn't work at all for other runners AFAIK. So it is @Deprecated not
>> because there is a replacement, but because it is kind of dangerous to use. 
>> Beam
>> could just say "GBK must ensure stable output" and "a composite containing
>> a GBK has to ensure stable output even if replaced" and that would solve
>> the issue, but I think it would make Beam on Flink impossibly slow - I
>> could be wrong about that. Generally stable input is tied to durability
>> model which is a key design point for engines.
>>
>> True that it isn't the only use, and I know you have been trying to nail
>> down what the uses actually are. Ben wrote up various uses in a portable
>> manner at https://beam.apache.org/documentation/execution-model.
>>
>>  - Coupled failure is the use where Reshuffle is to provide stable input
>>  - Breaking dependent parallelism is more portable - but since it is the
>> identity transform a runner may just elide it; it is a hint, basically, and
>> that seem OK (but can we do it more directly?)
>>
>> What I don't want is to build something where the implementation details
>> are the spec, and not fundamental, which is sort of where Reshuffle lies.
>> This thread highlights that this is a pretty urgent problem with our SDKs
>> and runners that it would be very helpful to work on.
>>
>> Kenn
>>
>>
>>
>> On Fri, May 18, 2018 at 7:50 AM Eugene Kirpichov <kirpic...@google.com>
>> wrote:
>>
>>> Agreed that it should be undeprecated, many users are getting confused
>>> by this.
>>> I know that some people are working on a replacement for at least one of
>>> its use cases (RequiresStableInput), but the use case of breaking fusion
>>> is, as of yet, unaddressed, and there's not much to be gained by keeping it
>>> deprecated.
>>>
>>> On Fri, May 18, 2018 at 7:45 AM Raghu Angadi <rang...@google.com> wrote:
>>>
>>>> I am interested in more clarity on this as well. It has been deprecated
>>>> for a long time without a replacement, and its usage has only grown, both
>>>> within Beam code base as well as in user applications.
>>>>
>>>> If we are certain that it will not be removed before there is a good
>>>> replacement for it, can we undeprecate it until there are proper plans for
>>>> replacement?
>>>>
>>>> On Fri, May 18, 2018 at 7:12 AM Ismaël Mejía <ieme...@gmail.com> wrote:
>>>>
>>>>> I saw in a recent thread that the use of the Reshuffle transform was
>>>>> recommended to solve an user issue:
>>>>>
>>>>>
>>>>> https://lists.apache.org/thread.html/87ef575ac67948868648e0a8110be242f811bfff8fdaa7f9b758b933@%3Cdev.beam.apache.org%3E
>>>>>
>>>>> I can see why it may fix the reported issue. I am just curious about
>>>>> the fact that the Reshuffle transform is marked as both @Internal and
>>>>> @Deprecated in Beam's SDK.
>>>>>
>>>>> Do we have some alternative? So far the class documentation does not
>>>>> recommend any replacement.
>>>>>
>>>>

Reply via email to