Agreed that keeping this deprecated without a clear replacement for so
long is not ideal. <br><br>I would at least break this into two
separate transforms, the parallelism-breaking one (which seems OK) and
the stable input one (which may just call the parallelism-breaking
one, but should be decorated with lots of caveats and maybe even still
have the deprecated annotation). <br>On Fri, May 18, 2018 at 11:02 AM
Kenneth Knowles <k...@google.com> wrote:<br>><br>> The fact
that its usage has grown probably indicates that we have a large
number of transforms that can easily cause data loss /
duplication.<br>><br>> Yes, it is deprecated because it is
primarily used as a Dataflow-specific way to ensure stable input. My
understanding is that the SparkRunner also materializes at every GBK
so it works there too (is this still the case?). It doesn't work
at all for other runners AFAIK. So it is @Deprecated not because there
is a replacement, but because it is kind of dangerous to use. Beam
could just say "GBK must ensure stable output" and "a
composite containing a GBK has to ensure stable output even if
replaced" and that would solve the issue, but I think it would
make Beam on Flink impossibly slow - I could be wrong about that.
Generally stable input is tied to durability model which is a key
design point for engines.<br>><br>> True that it isn't the
only use, and I know you have been trying to nail down what the uses
actually are. Ben wrote up various uses in a portable manner at
https://beam.apache.org/documentation/execution-model.<br>><br>>
- Coupled failure is the use where Reshuffle is to provide
stable input<br>> - Breaking dependent parallelism is more
portable - but since it is the identity transform a runner may just
elide it; it is a hint, basically, and that seem OK (but can we do it
more directly?)<br>><br>> What I don't want is to build
something where the implementation details are the spec, and not
fundamental, which is sort of where Reshuffle lies. This thread
highlights that this is a pretty urgent problem with our SDKs and
runners that it would be very helpful to work on.<br>><br>>
Kenn<br>><br>><br>><br>> On Fri, May 18, 2018 at 7:50 AM
Eugene Kirpichov <kirpic...@google.com>
wrote:<br>>><br>>> Agreed that it should be undeprecated,
many users are getting confused by this.<br>>> I know that some
people are working on a replacement for at least one of its use cases
(RequiresStableInput), but the use case of breaking fusion is, as of
yet, unaddressed, and there's not much to be gained by keeping it
deprecated.<br>>><br>>> On Fri, May 18, 2018 at 7:45 AM
Raghu Angadi <rang...@google.com>
wrote:<br>>>><br>>>> I am interested in more clarity
on this as well. It has been deprecated for a long time without a
replacement, and its usage has only grown, both within Beam code base
as well as in user applications.<br>>>><br>>>> If we
are certain that it will not be removed before there is a good
replacement for it, can we undeprecate it until there are proper plans
for replacement? <br>>>><br>>>> On Fri, May 18, 2018
at 7:12 AM Ismaël Mejía <ieme...@gmail.com>
wrote:<br>>>>><br>>>>> I saw in a recent
thread that the use of the Reshuffle transform was<br>>>>>
recommended to solve an user
issue:<br>>>>><br>>>>>
https://lists.apache.org/thread.html/87ef575ac67948868648e0a8110be242f811bfff8fdaa7f9b758b933@%3Cdev.beam.apache.org%3E<br>>>>><br>>>>>
I can see why it may fix the reported issue. I am just curious
about<br>>>>> the fact that the Reshuffle transform is
marked as both @Internal and<br>>>>> @Deprecated in
Beam's SDK.<br>>>>><br>>>>> Do we have
some alternative? So far the class documentation does
not<br>>>>> recommend any replacement.