Agreed that keeping this deprecated without a clear replacement for so
long is not ideal. <br><br>I would at least break this into two
separate transforms, the parallelism-breaking one (which seems OK) and
the stable input one (which may just call the parallelism-breaking
one, but should be decorated with lots of caveats and maybe even still
have the deprecated annotation). <br>On Fri, May 18, 2018 at 11:02 AM
Kenneth Knowles &lt;k...@google.com&gt; wrote:<br>&gt;<br>&gt; The fact
that its usage has grown probably indicates that we have a large
number of transforms that can easily cause data loss /
duplication.<br>&gt;<br>&gt; Yes, it is deprecated because it is
primarily used as a Dataflow-specific way to ensure stable input. My
understanding is that the SparkRunner also materializes at every GBK
so it works there too (is this still the case?). It doesn&#39;t work
at all for other runners AFAIK. So it is @Deprecated not because there
is a replacement, but because it is kind of dangerous to use. Beam
could just say &quot;GBK must ensure stable output&quot; and &quot;a
composite containing a GBK has to ensure stable output even if
replaced&quot; and that would solve the issue, but I think it would
make Beam on Flink impossibly slow - I could be wrong about that.
Generally stable input is tied to durability model which is a key
design point for engines.<br>&gt;<br>&gt; True that it isn&#39;t the
only use, and I know you have been trying to nail down what the uses
actually are. Ben wrote up various uses in a portable manner at
https://beam.apache.org/documentation/execution-model.<br>&gt;<br>&gt;
&nbsp;- Coupled failure is the use where Reshuffle is to provide
stable input<br>&gt; &nbsp;- Breaking dependent parallelism is more
portable - but since it is the identity transform a runner may just
elide it; it is a hint, basically, and that seem OK (but can we do it
more directly?)<br>&gt;<br>&gt; What I don&#39;t want is to build
something where the implementation details are the spec, and not
fundamental, which is sort of where Reshuffle lies. This thread
highlights that this is a pretty urgent problem with our SDKs and
runners that it would be very helpful to work on.<br>&gt;<br>&gt;
Kenn<br>&gt;<br>&gt;<br>&gt;<br>&gt; On Fri, May 18, 2018 at 7:50 AM
Eugene Kirpichov &lt;kirpic...@google.com&gt;
wrote:<br>&gt;&gt;<br>&gt;&gt; Agreed that it should be undeprecated,
many users are getting confused by this.<br>&gt;&gt; I know that some
people are working on a replacement for at least one of its use cases
(RequiresStableInput), but the use case of breaking fusion is, as of
yet, unaddressed, and there&#39;s not much to be gained by keeping it
deprecated.<br>&gt;&gt;<br>&gt;&gt; On Fri, May 18, 2018 at 7:45 AM
Raghu Angadi &lt;rang...@google.com&gt;
wrote:<br>&gt;&gt;&gt;<br>&gt;&gt;&gt; I am interested in more clarity
on this as well. It has been deprecated for a long time without a
replacement, and its usage has only grown, both within Beam code base
as well as in user applications.<br>&gt;&gt;&gt;<br>&gt;&gt;&gt; If we
are certain that it will not be removed before there is a good
replacement for it, can we undeprecate it until there are proper plans
for replacement? <br>&gt;&gt;&gt;<br>&gt;&gt;&gt; On Fri, May 18, 2018
at 7:12 AM Ismaël Mejía &lt;ieme...@gmail.com&gt;
wrote:<br>&gt;&gt;&gt;&gt;<br>&gt;&gt;&gt;&gt; I saw in a recent
thread that the use of the Reshuffle transform was<br>&gt;&gt;&gt;&gt;
recommended to solve an user
issue:<br>&gt;&gt;&gt;&gt;<br>&gt;&gt;&gt;&gt;
https://lists.apache.org/thread.html/87ef575ac67948868648e0a8110be242f811bfff8fdaa7f9b758b933@%3Cdev.beam.apache.org%3E<br>&gt;&gt;&gt;&gt;<br>&gt;&gt;&gt;&gt;
I can see why it may fix the reported issue. I am just curious
about<br>&gt;&gt;&gt;&gt; the fact that the Reshuffle transform is
marked as both @Internal and<br>&gt;&gt;&gt;&gt; @Deprecated in
Beam&#39;s SDK.<br>&gt;&gt;&gt;&gt;<br>&gt;&gt;&gt;&gt; Do we have
some alternative? So far the class documentation does
not<br>&gt;&gt;&gt;&gt; recommend any replacement.

Reply via email to