Filed https://issues.apache.org/jira/browse/BEAM-4372 (unassigned).
On Mon, May 21, 2018 at 10:22 AM Raghu Angadi wrote:
>
>
> On Mon, May 21, 2018 at 9:56 AM Robert Bradshaw
> wrote:
>
>> We should probably keep the warning and all the caveats until we
>> introduce the alternative (and migrate
On Mon, May 21, 2018 at 9:56 AM Robert Bradshaw wrote:
> We should probably keep the warning and all the caveats until we introduce
> the alternative (and migrate to it for the non-parallelism uses of
> reshuffle). I was just proposing we do this via a separate transform that
> just calls Reshuff
+1 to introducing alternative transforms even if they wrap Reshuffle
The benefits of making them distinct is that we can put appropriate Javadoc
in place and runners can figure out what the user is intending and whether
Reshuffle or some other implementation is appropriate. We can also see
which o
We should probably keep the warning and all the caveats until we introduce
the alternative (and migrate to it for the non-parallelism uses of
reshuffle). I was just proposing we do this via a separate transform that
just calls Reshuffle until we have the new story fully fleshed out (I don't
know if
On Sat, May 19, 2018 at 10:55 PM Robert Bradshaw
wrote:
> On Sat, May 19, 2018 at 6:27 PM Raghu Angadi wrote:
>
>> [...]
>>
> I think it would be much more user friendly to un-deprecate it to add a
>> warning for advanced users about non-portability of durability/replay
>> guarantees/stable inpu
On Sat, May 19, 2018 at 10:55 PM Robert Bradshaw
wrote:
> On Sat, May 19, 2018 at 6:27 PM Raghu Angadi wrote:
>
>> On Sat, May 19, 2018 at 8:11 AM Robert Bradshaw
>> wrote:
>>
>>> On Fri, May 18, 2018 at 6:29 PM Raghu Angadi wrote:
>>>
True. I am still failing to see what is broken about
On Sat, May 19, 2018 at 6:27 PM Raghu Angadi wrote:
> On Sat, May 19, 2018 at 8:11 AM Robert Bradshaw
> wrote:
>
>> On Fri, May 18, 2018 at 6:29 PM Raghu Angadi wrote:
>>
>>> True. I am still failing to see what is broken about Reshuffle that is
>>> also not broken with GroupByKey transform. If
On Sat, May 19, 2018 at 8:11 AM Robert Bradshaw wrote:
> On Fri, May 18, 2018 at 6:29 PM Raghu Angadi wrote:
>
>> True. I am still failing to see what is broken about Reshuffle that is
>> also not broken with GroupByKey transform. If someone depends on GroupByKey
>> to get stable input, isn't th
On Fri, May 18, 2018 at 6:29 PM Raghu Angadi wrote:
>
> On Fri, May 18, 2018 at 5:34 PM Robert Bradshaw
> wrote:
>
>> Ah, thanks, that makes sense. That implies to me Reshuffle is no more
>>> broken than GBK itself. May be Reshuffle.viaRandomKey() could have a clear
>>> caveat. Reshuffle's JavaD
On Fri, May 18, 2018 at 5:34 PM Robert Bradshaw wrote:
> Ah, thanks, that makes sense. That implies to me Reshuffle is no more
>> broken than GBK itself. May be Reshuffle.viaRandomKey() could have a clear
>> caveat. Reshuffle's JavaDoc could add a caveat too about non-deterministic
>> keys and re
On Fri, May 18, 2018 at 4:24 PM Raghu Angadi wrote:
> On Fri, May 18, 2018 at 4:07 PM Kenneth Knowles wrote:
>
>> It isn't any particular logic in Reshuffle - it is, semantically, an
>> identity transform. It is the fact that other runners are perfectly able to
>> re-run transform prior to a GBK
On Fri, May 18, 2018 at 4:07 PM Kenneth Knowles wrote:
> It isn't any particular logic in Reshuffle - it is, semantically, an
> identity transform. It is the fact that other runners are perfectly able to
> re-run transform prior to a GBK. So, for example, randomly generated IDs
> will be re-gener
It isn't any particular logic in Reshuffle - it is, semantically, an
identity transform. It is the fact that other runners are perfectly able to
re-run transform prior to a GBK. So, for example, randomly generated IDs
will be re-generated. We tend to put in reshuffles in order to "commit"
these ran
On Fri, May 18, 2018 at 12:22 PM Robert Bradshaw
wrote:
> [resending]
>
Agreed that keeping this deprecated without a clear replacement for so long
> is not ideal.
>
> I would at least break this into two separate transforms, the
> parallelism-breaking one (which seems OK) and the stable input on
On Fri, May 18, 2018 at 12:21 PM Robert Bradshaw
wrote:
> On Fri, May 18, 2018 at 11:46 AM Raghu Angadi wrote:
>
>> Thanks Kenn.
>>
>> On Fri, May 18, 2018 at 11:02 AM Kenneth Knowles wrote:
>>
>>> The fact that its usage has grown probably indicates that we have a
>>> large number of transform
[resending]
Agreed that keeping this deprecated without a clear replacement for so long
is not ideal.
I would at least break this into two separate transforms, the
parallelism-breaking one (which seems OK) and the stable input one (which
may just call the parallelism-breaking one, but should be d
On Fri, May 18, 2018 at 11:46 AM Raghu Angadi wrote:
> Thanks Kenn.
>
> On Fri, May 18, 2018 at 11:02 AM Kenneth Knowles wrote:
>
>> The fact that its usage has grown probably indicates that we have a large
>> number of transforms that can easily cause data loss / duplication.
>>
>
> Is this spe
On Fri, May 18, 2018 at 11:46 AM Raghu Angadi
wrote:>> Thanks
Kenn.>> On Fri, May 18, 2018 at 11:02 AM Kenneth Knowles
wrote: The fact that
its usage has grown probably indicates that we have a large number of
transforms that can easily cause data loss /
d
Agreed that keeping this deprecated without a clear replacement for so
long is not ideal. I would at least break this into two
separate transforms, the parallelism-breaking one (which seems OK) and
the stable input one (which may just call the parallelism-breaking
one, but should be decorated with
Thanks Kenn.
On Fri, May 18, 2018 at 11:02 AM Kenneth Knowles wrote:
> The fact that its usage has grown probably indicates that we have a large
> number of transforms that can easily cause data loss / duplication.
>
Is this specific to Reshuffle or it is true for any GroupByKey? I see
Reshuffl
The fact that its usage has grown probably indicates that we have a large
number of transforms that can easily cause data loss / duplication.
Yes, it is deprecated because it is primarily used as a Dataflow-specific
way to ensure stable input. My understanding is that the SparkRunner also
material
Agreed that it should be undeprecated, many users are getting confused by
this.
I know that some people are working on a replacement for at least one of
its use cases (RequiresStableInput), but the use case of breaking fusion
is, as of yet, unaddressed, and there's not much to be gained by keeping
I am interested in more clarity on this as well. It has been deprecated for
a long time without a replacement, and its usage has only grown, both
within Beam code base as well as in user applications.
If we are certain that it will not be removed before there is a good
replacement for it, can we u
I saw in a recent thread that the use of the Reshuffle transform was
recommended to solve an user issue:
https://lists.apache.org/thread.html/87ef575ac67948868648e0a8110be242f811bfff8fdaa7f9b758b933@%3Cdev.beam.apache.org%3E
I can see why it may fix the reported issue. I am just curious about
the
24 matches
Mail list logo