Let me add two new urns representing reshuffle via random key and reshuffle 
using key. I will share the PR later here, would need some help on Python/Go 
SDK changes too since I am not very familiar with them.

Best,
Ke


> On Oct 4, 2021, at 3:11 PM, Robert Bradshaw <rober...@google.com> wrote:
> 
> On Mon, Oct 4, 2021 at 3:08 PM Jan Lukavský <je...@seznam.cz 
> <mailto:je...@seznam.cz>> wrote:
>> 
>> On 10/4/21 11:43 PM, Robert Bradshaw wrote:
>>> Oh, yes.
>>> 
>>> Java Reshuffle.of() = Python ReshufflePerKey()
>>> Java Reshuffle.viaRandomKey() == Python Reshuffle()
>>> 
>>> We generally try to avoid this kind of discrepancy. It could make
>>> sense to rename Reshuffle.of() to Reshuffle.viaKey().
>> 
>> I'd suggest Reshuffle.usingKey(), but I'm not native speaker, so that
>> might be opinionated.
> 
> usingKey does sound better. (And, FWIW, usingRandomKey() sounds better
> to me than vaiRandomKey(), but probably not worth changing so the
> question becomes whether to be stilted or consistent.)
> 
>> More importantly - could we undeprecate Reshuffle
>> (in Java SDK)? I believe it was deprecated for wrong reasons - yes, it
>> has undocumented and non-portable side-effects, but is still makes sense
>> for various use-cases (e.g. fan-out, or SDF).
> 
> +1
> 
>>> 
>>> On Mon, Oct 4, 2021 at 2:33 PM Ke Wu <ke.wu...@gmail.com> wrote:
>>>> I should have said that the descrepency lives in SDK not Class vs Portable.
>>>> 
>>>> Correct me if I am wrong, Reshuffle transform in Java SDK requires the 
>>>> input to be KV [1] while Reshuffle in Python [2] and Go [3] does not.
>>>> 
>>>> 
>>>> [1] 
>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reshuffle.java#L53
>>>> [2] 
>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/util.py#L730
>>>> [3] https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/gbk.go#L122
>>>> 
>>>> On Oct 4, 2021, at 12:09 PM, Robert Bradshaw <rober...@google.com> wrote:
>>>> 
>>>> Reshuffle is not keyed, there is a separate reshuffle-per-key for
>>>> that. This is true for both Java and Python. This shouldn't depend on
>>>> classic vs. portable mode. It sounds like there's an issue in
>>>> translation.
>>>> 
>>>> On Mon, Oct 4, 2021 at 11:18 AM Ke Wu <ke.wu...@gmail.com> wrote:
>>>> 
>>>> 
>>>> Hello All,
>>>> 
>>>> Recent Samza Runner tests failure in python/xlang [1][2] reveals an 
>>>> interesting fact that Reshuffle Transform in classic pipeline requires the 
>>>> input to be KV while portable pipeline does not, where Reshuffle in 
>>>> portable mode it has an extra step to append a random key [3].
>>>> 
>>>> This suggests that Reshuffle in classic mode is, sort of, equivalent to 
>>>> ReshufflePerKey in potable mode instead of Reshuffle itself. Couple of 
>>>> questions on this:
>>>> 
>>>> 1. Is such SDK/API discrepancy expected?
>>>> 2. If Yes, then, what are the advised approach for runners to implement 
>>>> translators for such transforms?
>>>> 3. If No, is this something we can improve?
>>>> 
>>>> Best,
>>>> Ke
>>>> 
>>>> 
>>>> [1] https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Samza/288/
>>>> [2] https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/285/
>>>> [3] 
>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/util.py#L730

Reply via email to