Re: MapState API reloaded

Jan Lukavský Mon, 28 Mar 2022 03:07:24 -0700

I think that - besides the fact that Hyrum's law holds - this is aboutthe _severity_ of the impact of a change. Does it cause Pipeline tofail? Or does it cause incorrect outputs that might be realized aftersubstantial time without the ability to repair it anymore?

This is the second case, I'm afraid. Therefore I'd be +1 to replace themethod with some transitioning period, but to be able to return to theoriginal method with fixed semantics.


 Jan

On 3/23/22 17:54, Kenneth Knowles wrote:

In my view this method is like "add" or "subtract". Even if youaccurately document the wrong behavior, you are still wrong and youshould still fix it.

Also the converse of Hyrum's law applies, always. (Hyrum's law:eventually some user depends on every behavior, even if it is private,so every change is breaking. Converse: eventually every change isbreaking so every bugfix is a judgment call about the pro/con, even ifit is public)

A new method that has better behavior is an OK workaround, anddeprecate the old. But it is unnecessary IMO.

We focus too much on "breaking" changes that are easy to understandversus those that are important (dependency issues, performanceproblems, state changes).


Kenn

On Mon, Feb 28, 2022 at 2:34 AM Jan Lukavský <[email protected]> wrote:

    Agree that this might be an issue, though somewhat limited,
    because I (strong emphasize) *suppose* that users skipping
    multiple releases do care to test the upgrade appropriately. We
    can wait for more releases between deprecating and removing the
    API. But I don't think adding a different version of the API is
    any better, we should have a way to fix design bugs without
    introducing additional APIs.

     Jan

    On 2/26/22 01:34, Luke Cwik wrote:

    I have seen enough customers skip 10+ minor versions when
    updating to pick up a bug fix.

    On Fri, Feb 25, 2022 at 3:00 PM Brian Hulette
    <[email protected]> wrote:

        Is there a number of versions to wait that would make this
        acceptable?

        I'm curious if there's general guidance here.

        On Fri, Feb 25, 2022 at 1:30 PM Luke Cwik <[email protected]>
        wrote:

            There are enough people that skip many versions so they
            would get the changed semantics without realizing.

            On Mon, Feb 21, 2022 at 12:49 AM Jan Lukavský
            <[email protected]> wrote:

                I think we cannot change (documented) semantics
                without any notice. That could cause issues for
                upgraded pipelines (without proper testing, but can
                we rely on that?). I'd therefore suggest to:

                 a) deprecate computeIfAbsent()

                 b) create compute() as a new method and temporary
                replacement for computeIfAbsent()

                 c) after several (3, 4?) releases remove
                computeIfAbsent()

                 d) after next several releases return it back with
                changed semantics

                This way the unexpected change would affect only
                users skipping many releases.

                 Jan

                On 2/17/22 19:08, Kenneth Knowles wrote:

                I think this is a design & documentation bug. The
                behavior does not make sense. Deviating from
                Map#computeIfAbsent cannot really be fixed by
                documentation in bold, even though that is
                definitely better than nothing. Fixing this should
                not be considered a breaking change. It should be
                identical to Map#computeIfAbsent but deferred, hence
                a ReadableState.

                Kenn

                On Wed, Feb 16, 2022 at 2:12 PM Luke Cwik
                <[email protected]> wrote:

                    The least we could do is add documentation to
                    the current MapState.computeIfAbsent API in bold
                    that this differs from Java's
                    Map#computeIfAbsent since it returns the prior
                    value and not the new value.

                    You can add a computeIfAbsent2(...) and
                    deprecate the old one specifying the difference
                    in the API.

                    Adding compute() would also be nice but would
                    add complexity for users who only want the
                    computeIfAbsent2() implementation.



                    On Wed, Feb 16, 2022 at 1:01 AM Jan Lukavský
                    <[email protected]> wrote:

                        That behavior is documented, so if that is a
                        bug, I suppose it is a design bug [1]. I'm
                        not sure how to fix that, because we cannot
                        simply change the semantics without changing
                        API, that could result in user-code bugs. If
                        I understand it correctly, the only reason
                        for not returning the actual value, but a
                        ReadableState is to enable calling
                        readLater() to batch multiple gets to single
                        call. That makes sense and is sort of
                        consistent with the other usages of state in
                        general. But makes it impossible to return
                        the actual value V instead of ReadableState<V>.

                        Maybe we could deprecate 'computeIfAbsent()'
                        and introduce 'compute()' with the
                        JDK-compliant semantics? It is not the same,
                        though, but should be more practical than
                        the current version of 'computeIfAbsent',
                        which enforces various hacks to actually
                        retrieve the computed (that is the current!)
                        value.

                        WDYT?

                         Jan

                        [1]
                        
https://beam.apache.org/releases/javadoc/2.36.0/org/apache/beam/sdk/state/MapState.html#computeIfAbsent-K-java.util.function.Function-

                        On 2/16/22 02:10, Reuven Lax wrote:

                        Sounds like a bug, probably related to the
                        bug mentioned in that thread.

                        On Fri, Feb 11, 2022 at 8:58 AM
                        <[email protected]> wrote:

                            Yes, exactly, it returns null. The Java
                            contract is to return the existing
                            value or the new (computed value) instead.

                            Dne 11. 2. 2022 16:24 napsal uživatel
                            Reuven Lax <[email protected]>:

                                I'm not sure I understand what you
                                mean. Do you mean it always returns
                                null? Since computeIfAbsent doesn't
                                modify the value unless it is not
                                present, I'm not sure what the "old
                                value" means, unless you mean that
                                it returns null.

                                On Fri, Feb 11, 2022 at 4:54 AM Jan
                                Lukavský <[email protected]> wrote:

                                    Hi,

                                    I just found another weirdness
                                    in the MapState API - method
                                    computeIfAbsent returns the
                                    _old_ value associated with the
                                    key. That is
                                    inconsistent with Java API and
                                    frankly with what users should
                                    expect
                                    when using this method. The
                                    semantics should be "retrieve
                                    current value,
                                    if not present associate the
                                    key with this one and return me
                                    the current
                                    value". Is this behavior
                                    intentional? What is the
                                    motivation? In
                                    combination with the previous
                                    thread [1] that states, that
                                    the value is
                                    not updated until read() of the
                                    state returned from
                                    computeIfAbsent is
                                    called this seems to force
                                    users to either fetch the state
                                    twice or do
                                    various workarounds.

                                      Jan

                                    [1]
                                    
https://lists.apache.org/thread/mp979twlgm76gjg014f5r0k2p0nlcj2q

Re: MapState API reloaded

Reply via email to