I think we cannot change (documented) semantics without any notice. That could cause issues for upgraded pipelines (without proper testing, but can we rely on that?). I'd therefore suggest to:

 a) deprecate computeIfAbsent()

 b) create compute() as a new method and temporary replacement for computeIfAbsent()

 c) after several (3, 4?) releases remove computeIfAbsent()

 d) after next several releases return it back with changed semantics

This way the unexpected change would affect only users skipping many releases.

 Jan

On 2/17/22 19:08, Kenneth Knowles wrote:
I think this is a design & documentation bug. The behavior does not make sense. Deviating from Map#computeIfAbsent cannot really be fixed by documentation in bold, even though that is definitely better than nothing. Fixing this should not be considered a breaking change. It should be identical to Map#computeIfAbsent but deferred, hence a ReadableState.

Kenn

On Wed, Feb 16, 2022 at 2:12 PM Luke Cwik <[email protected]> wrote:

    The least we could do is add documentation to the current
    MapState.computeIfAbsent API in bold that this differs from Java's
    Map#computeIfAbsent since it returns the prior value and not the
    new value.

    You can add a computeIfAbsent2(...) and deprecate the old one
    specifying the difference in the API.

    Adding compute() would also be nice but would add complexity for
    users who only want the computeIfAbsent2() implementation.



    On Wed, Feb 16, 2022 at 1:01 AM Jan Lukavský <[email protected]> wrote:

        That behavior is documented, so if that is a bug, I suppose it
        is a design bug [1]. I'm not sure how to fix that, because we
        cannot simply change the semantics without changing API, that
        could result in user-code bugs. If I understand it correctly,
        the only reason for not returning the actual value, but a
        ReadableState is to enable calling readLater() to batch
        multiple gets to single call. That makes sense and is sort of
        consistent with the other usages of state in general. But
        makes it impossible to return the actual value V instead of
        ReadableState<V>.

        Maybe we could deprecate 'computeIfAbsent()' and introduce
        'compute()' with the JDK-compliant semantics? It is not the
        same, though, but should be more practical than the current
        version of 'computeIfAbsent', which enforces various hacks to
        actually retrieve the computed (that is the current!) value.

        WDYT?

         Jan

        [1]
        
https://beam.apache.org/releases/javadoc/2.36.0/org/apache/beam/sdk/state/MapState.html#computeIfAbsent-K-java.util.function.Function-

        On 2/16/22 02:10, Reuven Lax wrote:
        Sounds like a bug, probably related to the bug mentioned in
        that thread.

        On Fri, Feb 11, 2022 at 8:58 AM <[email protected]> wrote:

            Yes, exactly, it returns null. The Java contract is to
            return the existing value or the new (computed value)
            instead.

            Dne 11. 2. 2022 16:24 napsal uživatel Reuven Lax
            <[email protected]>:

                I'm not sure I understand what you mean. Do you mean
                it always returns null? Since computeIfAbsent doesn't
                modify the value unless it is not present, I'm not
                sure what the "old value" means, unless you mean that
                it returns null.

                On Fri, Feb 11, 2022 at 4:54 AM Jan Lukavský
                <[email protected]> wrote:

                    Hi,

                    I just found another weirdness in the MapState
                    API - method
                    computeIfAbsent returns the _old_ value
                    associated with the key. That is
                    inconsistent with Java API and frankly with what
                    users should expect
                    when using this method. The semantics should be
                    "retrieve current value,
                    if not present associate the key with this one
                    and return me the current
                    value". Is this behavior intentional? What is the
                    motivation? In
                    combination with the previous thread [1] that
                    states, that the value is
                    not updated until read() of the state returned
                    from computeIfAbsent is
                    called this seems to force users to either fetch
                    the state twice or do
                    various workarounds.

                      Jan

                    [1]
                    
https://lists.apache.org/thread/mp979twlgm76gjg014f5r0k2p0nlcj2q

Reply via email to