I think that - besides the fact that Hyrum's law holds - this is about
the _severity_ of the impact of a change. Does it cause Pipeline to
fail? Or does it cause incorrect outputs that might be realized after
substantial time without the ability to repair it anymore?
This is the second case, I'm afraid. Therefore I'd be +1 to replace the
method with some transitioning period, but to be able to return to the
original method with fixed semantics.
Jan
On 3/23/22 17:54, Kenneth Knowles wrote:
In my view this method is like "add" or "subtract". Even if you
accurately document the wrong behavior, you are still wrong and you
should still fix it.
Also the converse of Hyrum's law applies, always. (Hyrum's law:
eventually some user depends on every behavior, even if it is private,
so every change is breaking. Converse: eventually every change is
breaking so every bugfix is a judgment call about the pro/con, even if
it is public)
A new method that has better behavior is an OK workaround, and
deprecate the old. But it is unnecessary IMO.
We focus too much on "breaking" changes that are easy to understand
versus those that are important (dependency issues, performance
problems, state changes).
Kenn
On Mon, Feb 28, 2022 at 2:34 AM Jan Lukavský <[email protected]> wrote:
Agree that this might be an issue, though somewhat limited,
because I (strong emphasize) *suppose* that users skipping
multiple releases do care to test the upgrade appropriately. We
can wait for more releases between deprecating and removing the
API. But I don't think adding a different version of the API is
any better, we should have a way to fix design bugs without
introducing additional APIs.
Jan
On 2/26/22 01:34, Luke Cwik wrote:
I have seen enough customers skip 10+ minor versions when
updating to pick up a bug fix.
On Fri, Feb 25, 2022 at 3:00 PM Brian Hulette
<[email protected]> wrote:
Is there a number of versions to wait that would make this
acceptable?
I'm curious if there's general guidance here.
On Fri, Feb 25, 2022 at 1:30 PM Luke Cwik <[email protected]>
wrote:
There are enough people that skip many versions so they
would get the changed semantics without realizing.
On Mon, Feb 21, 2022 at 12:49 AM Jan Lukavský
<[email protected]> wrote:
I think we cannot change (documented) semantics
without any notice. That could cause issues for
upgraded pipelines (without proper testing, but can
we rely on that?). I'd therefore suggest to:
a) deprecate computeIfAbsent()
b) create compute() as a new method and temporary
replacement for computeIfAbsent()
c) after several (3, 4?) releases remove
computeIfAbsent()
d) after next several releases return it back with
changed semantics
This way the unexpected change would affect only
users skipping many releases.
Jan
On 2/17/22 19:08, Kenneth Knowles wrote:
I think this is a design & documentation bug. The
behavior does not make sense. Deviating from
Map#computeIfAbsent cannot really be fixed by
documentation in bold, even though that is
definitely better than nothing. Fixing this should
not be considered a breaking change. It should be
identical to Map#computeIfAbsent but deferred, hence
a ReadableState.
Kenn
On Wed, Feb 16, 2022 at 2:12 PM Luke Cwik
<[email protected]> wrote:
The least we could do is add documentation to
the current MapState.computeIfAbsent API in bold
that this differs from Java's
Map#computeIfAbsent since it returns the prior
value and not the new value.
You can add a computeIfAbsent2(...) and
deprecate the old one specifying the difference
in the API.
Adding compute() would also be nice but would
add complexity for users who only want the
computeIfAbsent2() implementation.
On Wed, Feb 16, 2022 at 1:01 AM Jan Lukavský
<[email protected]> wrote:
That behavior is documented, so if that is a
bug, I suppose it is a design bug [1]. I'm
not sure how to fix that, because we cannot
simply change the semantics without changing
API, that could result in user-code bugs. If
I understand it correctly, the only reason
for not returning the actual value, but a
ReadableState is to enable calling
readLater() to batch multiple gets to single
call. That makes sense and is sort of
consistent with the other usages of state in
general. But makes it impossible to return
the actual value V instead of ReadableState<V>.
Maybe we could deprecate 'computeIfAbsent()'
and introduce 'compute()' with the
JDK-compliant semantics? It is not the same,
though, but should be more practical than
the current version of 'computeIfAbsent',
which enforces various hacks to actually
retrieve the computed (that is the current!)
value.
WDYT?
Jan
[1]
https://beam.apache.org/releases/javadoc/2.36.0/org/apache/beam/sdk/state/MapState.html#computeIfAbsent-K-java.util.function.Function-
On 2/16/22 02:10, Reuven Lax wrote:
Sounds like a bug, probably related to the
bug mentioned in that thread.
On Fri, Feb 11, 2022 at 8:58 AM
<[email protected]> wrote:
Yes, exactly, it returns null. The Java
contract is to return the existing
value or the new (computed value) instead.
Dne 11. 2. 2022 16:24 napsal uživatel
Reuven Lax <[email protected]>:
I'm not sure I understand what you
mean. Do you mean it always returns
null? Since computeIfAbsent doesn't
modify the value unless it is not
present, I'm not sure what the "old
value" means, unless you mean that
it returns null.
On Fri, Feb 11, 2022 at 4:54 AM Jan
Lukavský <[email protected]> wrote:
Hi,
I just found another weirdness
in the MapState API - method
computeIfAbsent returns the
_old_ value associated with the
key. That is
inconsistent with Java API and
frankly with what users should
expect
when using this method. The
semantics should be "retrieve
current value,
if not present associate the
key with this one and return me
the current
value". Is this behavior
intentional? What is the
motivation? In
combination with the previous
thread [1] that states, that
the value is
not updated until read() of the
state returned from
computeIfAbsent is
called this seems to force
users to either fetch the state
twice or do
various workarounds.
Jan
[1]
https://lists.apache.org/thread/mp979twlgm76gjg014f5r0k2p0nlcj2q