Re: Existing transactionality inconsistency in the Beam Java State API

Charles Chen Wed, 23 May 2018 19:14:26 -0700

Thanks Kenn.  I think there are two issues to highlight: (1) the API should
allow for some sort of prefetching / batching / background I/O for state;
and (2) it should be clear what the semantics are for reading (e.g. so we
don't have confusing read after write behavior).


The approach I'm leaning towards for (1) is to allow a state.prefetch()
method (to prefetch a value, iterable or [entire] map state) and maybe
something like state.prefetch_key(key) to prefetch a specific KV in the
map.  Issue (2) seems to be okay in either of Kenn's positions.

On Wed, May 23, 2018 at 5:33 PM Robert Bradshaw <[email protected]> wrote:

> Thanks for laying this out so well, Kenn. I'm also leaning towards the
> second option, despite its drawbacks. (In particular, readLater should
> not influence what's returned at read(), it's just a hint.)
>
> On Wed, May 23, 2018 at 4:43 PM Kenneth Knowles <[email protected]> wrote:
>
>> Great idea to bring it to dev@. I think it is better to focus here than
>> long doc comment threads.
>>
>> I had strong opinions that I think were a bit confused and wrong. Sorry
>> for that. I stated this position:
>>
>>  - XYZState class is a handle to a mutable location
>>  - its methods like isEmpty() or contents() should return immutable
>> future values (implicitly means their contents are semantically frozen when
>> they are created)
>>  - the fact that you created the future is a hint that all necessary
>> fetching/computation should be kicked off
>>  - later forced with get()
>>  - when it was designed, pure async style was not a viable option
>>
>> I see now that the actual position of some of its original designers is:
>>
>>  - XYZState class is a view on a mutable location
>>  - its methods return new views on that mutable location
>>  - calling readLater() is a hint that some fetching/computation should be
>> kicked off
>>  - later read() will combine whatever readLater() did with additional
>> local info to give the current value
>>  - async style not applicable nor desirable as per Beam's focus on naive
>> straight-line coding + autoscaling
>>
>> These are both internally consistent I think. In fact, I like the second
>> perspective better than the one I have been promoting. There are some
>> weaknesses: readLater() is pretty tightly coupled to a particular
>> implementation style, and futures are decades old so you can get good APIs
>> and performance without inventing anything. But I still like the non-future
>> version a little better.
>>
>> Kenn
>>
>> On Wed, May 23, 2018 at 4:05 PM Charles Chen <[email protected]> wrote:
>>
>>> During the design of the Beam Python State API, we noticed some
>>> transactionality inconsistencies in the existing Beam Java State API (these
>>> are the unresolved bugs BEAM-2980
>>> <https://issues.apache.org/jira/browse/BEAM-2980> and BEAM-2975
>>> <https://issues.apache.org/jira/browse/BEAM-2975>).  We are therefore
>>> having a discussion about this API which can have implications for its
>>> future development in all Beam languages:
>>> https://docs.google.com/document/d/1GadEkAmtbJQjmqiqfSzGw3b66TKerm8tyn6TK4blAys/edit#heading=h.ofyl9jspiz3b
>>>
>>> If you have an opinion on the possible design approaches, it would be
>>> very helpful to bring up in the doc or on this thread.  Thanks!
>>>
>>> Best,
>>> Charles
>>>
>>

Re: Existing transactionality inconsistency in the Beam Java State API

Reply via email to