Great, I was confused in the description that was provided and then the follow up by Ben. I think its worthwhile to describe the differences with actual examples of what happens.
On Fri, May 25, 2018 at 10:54 AM Kenneth Knowles <[email protected]> wrote: > I think the return value of read() should always be an immutable value. > > Kenn > > On Fri, May 25, 2018 at 10:44 AM Lukasz Cwik <[email protected]> wrote: > >> Kenn, in the second example where we are creating views whenever read() >> is called, is it that the view's underlying data is immutable. For example: >> Iterable<String> values = state.read(); >> state.append("newValue"); >> If I iterate over values, does values now contain "newValues"? >> >> >> On Thu, May 24, 2018 at 10:38 AM Kenneth Knowles <[email protected]> wrote: >> >>> I see what you mean but I don't agree that futures imply anything other >>> than "it is a value that you have to force", with deliberately many >>> possible implementations. When at the point in 1 and you've got >>> >>> interface ReadableState<T> { >>> T read() >>> } >>> >>> and you want to improve performance, both approaches "void readLater()" >>> and "StateFuture<T> read()" are natural evolutions. They both gain the same >>> 10x and they both support the "unchanging committed state plus buffered >>> mutations" implementation well. And snapshots are essentially free for that >>> implementation if the buffered mutations are stored in a decent data >>> structure. >>> >>> My recollection was that futures were seen as more cumbersome. They >>> affect the types even for simple uses. The only appealing future API was >>> Guava's, but we didn't want that on the API surface. And we did not intend >>> for these to be used in complex ways, so the usability & perf benefits of a >>> future-based API weren't really realized anyhow. >>> >>> The only reason I belabor this is that if we ever wanted to support more >>> complex use cases, such as concurrent use of state, then my preference >>> would flip. I wouldn't want to make XYZState a synchronized monitor. At >>> that point I'd favor using a snapshots-are-free concurrent data structure >>> under the hood of a future-based API. >>> >>> Since there is really only one implementation in mind for this, maybe >>> only one that works reasonably, we should just document it as such. The >>> docs on ReadableState do make it sound like writes will invalidate the >>> usefulness of readLater, even though that isn't the case for the intended >>> implementation strategy. >>> >>> Kenn >>> >>> On Thu, May 24, 2018 at 9:40 AM Ben Chambers <[email protected]> >>> wrote: >>> >>>> I think Kenn's second option accurately reflects my memory of the >>>> original intentions: >>>> >>>> 1. I remember we we considered either using the Future interface or >>>> calling the ReadableState interface a future, and explicitly said "no, >>>> future implies asynchrony and that the value returned by `get` won't change >>>> over multiple calls, but we want the latest value each time". So, I >>>> remember us explicitly considering and rejecting Future, thus the name >>>> "ReadableState". >>>> >>>> 2. The intuition behind the implementation was analogous to a >>>> mutable-reference cell in languages like ML / Scheme / etc. The >>>> ReadableState is just a pointer to the the reference cell. Calling read >>>> returns the value currently in the cell. If we have 100 ReadableStates >>>> pointing at the same cell, they all get the same value regardless of when >>>> they were created. This avoids needing to duplicate/snapshot values at any >>>> point in time. >>>> >>>> 3. ReadLater was added, as noted by Charles, to suggest prefetching the >>>> associated value. This was added after benchmarks showed 10x (if I remember >>>> correctly) performance improvements in things like GroupAlsoByWindows by >>>> minimizing round-trips asking for more state. The intuition being -- if we >>>> need to make an RPC to load one state value, we are better off making an >>>> RPC to load all the values we need. >>>> >>>> Overall, I too lean towards maintaining the second interpretation since >>>> it seems to be consistent and I believe we had additional reasons for >>>> preferring it over futures. >>>> >>>> Given the confusion, I think strengthening the class documentation >>>> makes sense -- I note the only hint of the current behavior is that >>>> ReadableState indicates it gets the *current* value (emphasis mine). We >>>> should emphasize that and perhaps even mention that the ReadableState >>>> should be understood as just a reference or handle to the underlying state, >>>> and thus its value will reflect the latest write. >>>> >>>> Charles, if it helps, the plan I remember regarding prefetching was >>>> something like: >>>> >>>> interface ReadableMapState<K, V> { >>>> ReadableState<V> get(K key); >>>> ReadableState<Iterable<V>> getIterable(); >>>> ReadableState<Map<K, V>> get(); >>>> // ... more things ... >>>> } >>>> >>>> Then prefetching a value is `mapState.get(key).readLater()` and >>>> prefetching the entire map is `mapState.get().readLater()`, etc. >>>> >>>> On Wed, May 23, 2018 at 7:13 PM Charles Chen <[email protected]> wrote: >>>> >>>>> Thanks Kenn. I think there are two issues to highlight: (1) the API >>>>> should allow for some sort of prefetching / batching / background I/O for >>>>> state; and (2) it should be clear what the semantics are for reading (e.g. >>>>> so we don't have confusing read after write behavior). >>>>> >>>>> The approach I'm leaning towards for (1) is to allow a >>>>> state.prefetch() method (to prefetch a value, iterable or [entire] map >>>>> state) and maybe something like state.prefetch_key(key) to prefetch a >>>>> specific KV in the map. Issue (2) seems to be okay in either of Kenn's >>>>> positions. >>>>> >>>>> On Wed, May 23, 2018 at 5:33 PM Robert Bradshaw <[email protected]> >>>>> wrote: >>>>> >>>>>> Thanks for laying this out so well, Kenn. I'm also leaning towards >>>>>> the second option, despite its drawbacks. (In particular, readLater >>>>>> should not influence what's returned at read(), it's just a hint.) >>>>>> >>>>>> On Wed, May 23, 2018 at 4:43 PM Kenneth Knowles <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Great idea to bring it to dev@. I think it is better to focus here >>>>>>> than long doc comment threads. >>>>>>> >>>>>>> I had strong opinions that I think were a bit confused and wrong. >>>>>>> Sorry for that. I stated this position: >>>>>>> >>>>>>> - XYZState class is a handle to a mutable location >>>>>>> - its methods like isEmpty() or contents() should return immutable >>>>>>> future values (implicitly means their contents are semantically frozen >>>>>>> when >>>>>>> they are created) >>>>>>> - the fact that you created the future is a hint that all necessary >>>>>>> fetching/computation should be kicked off >>>>>>> - later forced with get() >>>>>>> - when it was designed, pure async style was not a viable option >>>>>>> >>>>>>> I see now that the actual position of some of its original designers >>>>>>> is: >>>>>>> >>>>>>> - XYZState class is a view on a mutable location >>>>>>> - its methods return new views on that mutable location >>>>>>> - calling readLater() is a hint that some fetching/computation >>>>>>> should be kicked off >>>>>>> - later read() will combine whatever readLater() did with >>>>>>> additional local info to give the current value >>>>>>> - async style not applicable nor desirable as per Beam's focus on >>>>>>> naive straight-line coding + autoscaling >>>>>>> >>>>>>> These are both internally consistent I think. In fact, I like the >>>>>>> second perspective better than the one I have been promoting. There are >>>>>>> some weaknesses: readLater() is pretty tightly coupled to a particular >>>>>>> implementation style, and futures are decades old so you can get good >>>>>>> APIs >>>>>>> and performance without inventing anything. But I still like the >>>>>>> non-future >>>>>>> version a little better. >>>>>>> >>>>>>> Kenn >>>>>>> >>>>>>> On Wed, May 23, 2018 at 4:05 PM Charles Chen <[email protected]> wrote: >>>>>>> >>>>>>>> During the design of the Beam Python State API, we noticed some >>>>>>>> transactionality inconsistencies in the existing Beam Java State API >>>>>>>> (these >>>>>>>> are the unresolved bugs BEAM-2980 >>>>>>>> <https://issues.apache.org/jira/browse/BEAM-2980> and BEAM-2975 >>>>>>>> <https://issues.apache.org/jira/browse/BEAM-2975>). We are >>>>>>>> therefore having a discussion about this API which can have >>>>>>>> implications >>>>>>>> for its future development in all Beam languages: >>>>>>>> https://docs.google.com/document/d/1GadEkAmtbJQjmqiqfSzGw3b66TKerm8tyn6TK4blAys/edit#heading=h.ofyl9jspiz3b >>>>>>>> >>>>>>>> If you have an opinion on the possible design approaches, it would >>>>>>>> be very helpful to bring up in the doc or on this thread. Thanks! >>>>>>>> >>>>>>>> Best, >>>>>>>> Charles >>>>>>>> >>>>>>>
