Brian,

I just want to follow up. Let me know if you are working on this.
Otherwise, I can implement ReadModifyWriteState.

On Mon, May 6, 2019 at 4:52 PM Reza Rokni <[email protected]> wrote:

> When used as metadata I think the ReadModifyWrite naming is very accurate
> for the majority of cases.
>
> The only case that does not follow that pattern is if its being used as a
> Boolean to indicate that something should be done in the OnTimer call based
> on an event that has been seen in the OnProcess code. But even then calling
> it ReadModifyWrite would still be ok and easily understood.
>
> *From: *Kenneth Knowles <[email protected]>
> *Date: *Fri, 3 May 2019 at 10:58
> *To: *dev
>
> Agree with all of your points about the drawbacks of ValueState. It is
>> definitely a pro/con weighing sort of situation. Considering the number of
>> users who are new to the orthogonality of event time and processing time,
>> ValueState could certainly lead to confusion about why things are not in
>> any particular order. Perhaps a middle ground is to have a bad of "written
>> values" with sequence numbers, and document some patterns/examples of how a
>> user may care to resolve these sequences numbers on read, or perhaps some
>> combiners like you describe. Overall, I'd defer to Reza and Reuven as they
>> seem very familiar with real-world use cases here.
>>
>> Kenn
>>
>> On Thu, May 2, 2019 at 2:53 AM Robert Bradshaw <[email protected]>
>> wrote:
>>
>>> On Wed, May 1, 2019 at 8:09 PM Kenneth Knowles <[email protected]> wrote:
>>> >
>>> > On Wed, May 1, 2019 at 8:51 AM Reuven Lax <[email protected]> wrote:
>>> >>
>>> >> ValueState is not  necessarily racy if you're doing a
>>> read-modify-write. It's only racy if you're doing something like writing
>>> last element seen.
>>> >
>>> > Race conditions are not inherently a problem. They are neither
>>> necessary nor sufficient for correctness. In this case, it is not the
>>> classic sense of race condition anyhow, it is simply a nondeterministic
>>> result, which may often be perfectly fine.
>>>
>>> One can write correct code with ValueState, but it's harder to do.
>>> This is exacerbated by the fact that at first glance it looks easier
>>> to use.
>>>
>>> >>>>> On Wed, May 1, 2019 at 8:30 AM Lukasz Cwik <[email protected]>
>>> wrote:
>>> >>>>>>
>>> >>>>>> Isn't a value state just a bag state with at most one element and
>>> the usage pattern would be?
>>> >>>>>> 1) value_state.get == bag_state.read.next() (both have to handle
>>> the case when neither have been set)
>>> >>>>>> 2) user logic on what to do with current state + additional
>>> information to produce new state
>>> >>>>>> 3) value_state.set == bag_state.clear + bag_state.append? (note
>>> that Runners should optimize clear + append to become a single
>>> transaction/write)
>>> >
>>> > Your unpacking is accurate, but "X is just a Y" is not accurate. In
>>> this case you've demonstrated that value state *can be implemented using*
>>> bag state / has a workaround. But it is not subsumed by bag state. One
>>> important feature of ValueState is that it is statically determined that
>>> the transform cannot be used with merging windows.
>>>
>>> The flip side is that it makes it easy to write code that cannot be
>>> used with merging windows. Which hurts composition (especially if
>>> these operations are used as part of larger composite operations).
>>>
>>> > Another feature is that it is impossible to accidentally write more
>>> than one value. And a third important feature is that it declares what it
>>> is so that the code is more readable.
>>>
>>> +1. Which is why I think CombiningState is a better substitute than
>>> BagState where it makes sense (and often does, and often can even be
>>> an improvement over ValueState for performance and readability).
>>>
>>> Perhaps instead we could call it ReadModifyWrite state. It could make
>>> sense, as well as a read() and write() operation, that we even offer a
>>> modify(I, (I, S) -> S) operation.
>>>
>>> (Also, yes, when I said Latest I too meant a hypothetical "throw away
>>> everything else when a new element is written" one, not the specific
>>> one in the code. Sorry for the confusion.)
>>>
>>> >>>>>> For example, the blog post with the counter example would be:
>>> >>>>>>   @StateId("buffer")
>>> >>>>>>   private final StateSpec<BagState<Event>> bufferedEvents =
>>> StateSpecs.bag();
>>> >>>>>>
>>> >>>>>>   @StateId("count")
>>> >>>>>>   private final StateSpec<BagState<Integer>> countState =
>>> StateSpecs.bag();
>>> >>>>>>
>>> >>>>>>   @ProcessElement
>>> >>>>>>   public void process(
>>> >>>>>>       ProcessContext context,
>>> >>>>>>       @StateId("buffer") BagState<Event> bufferState,
>>> >>>>>>       @StateId("count") BagState<Integer> countState) {
>>> >>>>>>
>>> >>>>>>     int count = Iterables.getFirst(countState.read(), 0);
>>> >>>>>>     count = count + 1;
>>> >>>>>>     countState.clear();
>>> >>>>>>     countState.append(count);
>>> >>>>>>     bufferState.add(context.element());
>>> >>>>>>
>>> >>>>>>     if (count > MAX_BUFFER_SIZE) {
>>> >>>>>>       for (EnrichedEvent enrichedEvent :
>>> enrichEvents(bufferState.read())) {
>>> >>>>>>         context.output(enrichedEvent);
>>> >>>>>>       }
>>> >>>>>>       bufferState.clear();
>>> >>>>>>       countState.clear();
>>> >>>>>>     }
>>> >>>>>>   }
>>> >>>>>>
>>> >>>>>> On Tue, Apr 30, 2019 at 5:39 PM Kenneth Knowles <[email protected]>
>>> wrote:
>>> >>>>>>>
>>> >>>>>>> Anything where the state evolves serially but arbitrarily - the
>>> toy example is the integer counter in my blog post - needs ValueState. You
>>> can't do it with AnyCombineFn. And I think LatestCombineFn is dangerous,
>>> especially when it comes to CombingState. ValueState is more explicit, and
>>> I still maintain that it is status quo, modulo unimplemented features in
>>> the Python SDK. The reads and writes are explicitly serial per (key,
>>> window), unlike for a CombiningState. Because of a CombineFn's requirement
>>> to be associative and commutative, I would interpret it that the runner is
>>> allowed to reorder inputs even after they are "written" by the user's DoFn
>>> - for example doing blind writes to an unordered bag, and only later
>>> reading the elements out and combining them in arbitrary order. Or any
>>> other strategy mixing addInput and mergeAccumulators. It would be a
>>> violation of the meaning of CombineFn to overspecify CombiningState further
>>> than this. So you cannot actually implement a consistent
>>> CombiningState(LatestCombineFn).
>>> >>>>>>>
>>> >>>>>>> Kenn
>>> >>>>>>>
>>> >>>>>>> On Tue, Apr 30, 2019 at 5:19 PM Robert Bradshaw <
>>> [email protected]> wrote:
>>> >>>>>>>>
>>> >>>>>>>> On Wed, May 1, 2019 at 1:55 AM Brian Hulette <
>>> [email protected]> wrote:
>>> >>>>>>>> >
>>> >>>>>>>> > Reza - you're definitely not derailing, that's exactly what I
>>> was looking for!
>>> >>>>>>>> >
>>> >>>>>>>> > I've actually recently encountered an additional use case
>>> where I'd like to use ValueState in the Python SDK. I'm experimenting with
>>> an ArrowBatchingDoFn that uses state and timers to batch up python
>>> dictionaries into arrow record batches (actually my entire purpose for
>>> jumping down this python state rabbit hole).
>>> >>>>>>>> >
>>> >>>>>>>> > At first blush it seems like the best way to do this would be
>>> to just replicate the batching approach in the timely processing post [1],
>>> but when the bag is full combine the elements into an arrow record batch,
>>> rather than enriching all of the elements and writing them out separately.
>>> However, if possible I'd like to pre-allocate buffers for each column and
>>> populate them as elements arrive (at least for columns with a fixed size
>>> type), so a bag state wouldn't be ideal.
>>> >>>>>>>>
>>> >>>>>>>> It seems it'd be preferable to do the conversion from a bag of
>>> >>>>>>>> elements to a single arrow frame all at once, when emitting,
>>> rather
>>> >>>>>>>> than repeatedly reading and writing the partial batch to and
>>> from
>>> >>>>>>>> state with every element that comes in. (Bag state has blind
>>> append.)
>>> >>>>>>>>
>>> >>>>>>>> > Also, a CombiningValueState is not ideal because I'd need to
>>> implement a merge_accumulators function that combines several in-progress
>>> batches. I could certainly implement that, but I'd prefer that it never be
>>> called unless absolutely necessary, which doesn't seem to be the case for
>>> CombiningValueState. (As an aside, maybe there's some room there for a
>>> middle ground between ValueState and CombiningValueState
>>> >>>>>>>>
>>> >>>>>>>> This does actually feel natural (to me), because you're
>>> repeatedly
>>> >>>>>>>> adding elements to build something up. merge_accumulators would
>>> >>>>>>>> probably be pretty easy (concatenation) but unless your windows
>>> are
>>> >>>>>>>> merging could just throw a not implemented error to really guard
>>> >>>>>>>> against it being used.
>>> >>>>>>>>
>>> >>>>>>>> > I suppose you could argue that this is a pretty low-level
>>> optimization we should be able to shield our users from, but right now I
>>> just wish I had ValueState in python so I didn't have to hack it up with a
>>> BagState :)
>>> >>>>>>>> >
>>> >>>>>>>> > Anyway, in light of this and all the other use-cases
>>> mentioned here, I think the resolution is to just implement ValueState in
>>> python, and document the danger with ValueState in both Python and Java.
>>> Just to be clear, the danger I'm referring to is that users might easily
>>> forget that data can be out of order, and use ValueState in a way that
>>> assumes it's been populated with data from the most recent element in event
>>> time, then in practice out of order data clobbers their state. I'm happy to
>>> write up a PR for this - are there any objections to that?
>>> >>>>>>>>
>>> >>>>>>>> I still haven't seen a good case for it (though I haven't
>>> looked at
>>> >>>>>>>> Reza's BiTemporalStream yet). Much harder to remove things once
>>> >>>>>>>> they're in. Can we just add a Any and/or LatestCombineFn and
>>> use (and
>>> >>>>>>>> point to) that instead? With the comment that if you're doing
>>> >>>>>>>> read-modify-write, an add_input may be better.
>>> >>>>>>>>
>>> >>>>>>>> > [1]
>>> https://beam.apache.org/blog/2017/08/28/timely-processing.html
>>> >>>>>>>> >
>>> >>>>>>>> > On Mon, Apr 29, 2019 at 12:23 AM Robert Bradshaw <
>>> [email protected]> wrote:
>>> >>>>>>>> >>
>>> >>>>>>>> >> On Mon, Apr 29, 2019 at 3:43 AM Reza Rokni <[email protected]>
>>> wrote:
>>> >>>>>>>> >> >
>>> >>>>>>>> >> > @Robert Bradshaw Some examples, mostly built out from
>>> cases around Timeseries data, don't want to derail this thread so at a hi
>>> level  :
>>> >>>>>>>> >>
>>> >>>>>>>> >> Thanks. Perfectly on-topic for the thread.
>>> >>>>>>>> >>
>>> >>>>>>>> >> > Looping timers, a timer which allows for creation of a
>>> value within a window when no external input has been seen. Requires
>>> metadata like "is timer set".
>>> >>>>>>>> >> >
>>> >>>>>>>> >> > BiTemporalStream join, where we need to match
>>> leftCol.timestamp to a value ==  (max(rightCol.timestamp) where
>>> rightCol.timestamp <= leftCol.timestamp)) , this if for a application
>>> matching trades to quotes.
>>> >>>>>>>> >>
>>> >>>>>>>> >> I'd be interested in seeing the code here. The fact that you
>>> have a
>>> >>>>>>>> >> max here makes me wonder if combining would be applicable.
>>> >>>>>>>> >>
>>> >>>>>>>> >> (FWIW, I've long thought it would be useful to do this kind
>>> of thing
>>> >>>>>>>> >> with Windows. Basically, it'd be like session windows with
>>> one side
>>> >>>>>>>> >> being the window from the timestamp forward into the future,
>>> and the
>>> >>>>>>>> >> other side being from the timestamp back a certain amount in
>>> the past.
>>> >>>>>>>> >> This seems a common join pattern.)
>>> >>>>>>>> >>
>>> >>>>>>>> >> > Metadata is used for
>>> >>>>>>>> >> >
>>> >>>>>>>> >> > Taking the Key from the KV  for use within the OnTimer
>>> call.
>>> >>>>>>>> >> > Knowing where we are in watermarks for GC of objects in
>>> state.
>>> >>>>>>>> >> > More timer metadata (min timer ..)
>>> >>>>>>>> >> >
>>> >>>>>>>> >> > It could be argued that what we are using state for mostly
>>> workarounds for things that could eventually end up in the API itself. For
>>> example
>>> >>>>>>>> >> >
>>> >>>>>>>> >> > There is a Jira for OnTimer Context to have Key.
>>> >>>>>>>> >> >  The GC needs are mostly due to not having a Map State
>>> object in all runners yet.
>>> >>>>>>>> >>
>>> >>>>>>>> >> Yeah. GC could probably be done with a max combine. The Key
>>> (which
>>> >>>>>>>> >> should be in the API) could be an AnyCombine for now (safe to
>>> >>>>>>>> >> overwrite because it's always the same).
>>> >>>>>>>> >>
>>> >>>>>>>> >> > However I think as folks explore Beam there will always be
>>> little things that require Metadata and so having access to something which
>>> gives us fine grain control ( as Kenneth mentioned) is useful.
>>> >>>>>>>> >>
>>> >>>>>>>> >> Likely. I guess in line with making easy things easy, I'd
>>> like to make
>>> >>>>>>>> >> dangerous things hard(er). As Kenn says, we'll probably need
>>> some kind
>>> >>>>>>>> >> of lower-level thing, especially if we introduce OnMerge.
>>> >>>>>>>> >>
>>> >>>>>>>> >> > Cheers
>>> >>>>>>>> >> >
>>> >>>>>>>> >> > Reza
>>> >>>>>>>> >> >
>>> >>>>>>>> >> > On Sat, 27 Apr 2019 at 02:59, Kenneth Knowles <
>>> [email protected]> wrote:
>>> >>>>>>>> >> >>
>>> >>>>>>>> >> >> To be clear, the intent was always that ValueState would
>>> be not usable in merging pipelines. So no danger of clobbering, but also
>>> limited functionality. Is there a runner than accepts it and clobbers? The
>>> whole idea of the new DoFn is that it is easy to do the construction-time
>>> analysis and reject the invalid pipeline. It is actually runner independent
>>> and I think already implemented in ParDo's validation, no?
>>> >>>>>>>> >> >>
>>> >>>>>>>> >> >> Kenn
>>> >>>>>>>> >> >>
>>> >>>>>>>> >> >> On Fri, Apr 26, 2019 at 10:14 AM Lukasz Cwik <
>>> [email protected]> wrote:
>>> >>>>>>>> >> >>>
>>> >>>>>>>> >> >>> I am in the camp where we should only support merging
>>> state (either naturally via things like bags or via combiners). I believe
>>> that having the wrapper that Brian suggests is useful for users. As for the
>>> @OnMerge method, I believe combiners should have the ability to look at the
>>> window information and we should treat @OnMerge as syntactic sugar over a
>>> combiner if the combiner API is too cumbersome.
>>> >>>>>>>> >> >>>
>>> >>>>>>>> >> >>> I believe using combiners can also extend to side inputs
>>> and help us deal with singleton and map like side inputs when multiple
>>> firings occur. I also like treating everything like a combiner because it
>>> will give us a lot reuse of combiner implementations across all the places
>>> they could be used and will be especially useful when we start exposing
>>> APIs related to retractions on combiners.
>>> >>>>>>>> >> >>>
>>> >>>>>>>> >> >>> On Fri, Apr 26, 2019 at 9:43 AM Brian Hulette <
>>> [email protected]> wrote:
>>> >>>>>>>> >> >>>>
>>> >>>>>>>> >> >>>> Yeah the danger with out of order processing concerns
>>> me more than the merging as well. As a new Beam user, I immediately
>>> gravitated towards ValueState since it was easy to think about and I just
>>> assumed there wasn't anything to be concerned about. So it was shocking to
>>> learn that there is this dangerous edge-case.
>>> >>>>>>>> >> >>>>
>>> >>>>>>>> >> >>>> What if ValueState were just implemented as a wrapper
>>> of CombiningState with a LatestCombineFn and documented as such (and
>>> perhaps we encourage users to consider using a CombiningState explicitly if
>>> at all possible)?
>>> >>>>>>>> >> >>>>
>>> >>>>>>>> >> >>>> Brian
>>> >>>>>>>> >> >>>>
>>> >>>>>>>> >> >>>>
>>> >>>>>>>> >> >>>>
>>> >>>>>>>> >> >>>> On Fri, Apr 26, 2019 at 2:29 AM Robert Bradshaw <
>>> [email protected]> wrote:
>>> >>>>>>>> >> >>>>>
>>> >>>>>>>> >> >>>>> On Fri, Apr 26, 2019 at 6:40 AM Kenneth Knowles <
>>> [email protected]> wrote:
>>> >>>>>>>> >> >>>>> >
>>> >>>>>>>> >> >>>>> > You could use a CombiningState with a CombineFn that
>>> returns the minimum for this case.
>>> >>>>>>>> >> >>>>>
>>> >>>>>>>> >> >>>>> We've also wanted to be able to set data when setting
>>> a timer that
>>> >>>>>>>> >> >>>>> would be returned when the timer fires. (It's in the
>>> FnAPI, but not
>>> >>>>>>>> >> >>>>> the SDKs yet.)
>>> >>>>>>>> >> >>>>>
>>> >>>>>>>> >> >>>>> The metadata is an interesting usecase, do you have
>>> some more specific
>>> >>>>>>>> >> >>>>> examples? Might boil down to not having a rich enough
>>> (single) state
>>> >>>>>>>> >> >>>>> type.
>>> >>>>>>>> >> >>>>>
>>> >>>>>>>> >> >>>>> > But I've come to feel there is a mismatch. On the
>>> one hand, ParDo(<stateful DoFn>) is a way to drop to a lower level and
>>> write logic that does not fit a more general computational pattern, really
>>> taking fine control. On the other hand, automatically merging state via
>>> CombiningState or BagState is more of a no-knobs higher level of
>>> programming. To me there seems to be a bit of a philosophical conflict.
>>> >>>>>>>> >> >>>>> >
>>> >>>>>>>> >> >>>>> > These days, I feel like an @OnMerge method would be
>>> more natural. If you are using state and timers, you probably often want
>>> more direct control over how state from windows gets merged. An of course
>>> we don't even have a design for timers - you would need some kind of
>>> timestamp CombineFn but I think setting/unsetting timers manually makes
>>> more sense. Especially considering the trickiness around merging windows in
>>> the absence of retractions, you really need this callback, so you can issue
>>> retractions manually for any output your stateful DoFn emitted in windows
>>> that no longer exist.
>>> >>>>>>>> >> >>>>>
>>> >>>>>>>> >> >>>>> I agree we'll probably need an @OnMerge. On the other
>>> hand, I like
>>> >>>>>>>> >> >>>>> being able to have good defaults. The high/low level
>>> thing is a
>>> >>>>>>>> >> >>>>> continuum (the indexing example falling towards the
>>> high end).
>>> >>>>>>>> >> >>>>>
>>> >>>>>>>> >> >>>>> Actually, the merging questions bother me less than
>>> how easy it is to
>>> >>>>>>>> >> >>>>> accidentally clobber previous values. It looks so easy
>>> (like the
>>> >>>>>>>> >> >>>>> easiest state to use) but is actually the most
>>> dangerous. If one wants
>>> >>>>>>>> >> >>>>> this behavior, I would rather an explicit AnyCombineFn
>>> or
>>> >>>>>>>> >> >>>>> LatestCombineFn which makes you think about the
>>> semantics.
>>> >>>>>>>> >> >>>>>
>>> >>>>>>>> >> >>>>> - Robert
>>> >>>>>>>> >> >>>>>
>>> >>>>>>>> >> >>>>> > On Thu, Apr 25, 2019 at 5:49 PM Reza Rokni <
>>> [email protected]> wrote:
>>> >>>>>>>> >> >>>>> >>
>>> >>>>>>>> >> >>>>> >> +1 on the metadata use case.
>>> >>>>>>>> >> >>>>> >>
>>> >>>>>>>> >> >>>>> >> For performance reasons the Timer API does not
>>> support a read() operation, which for the  vast majority of use cases is
>>> not a required feature. In the small set of use cases where it is needed,
>>> for example when you need to set a Timer in EventTime based on the smallest
>>> timestamp seen in the elements within a DoFn, we can make use of a
>>> ValueState object to keep track of the value.
>>> >>>>>>>> >> >>>>> >>
>>> >>>>>>>> >> >>>>> >> On Fri, 26 Apr 2019 at 00:38, Reuven Lax <
>>> [email protected]> wrote:
>>> >>>>>>>> >> >>>>> >>>
>>> >>>>>>>> >> >>>>> >>> I see examples of people using ValueState that I
>>> think are not captured CombiningState. For example, one common one is users
>>> who set a timer and then record the timestamp of that timer in a
>>> ValueState. In general when you store state that is metadata about other
>>> state you store, then ValueState will usually make more sense than
>>> CombiningState.
>>> >>>>>>>> >> >>>>> >>>
>>> >>>>>>>> >> >>>>> >>> On Thu, Apr 25, 2019 at 9:32 AM Brian Hulette <
>>> [email protected]> wrote:
>>> >>>>>>>> >> >>>>> >>>>
>>> >>>>>>>> >> >>>>> >>>> Currently the Python SDK does not make ValueState
>>> available to users. My initial inclination was to go ahead and implement it
>>> there to be consistent with Java, but Robert brings up a great point here
>>> that ValueState has an inherent race condition for out of order data, and a
>>> lot of it's use cases can actually be implemented with a CombiningState
>>> instead.
>>> >>>>>>>> >> >>>>> >>>>
>>> >>>>>>>> >> >>>>> >>>> It seems to me that at the very least we should
>>> discourage the use of ValueState by noting the danger in the documentation
>>> and preferring CombiningState in examples, and perhaps we should go further
>>> and deprecate it in Java and not implement it in python. Either way I think
>>> we should be consistent between Java and Python.
>>> >>>>>>>> >> >>>>> >>>>
>>> >>>>>>>> >> >>>>> >>>> I'm curious what people think about this, are
>>> there use cases that we really need to keep ValueState around for?
>>> >>>>>>>> >> >>>>> >>>>
>>> >>>>>>>> >> >>>>> >>>> Brian
>>> >>>>>>>> >> >>>>> >>>>
>>> >>>>>>>> >> >>>>> >>>> ---------- Forwarded message ---------
>>> >>>>>>>> >> >>>>> >>>> From: Robert Bradshaw <[email protected]>
>>> >>>>>>>> >> >>>>> >>>> Date: Thu, Apr 25, 2019, 08:31
>>> >>>>>>>> >> >>>>> >>>> Subject: Re: [docs] Python State & Timers
>>> >>>>>>>> >> >>>>> >>>> To: dev <[email protected]>
>>> >>>>>>>> >> >>>>> >>>>
>>> >>>>>>>> >> >>>>> >>>>
>>> >>>>>>>> >> >>>>> >>>>
>>> >>>>>>>> >> >>>>> >>>>
>>> >>>>>>>> >> >>>>> >>>> On Thu, Apr 25, 2019, 5:26 PM Maximilian Michels <
>>> [email protected]> wrote:
>>> >>>>>>>> >> >>>>> >>>>>
>>> >>>>>>>> >> >>>>> >>>>> Completely agree that CombiningState is nicer in
>>> this example. Users may
>>> >>>>>>>> >> >>>>> >>>>> still want to use ValueState when there is
>>> nothing to combine.
>>> >>>>>>>> >> >>>>> >>>>
>>> >>>>>>>> >> >>>>> >>>>
>>> >>>>>>>> >> >>>>> >>>> I've always had trouble coming up with any good
>>> examples of this.
>>> >>>>>>>> >> >>>>> >>>>
>>> >>>>>>>> >> >>>>> >>>>> Also,
>>> >>>>>>>> >> >>>>> >>>>> users already know ValueState from the Java SDK.
>>> >>>>>>>> >> >>>>> >>>>
>>> >>>>>>>> >> >>>>> >>>>
>>> >>>>>>>> >> >>>>> >>>> Maybe we should deprecate that :)
>>> >>>>>>>> >> >>>>> >>>>
>>> >>>>>>>> >> >>>>> >>>>
>>> >>>>>>>> >> >>>>> >>>>> On 25.04.19 17:12, Robert Bradshaw wrote:
>>> >>>>>>>> >> >>>>> >>>>> > On Thu, Apr 25, 2019 at 4:58 PM Maximilian
>>> Michels <[email protected]> wrote:
>>> >>>>>>>> >> >>>>> >>>>> >>
>>> >>>>>>>> >> >>>>> >>>>> >> I forgot to give an example, just to clarify
>>> for others:
>>> >>>>>>>> >> >>>>> >>>>> >>
>>> >>>>>>>> >> >>>>> >>>>> >>> What was the specific example that was less
>>> natural?
>>> >>>>>>>> >> >>>>> >>>>> >>
>>> >>>>>>>> >> >>>>> >>>>> >> Basically every time we use ListState to
>>> express ValueState, e.g.
>>> >>>>>>>> >> >>>>> >>>>> >>
>>> >>>>>>>> >> >>>>> >>>>> >>     next_index, = list(state.read()) or [0]
>>> >>>>>>>> >> >>>>> >>>>> >>
>>> >>>>>>>> >> >>>>> >>>>> >> Taken from:
>>> >>>>>>>> >> >>>>> >>>>> >>
>>> https://github.com/apache/beam/pull/8363/files#diff-ba1a2aed98079ccce869cd660ca9d97dR301
>>> >>>>>>>> >> >>>>> >>>>> >
>>> >>>>>>>> >> >>>>> >>>>> > Yes, ListState is much less natural here. I
>>> think generally
>>> >>>>>>>> >> >>>>> >>>>> > CombiningValue is often a better replacement.
>>> E.g. the Java example
>>> >>>>>>>> >> >>>>> >>>>> > reads
>>> >>>>>>>> >> >>>>> >>>>> >
>>> >>>>>>>> >> >>>>> >>>>> >
>>> >>>>>>>> >> >>>>> >>>>> > public void processElement(
>>> >>>>>>>> >> >>>>> >>>>> >        ProcessContext context,
>>> @StateId("index") ValueState<Integer> index) {
>>> >>>>>>>> >> >>>>> >>>>> >      int current = firstNonNull(index.read(),
>>> 0);
>>> >>>>>>>> >> >>>>> >>>>> >      context.output(KV.of(current,
>>> context.element()));
>>> >>>>>>>> >> >>>>> >>>>> >      index.write(current+1);
>>> >>>>>>>> >> >>>>> >>>>> > }
>>> >>>>>>>> >> >>>>> >>>>> >
>>> >>>>>>>> >> >>>>> >>>>> >
>>> >>>>>>>> >> >>>>> >>>>> > which is replaced with bag state
>>> >>>>>>>> >> >>>>> >>>>> >
>>> >>>>>>>> >> >>>>> >>>>> >
>>> >>>>>>>> >> >>>>> >>>>> > def process(self, element,
>>> state=DoFn.StateParam(INDEX_STATE)):
>>> >>>>>>>> >> >>>>> >>>>> >      next_index, = list(state.read()) or [0]
>>> >>>>>>>> >> >>>>> >>>>> >      yield (element, next_index)
>>> >>>>>>>> >> >>>>> >>>>> >      state.clear()
>>> >>>>>>>> >> >>>>> >>>>> >      state.add(next_index + 1)
>>> >>>>>>>> >> >>>>> >>>>> >
>>> >>>>>>>> >> >>>>> >>>>> >
>>> >>>>>>>> >> >>>>> >>>>> > whereas CombiningState would be more natural
>>> (than ListState, and
>>> >>>>>>>> >> >>>>> >>>>> > arguably than even ValueState), giving
>>> >>>>>>>> >> >>>>> >>>>> >
>>> >>>>>>>> >> >>>>> >>>>> >
>>> >>>>>>>> >> >>>>> >>>>> > def process(self, element,
>>> index=DoFn.StateParam(INDEX_STATE)):
>>> >>>>>>>> >> >>>>> >>>>> >      yield element, index.read()
>>> >>>>>>>> >> >>>>> >>>>> >      index.add(1)
>>> >>>>>>>> >> >>>>> >>>>> >
>>> >>>>>>>> >> >>>>> >>>>> >
>>> >>>>>>>> >> >>>>> >>>>> >
>>> >>>>>>>> >> >>>>> >>>>> >
>>> >>>>>>>> >> >>>>> >>>>> >>
>>> >>>>>>>> >> >>>>> >>>>> >> -Max
>>> >>>>>>>> >> >>>>> >>>>> >>
>>> >>>>>>>> >> >>>>> >>>>> >> On 25.04.19 16:40, Robert Bradshaw wrote:
>>> >>>>>>>> >> >>>>> >>>>> >>> https://github.com/apache/beam/pull/8402
>>> >>>>>>>> >> >>>>> >>>>> >>>
>>> >>>>>>>> >> >>>>> >>>>> >>> On Thu, Apr 25, 2019 at 4:26 PM Robert
>>> Bradshaw <[email protected]> wrote:
>>> >>>>>>>> >> >>>>> >>>>> >>>>
>>> >>>>>>>> >> >>>>> >>>>> >>>> Oh, this is for the indexing example.
>>> >>>>>>>> >> >>>>> >>>>> >>>>
>>> >>>>>>>> >> >>>>> >>>>> >>>> I actually think using CombiningState is
>>> more cleaner than ValueState.
>>> >>>>>>>> >> >>>>> >>>>> >>>>
>>> >>>>>>>> >> >>>>> >>>>> >>>>
>>> https://github.com/apache/beam/blob/release-2.12.0/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py#L262
>>> >>>>>>>> >> >>>>> >>>>> >>>>
>>> >>>>>>>> >> >>>>> >>>>> >>>> (The fact that one must specify the
>>> accumulator coder is, however,
>>> >>>>>>>> >> >>>>> >>>>> >>>> unfortunate. We should probably infer that
>>> if we can.)
>>> >>>>>>>> >> >>>>> >>>>> >>>>
>>> >>>>>>>> >> >>>>> >>>>> >>>> On Thu, Apr 25, 2019 at 4:19 PM Robert
>>> Bradshaw <[email protected]> wrote:
>>> >>>>>>>> >> >>>>> >>>>> >>>>>
>>> >>>>>>>> >> >>>>> >>>>> >>>>> The desire was to avoid the implicit
>>> disallowed combination wart in
>>> >>>>>>>> >> >>>>> >>>>> >>>>> Python (until we could make sense of it),
>>> and also ValueState could be
>>> >>>>>>>> >> >>>>> >>>>> >>>>> surprising with respect to older values
>>> overwriting newer ones. What
>>> >>>>>>>> >> >>>>> >>>>> >>>>> was the specific example that was less
>>> natural?
>>> >>>>>>>> >> >>>>> >>>>> >>>>>
>>> >>>>>>>> >> >>>>> >>>>> >>>>> On Thu, Apr 25, 2019 at 3:01 PM Maximilian
>>> Michels <[email protected]> wrote:
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>
>>> >>>>>>>> >> >>>>> >>>>> >>>>>> @Pablo: Thanks for following up with the
>>> PR! :)
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>
>>> >>>>>>>> >> >>>>> >>>>> >>>>>> @Brian: I was wondering about this as
>>> well. It makes the Python state
>>> >>>>>>>> >> >>>>> >>>>> >>>>>> code a bit unnatural. I'd suggest to add
>>> a ValueState wrapper around
>>> >>>>>>>> >> >>>>> >>>>> >>>>>> ListState/CombiningState.
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>
>>> >>>>>>>> >> >>>>> >>>>> >>>>>> @Robert: Like Reuven pointed out, we can
>>> disallow ValueState for merging
>>> >>>>>>>> >> >>>>> >>>>> >>>>>> windows with state.
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>
>>> >>>>>>>> >> >>>>> >>>>> >>>>>> @Reza: Great. Let's make sure it has
>>> Python examples out of the box.
>>> >>>>>>>> >> >>>>> >>>>> >>>>>> Either Pablo or me could help there.
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>
>>> >>>>>>>> >> >>>>> >>>>> >>>>>> Thanks,
>>> >>>>>>>> >> >>>>> >>>>> >>>>>> Max
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>
>>> >>>>>>>> >> >>>>> >>>>> >>>>>> On 25.04.19 04:14, Reza Ardeshir Rokni
>>> wrote:
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>> Pablo, Kenneth and I have a new blog
>>> ready for publication which covers
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>> how to create a "looping timer" it
>>> allows for default values to be
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>> created in a window when no incoming
>>> elements exists. We just need to
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>> clear a few bits before publication, but
>>> would be great to have that
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>> also include a python example, I wrote
>>> it in java...
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>> Cheers
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>> Reza
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>> On Thu, 25 Apr 2019 at 04:34, Reuven Lax
>>> <[email protected]
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>> <mailto:[email protected]>> wrote:
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>       Well state is still not
>>> implemented for merging windows even for
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>       Java (though I believe the idea
>>> was to disallow ValueState there).
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>       On Wed, Apr 24, 2019 at 1:11 PM
>>> Robert Bradshaw <[email protected]
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>       <mailto:[email protected]>>
>>> wrote:
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           It was unclear what the
>>> semantics were for ValueState for merging
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           windows. (It's also a bit
>>> weird as it's inherently a race condition
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           wrt element ordering, unlike
>>> Bag and CombineState, though you can
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           always implement it as a
>>> CombineState that always returns the latest
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           value which is a bit more
>>> explicit about the dangers here.)
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           On Wed, Apr 24, 2019 at 10:08
>>> PM Brian Hulette
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           <[email protected] <mailto:
>>> [email protected]>> wrote:
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            > That's a great idea! I
>>> thought about this too after those
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           posts came up on the list
>>> recently. I started to look into it,
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           but I noticed that there's
>>> actually no implementation of
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           ValueState in userstate. Is
>>> there a reason for that? I started
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           to work on a patch to add it
>>> but I was just curious if there was
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           some reason it was omitted
>>> that I should be aware of.
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            > We could certainly
>>> replicate the example without ValueState
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           by using BagState and clearing
>>> it before each write, but it
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           would be nice if we could draw
>>> a direct parallel.
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            > Brian
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            > On Fri, Apr 12, 2019 at
>>> 7:05 AM Maximilian Michels
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           <[email protected] <mailto:
>>> [email protected]>> wrote:
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >>
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> > It would probably be
>>> pretty easy to add the corresponding
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           code snippets to the docs as
>>> well.
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >>
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> It's probably a bit more
>>> work because there is no section
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           dedicated to
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> state/timer yet in the
>>> documentation. Tracked here:
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >>
>>> https://jira.apache.org/jira/browse/BEAM-2472
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >>
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> > I've been going over
>>> this topic a bit. I'll add the
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           snippets next week, if that's
>>> fine by y'all.
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >>
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> That would be great. The
>>> blog posts are a great way to get
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           started with
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> state/timers.
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >>
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> Thanks,
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> Max
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >>
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> On 11.04.19 20:21, Pablo
>>> Estrada wrote:
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> > I've been going over
>>> this topic a bit. I'll add the
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           snippets next week,
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> > if that's fine by y'all.
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> > Best
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> > -P.
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> > On Thu, Apr 11, 2019 at
>>> 5:27 AM Robert Bradshaw
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           <[email protected] <mailto:
>>> [email protected]>
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> > <mailto:
>>> [email protected] <mailto:[email protected]>>>
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           wrote:
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >     That's a great idea!
>>> It would probably be pretty easy
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           to add the
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >     corresponding code
>>> snippets to the docs as well.
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >     On Thu, Apr 11, 2019
>>> at 2:00 PM Maximilian Michels
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           <[email protected] <mailto:
>>> [email protected]>
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >     <mailto:
>>> [email protected] <mailto:[email protected]>>> wrote:
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      >
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      > Hi everyone,
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      >
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      > The Python SDK
>>> still lacks documentation on state
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           and timers.
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      >
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      > As a first step,
>>> what do you think about updating
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>           these two blog
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >     posts
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      > with the
>>> corresponding Python code?
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      >
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      >
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>
>>> https://beam.apache.org/blog/2017/02/13/stateful-processing.html
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      >
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>
>>> https://beam.apache.org/blog/2017/08/28/timely-processing.html
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      >
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      > Thanks,
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      > Max
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >
>>> >>>>>>>> >> >>>>> >>>>> >>>>>>>
>>> >>>>>>>> >> >>>>> >>
>>> >>>>>>>> >> >>>>> >>
>>> >>>>>>>> >> >>>>> >>
>>> >>>>>>>> >> >>>>> >> --
>>> >>>>>>>> >> >>>>> >>
>>> >>>>>>>> >> >>>>> >> This email may be confidential and privileged. If
>>> you received this communication by mistake, please don't forward it to
>>> anyone else, please erase all copies and attachments, and please let me
>>> know that it has gone to the wrong person.
>>> >>>>>>>> >> >>>>> >>
>>> >>>>>>>> >> >>>>> >> The above terms reflect a potential business
>>> arrangement, are provided solely as a basis for further discussion, and are
>>> not intended to be and do not constitute a legally binding obligation. No
>>> legally binding obligations will be created, implied, or inferred until an
>>> agreement in final form is executed in writing by all parties involved.
>>> >>>>>>>> >> >
>>> >>>>>>>> >> >
>>> >>>>>>>> >> >
>>> >>>>>>>> >> > --
>>> >>>>>>>> >> >
>>> >>>>>>>> >> > This email may be confidential and privileged. If you
>>> received this communication by mistake, please don't forward it to anyone
>>> else, please erase all copies and attachments, and please let me know that
>>> it has gone to the wrong person.
>>> >>>>>>>> >> >
>>> >>>>>>>> >> > The above terms reflect a potential business arrangement,
>>> are provided solely as a basis for further discussion, and are not intended
>>> to be and do not constitute a legally binding obligation. No legally
>>> binding obligations will be created, implied, or inferred until an
>>> agreement in final form is executed in writing by all parties involved.
>>>
>>
>
> --
>
> This email may be confidential and privileged. If you received this
> communication by mistake, please don't forward it to anyone else, please
> erase all copies and attachments, and please let me know that it has gone
> to the wrong person.
>
> The above terms reflect a potential business arrangement, are provided
> solely as a basis for further discussion, and are not intended to be and do
> not constitute a legally binding obligation. No legally binding obligations
> will be created, implied, or inferred until an agreement in final form is
> executed in writing by all parties involved.
>

Reply via email to