I guess it depends on how we define LatestCombineFn. My assumption was
LatestCombineFn meant "throw away everything else when a new element is
written" i.e. latest means latest in processing time. Then a
CombiningValueState(LatestCombineFn) would be the same as ValueState I
think - and you could do a read-modify-write and get similar non-racy
behavior.

But when I look for existing "Latest" CombineFns I just find
`Latest.combineFn()` [1] which seems to be based on event timestamps. So
maybe I'm operating on a different definition from everyone else. I agree
that that would be a very confusing replacement for ValueState.

[1]
https://beam.apache.org/releases/javadoc/2.12.0/org/apache/beam/sdk/transforms/Latest.html

On Wed, May 1, 2019 at 8:51 AM Reuven Lax <[email protected]> wrote:

> ValueState is not  necessarily racy if you're doing a read-modify-write.
> It's only racy if you're doing something like writing last element seen.
>
> While it's true that many read-modify-write patterns can be expressed via
> a combiner, having to write a full combiner to do something simple will be
> a step back in usability. In addition, as I mentioned above people often
> use ValueState to store metadata about other state; e.g. adding elements to
> BagState, and storing information about that BagState in a ValueState
> (minimum element in the bag, histogram of elements in the bag, last element
> processed in the bag, etc.). This is not racy at all.
>
> Reuven
>
> On Wed, May 1, 2019 at 8:46 AM Brian Hulette <[email protected]> wrote:
>
>> > LatestCombineFn sounds to me like the worst possible world. It will
>> almost always be racy and confusing.
>> But isn't that Robert's point? ValueState is already racy and confusing,
>> the LatestCombineFn just makes it explicit.
>>
>> On Wed, May 1, 2019 at 8:39 AM Reuven Lax <[email protected]> wrote:
>>
>>> I also think that ValueState is very useful, for all the reasons
>>> mentioned in this thread. Also keep in mind that even for cases where
>>> CombiningState can be used, that will be much more cumbersome unless a
>>> preexisting combiner is already written. Writing .a custom combiner is a
>>> lot of boilerplate that I think users would like to avoid, while doing a
>>> simple read-modify-write on a ValueState is pretty explicit and simple.
>>>
>>> LatestCombineFn sounds to me like the worst possible world. It will
>>> almost always be racy and confusing.
>>>
>>> Reuven
>>>
>>> On Wed, May 1, 2019 at 8:34 AM Lukasz Cwik <[email protected]> wrote:
>>>
>>>> Note, the example didn't support merging windows so I also ignored it.
>>>> In the case of merging windows, your solution would depend on whether you
>>>> needed to know from what window the enriched event was from.
>>>>
>>>> On Wed, May 1, 2019 at 8:30 AM Lukasz Cwik <[email protected]> wrote:
>>>>
>>>>> Isn't a value state just a bag state with at most one element and the
>>>>> usage pattern would be?
>>>>> 1) value_state.get == bag_state.read.next() (both have to handle the
>>>>> case when neither have been set)
>>>>> 2) user logic on what to do with current state + additional
>>>>> information to produce new state
>>>>> 3) value_state.set == bag_state.clear + bag_state.append? (note that
>>>>> Runners should optimize clear + append to become a single 
>>>>> transaction/write)
>>>>>
>>>>> For example, the blog post with the counter example would be:
>>>>>   @StateId("buffer")
>>>>>   private final StateSpec<BagState<Event>> bufferedEvents =
>>>>> StateSpecs.bag();
>>>>>
>>>>>   @StateId("count")
>>>>>   private final StateSpec<BagState<Integer>> countState =
>>>>> StateSpecs.bag();
>>>>>
>>>>>   @ProcessElement
>>>>>   public void process(
>>>>>       ProcessContext context,
>>>>>       @StateId("buffer") BagState<Event> bufferState,
>>>>>       @StateId("count") BagState<Integer> countState) {
>>>>>
>>>>>     int count = Iterables.getFirst(countState.read(), 0);
>>>>>     count = count + 1;
>>>>>     countState.clear();
>>>>>     countState.append(count);
>>>>>     bufferState.add(context.element());
>>>>>
>>>>>     if (count > MAX_BUFFER_SIZE) {
>>>>>       for (EnrichedEvent enrichedEvent :
>>>>> enrichEvents(bufferState.read())) {
>>>>>         context.output(enrichedEvent);
>>>>>       }
>>>>>       bufferState.clear();
>>>>>       countState.clear();
>>>>>     }
>>>>>   }
>>>>>
>>>>> On Tue, Apr 30, 2019 at 5:39 PM Kenneth Knowles <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Anything where the state evolves serially but arbitrarily - the toy
>>>>>> example is the integer counter in my blog post - needs ValueState. You
>>>>>> can't do it with AnyCombineFn. And I think LatestCombineFn is dangerous,
>>>>>> especially when it comes to CombingState. ValueState is more explicit, 
>>>>>> and
>>>>>> I still maintain that it is status quo, modulo unimplemented features in
>>>>>> the Python SDK. The reads and writes are explicitly serial per (key,
>>>>>> window), unlike for a CombiningState. Because of a CombineFn's 
>>>>>> requirement
>>>>>> to be associative and commutative, I would interpret it that the runner 
>>>>>> is
>>>>>> allowed to reorder inputs even after they are "written" by the user's 
>>>>>> DoFn
>>>>>> - for example doing blind writes to an unordered bag, and only later
>>>>>> reading the elements out and combining them in arbitrary order. Or any
>>>>>> other strategy mixing addInput and mergeAccumulators. It would be a
>>>>>> violation of the meaning of CombineFn to overspecify CombiningState 
>>>>>> further
>>>>>> than this. So you cannot actually implement a consistent
>>>>>> CombiningState(LatestCombineFn).
>>>>>>
>>>>>> Kenn
>>>>>>
>>>>>> On Tue, Apr 30, 2019 at 5:19 PM Robert Bradshaw <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> On Wed, May 1, 2019 at 1:55 AM Brian Hulette <[email protected]>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > Reza - you're definitely not derailing, that's exactly what I was
>>>>>>> looking for!
>>>>>>> >
>>>>>>> > I've actually recently encountered an additional use case where
>>>>>>> I'd like to use ValueState in the Python SDK. I'm experimenting with an
>>>>>>> ArrowBatchingDoFn that uses state and timers to batch up python
>>>>>>> dictionaries into arrow record batches (actually my entire purpose for
>>>>>>> jumping down this python state rabbit hole).
>>>>>>> >
>>>>>>> > At first blush it seems like the best way to do this would be to
>>>>>>> just replicate the batching approach in the timely processing post [1], 
>>>>>>> but
>>>>>>> when the bag is full combine the elements into an arrow record batch,
>>>>>>> rather than enriching all of the elements and writing them out 
>>>>>>> separately.
>>>>>>> However, if possible I'd like to pre-allocate buffers for each column 
>>>>>>> and
>>>>>>> populate them as elements arrive (at least for columns with a fixed size
>>>>>>> type), so a bag state wouldn't be ideal.
>>>>>>>
>>>>>>> It seems it'd be preferable to do the conversion from a bag of
>>>>>>> elements to a single arrow frame all at once, when emitting, rather
>>>>>>> than repeatedly reading and writing the partial batch to and from
>>>>>>> state with every element that comes in. (Bag state has blind append.)
>>>>>>>
>>>>>>> > Also, a CombiningValueState is not ideal because I'd need to
>>>>>>> implement a merge_accumulators function that combines several 
>>>>>>> in-progress
>>>>>>> batches. I could certainly implement that, but I'd prefer that it never 
>>>>>>> be
>>>>>>> called unless absolutely necessary, which doesn't seem to be the case 
>>>>>>> for
>>>>>>> CombiningValueState. (As an aside, maybe there's some room there for a
>>>>>>> middle ground between ValueState and CombiningValueState
>>>>>>>
>>>>>>> This does actually feel natural (to me), because you're repeatedly
>>>>>>> adding elements to build something up. merge_accumulators would
>>>>>>> probably be pretty easy (concatenation) but unless your windows are
>>>>>>> merging could just throw a not implemented error to really guard
>>>>>>> against it being used.
>>>>>>>
>>>>>>> > I suppose you could argue that this is a pretty low-level
>>>>>>> optimization we should be able to shield our users from, but right now I
>>>>>>> just wish I had ValueState in python so I didn't have to hack it up 
>>>>>>> with a
>>>>>>> BagState :)
>>>>>>> >
>>>>>>> > Anyway, in light of this and all the other use-cases mentioned
>>>>>>> here, I think the resolution is to just implement ValueState in python, 
>>>>>>> and
>>>>>>> document the danger with ValueState in both Python and Java. Just to be
>>>>>>> clear, the danger I'm referring to is that users might easily forget 
>>>>>>> that
>>>>>>> data can be out of order, and use ValueState in a way that assumes it's
>>>>>>> been populated with data from the most recent element in event time, 
>>>>>>> then
>>>>>>> in practice out of order data clobbers their state. I'm happy to write 
>>>>>>> up a
>>>>>>> PR for this - are there any objections to that?
>>>>>>>
>>>>>>> I still haven't seen a good case for it (though I haven't looked at
>>>>>>> Reza's BiTemporalStream yet). Much harder to remove things once
>>>>>>> they're in. Can we just add a Any and/or LatestCombineFn and use (and
>>>>>>> point to) that instead? With the comment that if you're doing
>>>>>>> read-modify-write, an add_input may be better.
>>>>>>>
>>>>>>> > [1] https://beam.apache.org/blog/2017/08/28/timely-processing.html
>>>>>>> >
>>>>>>> > On Mon, Apr 29, 2019 at 12:23 AM Robert Bradshaw <
>>>>>>> [email protected]> wrote:
>>>>>>> >>
>>>>>>> >> On Mon, Apr 29, 2019 at 3:43 AM Reza Rokni <[email protected]>
>>>>>>> wrote:
>>>>>>> >> >
>>>>>>> >> > @Robert Bradshaw Some examples, mostly built out from cases
>>>>>>> around Timeseries data, don't want to derail this thread so at a hi 
>>>>>>> level  :
>>>>>>> >>
>>>>>>> >> Thanks. Perfectly on-topic for the thread.
>>>>>>> >>
>>>>>>> >> > Looping timers, a timer which allows for creation of a value
>>>>>>> within a window when no external input has been seen. Requires metadata
>>>>>>> like "is timer set".
>>>>>>> >> >
>>>>>>> >> > BiTemporalStream join, where we need to match leftCol.timestamp
>>>>>>> to a value ==  (max(rightCol.timestamp) where rightCol.timestamp <=
>>>>>>> leftCol.timestamp)) , this if for a application matching trades to 
>>>>>>> quotes.
>>>>>>> >>
>>>>>>> >> I'd be interested in seeing the code here. The fact that you have
>>>>>>> a
>>>>>>> >> max here makes me wonder if combining would be applicable.
>>>>>>> >>
>>>>>>> >> (FWIW, I've long thought it would be useful to do this kind of
>>>>>>> thing
>>>>>>> >> with Windows. Basically, it'd be like session windows with one
>>>>>>> side
>>>>>>> >> being the window from the timestamp forward into the future, and
>>>>>>> the
>>>>>>> >> other side being from the timestamp back a certain amount in the
>>>>>>> past.
>>>>>>> >> This seems a common join pattern.)
>>>>>>> >>
>>>>>>> >> > Metadata is used for
>>>>>>> >> >
>>>>>>> >> > Taking the Key from the KV  for use within the OnTimer call.
>>>>>>> >> > Knowing where we are in watermarks for GC of objects in state.
>>>>>>> >> > More timer metadata (min timer ..)
>>>>>>> >> >
>>>>>>> >> > It could be argued that what we are using state for mostly
>>>>>>> workarounds for things that could eventually end up in the API itself. 
>>>>>>> For
>>>>>>> example
>>>>>>> >> >
>>>>>>> >> > There is a Jira for OnTimer Context to have Key.
>>>>>>> >> >  The GC needs are mostly due to not having a Map State object
>>>>>>> in all runners yet.
>>>>>>> >>
>>>>>>> >> Yeah. GC could probably be done with a max combine. The Key (which
>>>>>>> >> should be in the API) could be an AnyCombine for now (safe to
>>>>>>> >> overwrite because it's always the same).
>>>>>>> >>
>>>>>>> >> > However I think as folks explore Beam there will always be
>>>>>>> little things that require Metadata and so having access to something 
>>>>>>> which
>>>>>>> gives us fine grain control ( as Kenneth mentioned) is useful.
>>>>>>> >>
>>>>>>> >> Likely. I guess in line with making easy things easy, I'd like to
>>>>>>> make
>>>>>>> >> dangerous things hard(er). As Kenn says, we'll probably need some
>>>>>>> kind
>>>>>>> >> of lower-level thing, especially if we introduce OnMerge.
>>>>>>> >>
>>>>>>> >> > Cheers
>>>>>>> >> >
>>>>>>> >> > Reza
>>>>>>> >> >
>>>>>>> >> > On Sat, 27 Apr 2019 at 02:59, Kenneth Knowles <[email protected]>
>>>>>>> wrote:
>>>>>>> >> >>
>>>>>>> >> >> To be clear, the intent was always that ValueState would be
>>>>>>> not usable in merging pipelines. So no danger of clobbering, but also
>>>>>>> limited functionality. Is there a runner than accepts it and clobbers? 
>>>>>>> The
>>>>>>> whole idea of the new DoFn is that it is easy to do the 
>>>>>>> construction-time
>>>>>>> analysis and reject the invalid pipeline. It is actually runner 
>>>>>>> independent
>>>>>>> and I think already implemented in ParDo's validation, no?
>>>>>>> >> >>
>>>>>>> >> >> Kenn
>>>>>>> >> >>
>>>>>>> >> >> On Fri, Apr 26, 2019 at 10:14 AM Lukasz Cwik <[email protected]>
>>>>>>> wrote:
>>>>>>> >> >>>
>>>>>>> >> >>> I am in the camp where we should only support merging state
>>>>>>> (either naturally via things like bags or via combiners). I believe that
>>>>>>> having the wrapper that Brian suggests is useful for users. As for the
>>>>>>> @OnMerge method, I believe combiners should have the ability to look at 
>>>>>>> the
>>>>>>> window information and we should treat @OnMerge as syntactic sugar over 
>>>>>>> a
>>>>>>> combiner if the combiner API is too cumbersome.
>>>>>>> >> >>>
>>>>>>> >> >>> I believe using combiners can also extend to side inputs and
>>>>>>> help us deal with singleton and map like side inputs when multiple 
>>>>>>> firings
>>>>>>> occur. I also like treating everything like a combiner because it will 
>>>>>>> give
>>>>>>> us a lot reuse of combiner implementations across all the places they 
>>>>>>> could
>>>>>>> be used and will be especially useful when we start exposing APIs 
>>>>>>> related
>>>>>>> to retractions on combiners.
>>>>>>> >> >>>
>>>>>>> >> >>> On Fri, Apr 26, 2019 at 9:43 AM Brian Hulette <
>>>>>>> [email protected]> wrote:
>>>>>>> >> >>>>
>>>>>>> >> >>>> Yeah the danger with out of order processing concerns me
>>>>>>> more than the merging as well. As a new Beam user, I immediately 
>>>>>>> gravitated
>>>>>>> towards ValueState since it was easy to think about and I just assumed
>>>>>>> there wasn't anything to be concerned about. So it was shocking to learn
>>>>>>> that there is this dangerous edge-case.
>>>>>>> >> >>>>
>>>>>>> >> >>>> What if ValueState were just implemented as a wrapper of
>>>>>>> CombiningState with a LatestCombineFn and documented as such (and 
>>>>>>> perhaps
>>>>>>> we encourage users to consider using a CombiningState explicitly if at 
>>>>>>> all
>>>>>>> possible)?
>>>>>>> >> >>>>
>>>>>>> >> >>>> Brian
>>>>>>> >> >>>>
>>>>>>> >> >>>>
>>>>>>> >> >>>>
>>>>>>> >> >>>> On Fri, Apr 26, 2019 at 2:29 AM Robert Bradshaw <
>>>>>>> [email protected]> wrote:
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> On Fri, Apr 26, 2019 at 6:40 AM Kenneth Knowles <
>>>>>>> [email protected]> wrote:
>>>>>>> >> >>>>> >
>>>>>>> >> >>>>> > You could use a CombiningState with a CombineFn that
>>>>>>> returns the minimum for this case.
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> We've also wanted to be able to set data when setting a
>>>>>>> timer that
>>>>>>> >> >>>>> would be returned when the timer fires. (It's in the FnAPI,
>>>>>>> but not
>>>>>>> >> >>>>> the SDKs yet.)
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> The metadata is an interesting usecase, do you have some
>>>>>>> more specific
>>>>>>> >> >>>>> examples? Might boil down to not having a rich enough
>>>>>>> (single) state
>>>>>>> >> >>>>> type.
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> > But I've come to feel there is a mismatch. On the one
>>>>>>> hand, ParDo(<stateful DoFn>) is a way to drop to a lower level and write
>>>>>>> logic that does not fit a more general computational pattern, really 
>>>>>>> taking
>>>>>>> fine control. On the other hand, automatically merging state via
>>>>>>> CombiningState or BagState is more of a no-knobs higher level of
>>>>>>> programming. To me there seems to be a bit of a philosophical conflict.
>>>>>>> >> >>>>> >
>>>>>>> >> >>>>> > These days, I feel like an @OnMerge method would be more
>>>>>>> natural. If you are using state and timers, you probably often want more
>>>>>>> direct control over how state from windows gets merged. An of course we
>>>>>>> don't even have a design for timers - you would need some kind of 
>>>>>>> timestamp
>>>>>>> CombineFn but I think setting/unsetting timers manually makes more 
>>>>>>> sense.
>>>>>>> Especially considering the trickiness around merging windows in the 
>>>>>>> absence
>>>>>>> of retractions, you really need this callback, so you can issue 
>>>>>>> retractions
>>>>>>> manually for any output your stateful DoFn emitted in windows that no
>>>>>>> longer exist.
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> I agree we'll probably need an @OnMerge. On the other hand,
>>>>>>> I like
>>>>>>> >> >>>>> being able to have good defaults. The high/low level thing
>>>>>>> is a
>>>>>>> >> >>>>> continuum (the indexing example falling towards the high
>>>>>>> end).
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> Actually, the merging questions bother me less than how
>>>>>>> easy it is to
>>>>>>> >> >>>>> accidentally clobber previous values. It looks so easy
>>>>>>> (like the
>>>>>>> >> >>>>> easiest state to use) but is actually the most dangerous.
>>>>>>> If one wants
>>>>>>> >> >>>>> this behavior, I would rather an explicit AnyCombineFn or
>>>>>>> >> >>>>> LatestCombineFn which makes you think about the semantics.
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> - Robert
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> > On Thu, Apr 25, 2019 at 5:49 PM Reza Rokni <
>>>>>>> [email protected]> wrote:
>>>>>>> >> >>>>> >>
>>>>>>> >> >>>>> >> +1 on the metadata use case.
>>>>>>> >> >>>>> >>
>>>>>>> >> >>>>> >> For performance reasons the Timer API does not support a
>>>>>>> read() operation, which for the  vast majority of use cases is not a
>>>>>>> required feature. In the small set of use cases where it is needed, for
>>>>>>> example when you need to set a Timer in EventTime based on the smallest
>>>>>>> timestamp seen in the elements within a DoFn, we can make use of a
>>>>>>> ValueState object to keep track of the value.
>>>>>>> >> >>>>> >>
>>>>>>> >> >>>>> >> On Fri, 26 Apr 2019 at 00:38, Reuven Lax <
>>>>>>> [email protected]> wrote:
>>>>>>> >> >>>>> >>>
>>>>>>> >> >>>>> >>> I see examples of people using ValueState that I think
>>>>>>> are not captured CombiningState. For example, one common one is users 
>>>>>>> who
>>>>>>> set a timer and then record the timestamp of that timer in a 
>>>>>>> ValueState. In
>>>>>>> general when you store state that is metadata about other state you 
>>>>>>> store,
>>>>>>> then ValueState will usually make more sense than CombiningState.
>>>>>>> >> >>>>> >>>
>>>>>>> >> >>>>> >>> On Thu, Apr 25, 2019 at 9:32 AM Brian Hulette <
>>>>>>> [email protected]> wrote:
>>>>>>> >> >>>>> >>>>
>>>>>>> >> >>>>> >>>> Currently the Python SDK does not make ValueState
>>>>>>> available to users. My initial inclination was to go ahead and 
>>>>>>> implement it
>>>>>>> there to be consistent with Java, but Robert brings up a great point 
>>>>>>> here
>>>>>>> that ValueState has an inherent race condition for out of order data, 
>>>>>>> and a
>>>>>>> lot of it's use cases can actually be implemented with a CombiningState
>>>>>>> instead.
>>>>>>> >> >>>>> >>>>
>>>>>>> >> >>>>> >>>> It seems to me that at the very least we should
>>>>>>> discourage the use of ValueState by noting the danger in the 
>>>>>>> documentation
>>>>>>> and preferring CombiningState in examples, and perhaps we should go 
>>>>>>> further
>>>>>>> and deprecate it in Java and not implement it in python. Either way I 
>>>>>>> think
>>>>>>> we should be consistent between Java and Python.
>>>>>>> >> >>>>> >>>>
>>>>>>> >> >>>>> >>>> I'm curious what people think about this, are there
>>>>>>> use cases that we really need to keep ValueState around for?
>>>>>>> >> >>>>> >>>>
>>>>>>> >> >>>>> >>>> Brian
>>>>>>> >> >>>>> >>>>
>>>>>>> >> >>>>> >>>> ---------- Forwarded message ---------
>>>>>>> >> >>>>> >>>> From: Robert Bradshaw <[email protected]>
>>>>>>> >> >>>>> >>>> Date: Thu, Apr 25, 2019, 08:31
>>>>>>> >> >>>>> >>>> Subject: Re: [docs] Python State & Timers
>>>>>>> >> >>>>> >>>> To: dev <[email protected]>
>>>>>>> >> >>>>> >>>>
>>>>>>> >> >>>>> >>>>
>>>>>>> >> >>>>> >>>>
>>>>>>> >> >>>>> >>>>
>>>>>>> >> >>>>> >>>> On Thu, Apr 25, 2019, 5:26 PM Maximilian Michels <
>>>>>>> [email protected]> wrote:
>>>>>>> >> >>>>> >>>>>
>>>>>>> >> >>>>> >>>>> Completely agree that CombiningState is nicer in this
>>>>>>> example. Users may
>>>>>>> >> >>>>> >>>>> still want to use ValueState when there is nothing to
>>>>>>> combine.
>>>>>>> >> >>>>> >>>>
>>>>>>> >> >>>>> >>>>
>>>>>>> >> >>>>> >>>> I've always had trouble coming up with any good
>>>>>>> examples of this.
>>>>>>> >> >>>>> >>>>
>>>>>>> >> >>>>> >>>>> Also,
>>>>>>> >> >>>>> >>>>> users already know ValueState from the Java SDK.
>>>>>>> >> >>>>> >>>>
>>>>>>> >> >>>>> >>>>
>>>>>>> >> >>>>> >>>> Maybe we should deprecate that :)
>>>>>>> >> >>>>> >>>>
>>>>>>> >> >>>>> >>>>
>>>>>>> >> >>>>> >>>>> On 25.04.19 17:12, Robert Bradshaw wrote:
>>>>>>> >> >>>>> >>>>> > On Thu, Apr 25, 2019 at 4:58 PM Maximilian Michels <
>>>>>>> [email protected]> wrote:
>>>>>>> >> >>>>> >>>>> >>
>>>>>>> >> >>>>> >>>>> >> I forgot to give an example, just to clarify for
>>>>>>> others:
>>>>>>> >> >>>>> >>>>> >>
>>>>>>> >> >>>>> >>>>> >>> What was the specific example that was less
>>>>>>> natural?
>>>>>>> >> >>>>> >>>>> >>
>>>>>>> >> >>>>> >>>>> >> Basically every time we use ListState to express
>>>>>>> ValueState, e.g.
>>>>>>> >> >>>>> >>>>> >>
>>>>>>> >> >>>>> >>>>> >>     next_index, = list(state.read()) or [0]
>>>>>>> >> >>>>> >>>>> >>
>>>>>>> >> >>>>> >>>>> >> Taken from:
>>>>>>> >> >>>>> >>>>> >>
>>>>>>> https://github.com/apache/beam/pull/8363/files#diff-ba1a2aed98079ccce869cd660ca9d97dR301
>>>>>>> >> >>>>> >>>>> >
>>>>>>> >> >>>>> >>>>> > Yes, ListState is much less natural here. I think
>>>>>>> generally
>>>>>>> >> >>>>> >>>>> > CombiningValue is often a better replacement. E.g.
>>>>>>> the Java example
>>>>>>> >> >>>>> >>>>> > reads
>>>>>>> >> >>>>> >>>>> >
>>>>>>> >> >>>>> >>>>> >
>>>>>>> >> >>>>> >>>>> > public void processElement(
>>>>>>> >> >>>>> >>>>> >        ProcessContext context, @StateId("index")
>>>>>>> ValueState<Integer> index) {
>>>>>>> >> >>>>> >>>>> >      int current = firstNonNull(index.read(), 0);
>>>>>>> >> >>>>> >>>>> >      context.output(KV.of(current,
>>>>>>> context.element()));
>>>>>>> >> >>>>> >>>>> >      index.write(current+1);
>>>>>>> >> >>>>> >>>>> > }
>>>>>>> >> >>>>> >>>>> >
>>>>>>> >> >>>>> >>>>> >
>>>>>>> >> >>>>> >>>>> > which is replaced with bag state
>>>>>>> >> >>>>> >>>>> >
>>>>>>> >> >>>>> >>>>> >
>>>>>>> >> >>>>> >>>>> > def process(self, element,
>>>>>>> state=DoFn.StateParam(INDEX_STATE)):
>>>>>>> >> >>>>> >>>>> >      next_index, = list(state.read()) or [0]
>>>>>>> >> >>>>> >>>>> >      yield (element, next_index)
>>>>>>> >> >>>>> >>>>> >      state.clear()
>>>>>>> >> >>>>> >>>>> >      state.add(next_index + 1)
>>>>>>> >> >>>>> >>>>> >
>>>>>>> >> >>>>> >>>>> >
>>>>>>> >> >>>>> >>>>> > whereas CombiningState would be more natural (than
>>>>>>> ListState, and
>>>>>>> >> >>>>> >>>>> > arguably than even ValueState), giving
>>>>>>> >> >>>>> >>>>> >
>>>>>>> >> >>>>> >>>>> >
>>>>>>> >> >>>>> >>>>> > def process(self, element,
>>>>>>> index=DoFn.StateParam(INDEX_STATE)):
>>>>>>> >> >>>>> >>>>> >      yield element, index.read()
>>>>>>> >> >>>>> >>>>> >      index.add(1)
>>>>>>> >> >>>>> >>>>> >
>>>>>>> >> >>>>> >>>>> >
>>>>>>> >> >>>>> >>>>> >
>>>>>>> >> >>>>> >>>>> >
>>>>>>> >> >>>>> >>>>> >>
>>>>>>> >> >>>>> >>>>> >> -Max
>>>>>>> >> >>>>> >>>>> >>
>>>>>>> >> >>>>> >>>>> >> On 25.04.19 16:40, Robert Bradshaw wrote:
>>>>>>> >> >>>>> >>>>> >>> https://github.com/apache/beam/pull/8402
>>>>>>> >> >>>>> >>>>> >>>
>>>>>>> >> >>>>> >>>>> >>> On Thu, Apr 25, 2019 at 4:26 PM Robert Bradshaw <
>>>>>>> [email protected]> wrote:
>>>>>>> >> >>>>> >>>>> >>>>
>>>>>>> >> >>>>> >>>>> >>>> Oh, this is for the indexing example.
>>>>>>> >> >>>>> >>>>> >>>>
>>>>>>> >> >>>>> >>>>> >>>> I actually think using CombiningState is more
>>>>>>> cleaner than ValueState.
>>>>>>> >> >>>>> >>>>> >>>>
>>>>>>> >> >>>>> >>>>> >>>>
>>>>>>> https://github.com/apache/beam/blob/release-2.12.0/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py#L262
>>>>>>> >> >>>>> >>>>> >>>>
>>>>>>> >> >>>>> >>>>> >>>> (The fact that one must specify the accumulator
>>>>>>> coder is, however,
>>>>>>> >> >>>>> >>>>> >>>> unfortunate. We should probably infer that if we
>>>>>>> can.)
>>>>>>> >> >>>>> >>>>> >>>>
>>>>>>> >> >>>>> >>>>> >>>> On Thu, Apr 25, 2019 at 4:19 PM Robert Bradshaw <
>>>>>>> [email protected]> wrote:
>>>>>>> >> >>>>> >>>>> >>>>>
>>>>>>> >> >>>>> >>>>> >>>>> The desire was to avoid the implicit disallowed
>>>>>>> combination wart in
>>>>>>> >> >>>>> >>>>> >>>>> Python (until we could make sense of it), and
>>>>>>> also ValueState could be
>>>>>>> >> >>>>> >>>>> >>>>> surprising with respect to older values
>>>>>>> overwriting newer ones. What
>>>>>>> >> >>>>> >>>>> >>>>> was the specific example that was less natural?
>>>>>>> >> >>>>> >>>>> >>>>>
>>>>>>> >> >>>>> >>>>> >>>>> On Thu, Apr 25, 2019 at 3:01 PM Maximilian
>>>>>>> Michels <[email protected]> wrote:
>>>>>>> >> >>>>> >>>>> >>>>>>
>>>>>>> >> >>>>> >>>>> >>>>>> @Pablo: Thanks for following up with the PR! :)
>>>>>>> >> >>>>> >>>>> >>>>>>
>>>>>>> >> >>>>> >>>>> >>>>>> @Brian: I was wondering about this as well. It
>>>>>>> makes the Python state
>>>>>>> >> >>>>> >>>>> >>>>>> code a bit unnatural. I'd suggest to add a
>>>>>>> ValueState wrapper around
>>>>>>> >> >>>>> >>>>> >>>>>> ListState/CombiningState.
>>>>>>> >> >>>>> >>>>> >>>>>>
>>>>>>> >> >>>>> >>>>> >>>>>> @Robert: Like Reuven pointed out, we can
>>>>>>> disallow ValueState for merging
>>>>>>> >> >>>>> >>>>> >>>>>> windows with state.
>>>>>>> >> >>>>> >>>>> >>>>>>
>>>>>>> >> >>>>> >>>>> >>>>>> @Reza: Great. Let's make sure it has Python
>>>>>>> examples out of the box.
>>>>>>> >> >>>>> >>>>> >>>>>> Either Pablo or me could help there.
>>>>>>> >> >>>>> >>>>> >>>>>>
>>>>>>> >> >>>>> >>>>> >>>>>> Thanks,
>>>>>>> >> >>>>> >>>>> >>>>>> Max
>>>>>>> >> >>>>> >>>>> >>>>>>
>>>>>>> >> >>>>> >>>>> >>>>>> On 25.04.19 04:14, Reza Ardeshir Rokni wrote:
>>>>>>> >> >>>>> >>>>> >>>>>>> Pablo, Kenneth and I have a new blog ready
>>>>>>> for publication which covers
>>>>>>> >> >>>>> >>>>> >>>>>>> how to create a "looping timer" it allows for
>>>>>>> default values to be
>>>>>>> >> >>>>> >>>>> >>>>>>> created in a window when no incoming elements
>>>>>>> exists. We just need to
>>>>>>> >> >>>>> >>>>> >>>>>>> clear a few bits before publication, but
>>>>>>> would be great to have that
>>>>>>> >> >>>>> >>>>> >>>>>>> also include a python example, I wrote it in
>>>>>>> java...
>>>>>>> >> >>>>> >>>>> >>>>>>>
>>>>>>> >> >>>>> >>>>> >>>>>>> Cheers
>>>>>>> >> >>>>> >>>>> >>>>>>>
>>>>>>> >> >>>>> >>>>> >>>>>>> Reza
>>>>>>> >> >>>>> >>>>> >>>>>>>
>>>>>>> >> >>>>> >>>>> >>>>>>> On Thu, 25 Apr 2019 at 04:34, Reuven Lax <
>>>>>>> [email protected]
>>>>>>> >> >>>>> >>>>> >>>>>>> <mailto:[email protected]>> wrote:
>>>>>>> >> >>>>> >>>>> >>>>>>>
>>>>>>> >> >>>>> >>>>> >>>>>>>       Well state is still not implemented for
>>>>>>> merging windows even for
>>>>>>> >> >>>>> >>>>> >>>>>>>       Java (though I believe the idea was to
>>>>>>> disallow ValueState there).
>>>>>>> >> >>>>> >>>>> >>>>>>>
>>>>>>> >> >>>>> >>>>> >>>>>>>       On Wed, Apr 24, 2019 at 1:11 PM Robert
>>>>>>> Bradshaw <[email protected]
>>>>>>> >> >>>>> >>>>> >>>>>>>       <mailto:[email protected]>> wrote:
>>>>>>> >> >>>>> >>>>> >>>>>>>
>>>>>>> >> >>>>> >>>>> >>>>>>>           It was unclear what the semantics
>>>>>>> were for ValueState for merging
>>>>>>> >> >>>>> >>>>> >>>>>>>           windows. (It's also a bit weird as
>>>>>>> it's inherently a race condition
>>>>>>> >> >>>>> >>>>> >>>>>>>           wrt element ordering, unlike Bag
>>>>>>> and CombineState, though you can
>>>>>>> >> >>>>> >>>>> >>>>>>>           always implement it as a
>>>>>>> CombineState that always returns the latest
>>>>>>> >> >>>>> >>>>> >>>>>>>           value which is a bit more explicit
>>>>>>> about the dangers here.)
>>>>>>> >> >>>>> >>>>> >>>>>>>
>>>>>>> >> >>>>> >>>>> >>>>>>>           On Wed, Apr 24, 2019 at 10:08 PM
>>>>>>> Brian Hulette
>>>>>>> >> >>>>> >>>>> >>>>>>>           <[email protected] <mailto:
>>>>>>> [email protected]>> wrote:
>>>>>>> >> >>>>> >>>>> >>>>>>>            >
>>>>>>> >> >>>>> >>>>> >>>>>>>            > That's a great idea! I thought
>>>>>>> about this too after those
>>>>>>> >> >>>>> >>>>> >>>>>>>           posts came up on the list recently.
>>>>>>> I started to look into it,
>>>>>>> >> >>>>> >>>>> >>>>>>>           but I noticed that there's actually
>>>>>>> no implementation of
>>>>>>> >> >>>>> >>>>> >>>>>>>           ValueState in userstate. Is there a
>>>>>>> reason for that? I started
>>>>>>> >> >>>>> >>>>> >>>>>>>           to work on a patch to add it but I
>>>>>>> was just curious if there was
>>>>>>> >> >>>>> >>>>> >>>>>>>           some reason it was omitted that I
>>>>>>> should be aware of.
>>>>>>> >> >>>>> >>>>> >>>>>>>            >
>>>>>>> >> >>>>> >>>>> >>>>>>>            > We could certainly replicate the
>>>>>>> example without ValueState
>>>>>>> >> >>>>> >>>>> >>>>>>>           by using BagState and clearing it
>>>>>>> before each write, but it
>>>>>>> >> >>>>> >>>>> >>>>>>>           would be nice if we could draw a
>>>>>>> direct parallel.
>>>>>>> >> >>>>> >>>>> >>>>>>>            >
>>>>>>> >> >>>>> >>>>> >>>>>>>            > Brian
>>>>>>> >> >>>>> >>>>> >>>>>>>            >
>>>>>>> >> >>>>> >>>>> >>>>>>>            > On Fri, Apr 12, 2019 at 7:05 AM
>>>>>>> Maximilian Michels
>>>>>>> >> >>>>> >>>>> >>>>>>>           <[email protected] <mailto:
>>>>>>> [email protected]>> wrote:
>>>>>>> >> >>>>> >>>>> >>>>>>>            >>
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> > It would probably be pretty
>>>>>>> easy to add the corresponding
>>>>>>> >> >>>>> >>>>> >>>>>>>           code snippets to the docs as well.
>>>>>>> >> >>>>> >>>>> >>>>>>>            >>
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> It's probably a bit more work
>>>>>>> because there is no section
>>>>>>> >> >>>>> >>>>> >>>>>>>           dedicated to
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> state/timer yet in the
>>>>>>> documentation. Tracked here:
>>>>>>> >> >>>>> >>>>> >>>>>>>            >>
>>>>>>> https://jira.apache.org/jira/browse/BEAM-2472
>>>>>>> >> >>>>> >>>>> >>>>>>>            >>
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> > I've been going over this
>>>>>>> topic a bit. I'll add the
>>>>>>> >> >>>>> >>>>> >>>>>>>           snippets next week, if that's fine
>>>>>>> by y'all.
>>>>>>> >> >>>>> >>>>> >>>>>>>            >>
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> That would be great. The blog
>>>>>>> posts are a great way to get
>>>>>>> >> >>>>> >>>>> >>>>>>>           started with
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> state/timers.
>>>>>>> >> >>>>> >>>>> >>>>>>>            >>
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> Thanks,
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> Max
>>>>>>> >> >>>>> >>>>> >>>>>>>            >>
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> On 11.04.19 20:21, Pablo
>>>>>>> Estrada wrote:
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> > I've been going over this
>>>>>>> topic a bit. I'll add the
>>>>>>> >> >>>>> >>>>> >>>>>>>           snippets next week,
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> > if that's fine by y'all.
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> > Best
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> > -P.
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> > On Thu, Apr 11, 2019 at 5:27
>>>>>>> AM Robert Bradshaw
>>>>>>> >> >>>>> >>>>> >>>>>>>           <[email protected] <mailto:
>>>>>>> [email protected]>
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> > <mailto:[email protected]
>>>>>>> <mailto:[email protected]>>>
>>>>>>> >> >>>>> >>>>> >>>>>>>           wrote:
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >     That's a great idea! It
>>>>>>> would probably be pretty easy
>>>>>>> >> >>>>> >>>>> >>>>>>>           to add the
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >     corresponding code
>>>>>>> snippets to the docs as well.
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >     On Thu, Apr 11, 2019 at
>>>>>>> 2:00 PM Maximilian Michels
>>>>>>> >> >>>>> >>>>> >>>>>>>           <[email protected] <mailto:
>>>>>>> [email protected]>
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >     <mailto:[email protected]
>>>>>>> <mailto:[email protected]>>> wrote:
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      >
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      > Hi everyone,
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      >
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      > The Python SDK still
>>>>>>> lacks documentation on state
>>>>>>> >> >>>>> >>>>> >>>>>>>           and timers.
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      >
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      > As a first step, what
>>>>>>> do you think about updating
>>>>>>> >> >>>>> >>>>> >>>>>>>           these two blog
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >     posts
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      > with the corresponding
>>>>>>> Python code?
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      >
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      >
>>>>>>> >> >>>>> >>>>> >>>>>>>
>>>>>>> https://beam.apache.org/blog/2017/02/13/stateful-processing.html
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      >
>>>>>>> >> >>>>> >>>>> >>>>>>>
>>>>>>> https://beam.apache.org/blog/2017/08/28/timely-processing.html
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      >
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      > Thanks,
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >      > Max
>>>>>>> >> >>>>> >>>>> >>>>>>>            >> >
>>>>>>> >> >>>>> >>>>> >>>>>>>
>>>>>>> >> >>>>> >>
>>>>>>> >> >>>>> >>
>>>>>>> >> >>>>> >>
>>>>>>> >> >>>>> >> --
>>>>>>> >> >>>>> >>
>>>>>>> >> >>>>> >> This email may be confidential and privileged. If you
>>>>>>> received this communication by mistake, please don't forward it to 
>>>>>>> anyone
>>>>>>> else, please erase all copies and attachments, and please let me know 
>>>>>>> that
>>>>>>> it has gone to the wrong person.
>>>>>>> >> >>>>> >>
>>>>>>> >> >>>>> >> The above terms reflect a potential business
>>>>>>> arrangement, are provided solely as a basis for further discussion, and 
>>>>>>> are
>>>>>>> not intended to be and do not constitute a legally binding obligation. 
>>>>>>> No
>>>>>>> legally binding obligations will be created, implied, or inferred until 
>>>>>>> an
>>>>>>> agreement in final form is executed in writing by all parties involved.
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> > --
>>>>>>> >> >
>>>>>>> >> > This email may be confidential and privileged. If you received
>>>>>>> this communication by mistake, please don't forward it to anyone else,
>>>>>>> please erase all copies and attachments, and please let me know that it 
>>>>>>> has
>>>>>>> gone to the wrong person.
>>>>>>> >> >
>>>>>>> >> > The above terms reflect a potential business arrangement, are
>>>>>>> provided solely as a basis for further discussion, and are not intended 
>>>>>>> to
>>>>>>> be and do not constitute a legally binding obligation. No legally 
>>>>>>> binding
>>>>>>> obligations will be created, implied, or inferred until an agreement in
>>>>>>> final form is executed in writing by all parties involved.
>>>>>>>
>>>>>>

Reply via email to