I guess it depends on how we define LatestCombineFn. My assumption was LatestCombineFn meant "throw away everything else when a new element is written" i.e. latest means latest in processing time. Then a CombiningValueState(LatestCombineFn) would be the same as ValueState I think - and you could do a read-modify-write and get similar non-racy behavior.
But when I look for existing "Latest" CombineFns I just find `Latest.combineFn()` [1] which seems to be based on event timestamps. So maybe I'm operating on a different definition from everyone else. I agree that that would be a very confusing replacement for ValueState. [1] https://beam.apache.org/releases/javadoc/2.12.0/org/apache/beam/sdk/transforms/Latest.html On Wed, May 1, 2019 at 8:51 AM Reuven Lax <[email protected]> wrote: > ValueState is not necessarily racy if you're doing a read-modify-write. > It's only racy if you're doing something like writing last element seen. > > While it's true that many read-modify-write patterns can be expressed via > a combiner, having to write a full combiner to do something simple will be > a step back in usability. In addition, as I mentioned above people often > use ValueState to store metadata about other state; e.g. adding elements to > BagState, and storing information about that BagState in a ValueState > (minimum element in the bag, histogram of elements in the bag, last element > processed in the bag, etc.). This is not racy at all. > > Reuven > > On Wed, May 1, 2019 at 8:46 AM Brian Hulette <[email protected]> wrote: > >> > LatestCombineFn sounds to me like the worst possible world. It will >> almost always be racy and confusing. >> But isn't that Robert's point? ValueState is already racy and confusing, >> the LatestCombineFn just makes it explicit. >> >> On Wed, May 1, 2019 at 8:39 AM Reuven Lax <[email protected]> wrote: >> >>> I also think that ValueState is very useful, for all the reasons >>> mentioned in this thread. Also keep in mind that even for cases where >>> CombiningState can be used, that will be much more cumbersome unless a >>> preexisting combiner is already written. Writing .a custom combiner is a >>> lot of boilerplate that I think users would like to avoid, while doing a >>> simple read-modify-write on a ValueState is pretty explicit and simple. >>> >>> LatestCombineFn sounds to me like the worst possible world. It will >>> almost always be racy and confusing. >>> >>> Reuven >>> >>> On Wed, May 1, 2019 at 8:34 AM Lukasz Cwik <[email protected]> wrote: >>> >>>> Note, the example didn't support merging windows so I also ignored it. >>>> In the case of merging windows, your solution would depend on whether you >>>> needed to know from what window the enriched event was from. >>>> >>>> On Wed, May 1, 2019 at 8:30 AM Lukasz Cwik <[email protected]> wrote: >>>> >>>>> Isn't a value state just a bag state with at most one element and the >>>>> usage pattern would be? >>>>> 1) value_state.get == bag_state.read.next() (both have to handle the >>>>> case when neither have been set) >>>>> 2) user logic on what to do with current state + additional >>>>> information to produce new state >>>>> 3) value_state.set == bag_state.clear + bag_state.append? (note that >>>>> Runners should optimize clear + append to become a single >>>>> transaction/write) >>>>> >>>>> For example, the blog post with the counter example would be: >>>>> @StateId("buffer") >>>>> private final StateSpec<BagState<Event>> bufferedEvents = >>>>> StateSpecs.bag(); >>>>> >>>>> @StateId("count") >>>>> private final StateSpec<BagState<Integer>> countState = >>>>> StateSpecs.bag(); >>>>> >>>>> @ProcessElement >>>>> public void process( >>>>> ProcessContext context, >>>>> @StateId("buffer") BagState<Event> bufferState, >>>>> @StateId("count") BagState<Integer> countState) { >>>>> >>>>> int count = Iterables.getFirst(countState.read(), 0); >>>>> count = count + 1; >>>>> countState.clear(); >>>>> countState.append(count); >>>>> bufferState.add(context.element()); >>>>> >>>>> if (count > MAX_BUFFER_SIZE) { >>>>> for (EnrichedEvent enrichedEvent : >>>>> enrichEvents(bufferState.read())) { >>>>> context.output(enrichedEvent); >>>>> } >>>>> bufferState.clear(); >>>>> countState.clear(); >>>>> } >>>>> } >>>>> >>>>> On Tue, Apr 30, 2019 at 5:39 PM Kenneth Knowles <[email protected]> >>>>> wrote: >>>>> >>>>>> Anything where the state evolves serially but arbitrarily - the toy >>>>>> example is the integer counter in my blog post - needs ValueState. You >>>>>> can't do it with AnyCombineFn. And I think LatestCombineFn is dangerous, >>>>>> especially when it comes to CombingState. ValueState is more explicit, >>>>>> and >>>>>> I still maintain that it is status quo, modulo unimplemented features in >>>>>> the Python SDK. The reads and writes are explicitly serial per (key, >>>>>> window), unlike for a CombiningState. Because of a CombineFn's >>>>>> requirement >>>>>> to be associative and commutative, I would interpret it that the runner >>>>>> is >>>>>> allowed to reorder inputs even after they are "written" by the user's >>>>>> DoFn >>>>>> - for example doing blind writes to an unordered bag, and only later >>>>>> reading the elements out and combining them in arbitrary order. Or any >>>>>> other strategy mixing addInput and mergeAccumulators. It would be a >>>>>> violation of the meaning of CombineFn to overspecify CombiningState >>>>>> further >>>>>> than this. So you cannot actually implement a consistent >>>>>> CombiningState(LatestCombineFn). >>>>>> >>>>>> Kenn >>>>>> >>>>>> On Tue, Apr 30, 2019 at 5:19 PM Robert Bradshaw <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> On Wed, May 1, 2019 at 1:55 AM Brian Hulette <[email protected]> >>>>>>> wrote: >>>>>>> > >>>>>>> > Reza - you're definitely not derailing, that's exactly what I was >>>>>>> looking for! >>>>>>> > >>>>>>> > I've actually recently encountered an additional use case where >>>>>>> I'd like to use ValueState in the Python SDK. I'm experimenting with an >>>>>>> ArrowBatchingDoFn that uses state and timers to batch up python >>>>>>> dictionaries into arrow record batches (actually my entire purpose for >>>>>>> jumping down this python state rabbit hole). >>>>>>> > >>>>>>> > At first blush it seems like the best way to do this would be to >>>>>>> just replicate the batching approach in the timely processing post [1], >>>>>>> but >>>>>>> when the bag is full combine the elements into an arrow record batch, >>>>>>> rather than enriching all of the elements and writing them out >>>>>>> separately. >>>>>>> However, if possible I'd like to pre-allocate buffers for each column >>>>>>> and >>>>>>> populate them as elements arrive (at least for columns with a fixed size >>>>>>> type), so a bag state wouldn't be ideal. >>>>>>> >>>>>>> It seems it'd be preferable to do the conversion from a bag of >>>>>>> elements to a single arrow frame all at once, when emitting, rather >>>>>>> than repeatedly reading and writing the partial batch to and from >>>>>>> state with every element that comes in. (Bag state has blind append.) >>>>>>> >>>>>>> > Also, a CombiningValueState is not ideal because I'd need to >>>>>>> implement a merge_accumulators function that combines several >>>>>>> in-progress >>>>>>> batches. I could certainly implement that, but I'd prefer that it never >>>>>>> be >>>>>>> called unless absolutely necessary, which doesn't seem to be the case >>>>>>> for >>>>>>> CombiningValueState. (As an aside, maybe there's some room there for a >>>>>>> middle ground between ValueState and CombiningValueState >>>>>>> >>>>>>> This does actually feel natural (to me), because you're repeatedly >>>>>>> adding elements to build something up. merge_accumulators would >>>>>>> probably be pretty easy (concatenation) but unless your windows are >>>>>>> merging could just throw a not implemented error to really guard >>>>>>> against it being used. >>>>>>> >>>>>>> > I suppose you could argue that this is a pretty low-level >>>>>>> optimization we should be able to shield our users from, but right now I >>>>>>> just wish I had ValueState in python so I didn't have to hack it up >>>>>>> with a >>>>>>> BagState :) >>>>>>> > >>>>>>> > Anyway, in light of this and all the other use-cases mentioned >>>>>>> here, I think the resolution is to just implement ValueState in python, >>>>>>> and >>>>>>> document the danger with ValueState in both Python and Java. Just to be >>>>>>> clear, the danger I'm referring to is that users might easily forget >>>>>>> that >>>>>>> data can be out of order, and use ValueState in a way that assumes it's >>>>>>> been populated with data from the most recent element in event time, >>>>>>> then >>>>>>> in practice out of order data clobbers their state. I'm happy to write >>>>>>> up a >>>>>>> PR for this - are there any objections to that? >>>>>>> >>>>>>> I still haven't seen a good case for it (though I haven't looked at >>>>>>> Reza's BiTemporalStream yet). Much harder to remove things once >>>>>>> they're in. Can we just add a Any and/or LatestCombineFn and use (and >>>>>>> point to) that instead? With the comment that if you're doing >>>>>>> read-modify-write, an add_input may be better. >>>>>>> >>>>>>> > [1] https://beam.apache.org/blog/2017/08/28/timely-processing.html >>>>>>> > >>>>>>> > On Mon, Apr 29, 2019 at 12:23 AM Robert Bradshaw < >>>>>>> [email protected]> wrote: >>>>>>> >> >>>>>>> >> On Mon, Apr 29, 2019 at 3:43 AM Reza Rokni <[email protected]> >>>>>>> wrote: >>>>>>> >> > >>>>>>> >> > @Robert Bradshaw Some examples, mostly built out from cases >>>>>>> around Timeseries data, don't want to derail this thread so at a hi >>>>>>> level : >>>>>>> >> >>>>>>> >> Thanks. Perfectly on-topic for the thread. >>>>>>> >> >>>>>>> >> > Looping timers, a timer which allows for creation of a value >>>>>>> within a window when no external input has been seen. Requires metadata >>>>>>> like "is timer set". >>>>>>> >> > >>>>>>> >> > BiTemporalStream join, where we need to match leftCol.timestamp >>>>>>> to a value == (max(rightCol.timestamp) where rightCol.timestamp <= >>>>>>> leftCol.timestamp)) , this if for a application matching trades to >>>>>>> quotes. >>>>>>> >> >>>>>>> >> I'd be interested in seeing the code here. The fact that you have >>>>>>> a >>>>>>> >> max here makes me wonder if combining would be applicable. >>>>>>> >> >>>>>>> >> (FWIW, I've long thought it would be useful to do this kind of >>>>>>> thing >>>>>>> >> with Windows. Basically, it'd be like session windows with one >>>>>>> side >>>>>>> >> being the window from the timestamp forward into the future, and >>>>>>> the >>>>>>> >> other side being from the timestamp back a certain amount in the >>>>>>> past. >>>>>>> >> This seems a common join pattern.) >>>>>>> >> >>>>>>> >> > Metadata is used for >>>>>>> >> > >>>>>>> >> > Taking the Key from the KV for use within the OnTimer call. >>>>>>> >> > Knowing where we are in watermarks for GC of objects in state. >>>>>>> >> > More timer metadata (min timer ..) >>>>>>> >> > >>>>>>> >> > It could be argued that what we are using state for mostly >>>>>>> workarounds for things that could eventually end up in the API itself. >>>>>>> For >>>>>>> example >>>>>>> >> > >>>>>>> >> > There is a Jira for OnTimer Context to have Key. >>>>>>> >> > The GC needs are mostly due to not having a Map State object >>>>>>> in all runners yet. >>>>>>> >> >>>>>>> >> Yeah. GC could probably be done with a max combine. The Key (which >>>>>>> >> should be in the API) could be an AnyCombine for now (safe to >>>>>>> >> overwrite because it's always the same). >>>>>>> >> >>>>>>> >> > However I think as folks explore Beam there will always be >>>>>>> little things that require Metadata and so having access to something >>>>>>> which >>>>>>> gives us fine grain control ( as Kenneth mentioned) is useful. >>>>>>> >> >>>>>>> >> Likely. I guess in line with making easy things easy, I'd like to >>>>>>> make >>>>>>> >> dangerous things hard(er). As Kenn says, we'll probably need some >>>>>>> kind >>>>>>> >> of lower-level thing, especially if we introduce OnMerge. >>>>>>> >> >>>>>>> >> > Cheers >>>>>>> >> > >>>>>>> >> > Reza >>>>>>> >> > >>>>>>> >> > On Sat, 27 Apr 2019 at 02:59, Kenneth Knowles <[email protected]> >>>>>>> wrote: >>>>>>> >> >> >>>>>>> >> >> To be clear, the intent was always that ValueState would be >>>>>>> not usable in merging pipelines. So no danger of clobbering, but also >>>>>>> limited functionality. Is there a runner than accepts it and clobbers? >>>>>>> The >>>>>>> whole idea of the new DoFn is that it is easy to do the >>>>>>> construction-time >>>>>>> analysis and reject the invalid pipeline. It is actually runner >>>>>>> independent >>>>>>> and I think already implemented in ParDo's validation, no? >>>>>>> >> >> >>>>>>> >> >> Kenn >>>>>>> >> >> >>>>>>> >> >> On Fri, Apr 26, 2019 at 10:14 AM Lukasz Cwik <[email protected]> >>>>>>> wrote: >>>>>>> >> >>> >>>>>>> >> >>> I am in the camp where we should only support merging state >>>>>>> (either naturally via things like bags or via combiners). I believe that >>>>>>> having the wrapper that Brian suggests is useful for users. As for the >>>>>>> @OnMerge method, I believe combiners should have the ability to look at >>>>>>> the >>>>>>> window information and we should treat @OnMerge as syntactic sugar over >>>>>>> a >>>>>>> combiner if the combiner API is too cumbersome. >>>>>>> >> >>> >>>>>>> >> >>> I believe using combiners can also extend to side inputs and >>>>>>> help us deal with singleton and map like side inputs when multiple >>>>>>> firings >>>>>>> occur. I also like treating everything like a combiner because it will >>>>>>> give >>>>>>> us a lot reuse of combiner implementations across all the places they >>>>>>> could >>>>>>> be used and will be especially useful when we start exposing APIs >>>>>>> related >>>>>>> to retractions on combiners. >>>>>>> >> >>> >>>>>>> >> >>> On Fri, Apr 26, 2019 at 9:43 AM Brian Hulette < >>>>>>> [email protected]> wrote: >>>>>>> >> >>>> >>>>>>> >> >>>> Yeah the danger with out of order processing concerns me >>>>>>> more than the merging as well. As a new Beam user, I immediately >>>>>>> gravitated >>>>>>> towards ValueState since it was easy to think about and I just assumed >>>>>>> there wasn't anything to be concerned about. So it was shocking to learn >>>>>>> that there is this dangerous edge-case. >>>>>>> >> >>>> >>>>>>> >> >>>> What if ValueState were just implemented as a wrapper of >>>>>>> CombiningState with a LatestCombineFn and documented as such (and >>>>>>> perhaps >>>>>>> we encourage users to consider using a CombiningState explicitly if at >>>>>>> all >>>>>>> possible)? >>>>>>> >> >>>> >>>>>>> >> >>>> Brian >>>>>>> >> >>>> >>>>>>> >> >>>> >>>>>>> >> >>>> >>>>>>> >> >>>> On Fri, Apr 26, 2019 at 2:29 AM Robert Bradshaw < >>>>>>> [email protected]> wrote: >>>>>>> >> >>>>> >>>>>>> >> >>>>> On Fri, Apr 26, 2019 at 6:40 AM Kenneth Knowles < >>>>>>> [email protected]> wrote: >>>>>>> >> >>>>> > >>>>>>> >> >>>>> > You could use a CombiningState with a CombineFn that >>>>>>> returns the minimum for this case. >>>>>>> >> >>>>> >>>>>>> >> >>>>> We've also wanted to be able to set data when setting a >>>>>>> timer that >>>>>>> >> >>>>> would be returned when the timer fires. (It's in the FnAPI, >>>>>>> but not >>>>>>> >> >>>>> the SDKs yet.) >>>>>>> >> >>>>> >>>>>>> >> >>>>> The metadata is an interesting usecase, do you have some >>>>>>> more specific >>>>>>> >> >>>>> examples? Might boil down to not having a rich enough >>>>>>> (single) state >>>>>>> >> >>>>> type. >>>>>>> >> >>>>> >>>>>>> >> >>>>> > But I've come to feel there is a mismatch. On the one >>>>>>> hand, ParDo(<stateful DoFn>) is a way to drop to a lower level and write >>>>>>> logic that does not fit a more general computational pattern, really >>>>>>> taking >>>>>>> fine control. On the other hand, automatically merging state via >>>>>>> CombiningState or BagState is more of a no-knobs higher level of >>>>>>> programming. To me there seems to be a bit of a philosophical conflict. >>>>>>> >> >>>>> > >>>>>>> >> >>>>> > These days, I feel like an @OnMerge method would be more >>>>>>> natural. If you are using state and timers, you probably often want more >>>>>>> direct control over how state from windows gets merged. An of course we >>>>>>> don't even have a design for timers - you would need some kind of >>>>>>> timestamp >>>>>>> CombineFn but I think setting/unsetting timers manually makes more >>>>>>> sense. >>>>>>> Especially considering the trickiness around merging windows in the >>>>>>> absence >>>>>>> of retractions, you really need this callback, so you can issue >>>>>>> retractions >>>>>>> manually for any output your stateful DoFn emitted in windows that no >>>>>>> longer exist. >>>>>>> >> >>>>> >>>>>>> >> >>>>> I agree we'll probably need an @OnMerge. On the other hand, >>>>>>> I like >>>>>>> >> >>>>> being able to have good defaults. The high/low level thing >>>>>>> is a >>>>>>> >> >>>>> continuum (the indexing example falling towards the high >>>>>>> end). >>>>>>> >> >>>>> >>>>>>> >> >>>>> Actually, the merging questions bother me less than how >>>>>>> easy it is to >>>>>>> >> >>>>> accidentally clobber previous values. It looks so easy >>>>>>> (like the >>>>>>> >> >>>>> easiest state to use) but is actually the most dangerous. >>>>>>> If one wants >>>>>>> >> >>>>> this behavior, I would rather an explicit AnyCombineFn or >>>>>>> >> >>>>> LatestCombineFn which makes you think about the semantics. >>>>>>> >> >>>>> >>>>>>> >> >>>>> - Robert >>>>>>> >> >>>>> >>>>>>> >> >>>>> > On Thu, Apr 25, 2019 at 5:49 PM Reza Rokni < >>>>>>> [email protected]> wrote: >>>>>>> >> >>>>> >> >>>>>>> >> >>>>> >> +1 on the metadata use case. >>>>>>> >> >>>>> >> >>>>>>> >> >>>>> >> For performance reasons the Timer API does not support a >>>>>>> read() operation, which for the vast majority of use cases is not a >>>>>>> required feature. In the small set of use cases where it is needed, for >>>>>>> example when you need to set a Timer in EventTime based on the smallest >>>>>>> timestamp seen in the elements within a DoFn, we can make use of a >>>>>>> ValueState object to keep track of the value. >>>>>>> >> >>>>> >> >>>>>>> >> >>>>> >> On Fri, 26 Apr 2019 at 00:38, Reuven Lax < >>>>>>> [email protected]> wrote: >>>>>>> >> >>>>> >>> >>>>>>> >> >>>>> >>> I see examples of people using ValueState that I think >>>>>>> are not captured CombiningState. For example, one common one is users >>>>>>> who >>>>>>> set a timer and then record the timestamp of that timer in a >>>>>>> ValueState. In >>>>>>> general when you store state that is metadata about other state you >>>>>>> store, >>>>>>> then ValueState will usually make more sense than CombiningState. >>>>>>> >> >>>>> >>> >>>>>>> >> >>>>> >>> On Thu, Apr 25, 2019 at 9:32 AM Brian Hulette < >>>>>>> [email protected]> wrote: >>>>>>> >> >>>>> >>>> >>>>>>> >> >>>>> >>>> Currently the Python SDK does not make ValueState >>>>>>> available to users. My initial inclination was to go ahead and >>>>>>> implement it >>>>>>> there to be consistent with Java, but Robert brings up a great point >>>>>>> here >>>>>>> that ValueState has an inherent race condition for out of order data, >>>>>>> and a >>>>>>> lot of it's use cases can actually be implemented with a CombiningState >>>>>>> instead. >>>>>>> >> >>>>> >>>> >>>>>>> >> >>>>> >>>> It seems to me that at the very least we should >>>>>>> discourage the use of ValueState by noting the danger in the >>>>>>> documentation >>>>>>> and preferring CombiningState in examples, and perhaps we should go >>>>>>> further >>>>>>> and deprecate it in Java and not implement it in python. Either way I >>>>>>> think >>>>>>> we should be consistent between Java and Python. >>>>>>> >> >>>>> >>>> >>>>>>> >> >>>>> >>>> I'm curious what people think about this, are there >>>>>>> use cases that we really need to keep ValueState around for? >>>>>>> >> >>>>> >>>> >>>>>>> >> >>>>> >>>> Brian >>>>>>> >> >>>>> >>>> >>>>>>> >> >>>>> >>>> ---------- Forwarded message --------- >>>>>>> >> >>>>> >>>> From: Robert Bradshaw <[email protected]> >>>>>>> >> >>>>> >>>> Date: Thu, Apr 25, 2019, 08:31 >>>>>>> >> >>>>> >>>> Subject: Re: [docs] Python State & Timers >>>>>>> >> >>>>> >>>> To: dev <[email protected]> >>>>>>> >> >>>>> >>>> >>>>>>> >> >>>>> >>>> >>>>>>> >> >>>>> >>>> >>>>>>> >> >>>>> >>>> >>>>>>> >> >>>>> >>>> On Thu, Apr 25, 2019, 5:26 PM Maximilian Michels < >>>>>>> [email protected]> wrote: >>>>>>> >> >>>>> >>>>> >>>>>>> >> >>>>> >>>>> Completely agree that CombiningState is nicer in this >>>>>>> example. Users may >>>>>>> >> >>>>> >>>>> still want to use ValueState when there is nothing to >>>>>>> combine. >>>>>>> >> >>>>> >>>> >>>>>>> >> >>>>> >>>> >>>>>>> >> >>>>> >>>> I've always had trouble coming up with any good >>>>>>> examples of this. >>>>>>> >> >>>>> >>>> >>>>>>> >> >>>>> >>>>> Also, >>>>>>> >> >>>>> >>>>> users already know ValueState from the Java SDK. >>>>>>> >> >>>>> >>>> >>>>>>> >> >>>>> >>>> >>>>>>> >> >>>>> >>>> Maybe we should deprecate that :) >>>>>>> >> >>>>> >>>> >>>>>>> >> >>>>> >>>> >>>>>>> >> >>>>> >>>>> On 25.04.19 17:12, Robert Bradshaw wrote: >>>>>>> >> >>>>> >>>>> > On Thu, Apr 25, 2019 at 4:58 PM Maximilian Michels < >>>>>>> [email protected]> wrote: >>>>>>> >> >>>>> >>>>> >> >>>>>>> >> >>>>> >>>>> >> I forgot to give an example, just to clarify for >>>>>>> others: >>>>>>> >> >>>>> >>>>> >> >>>>>>> >> >>>>> >>>>> >>> What was the specific example that was less >>>>>>> natural? >>>>>>> >> >>>>> >>>>> >> >>>>>>> >> >>>>> >>>>> >> Basically every time we use ListState to express >>>>>>> ValueState, e.g. >>>>>>> >> >>>>> >>>>> >> >>>>>>> >> >>>>> >>>>> >> next_index, = list(state.read()) or [0] >>>>>>> >> >>>>> >>>>> >> >>>>>>> >> >>>>> >>>>> >> Taken from: >>>>>>> >> >>>>> >>>>> >> >>>>>>> https://github.com/apache/beam/pull/8363/files#diff-ba1a2aed98079ccce869cd660ca9d97dR301 >>>>>>> >> >>>>> >>>>> > >>>>>>> >> >>>>> >>>>> > Yes, ListState is much less natural here. I think >>>>>>> generally >>>>>>> >> >>>>> >>>>> > CombiningValue is often a better replacement. E.g. >>>>>>> the Java example >>>>>>> >> >>>>> >>>>> > reads >>>>>>> >> >>>>> >>>>> > >>>>>>> >> >>>>> >>>>> > >>>>>>> >> >>>>> >>>>> > public void processElement( >>>>>>> >> >>>>> >>>>> > ProcessContext context, @StateId("index") >>>>>>> ValueState<Integer> index) { >>>>>>> >> >>>>> >>>>> > int current = firstNonNull(index.read(), 0); >>>>>>> >> >>>>> >>>>> > context.output(KV.of(current, >>>>>>> context.element())); >>>>>>> >> >>>>> >>>>> > index.write(current+1); >>>>>>> >> >>>>> >>>>> > } >>>>>>> >> >>>>> >>>>> > >>>>>>> >> >>>>> >>>>> > >>>>>>> >> >>>>> >>>>> > which is replaced with bag state >>>>>>> >> >>>>> >>>>> > >>>>>>> >> >>>>> >>>>> > >>>>>>> >> >>>>> >>>>> > def process(self, element, >>>>>>> state=DoFn.StateParam(INDEX_STATE)): >>>>>>> >> >>>>> >>>>> > next_index, = list(state.read()) or [0] >>>>>>> >> >>>>> >>>>> > yield (element, next_index) >>>>>>> >> >>>>> >>>>> > state.clear() >>>>>>> >> >>>>> >>>>> > state.add(next_index + 1) >>>>>>> >> >>>>> >>>>> > >>>>>>> >> >>>>> >>>>> > >>>>>>> >> >>>>> >>>>> > whereas CombiningState would be more natural (than >>>>>>> ListState, and >>>>>>> >> >>>>> >>>>> > arguably than even ValueState), giving >>>>>>> >> >>>>> >>>>> > >>>>>>> >> >>>>> >>>>> > >>>>>>> >> >>>>> >>>>> > def process(self, element, >>>>>>> index=DoFn.StateParam(INDEX_STATE)): >>>>>>> >> >>>>> >>>>> > yield element, index.read() >>>>>>> >> >>>>> >>>>> > index.add(1) >>>>>>> >> >>>>> >>>>> > >>>>>>> >> >>>>> >>>>> > >>>>>>> >> >>>>> >>>>> > >>>>>>> >> >>>>> >>>>> > >>>>>>> >> >>>>> >>>>> >> >>>>>>> >> >>>>> >>>>> >> -Max >>>>>>> >> >>>>> >>>>> >> >>>>>>> >> >>>>> >>>>> >> On 25.04.19 16:40, Robert Bradshaw wrote: >>>>>>> >> >>>>> >>>>> >>> https://github.com/apache/beam/pull/8402 >>>>>>> >> >>>>> >>>>> >>> >>>>>>> >> >>>>> >>>>> >>> On Thu, Apr 25, 2019 at 4:26 PM Robert Bradshaw < >>>>>>> [email protected]> wrote: >>>>>>> >> >>>>> >>>>> >>>> >>>>>>> >> >>>>> >>>>> >>>> Oh, this is for the indexing example. >>>>>>> >> >>>>> >>>>> >>>> >>>>>>> >> >>>>> >>>>> >>>> I actually think using CombiningState is more >>>>>>> cleaner than ValueState. >>>>>>> >> >>>>> >>>>> >>>> >>>>>>> >> >>>>> >>>>> >>>> >>>>>>> https://github.com/apache/beam/blob/release-2.12.0/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py#L262 >>>>>>> >> >>>>> >>>>> >>>> >>>>>>> >> >>>>> >>>>> >>>> (The fact that one must specify the accumulator >>>>>>> coder is, however, >>>>>>> >> >>>>> >>>>> >>>> unfortunate. We should probably infer that if we >>>>>>> can.) >>>>>>> >> >>>>> >>>>> >>>> >>>>>>> >> >>>>> >>>>> >>>> On Thu, Apr 25, 2019 at 4:19 PM Robert Bradshaw < >>>>>>> [email protected]> wrote: >>>>>>> >> >>>>> >>>>> >>>>> >>>>>>> >> >>>>> >>>>> >>>>> The desire was to avoid the implicit disallowed >>>>>>> combination wart in >>>>>>> >> >>>>> >>>>> >>>>> Python (until we could make sense of it), and >>>>>>> also ValueState could be >>>>>>> >> >>>>> >>>>> >>>>> surprising with respect to older values >>>>>>> overwriting newer ones. What >>>>>>> >> >>>>> >>>>> >>>>> was the specific example that was less natural? >>>>>>> >> >>>>> >>>>> >>>>> >>>>>>> >> >>>>> >>>>> >>>>> On Thu, Apr 25, 2019 at 3:01 PM Maximilian >>>>>>> Michels <[email protected]> wrote: >>>>>>> >> >>>>> >>>>> >>>>>> >>>>>>> >> >>>>> >>>>> >>>>>> @Pablo: Thanks for following up with the PR! :) >>>>>>> >> >>>>> >>>>> >>>>>> >>>>>>> >> >>>>> >>>>> >>>>>> @Brian: I was wondering about this as well. It >>>>>>> makes the Python state >>>>>>> >> >>>>> >>>>> >>>>>> code a bit unnatural. I'd suggest to add a >>>>>>> ValueState wrapper around >>>>>>> >> >>>>> >>>>> >>>>>> ListState/CombiningState. >>>>>>> >> >>>>> >>>>> >>>>>> >>>>>>> >> >>>>> >>>>> >>>>>> @Robert: Like Reuven pointed out, we can >>>>>>> disallow ValueState for merging >>>>>>> >> >>>>> >>>>> >>>>>> windows with state. >>>>>>> >> >>>>> >>>>> >>>>>> >>>>>>> >> >>>>> >>>>> >>>>>> @Reza: Great. Let's make sure it has Python >>>>>>> examples out of the box. >>>>>>> >> >>>>> >>>>> >>>>>> Either Pablo or me could help there. >>>>>>> >> >>>>> >>>>> >>>>>> >>>>>>> >> >>>>> >>>>> >>>>>> Thanks, >>>>>>> >> >>>>> >>>>> >>>>>> Max >>>>>>> >> >>>>> >>>>> >>>>>> >>>>>>> >> >>>>> >>>>> >>>>>> On 25.04.19 04:14, Reza Ardeshir Rokni wrote: >>>>>>> >> >>>>> >>>>> >>>>>>> Pablo, Kenneth and I have a new blog ready >>>>>>> for publication which covers >>>>>>> >> >>>>> >>>>> >>>>>>> how to create a "looping timer" it allows for >>>>>>> default values to be >>>>>>> >> >>>>> >>>>> >>>>>>> created in a window when no incoming elements >>>>>>> exists. We just need to >>>>>>> >> >>>>> >>>>> >>>>>>> clear a few bits before publication, but >>>>>>> would be great to have that >>>>>>> >> >>>>> >>>>> >>>>>>> also include a python example, I wrote it in >>>>>>> java... >>>>>>> >> >>>>> >>>>> >>>>>>> >>>>>>> >> >>>>> >>>>> >>>>>>> Cheers >>>>>>> >> >>>>> >>>>> >>>>>>> >>>>>>> >> >>>>> >>>>> >>>>>>> Reza >>>>>>> >> >>>>> >>>>> >>>>>>> >>>>>>> >> >>>>> >>>>> >>>>>>> On Thu, 25 Apr 2019 at 04:34, Reuven Lax < >>>>>>> [email protected] >>>>>>> >> >>>>> >>>>> >>>>>>> <mailto:[email protected]>> wrote: >>>>>>> >> >>>>> >>>>> >>>>>>> >>>>>>> >> >>>>> >>>>> >>>>>>> Well state is still not implemented for >>>>>>> merging windows even for >>>>>>> >> >>>>> >>>>> >>>>>>> Java (though I believe the idea was to >>>>>>> disallow ValueState there). >>>>>>> >> >>>>> >>>>> >>>>>>> >>>>>>> >> >>>>> >>>>> >>>>>>> On Wed, Apr 24, 2019 at 1:11 PM Robert >>>>>>> Bradshaw <[email protected] >>>>>>> >> >>>>> >>>>> >>>>>>> <mailto:[email protected]>> wrote: >>>>>>> >> >>>>> >>>>> >>>>>>> >>>>>>> >> >>>>> >>>>> >>>>>>> It was unclear what the semantics >>>>>>> were for ValueState for merging >>>>>>> >> >>>>> >>>>> >>>>>>> windows. (It's also a bit weird as >>>>>>> it's inherently a race condition >>>>>>> >> >>>>> >>>>> >>>>>>> wrt element ordering, unlike Bag >>>>>>> and CombineState, though you can >>>>>>> >> >>>>> >>>>> >>>>>>> always implement it as a >>>>>>> CombineState that always returns the latest >>>>>>> >> >>>>> >>>>> >>>>>>> value which is a bit more explicit >>>>>>> about the dangers here.) >>>>>>> >> >>>>> >>>>> >>>>>>> >>>>>>> >> >>>>> >>>>> >>>>>>> On Wed, Apr 24, 2019 at 10:08 PM >>>>>>> Brian Hulette >>>>>>> >> >>>>> >>>>> >>>>>>> <[email protected] <mailto: >>>>>>> [email protected]>> wrote: >>>>>>> >> >>>>> >>>>> >>>>>>> > >>>>>>> >> >>>>> >>>>> >>>>>>> > That's a great idea! I thought >>>>>>> about this too after those >>>>>>> >> >>>>> >>>>> >>>>>>> posts came up on the list recently. >>>>>>> I started to look into it, >>>>>>> >> >>>>> >>>>> >>>>>>> but I noticed that there's actually >>>>>>> no implementation of >>>>>>> >> >>>>> >>>>> >>>>>>> ValueState in userstate. Is there a >>>>>>> reason for that? I started >>>>>>> >> >>>>> >>>>> >>>>>>> to work on a patch to add it but I >>>>>>> was just curious if there was >>>>>>> >> >>>>> >>>>> >>>>>>> some reason it was omitted that I >>>>>>> should be aware of. >>>>>>> >> >>>>> >>>>> >>>>>>> > >>>>>>> >> >>>>> >>>>> >>>>>>> > We could certainly replicate the >>>>>>> example without ValueState >>>>>>> >> >>>>> >>>>> >>>>>>> by using BagState and clearing it >>>>>>> before each write, but it >>>>>>> >> >>>>> >>>>> >>>>>>> would be nice if we could draw a >>>>>>> direct parallel. >>>>>>> >> >>>>> >>>>> >>>>>>> > >>>>>>> >> >>>>> >>>>> >>>>>>> > Brian >>>>>>> >> >>>>> >>>>> >>>>>>> > >>>>>>> >> >>>>> >>>>> >>>>>>> > On Fri, Apr 12, 2019 at 7:05 AM >>>>>>> Maximilian Michels >>>>>>> >> >>>>> >>>>> >>>>>>> <[email protected] <mailto: >>>>>>> [email protected]>> wrote: >>>>>>> >> >>>>> >>>>> >>>>>>> >> >>>>>>> >> >>>>> >>>>> >>>>>>> >> > It would probably be pretty >>>>>>> easy to add the corresponding >>>>>>> >> >>>>> >>>>> >>>>>>> code snippets to the docs as well. >>>>>>> >> >>>>> >>>>> >>>>>>> >> >>>>>>> >> >>>>> >>>>> >>>>>>> >> It's probably a bit more work >>>>>>> because there is no section >>>>>>> >> >>>>> >>>>> >>>>>>> dedicated to >>>>>>> >> >>>>> >>>>> >>>>>>> >> state/timer yet in the >>>>>>> documentation. Tracked here: >>>>>>> >> >>>>> >>>>> >>>>>>> >> >>>>>>> https://jira.apache.org/jira/browse/BEAM-2472 >>>>>>> >> >>>>> >>>>> >>>>>>> >> >>>>>>> >> >>>>> >>>>> >>>>>>> >> > I've been going over this >>>>>>> topic a bit. I'll add the >>>>>>> >> >>>>> >>>>> >>>>>>> snippets next week, if that's fine >>>>>>> by y'all. >>>>>>> >> >>>>> >>>>> >>>>>>> >> >>>>>>> >> >>>>> >>>>> >>>>>>> >> That would be great. The blog >>>>>>> posts are a great way to get >>>>>>> >> >>>>> >>>>> >>>>>>> started with >>>>>>> >> >>>>> >>>>> >>>>>>> >> state/timers. >>>>>>> >> >>>>> >>>>> >>>>>>> >> >>>>>>> >> >>>>> >>>>> >>>>>>> >> Thanks, >>>>>>> >> >>>>> >>>>> >>>>>>> >> Max >>>>>>> >> >>>>> >>>>> >>>>>>> >> >>>>>>> >> >>>>> >>>>> >>>>>>> >> On 11.04.19 20:21, Pablo >>>>>>> Estrada wrote: >>>>>>> >> >>>>> >>>>> >>>>>>> >> > I've been going over this >>>>>>> topic a bit. I'll add the >>>>>>> >> >>>>> >>>>> >>>>>>> snippets next week, >>>>>>> >> >>>>> >>>>> >>>>>>> >> > if that's fine by y'all. >>>>>>> >> >>>>> >>>>> >>>>>>> >> > Best >>>>>>> >> >>>>> >>>>> >>>>>>> >> > -P. >>>>>>> >> >>>>> >>>>> >>>>>>> >> > >>>>>>> >> >>>>> >>>>> >>>>>>> >> > On Thu, Apr 11, 2019 at 5:27 >>>>>>> AM Robert Bradshaw >>>>>>> >> >>>>> >>>>> >>>>>>> <[email protected] <mailto: >>>>>>> [email protected]> >>>>>>> >> >>>>> >>>>> >>>>>>> >> > <mailto:[email protected] >>>>>>> <mailto:[email protected]>>> >>>>>>> >> >>>>> >>>>> >>>>>>> wrote: >>>>>>> >> >>>>> >>>>> >>>>>>> >> > >>>>>>> >> >>>>> >>>>> >>>>>>> >> > That's a great idea! It >>>>>>> would probably be pretty easy >>>>>>> >> >>>>> >>>>> >>>>>>> to add the >>>>>>> >> >>>>> >>>>> >>>>>>> >> > corresponding code >>>>>>> snippets to the docs as well. >>>>>>> >> >>>>> >>>>> >>>>>>> >> > >>>>>>> >> >>>>> >>>>> >>>>>>> >> > On Thu, Apr 11, 2019 at >>>>>>> 2:00 PM Maximilian Michels >>>>>>> >> >>>>> >>>>> >>>>>>> <[email protected] <mailto: >>>>>>> [email protected]> >>>>>>> >> >>>>> >>>>> >>>>>>> >> > <mailto:[email protected] >>>>>>> <mailto:[email protected]>>> wrote: >>>>>>> >> >>>>> >>>>> >>>>>>> >> > > >>>>>>> >> >>>>> >>>>> >>>>>>> >> > > Hi everyone, >>>>>>> >> >>>>> >>>>> >>>>>>> >> > > >>>>>>> >> >>>>> >>>>> >>>>>>> >> > > The Python SDK still >>>>>>> lacks documentation on state >>>>>>> >> >>>>> >>>>> >>>>>>> and timers. >>>>>>> >> >>>>> >>>>> >>>>>>> >> > > >>>>>>> >> >>>>> >>>>> >>>>>>> >> > > As a first step, what >>>>>>> do you think about updating >>>>>>> >> >>>>> >>>>> >>>>>>> these two blog >>>>>>> >> >>>>> >>>>> >>>>>>> >> > posts >>>>>>> >> >>>>> >>>>> >>>>>>> >> > > with the corresponding >>>>>>> Python code? >>>>>>> >> >>>>> >>>>> >>>>>>> >> > > >>>>>>> >> >>>>> >>>>> >>>>>>> >> > > >>>>>>> >> >>>>> >>>>> >>>>>>> >>>>>>> https://beam.apache.org/blog/2017/02/13/stateful-processing.html >>>>>>> >> >>>>> >>>>> >>>>>>> >> > > >>>>>>> >> >>>>> >>>>> >>>>>>> >>>>>>> https://beam.apache.org/blog/2017/08/28/timely-processing.html >>>>>>> >> >>>>> >>>>> >>>>>>> >> > > >>>>>>> >> >>>>> >>>>> >>>>>>> >> > > Thanks, >>>>>>> >> >>>>> >>>>> >>>>>>> >> > > Max >>>>>>> >> >>>>> >>>>> >>>>>>> >> > >>>>>>> >> >>>>> >>>>> >>>>>>> >>>>>>> >> >>>>> >> >>>>>>> >> >>>>> >> >>>>>>> >> >>>>> >> >>>>>>> >> >>>>> >> -- >>>>>>> >> >>>>> >> >>>>>>> >> >>>>> >> This email may be confidential and privileged. If you >>>>>>> received this communication by mistake, please don't forward it to >>>>>>> anyone >>>>>>> else, please erase all copies and attachments, and please let me know >>>>>>> that >>>>>>> it has gone to the wrong person. >>>>>>> >> >>>>> >> >>>>>>> >> >>>>> >> The above terms reflect a potential business >>>>>>> arrangement, are provided solely as a basis for further discussion, and >>>>>>> are >>>>>>> not intended to be and do not constitute a legally binding obligation. >>>>>>> No >>>>>>> legally binding obligations will be created, implied, or inferred until >>>>>>> an >>>>>>> agreement in final form is executed in writing by all parties involved. >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > -- >>>>>>> >> > >>>>>>> >> > This email may be confidential and privileged. If you received >>>>>>> this communication by mistake, please don't forward it to anyone else, >>>>>>> please erase all copies and attachments, and please let me know that it >>>>>>> has >>>>>>> gone to the wrong person. >>>>>>> >> > >>>>>>> >> > The above terms reflect a potential business arrangement, are >>>>>>> provided solely as a basis for further discussion, and are not intended >>>>>>> to >>>>>>> be and do not constitute a legally binding obligation. No legally >>>>>>> binding >>>>>>> obligations will be created, implied, or inferred until an agreement in >>>>>>> final form is executed in writing by all parties involved. >>>>>>> >>>>>>
