On Wed, May 1, 2019 at 11:09 AM Kenneth Knowles <k...@apache.org> wrote:
> On Wed, May 1, 2019 at 8:51 AM Reuven Lax <re...@google.com> wrote: > >> ValueState is not necessarily racy if you're doing a read-modify-write. >> It's only racy if you're doing something like writing last element seen. >> > > Race conditions are not inherently a problem. They are neither necessary > nor sufficient for correctness. In this case, it is not the classic sense > of race condition anyhow, it is simply a nondeterministic result, which may > often be perfectly fine. > > >> On Wed, May 1, 2019 at 8:30 AM Lukasz Cwik <lc...@google.com> wrote: >>>>> >>>>>> Isn't a value state just a bag state with at most one element and the >>>>>> usage pattern would be? >>>>>> 1) value_state.get == bag_state.read.next() (both have to handle the >>>>>> case when neither have been set) >>>>>> 2) user logic on what to do with current state + additional >>>>>> information to produce new state >>>>>> 3) value_state.set == bag_state.clear + bag_state.append? (note that >>>>>> Runners should optimize clear + append to become a single >>>>>> transaction/write) >>>>>> >>>>> > Your unpacking is accurate, but "X is just a Y" is not accurate. In this > case you've demonstrated that value state *can be implemented using* bag > state / has a workaround. But it is not subsumed by bag state. One > important feature of ValueState is that it is statically determined that > the transform cannot be used with merging windows. Another feature is that > it is impossible to accidentally write more than one value. And a third > important feature is that it declares what it is so that the code is more > readable. > Good points. > Kenn > > > >> >>>>>> For example, the blog post with the counter example would be: >>>>>> @StateId("buffer") >>>>>> private final StateSpec<BagState<Event>> bufferedEvents = >>>>>> StateSpecs.bag(); >>>>>> >>>>>> @StateId("count") >>>>>> private final StateSpec<BagState<Integer>> countState = >>>>>> StateSpecs.bag(); >>>>>> >>>>>> @ProcessElement >>>>>> public void process( >>>>>> ProcessContext context, >>>>>> @StateId("buffer") BagState<Event> bufferState, >>>>>> @StateId("count") BagState<Integer> countState) { >>>>>> >>>>>> int count = Iterables.getFirst(countState.read(), 0); >>>>>> count = count + 1; >>>>>> countState.clear(); >>>>>> countState.append(count); >>>>>> bufferState.add(context.element()); >>>>>> >>>>>> if (count > MAX_BUFFER_SIZE) { >>>>>> for (EnrichedEvent enrichedEvent : >>>>>> enrichEvents(bufferState.read())) { >>>>>> context.output(enrichedEvent); >>>>>> } >>>>>> bufferState.clear(); >>>>>> countState.clear(); >>>>>> } >>>>>> } >>>>>> >>>>>> On Tue, Apr 30, 2019 at 5:39 PM Kenneth Knowles <k...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Anything where the state evolves serially but arbitrarily - the toy >>>>>>> example is the integer counter in my blog post - needs ValueState. You >>>>>>> can't do it with AnyCombineFn. And I think LatestCombineFn is dangerous, >>>>>>> especially when it comes to CombingState. ValueState is more explicit, >>>>>>> and >>>>>>> I still maintain that it is status quo, modulo unimplemented features in >>>>>>> the Python SDK. The reads and writes are explicitly serial per (key, >>>>>>> window), unlike for a CombiningState. Because of a CombineFn's >>>>>>> requirement >>>>>>> to be associative and commutative, I would interpret it that the runner >>>>>>> is >>>>>>> allowed to reorder inputs even after they are "written" by the user's >>>>>>> DoFn >>>>>>> - for example doing blind writes to an unordered bag, and only later >>>>>>> reading the elements out and combining them in arbitrary order. Or any >>>>>>> other strategy mixing addInput and mergeAccumulators. It would be a >>>>>>> violation of the meaning of CombineFn to overspecify CombiningState >>>>>>> further >>>>>>> than this. So you cannot actually implement a consistent >>>>>>> CombiningState(LatestCombineFn). >>>>>>> >>>>>>> Kenn >>>>>>> >>>>>>> On Tue, Apr 30, 2019 at 5:19 PM Robert Bradshaw <rober...@google.com> >>>>>>> wrote: >>>>>>> >>>>>>>> On Wed, May 1, 2019 at 1:55 AM Brian Hulette <bhule...@google.com> >>>>>>>> wrote: >>>>>>>> > >>>>>>>> > Reza - you're definitely not derailing, that's exactly what I was >>>>>>>> looking for! >>>>>>>> > >>>>>>>> > I've actually recently encountered an additional use case where >>>>>>>> I'd like to use ValueState in the Python SDK. I'm experimenting with an >>>>>>>> ArrowBatchingDoFn that uses state and timers to batch up python >>>>>>>> dictionaries into arrow record batches (actually my entire purpose for >>>>>>>> jumping down this python state rabbit hole). >>>>>>>> > >>>>>>>> > At first blush it seems like the best way to do this would be to >>>>>>>> just replicate the batching approach in the timely processing post >>>>>>>> [1], but >>>>>>>> when the bag is full combine the elements into an arrow record batch, >>>>>>>> rather than enriching all of the elements and writing them out >>>>>>>> separately. >>>>>>>> However, if possible I'd like to pre-allocate buffers for each column >>>>>>>> and >>>>>>>> populate them as elements arrive (at least for columns with a fixed >>>>>>>> size >>>>>>>> type), so a bag state wouldn't be ideal. >>>>>>>> >>>>>>>> It seems it'd be preferable to do the conversion from a bag of >>>>>>>> elements to a single arrow frame all at once, when emitting, rather >>>>>>>> than repeatedly reading and writing the partial batch to and from >>>>>>>> state with every element that comes in. (Bag state has blind >>>>>>>> append.) >>>>>>>> >>>>>>>> > Also, a CombiningValueState is not ideal because I'd need to >>>>>>>> implement a merge_accumulators function that combines several >>>>>>>> in-progress >>>>>>>> batches. I could certainly implement that, but I'd prefer that it >>>>>>>> never be >>>>>>>> called unless absolutely necessary, which doesn't seem to be the case >>>>>>>> for >>>>>>>> CombiningValueState. (As an aside, maybe there's some room there for a >>>>>>>> middle ground between ValueState and CombiningValueState >>>>>>>> >>>>>>>> This does actually feel natural (to me), because you're repeatedly >>>>>>>> adding elements to build something up. merge_accumulators would >>>>>>>> probably be pretty easy (concatenation) but unless your windows are >>>>>>>> merging could just throw a not implemented error to really guard >>>>>>>> against it being used. >>>>>>>> >>>>>>>> > I suppose you could argue that this is a pretty low-level >>>>>>>> optimization we should be able to shield our users from, but right now >>>>>>>> I >>>>>>>> just wish I had ValueState in python so I didn't have to hack it up >>>>>>>> with a >>>>>>>> BagState :) >>>>>>>> > >>>>>>>> > Anyway, in light of this and all the other use-cases mentioned >>>>>>>> here, I think the resolution is to just implement ValueState in >>>>>>>> python, and >>>>>>>> document the danger with ValueState in both Python and Java. Just to be >>>>>>>> clear, the danger I'm referring to is that users might easily forget >>>>>>>> that >>>>>>>> data can be out of order, and use ValueState in a way that assumes it's >>>>>>>> been populated with data from the most recent element in event time, >>>>>>>> then >>>>>>>> in practice out of order data clobbers their state. I'm happy to write >>>>>>>> up a >>>>>>>> PR for this - are there any objections to that? >>>>>>>> >>>>>>>> I still haven't seen a good case for it (though I haven't looked at >>>>>>>> Reza's BiTemporalStream yet). Much harder to remove things once >>>>>>>> they're in. Can we just add a Any and/or LatestCombineFn and use >>>>>>>> (and >>>>>>>> point to) that instead? With the comment that if you're doing >>>>>>>> read-modify-write, an add_input may be better. >>>>>>>> >>>>>>>> > [1] >>>>>>>> https://beam.apache.org/blog/2017/08/28/timely-processing.html >>>>>>>> > >>>>>>>> > On Mon, Apr 29, 2019 at 12:23 AM Robert Bradshaw < >>>>>>>> rober...@google.com> wrote: >>>>>>>> >> >>>>>>>> >> On Mon, Apr 29, 2019 at 3:43 AM Reza Rokni <r...@google.com> >>>>>>>> wrote: >>>>>>>> >> > >>>>>>>> >> > @Robert Bradshaw Some examples, mostly built out from cases >>>>>>>> around Timeseries data, don't want to derail this thread so at a hi >>>>>>>> level : >>>>>>>> >> >>>>>>>> >> Thanks. Perfectly on-topic for the thread. >>>>>>>> >> >>>>>>>> >> > Looping timers, a timer which allows for creation of a value >>>>>>>> within a window when no external input has been seen. Requires metadata >>>>>>>> like "is timer set". >>>>>>>> >> > >>>>>>>> >> > BiTemporalStream join, where we need to match >>>>>>>> leftCol.timestamp to a value == (max(rightCol.timestamp) where >>>>>>>> rightCol.timestamp <= leftCol.timestamp)) , this if for a application >>>>>>>> matching trades to quotes. >>>>>>>> >> >>>>>>>> >> I'd be interested in seeing the code here. The fact that you >>>>>>>> have a >>>>>>>> >> max here makes me wonder if combining would be applicable. >>>>>>>> >> >>>>>>>> >> (FWIW, I've long thought it would be useful to do this kind of >>>>>>>> thing >>>>>>>> >> with Windows. Basically, it'd be like session windows with one >>>>>>>> side >>>>>>>> >> being the window from the timestamp forward into the future, and >>>>>>>> the >>>>>>>> >> other side being from the timestamp back a certain amount in the >>>>>>>> past. >>>>>>>> >> This seems a common join pattern.) >>>>>>>> >> >>>>>>>> >> > Metadata is used for >>>>>>>> >> > >>>>>>>> >> > Taking the Key from the KV for use within the OnTimer call. >>>>>>>> >> > Knowing where we are in watermarks for GC of objects in state. >>>>>>>> >> > More timer metadata (min timer ..) >>>>>>>> >> > >>>>>>>> >> > It could be argued that what we are using state for mostly >>>>>>>> workarounds for things that could eventually end up in the API itself. >>>>>>>> For >>>>>>>> example >>>>>>>> >> > >>>>>>>> >> > There is a Jira for OnTimer Context to have Key. >>>>>>>> >> > The GC needs are mostly due to not having a Map State object >>>>>>>> in all runners yet. >>>>>>>> >> >>>>>>>> >> Yeah. GC could probably be done with a max combine. The Key >>>>>>>> (which >>>>>>>> >> should be in the API) could be an AnyCombine for now (safe to >>>>>>>> >> overwrite because it's always the same). >>>>>>>> >> >>>>>>>> >> > However I think as folks explore Beam there will always be >>>>>>>> little things that require Metadata and so having access to something >>>>>>>> which >>>>>>>> gives us fine grain control ( as Kenneth mentioned) is useful. >>>>>>>> >> >>>>>>>> >> Likely. I guess in line with making easy things easy, I'd like >>>>>>>> to make >>>>>>>> >> dangerous things hard(er). As Kenn says, we'll probably need >>>>>>>> some kind >>>>>>>> >> of lower-level thing, especially if we introduce OnMerge. >>>>>>>> >> >>>>>>>> >> > Cheers >>>>>>>> >> > >>>>>>>> >> > Reza >>>>>>>> >> > >>>>>>>> >> > On Sat, 27 Apr 2019 at 02:59, Kenneth Knowles <k...@apache.org> >>>>>>>> wrote: >>>>>>>> >> >> >>>>>>>> >> >> To be clear, the intent was always that ValueState would be >>>>>>>> not usable in merging pipelines. So no danger of clobbering, but also >>>>>>>> limited functionality. Is there a runner than accepts it and clobbers? >>>>>>>> The >>>>>>>> whole idea of the new DoFn is that it is easy to do the >>>>>>>> construction-time >>>>>>>> analysis and reject the invalid pipeline. It is actually runner >>>>>>>> independent >>>>>>>> and I think already implemented in ParDo's validation, no? >>>>>>>> >> >> >>>>>>>> >> >> Kenn >>>>>>>> >> >> >>>>>>>> >> >> On Fri, Apr 26, 2019 at 10:14 AM Lukasz Cwik < >>>>>>>> lc...@google.com> wrote: >>>>>>>> >> >>> >>>>>>>> >> >>> I am in the camp where we should only support merging state >>>>>>>> (either naturally via things like bags or via combiners). I believe >>>>>>>> that >>>>>>>> having the wrapper that Brian suggests is useful for users. As for the >>>>>>>> @OnMerge method, I believe combiners should have the ability to look >>>>>>>> at the >>>>>>>> window information and we should treat @OnMerge as syntactic sugar >>>>>>>> over a >>>>>>>> combiner if the combiner API is too cumbersome. >>>>>>>> >> >>> >>>>>>>> >> >>> I believe using combiners can also extend to side inputs and >>>>>>>> help us deal with singleton and map like side inputs when multiple >>>>>>>> firings >>>>>>>> occur. I also like treating everything like a combiner because it will >>>>>>>> give >>>>>>>> us a lot reuse of combiner implementations across all the places they >>>>>>>> could >>>>>>>> be used and will be especially useful when we start exposing APIs >>>>>>>> related >>>>>>>> to retractions on combiners. >>>>>>>> >> >>> >>>>>>>> >> >>> On Fri, Apr 26, 2019 at 9:43 AM Brian Hulette < >>>>>>>> bhule...@google.com> wrote: >>>>>>>> >> >>>> >>>>>>>> >> >>>> Yeah the danger with out of order processing concerns me >>>>>>>> more than the merging as well. As a new Beam user, I immediately >>>>>>>> gravitated >>>>>>>> towards ValueState since it was easy to think about and I just assumed >>>>>>>> there wasn't anything to be concerned about. So it was shocking to >>>>>>>> learn >>>>>>>> that there is this dangerous edge-case. >>>>>>>> >> >>>> >>>>>>>> >> >>>> What if ValueState were just implemented as a wrapper of >>>>>>>> CombiningState with a LatestCombineFn and documented as such (and >>>>>>>> perhaps >>>>>>>> we encourage users to consider using a CombiningState explicitly if at >>>>>>>> all >>>>>>>> possible)? >>>>>>>> >> >>>> >>>>>>>> >> >>>> Brian >>>>>>>> >> >>>> >>>>>>>> >> >>>> >>>>>>>> >> >>>> >>>>>>>> >> >>>> On Fri, Apr 26, 2019 at 2:29 AM Robert Bradshaw < >>>>>>>> rober...@google.com> wrote: >>>>>>>> >> >>>>> >>>>>>>> >> >>>>> On Fri, Apr 26, 2019 at 6:40 AM Kenneth Knowles < >>>>>>>> k...@apache.org> wrote: >>>>>>>> >> >>>>> > >>>>>>>> >> >>>>> > You could use a CombiningState with a CombineFn that >>>>>>>> returns the minimum for this case. >>>>>>>> >> >>>>> >>>>>>>> >> >>>>> We've also wanted to be able to set data when setting a >>>>>>>> timer that >>>>>>>> >> >>>>> would be returned when the timer fires. (It's in the >>>>>>>> FnAPI, but not >>>>>>>> >> >>>>> the SDKs yet.) >>>>>>>> >> >>>>> >>>>>>>> >> >>>>> The metadata is an interesting usecase, do you have some >>>>>>>> more specific >>>>>>>> >> >>>>> examples? Might boil down to not having a rich enough >>>>>>>> (single) state >>>>>>>> >> >>>>> type. >>>>>>>> >> >>>>> >>>>>>>> >> >>>>> > But I've come to feel there is a mismatch. On the one >>>>>>>> hand, ParDo(<stateful DoFn>) is a way to drop to a lower level and >>>>>>>> write >>>>>>>> logic that does not fit a more general computational pattern, really >>>>>>>> taking >>>>>>>> fine control. On the other hand, automatically merging state via >>>>>>>> CombiningState or BagState is more of a no-knobs higher level of >>>>>>>> programming. To me there seems to be a bit of a philosophical conflict. >>>>>>>> >> >>>>> > >>>>>>>> >> >>>>> > These days, I feel like an @OnMerge method would be more >>>>>>>> natural. If you are using state and timers, you probably often want >>>>>>>> more >>>>>>>> direct control over how state from windows gets merged. An of course we >>>>>>>> don't even have a design for timers - you would need some kind of >>>>>>>> timestamp >>>>>>>> CombineFn but I think setting/unsetting timers manually makes more >>>>>>>> sense. >>>>>>>> Especially considering the trickiness around merging windows in the >>>>>>>> absence >>>>>>>> of retractions, you really need this callback, so you can issue >>>>>>>> retractions >>>>>>>> manually for any output your stateful DoFn emitted in windows that no >>>>>>>> longer exist. >>>>>>>> >> >>>>> >>>>>>>> >> >>>>> I agree we'll probably need an @OnMerge. On the other >>>>>>>> hand, I like >>>>>>>> >> >>>>> being able to have good defaults. The high/low level thing >>>>>>>> is a >>>>>>>> >> >>>>> continuum (the indexing example falling towards the high >>>>>>>> end). >>>>>>>> >> >>>>> >>>>>>>> >> >>>>> Actually, the merging questions bother me less than how >>>>>>>> easy it is to >>>>>>>> >> >>>>> accidentally clobber previous values. It looks so easy >>>>>>>> (like the >>>>>>>> >> >>>>> easiest state to use) but is actually the most dangerous. >>>>>>>> If one wants >>>>>>>> >> >>>>> this behavior, I would rather an explicit AnyCombineFn or >>>>>>>> >> >>>>> LatestCombineFn which makes you think about the semantics. >>>>>>>> >> >>>>> >>>>>>>> >> >>>>> - Robert >>>>>>>> >> >>>>> >>>>>>>> >> >>>>> > On Thu, Apr 25, 2019 at 5:49 PM Reza Rokni < >>>>>>>> r...@google.com> wrote: >>>>>>>> >> >>>>> >> >>>>>>>> >> >>>>> >> +1 on the metadata use case. >>>>>>>> >> >>>>> >> >>>>>>>> >> >>>>> >> For performance reasons the Timer API does not support >>>>>>>> a read() operation, which for the vast majority of use cases is not a >>>>>>>> required feature. In the small set of use cases where it is needed, for >>>>>>>> example when you need to set a Timer in EventTime based on the smallest >>>>>>>> timestamp seen in the elements within a DoFn, we can make use of a >>>>>>>> ValueState object to keep track of the value. >>>>>>>> >> >>>>> >> >>>>>>>> >> >>>>> >> On Fri, 26 Apr 2019 at 00:38, Reuven Lax < >>>>>>>> re...@google.com> wrote: >>>>>>>> >> >>>>> >>> >>>>>>>> >> >>>>> >>> I see examples of people using ValueState that I think >>>>>>>> are not captured CombiningState. For example, one common one is users >>>>>>>> who >>>>>>>> set a timer and then record the timestamp of that timer in a >>>>>>>> ValueState. In >>>>>>>> general when you store state that is metadata about other state you >>>>>>>> store, >>>>>>>> then ValueState will usually make more sense than CombiningState. >>>>>>>> >> >>>>> >>> >>>>>>>> >> >>>>> >>> On Thu, Apr 25, 2019 at 9:32 AM Brian Hulette < >>>>>>>> bhule...@google.com> wrote: >>>>>>>> >> >>>>> >>>> >>>>>>>> >> >>>>> >>>> Currently the Python SDK does not make ValueState >>>>>>>> available to users. My initial inclination was to go ahead and >>>>>>>> implement it >>>>>>>> there to be consistent with Java, but Robert brings up a great point >>>>>>>> here >>>>>>>> that ValueState has an inherent race condition for out of order data, >>>>>>>> and a >>>>>>>> lot of it's use cases can actually be implemented with a CombiningState >>>>>>>> instead. >>>>>>>> >> >>>>> >>>> >>>>>>>> >> >>>>> >>>> It seems to me that at the very least we should >>>>>>>> discourage the use of ValueState by noting the danger in the >>>>>>>> documentation >>>>>>>> and preferring CombiningState in examples, and perhaps we should go >>>>>>>> further >>>>>>>> and deprecate it in Java and not implement it in python. Either way I >>>>>>>> think >>>>>>>> we should be consistent between Java and Python. >>>>>>>> >> >>>>> >>>> >>>>>>>> >> >>>>> >>>> I'm curious what people think about this, are there >>>>>>>> use cases that we really need to keep ValueState around for? >>>>>>>> >> >>>>> >>>> >>>>>>>> >> >>>>> >>>> Brian >>>>>>>> >> >>>>> >>>> >>>>>>>> >> >>>>> >>>> ---------- Forwarded message --------- >>>>>>>> >> >>>>> >>>> From: Robert Bradshaw <rober...@google.com> >>>>>>>> >> >>>>> >>>> Date: Thu, Apr 25, 2019, 08:31 >>>>>>>> >> >>>>> >>>> Subject: Re: [docs] Python State & Timers >>>>>>>> >> >>>>> >>>> To: dev <dev@beam.apache.org> >>>>>>>> >> >>>>> >>>> >>>>>>>> >> >>>>> >>>> >>>>>>>> >> >>>>> >>>> >>>>>>>> >> >>>>> >>>> >>>>>>>> >> >>>>> >>>> On Thu, Apr 25, 2019, 5:26 PM Maximilian Michels < >>>>>>>> m...@apache.org> wrote: >>>>>>>> >> >>>>> >>>>> >>>>>>>> >> >>>>> >>>>> Completely agree that CombiningState is nicer in >>>>>>>> this example. Users may >>>>>>>> >> >>>>> >>>>> still want to use ValueState when there is nothing >>>>>>>> to combine. >>>>>>>> >> >>>>> >>>> >>>>>>>> >> >>>>> >>>> >>>>>>>> >> >>>>> >>>> I've always had trouble coming up with any good >>>>>>>> examples of this. >>>>>>>> >> >>>>> >>>> >>>>>>>> >> >>>>> >>>>> Also, >>>>>>>> >> >>>>> >>>>> users already know ValueState from the Java SDK. >>>>>>>> >> >>>>> >>>> >>>>>>>> >> >>>>> >>>> >>>>>>>> >> >>>>> >>>> Maybe we should deprecate that :) >>>>>>>> >> >>>>> >>>> >>>>>>>> >> >>>>> >>>> >>>>>>>> >> >>>>> >>>>> On 25.04.19 17:12, Robert Bradshaw wrote: >>>>>>>> >> >>>>> >>>>> > On Thu, Apr 25, 2019 at 4:58 PM Maximilian Michels >>>>>>>> <m...@apache.org> wrote: >>>>>>>> >> >>>>> >>>>> >> >>>>>>>> >> >>>>> >>>>> >> I forgot to give an example, just to clarify for >>>>>>>> others: >>>>>>>> >> >>>>> >>>>> >> >>>>>>>> >> >>>>> >>>>> >>> What was the specific example that was less >>>>>>>> natural? >>>>>>>> >> >>>>> >>>>> >> >>>>>>>> >> >>>>> >>>>> >> Basically every time we use ListState to express >>>>>>>> ValueState, e.g. >>>>>>>> >> >>>>> >>>>> >> >>>>>>>> >> >>>>> >>>>> >> next_index, = list(state.read()) or [0] >>>>>>>> >> >>>>> >>>>> >> >>>>>>>> >> >>>>> >>>>> >> Taken from: >>>>>>>> >> >>>>> >>>>> >> >>>>>>>> https://github.com/apache/beam/pull/8363/files#diff-ba1a2aed98079ccce869cd660ca9d97dR301 >>>>>>>> >> >>>>> >>>>> > >>>>>>>> >> >>>>> >>>>> > Yes, ListState is much less natural here. I think >>>>>>>> generally >>>>>>>> >> >>>>> >>>>> > CombiningValue is often a better replacement. E.g. >>>>>>>> the Java example >>>>>>>> >> >>>>> >>>>> > reads >>>>>>>> >> >>>>> >>>>> > >>>>>>>> >> >>>>> >>>>> > >>>>>>>> >> >>>>> >>>>> > public void processElement( >>>>>>>> >> >>>>> >>>>> > ProcessContext context, @StateId("index") >>>>>>>> ValueState<Integer> index) { >>>>>>>> >> >>>>> >>>>> > int current = firstNonNull(index.read(), 0); >>>>>>>> >> >>>>> >>>>> > context.output(KV.of(current, >>>>>>>> context.element())); >>>>>>>> >> >>>>> >>>>> > index.write(current+1); >>>>>>>> >> >>>>> >>>>> > } >>>>>>>> >> >>>>> >>>>> > >>>>>>>> >> >>>>> >>>>> > >>>>>>>> >> >>>>> >>>>> > which is replaced with bag state >>>>>>>> >> >>>>> >>>>> > >>>>>>>> >> >>>>> >>>>> > >>>>>>>> >> >>>>> >>>>> > def process(self, element, >>>>>>>> state=DoFn.StateParam(INDEX_STATE)): >>>>>>>> >> >>>>> >>>>> > next_index, = list(state.read()) or [0] >>>>>>>> >> >>>>> >>>>> > yield (element, next_index) >>>>>>>> >> >>>>> >>>>> > state.clear() >>>>>>>> >> >>>>> >>>>> > state.add(next_index + 1) >>>>>>>> >> >>>>> >>>>> > >>>>>>>> >> >>>>> >>>>> > >>>>>>>> >> >>>>> >>>>> > whereas CombiningState would be more natural (than >>>>>>>> ListState, and >>>>>>>> >> >>>>> >>>>> > arguably than even ValueState), giving >>>>>>>> >> >>>>> >>>>> > >>>>>>>> >> >>>>> >>>>> > >>>>>>>> >> >>>>> >>>>> > def process(self, element, >>>>>>>> index=DoFn.StateParam(INDEX_STATE)): >>>>>>>> >> >>>>> >>>>> > yield element, index.read() >>>>>>>> >> >>>>> >>>>> > index.add(1) >>>>>>>> >> >>>>> >>>>> > >>>>>>>> >> >>>>> >>>>> > >>>>>>>> >> >>>>> >>>>> > >>>>>>>> >> >>>>> >>>>> > >>>>>>>> >> >>>>> >>>>> >> >>>>>>>> >> >>>>> >>>>> >> -Max >>>>>>>> >> >>>>> >>>>> >> >>>>>>>> >> >>>>> >>>>> >> On 25.04.19 16:40, Robert Bradshaw wrote: >>>>>>>> >> >>>>> >>>>> >>> https://github.com/apache/beam/pull/8402 >>>>>>>> >> >>>>> >>>>> >>> >>>>>>>> >> >>>>> >>>>> >>> On Thu, Apr 25, 2019 at 4:26 PM Robert Bradshaw < >>>>>>>> rober...@google.com> wrote: >>>>>>>> >> >>>>> >>>>> >>>> >>>>>>>> >> >>>>> >>>>> >>>> Oh, this is for the indexing example. >>>>>>>> >> >>>>> >>>>> >>>> >>>>>>>> >> >>>>> >>>>> >>>> I actually think using CombiningState is more >>>>>>>> cleaner than ValueState. >>>>>>>> >> >>>>> >>>>> >>>> >>>>>>>> >> >>>>> >>>>> >>>> >>>>>>>> https://github.com/apache/beam/blob/release-2.12.0/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py#L262 >>>>>>>> >> >>>>> >>>>> >>>> >>>>>>>> >> >>>>> >>>>> >>>> (The fact that one must specify the accumulator >>>>>>>> coder is, however, >>>>>>>> >> >>>>> >>>>> >>>> unfortunate. We should probably infer that if >>>>>>>> we can.) >>>>>>>> >> >>>>> >>>>> >>>> >>>>>>>> >> >>>>> >>>>> >>>> On Thu, Apr 25, 2019 at 4:19 PM Robert Bradshaw >>>>>>>> <rober...@google.com> wrote: >>>>>>>> >> >>>>> >>>>> >>>>> >>>>>>>> >> >>>>> >>>>> >>>>> The desire was to avoid the implicit >>>>>>>> disallowed combination wart in >>>>>>>> >> >>>>> >>>>> >>>>> Python (until we could make sense of it), and >>>>>>>> also ValueState could be >>>>>>>> >> >>>>> >>>>> >>>>> surprising with respect to older values >>>>>>>> overwriting newer ones. What >>>>>>>> >> >>>>> >>>>> >>>>> was the specific example that was less natural? >>>>>>>> >> >>>>> >>>>> >>>>> >>>>>>>> >> >>>>> >>>>> >>>>> On Thu, Apr 25, 2019 at 3:01 PM Maximilian >>>>>>>> Michels <m...@apache.org> wrote: >>>>>>>> >> >>>>> >>>>> >>>>>> >>>>>>>> >> >>>>> >>>>> >>>>>> @Pablo: Thanks for following up with the PR! >>>>>>>> :) >>>>>>>> >> >>>>> >>>>> >>>>>> >>>>>>>> >> >>>>> >>>>> >>>>>> @Brian: I was wondering about this as well. >>>>>>>> It makes the Python state >>>>>>>> >> >>>>> >>>>> >>>>>> code a bit unnatural. I'd suggest to add a >>>>>>>> ValueState wrapper around >>>>>>>> >> >>>>> >>>>> >>>>>> ListState/CombiningState. >>>>>>>> >> >>>>> >>>>> >>>>>> >>>>>>>> >> >>>>> >>>>> >>>>>> @Robert: Like Reuven pointed out, we can >>>>>>>> disallow ValueState for merging >>>>>>>> >> >>>>> >>>>> >>>>>> windows with state. >>>>>>>> >> >>>>> >>>>> >>>>>> >>>>>>>> >> >>>>> >>>>> >>>>>> @Reza: Great. Let's make sure it has Python >>>>>>>> examples out of the box. >>>>>>>> >> >>>>> >>>>> >>>>>> Either Pablo or me could help there. >>>>>>>> >> >>>>> >>>>> >>>>>> >>>>>>>> >> >>>>> >>>>> >>>>>> Thanks, >>>>>>>> >> >>>>> >>>>> >>>>>> Max >>>>>>>> >> >>>>> >>>>> >>>>>> >>>>>>>> >> >>>>> >>>>> >>>>>> On 25.04.19 04:14, Reza Ardeshir Rokni wrote: >>>>>>>> >> >>>>> >>>>> >>>>>>> Pablo, Kenneth and I have a new blog ready >>>>>>>> for publication which covers >>>>>>>> >> >>>>> >>>>> >>>>>>> how to create a "looping timer" it allows >>>>>>>> for default values to be >>>>>>>> >> >>>>> >>>>> >>>>>>> created in a window when no incoming >>>>>>>> elements exists. We just need to >>>>>>>> >> >>>>> >>>>> >>>>>>> clear a few bits before publication, but >>>>>>>> would be great to have that >>>>>>>> >> >>>>> >>>>> >>>>>>> also include a python example, I wrote it in >>>>>>>> java... >>>>>>>> >> >>>>> >>>>> >>>>>>> >>>>>>>> >> >>>>> >>>>> >>>>>>> Cheers >>>>>>>> >> >>>>> >>>>> >>>>>>> >>>>>>>> >> >>>>> >>>>> >>>>>>> Reza >>>>>>>> >> >>>>> >>>>> >>>>>>> >>>>>>>> >> >>>>> >>>>> >>>>>>> On Thu, 25 Apr 2019 at 04:34, Reuven Lax < >>>>>>>> re...@google.com >>>>>>>> >> >>>>> >>>>> >>>>>>> <mailto:re...@google.com>> wrote: >>>>>>>> >> >>>>> >>>>> >>>>>>> >>>>>>>> >> >>>>> >>>>> >>>>>>> Well state is still not implemented >>>>>>>> for merging windows even for >>>>>>>> >> >>>>> >>>>> >>>>>>> Java (though I believe the idea was to >>>>>>>> disallow ValueState there). >>>>>>>> >> >>>>> >>>>> >>>>>>> >>>>>>>> >> >>>>> >>>>> >>>>>>> On Wed, Apr 24, 2019 at 1:11 PM Robert >>>>>>>> Bradshaw <rober...@google.com >>>>>>>> >> >>>>> >>>>> >>>>>>> <mailto:rober...@google.com>> wrote: >>>>>>>> >> >>>>> >>>>> >>>>>>> >>>>>>>> >> >>>>> >>>>> >>>>>>> It was unclear what the semantics >>>>>>>> were for ValueState for merging >>>>>>>> >> >>>>> >>>>> >>>>>>> windows. (It's also a bit weird as >>>>>>>> it's inherently a race condition >>>>>>>> >> >>>>> >>>>> >>>>>>> wrt element ordering, unlike Bag >>>>>>>> and CombineState, though you can >>>>>>>> >> >>>>> >>>>> >>>>>>> always implement it as a >>>>>>>> CombineState that always returns the latest >>>>>>>> >> >>>>> >>>>> >>>>>>> value which is a bit more explicit >>>>>>>> about the dangers here.) >>>>>>>> >> >>>>> >>>>> >>>>>>> >>>>>>>> >> >>>>> >>>>> >>>>>>> On Wed, Apr 24, 2019 at 10:08 PM >>>>>>>> Brian Hulette >>>>>>>> >> >>>>> >>>>> >>>>>>> <bhule...@google.com <mailto: >>>>>>>> bhule...@google.com>> wrote: >>>>>>>> >> >>>>> >>>>> >>>>>>> > >>>>>>>> >> >>>>> >>>>> >>>>>>> > That's a great idea! I thought >>>>>>>> about this too after those >>>>>>>> >> >>>>> >>>>> >>>>>>> posts came up on the list >>>>>>>> recently. I started to look into it, >>>>>>>> >> >>>>> >>>>> >>>>>>> but I noticed that there's >>>>>>>> actually no implementation of >>>>>>>> >> >>>>> >>>>> >>>>>>> ValueState in userstate. Is there >>>>>>>> a reason for that? I started >>>>>>>> >> >>>>> >>>>> >>>>>>> to work on a patch to add it but I >>>>>>>> was just curious if there was >>>>>>>> >> >>>>> >>>>> >>>>>>> some reason it was omitted that I >>>>>>>> should be aware of. >>>>>>>> >> >>>>> >>>>> >>>>>>> > >>>>>>>> >> >>>>> >>>>> >>>>>>> > We could certainly replicate >>>>>>>> the example without ValueState >>>>>>>> >> >>>>> >>>>> >>>>>>> by using BagState and clearing it >>>>>>>> before each write, but it >>>>>>>> >> >>>>> >>>>> >>>>>>> would be nice if we could draw a >>>>>>>> direct parallel. >>>>>>>> >> >>>>> >>>>> >>>>>>> > >>>>>>>> >> >>>>> >>>>> >>>>>>> > Brian >>>>>>>> >> >>>>> >>>>> >>>>>>> > >>>>>>>> >> >>>>> >>>>> >>>>>>> > On Fri, Apr 12, 2019 at 7:05 AM >>>>>>>> Maximilian Michels >>>>>>>> >> >>>>> >>>>> >>>>>>> <m...@apache.org <mailto: >>>>>>>> m...@apache.org>> wrote: >>>>>>>> >> >>>>> >>>>> >>>>>>> >> >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > It would probably be pretty >>>>>>>> easy to add the corresponding >>>>>>>> >> >>>>> >>>>> >>>>>>> code snippets to the docs as well. >>>>>>>> >> >>>>> >>>>> >>>>>>> >> >>>>>>>> >> >>>>> >>>>> >>>>>>> >> It's probably a bit more work >>>>>>>> because there is no section >>>>>>>> >> >>>>> >>>>> >>>>>>> dedicated to >>>>>>>> >> >>>>> >>>>> >>>>>>> >> state/timer yet in the >>>>>>>> documentation. Tracked here: >>>>>>>> >> >>>>> >>>>> >>>>>>> >> >>>>>>>> https://jira.apache.org/jira/browse/BEAM-2472 >>>>>>>> >> >>>>> >>>>> >>>>>>> >> >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > I've been going over this >>>>>>>> topic a bit. I'll add the >>>>>>>> >> >>>>> >>>>> >>>>>>> snippets next week, if that's fine >>>>>>>> by y'all. >>>>>>>> >> >>>>> >>>>> >>>>>>> >> >>>>>>>> >> >>>>> >>>>> >>>>>>> >> That would be great. The blog >>>>>>>> posts are a great way to get >>>>>>>> >> >>>>> >>>>> >>>>>>> started with >>>>>>>> >> >>>>> >>>>> >>>>>>> >> state/timers. >>>>>>>> >> >>>>> >>>>> >>>>>>> >> >>>>>>>> >> >>>>> >>>>> >>>>>>> >> Thanks, >>>>>>>> >> >>>>> >>>>> >>>>>>> >> Max >>>>>>>> >> >>>>> >>>>> >>>>>>> >> >>>>>>>> >> >>>>> >>>>> >>>>>>> >> On 11.04.19 20:21, Pablo >>>>>>>> Estrada wrote: >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > I've been going over this >>>>>>>> topic a bit. I'll add the >>>>>>>> >> >>>>> >>>>> >>>>>>> snippets next week, >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > if that's fine by y'all. >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > Best >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > -P. >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > On Thu, Apr 11, 2019 at 5:27 >>>>>>>> AM Robert Bradshaw >>>>>>>> >> >>>>> >>>>> >>>>>>> <rober...@google.com <mailto: >>>>>>>> rober...@google.com> >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > <mailto:rober...@google.com >>>>>>>> <mailto:rober...@google.com>>> >>>>>>>> >> >>>>> >>>>> >>>>>>> wrote: >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > That's a great idea! It >>>>>>>> would probably be pretty easy >>>>>>>> >> >>>>> >>>>> >>>>>>> to add the >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > corresponding code >>>>>>>> snippets to the docs as well. >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > On Thu, Apr 11, 2019 at >>>>>>>> 2:00 PM Maximilian Michels >>>>>>>> >> >>>>> >>>>> >>>>>>> <m...@apache.org <mailto: >>>>>>>> m...@apache.org> >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > <mailto:m...@apache.org >>>>>>>> <mailto:m...@apache.org>>> wrote: >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > > >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > > Hi everyone, >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > > >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > > The Python SDK still >>>>>>>> lacks documentation on state >>>>>>>> >> >>>>> >>>>> >>>>>>> and timers. >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > > >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > > As a first step, what >>>>>>>> do you think about updating >>>>>>>> >> >>>>> >>>>> >>>>>>> these two blog >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > posts >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > > with the >>>>>>>> corresponding Python code? >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > > >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > > >>>>>>>> >> >>>>> >>>>> >>>>>>> >>>>>>>> https://beam.apache.org/blog/2017/02/13/stateful-processing.html >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > > >>>>>>>> >> >>>>> >>>>> >>>>>>> >>>>>>>> https://beam.apache.org/blog/2017/08/28/timely-processing.html >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > > >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > > Thanks, >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > > Max >>>>>>>> >> >>>>> >>>>> >>>>>>> >> > >>>>>>>> >> >>>>> >>>>> >>>>>>> >>>>>>>> >> >>>>> >> >>>>>>>> >> >>>>> >> >>>>>>>> >> >>>>> >> >>>>>>>> >> >>>>> >> -- >>>>>>>> >> >>>>> >> >>>>>>>> >> >>>>> >> This email may be confidential and privileged. If you >>>>>>>> received this communication by mistake, please don't forward it to >>>>>>>> anyone >>>>>>>> else, please erase all copies and attachments, and please let me know >>>>>>>> that >>>>>>>> it has gone to the wrong person. >>>>>>>> >> >>>>> >> >>>>>>>> >> >>>>> >> The above terms reflect a potential business >>>>>>>> arrangement, are provided solely as a basis for further discussion, >>>>>>>> and are >>>>>>>> not intended to be and do not constitute a legally binding obligation. >>>>>>>> No >>>>>>>> legally binding obligations will be created, implied, or inferred >>>>>>>> until an >>>>>>>> agreement in final form is executed in writing by all parties involved. >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> >> > -- >>>>>>>> >> > >>>>>>>> >> > This email may be confidential and privileged. If you received >>>>>>>> this communication by mistake, please don't forward it to anyone else, >>>>>>>> please erase all copies and attachments, and please let me know that >>>>>>>> it has >>>>>>>> gone to the wrong person. >>>>>>>> >> > >>>>>>>> >> > The above terms reflect a potential business arrangement, are >>>>>>>> provided solely as a basis for further discussion, and are not >>>>>>>> intended to >>>>>>>> be and do not constitute a legally binding obligation. No legally >>>>>>>> binding >>>>>>>> obligations will be created, implied, or inferred until an agreement in >>>>>>>> final form is executed in writing by all parties involved. >>>>>>>> >>>>>>>