I see examples of people using ValueState that I think are not captured CombiningState. For example, one common one is users who set a timer and then record the timestamp of that timer in a ValueState. In general when you store state that is metadata about other state you store, then ValueState will usually make more sense than CombiningState.
On Thu, Apr 25, 2019 at 9:32 AM Brian Hulette <bhule...@google.com> wrote: > Currently the Python SDK does not make ValueState available to users. My > initial inclination was to go ahead and implement it there to be consistent > with Java, but Robert brings up a great point here that ValueState has an > inherent race condition for out of order data, and a lot of it's use cases > can actually be implemented with a CombiningState instead. > > It seems to me that at the very least we should discourage the use of > ValueState by noting the danger in the documentation and preferring > CombiningState in examples, and perhaps we should go further and deprecate > it in Java and not implement it in python. Either way I think we should be > consistent between Java and Python. > > I'm curious what people think about this, are there use cases that we > really need to keep ValueState around for? > > Brian > > ---------- Forwarded message --------- > From: Robert Bradshaw <rober...@google.com> > Date: Thu, Apr 25, 2019, 08:31 > Subject: Re: [docs] Python State & Timers > To: dev <dev@beam.apache.org> > > > > > On Thu, Apr 25, 2019, 5:26 PM Maximilian Michels <m...@apache.org> wrote: > >> Completely agree that CombiningState is nicer in this example. Users may >> still want to use ValueState when there is nothing to combine. > > > I've always had trouble coming up with any good examples of this. > > Also, >> users already know ValueState from the Java SDK. >> > > Maybe we should deprecate that :) > > > On 25.04.19 17:12, Robert Bradshaw wrote: >> > On Thu, Apr 25, 2019 at 4:58 PM Maximilian Michels <m...@apache.org> >> wrote: >> >> >> >> I forgot to give an example, just to clarify for others: >> >> >> >>> What was the specific example that was less natural? >> >> >> >> Basically every time we use ListState to express ValueState, e.g. >> >> >> >> next_index, = list(state.read()) or [0] >> >> >> >> Taken from: >> >> >> https://github.com/apache/beam/pull/8363/files#diff-ba1a2aed98079ccce869cd660ca9d97dR301 >> > >> > Yes, ListState is much less natural here. I think generally >> > CombiningValue is often a better replacement. E.g. the Java example >> > reads >> > >> > >> > public void processElement( >> > ProcessContext context, @StateId("index") ValueState<Integer> >> index) { >> > int current = firstNonNull(index.read(), 0); >> > context.output(KV.of(current, context.element())); >> > index.write(current+1); >> > } >> > >> > >> > which is replaced with bag state >> > >> > >> > def process(self, element, state=DoFn.StateParam(INDEX_STATE)): >> > next_index, = list(state.read()) or [0] >> > yield (element, next_index) >> > state.clear() >> > state.add(next_index + 1) >> > >> > >> > whereas CombiningState would be more natural (than ListState, and >> > arguably than even ValueState), giving >> > >> > >> > def process(self, element, index=DoFn.StateParam(INDEX_STATE)): >> > yield element, index.read() >> > index.add(1) >> > >> > >> > >> > >> >> >> >> -Max >> >> >> >> On 25.04.19 16:40, Robert Bradshaw wrote: >> >>> https://github.com/apache/beam/pull/8402 >> >>> >> >>> On Thu, Apr 25, 2019 at 4:26 PM Robert Bradshaw <rober...@google.com> >> wrote: >> >>>> >> >>>> Oh, this is for the indexing example. >> >>>> >> >>>> I actually think using CombiningState is more cleaner than >> ValueState. >> >>>> >> >>>> >> https://github.com/apache/beam/blob/release-2.12.0/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py#L262 >> >>>> >> >>>> (The fact that one must specify the accumulator coder is, however, >> >>>> unfortunate. We should probably infer that if we can.) >> >>>> >> >>>> On Thu, Apr 25, 2019 at 4:19 PM Robert Bradshaw <rober...@google.com> >> wrote: >> >>>>> >> >>>>> The desire was to avoid the implicit disallowed combination wart in >> >>>>> Python (until we could make sense of it), and also ValueState could >> be >> >>>>> surprising with respect to older values overwriting newer ones. What >> >>>>> was the specific example that was less natural? >> >>>>> >> >>>>> On Thu, Apr 25, 2019 at 3:01 PM Maximilian Michels <m...@apache.org> >> wrote: >> >>>>>> >> >>>>>> @Pablo: Thanks for following up with the PR! :) >> >>>>>> >> >>>>>> @Brian: I was wondering about this as well. It makes the Python >> state >> >>>>>> code a bit unnatural. I'd suggest to add a ValueState wrapper >> around >> >>>>>> ListState/CombiningState. >> >>>>>> >> >>>>>> @Robert: Like Reuven pointed out, we can disallow ValueState for >> merging >> >>>>>> windows with state. >> >>>>>> >> >>>>>> @Reza: Great. Let's make sure it has Python examples out of the >> box. >> >>>>>> Either Pablo or me could help there. >> >>>>>> >> >>>>>> Thanks, >> >>>>>> Max >> >>>>>> >> >>>>>> On 25.04.19 04:14, Reza Ardeshir Rokni wrote: >> >>>>>>> Pablo, Kenneth and I have a new blog ready for publication which >> covers >> >>>>>>> how to create a "looping timer" it allows for default values to be >> >>>>>>> created in a window when no incoming elements exists. We just >> need to >> >>>>>>> clear a few bits before publication, but would be great to have >> that >> >>>>>>> also include a python example, I wrote it in java... >> >>>>>>> >> >>>>>>> Cheers >> >>>>>>> >> >>>>>>> Reza >> >>>>>>> >> >>>>>>> On Thu, 25 Apr 2019 at 04:34, Reuven Lax <re...@google.com >> >>>>>>> <mailto:re...@google.com>> wrote: >> >>>>>>> >> >>>>>>> Well state is still not implemented for merging windows >> even for >> >>>>>>> Java (though I believe the idea was to disallow ValueState >> there). >> >>>>>>> >> >>>>>>> On Wed, Apr 24, 2019 at 1:11 PM Robert Bradshaw < >> rober...@google.com >> >>>>>>> <mailto:rober...@google.com>> wrote: >> >>>>>>> >> >>>>>>> It was unclear what the semantics were for ValueState >> for merging >> >>>>>>> windows. (It's also a bit weird as it's inherently a >> race condition >> >>>>>>> wrt element ordering, unlike Bag and CombineState, >> though you can >> >>>>>>> always implement it as a CombineState that always >> returns the latest >> >>>>>>> value which is a bit more explicit about the dangers >> here.) >> >>>>>>> >> >>>>>>> On Wed, Apr 24, 2019 at 10:08 PM Brian Hulette >> >>>>>>> <bhule...@google.com <mailto:bhule...@google.com>> >> wrote: >> >>>>>>> > >> >>>>>>> > That's a great idea! I thought about this too after >> those >> >>>>>>> posts came up on the list recently. I started to look >> into it, >> >>>>>>> but I noticed that there's actually no implementation of >> >>>>>>> ValueState in userstate. Is there a reason for that? I >> started >> >>>>>>> to work on a patch to add it but I was just curious if >> there was >> >>>>>>> some reason it was omitted that I should be aware of. >> >>>>>>> > >> >>>>>>> > We could certainly replicate the example without >> ValueState >> >>>>>>> by using BagState and clearing it before each write, >> but it >> >>>>>>> would be nice if we could draw a direct parallel. >> >>>>>>> > >> >>>>>>> > Brian >> >>>>>>> > >> >>>>>>> > On Fri, Apr 12, 2019 at 7:05 AM Maximilian Michels >> >>>>>>> <m...@apache.org <mailto:m...@apache.org>> wrote: >> >>>>>>> >> >> >>>>>>> >> > It would probably be pretty easy to add the >> corresponding >> >>>>>>> code snippets to the docs as well. >> >>>>>>> >> >> >>>>>>> >> It's probably a bit more work because there is no >> section >> >>>>>>> dedicated to >> >>>>>>> >> state/timer yet in the documentation. Tracked here: >> >>>>>>> >> https://jira.apache.org/jira/browse/BEAM-2472 >> >>>>>>> >> >> >>>>>>> >> > I've been going over this topic a bit. I'll add >> the >> >>>>>>> snippets next week, if that's fine by y'all. >> >>>>>>> >> >> >>>>>>> >> That would be great. The blog posts are a great way >> to get >> >>>>>>> started with >> >>>>>>> >> state/timers. >> >>>>>>> >> >> >>>>>>> >> Thanks, >> >>>>>>> >> Max >> >>>>>>> >> >> >>>>>>> >> On 11.04.19 20:21, Pablo Estrada wrote: >> >>>>>>> >> > I've been going over this topic a bit. I'll add >> the >> >>>>>>> snippets next week, >> >>>>>>> >> > if that's fine by y'all. >> >>>>>>> >> > Best >> >>>>>>> >> > -P. >> >>>>>>> >> > >> >>>>>>> >> > On Thu, Apr 11, 2019 at 5:27 AM Robert Bradshaw >> >>>>>>> <rober...@google.com <mailto:rober...@google.com> >> >>>>>>> >> > <mailto:rober...@google.com <mailto: >> rober...@google.com>>> >> >>>>>>> wrote: >> >>>>>>> >> > >> >>>>>>> >> > That's a great idea! It would probably be >> pretty easy >> >>>>>>> to add the >> >>>>>>> >> > corresponding code snippets to the docs as >> well. >> >>>>>>> >> > >> >>>>>>> >> > On Thu, Apr 11, 2019 at 2:00 PM Maximilian >> Michels >> >>>>>>> <m...@apache.org <mailto:m...@apache.org> >> >>>>>>> >> > <mailto:m...@apache.org >> >>>>>>> <mailto:m...@apache.org>>> >> wrote: >> >>>>>>> >> > > >> >>>>>>> >> > > Hi everyone, >> >>>>>>> >> > > >> >>>>>>> >> > > The Python SDK still lacks documentation >> on state >> >>>>>>> and timers. >> >>>>>>> >> > > >> >>>>>>> >> > > As a first step, what do you think about >> updating >> >>>>>>> these two blog >> >>>>>>> >> > posts >> >>>>>>> >> > > with the corresponding Python code? >> >>>>>>> >> > > >> >>>>>>> >> > > >> >>>>>>> >> https://beam.apache.org/blog/2017/02/13/stateful-processing.html >> >>>>>>> >> > > >> >>>>>>> >> https://beam.apache.org/blog/2017/08/28/timely-processing.html >> >>>>>>> >> > > >> >>>>>>> >> > > Thanks, >> >>>>>>> >> > > Max >> >>>>>>> >> > >> >>>>>>> >> >