On Thu, Apr 25, 2019 at 4:58 PM Maximilian Michels <[email protected]> wrote:
>
> I forgot to give an example, just to clarify for others:
>
> > What was the specific example that was less natural?
>
> Basically every time we use ListState to express ValueState, e.g.
>
> next_index, = list(state.read()) or [0]
>
> Taken from:
> https://github.com/apache/beam/pull/8363/files#diff-ba1a2aed98079ccce869cd660ca9d97dR301
Yes, ListState is much less natural here. I think generally
CombiningValue is often a better replacement. E.g. the Java example
reads
public void processElement(
ProcessContext context, @StateId("index") ValueState<Integer> index) {
int current = firstNonNull(index.read(), 0);
context.output(KV.of(current, context.element()));
index.write(current+1);
}
which is replaced with bag state
def process(self, element, state=DoFn.StateParam(INDEX_STATE)):
next_index, = list(state.read()) or [0]
yield (element, next_index)
state.clear()
state.add(next_index + 1)
whereas CombiningState would be more natural (than ListState, and
arguably than even ValueState), giving
def process(self, element, index=DoFn.StateParam(INDEX_STATE)):
yield element, index.read()
index.add(1)
>
> -Max
>
> On 25.04.19 16:40, Robert Bradshaw wrote:
> > https://github.com/apache/beam/pull/8402
> >
> > On Thu, Apr 25, 2019 at 4:26 PM Robert Bradshaw <[email protected]> wrote:
> >>
> >> Oh, this is for the indexing example.
> >>
> >> I actually think using CombiningState is more cleaner than ValueState.
> >>
> >> https://github.com/apache/beam/blob/release-2.12.0/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py#L262
> >>
> >> (The fact that one must specify the accumulator coder is, however,
> >> unfortunate. We should probably infer that if we can.)
> >>
> >> On Thu, Apr 25, 2019 at 4:19 PM Robert Bradshaw <[email protected]>
> >> wrote:
> >>>
> >>> The desire was to avoid the implicit disallowed combination wart in
> >>> Python (until we could make sense of it), and also ValueState could be
> >>> surprising with respect to older values overwriting newer ones. What
> >>> was the specific example that was less natural?
> >>>
> >>> On Thu, Apr 25, 2019 at 3:01 PM Maximilian Michels <[email protected]>
> >>> wrote:
> >>>>
> >>>> @Pablo: Thanks for following up with the PR! :)
> >>>>
> >>>> @Brian: I was wondering about this as well. It makes the Python state
> >>>> code a bit unnatural. I'd suggest to add a ValueState wrapper around
> >>>> ListState/CombiningState.
> >>>>
> >>>> @Robert: Like Reuven pointed out, we can disallow ValueState for merging
> >>>> windows with state.
> >>>>
> >>>> @Reza: Great. Let's make sure it has Python examples out of the box.
> >>>> Either Pablo or me could help there.
> >>>>
> >>>> Thanks,
> >>>> Max
> >>>>
> >>>> On 25.04.19 04:14, Reza Ardeshir Rokni wrote:
> >>>>> Pablo, Kenneth and I have a new blog ready for publication which covers
> >>>>> how to create a "looping timer" it allows for default values to be
> >>>>> created in a window when no incoming elements exists. We just need to
> >>>>> clear a few bits before publication, but would be great to have that
> >>>>> also include a python example, I wrote it in java...
> >>>>>
> >>>>> Cheers
> >>>>>
> >>>>> Reza
> >>>>>
> >>>>> On Thu, 25 Apr 2019 at 04:34, Reuven Lax <[email protected]
> >>>>> <mailto:[email protected]>> wrote:
> >>>>>
> >>>>> Well state is still not implemented for merging windows even for
> >>>>> Java (though I believe the idea was to disallow ValueState there).
> >>>>>
> >>>>> On Wed, Apr 24, 2019 at 1:11 PM Robert Bradshaw
> >>>>> <[email protected]
> >>>>> <mailto:[email protected]>> wrote:
> >>>>>
> >>>>> It was unclear what the semantics were for ValueState for
> >>>>> merging
> >>>>> windows. (It's also a bit weird as it's inherently a race
> >>>>> condition
> >>>>> wrt element ordering, unlike Bag and CombineState, though you
> >>>>> can
> >>>>> always implement it as a CombineState that always returns the
> >>>>> latest
> >>>>> value which is a bit more explicit about the dangers here.)
> >>>>>
> >>>>> On Wed, Apr 24, 2019 at 10:08 PM Brian Hulette
> >>>>> <[email protected] <mailto:[email protected]>> wrote:
> >>>>> >
> >>>>> > That's a great idea! I thought about this too after those
> >>>>> posts came up on the list recently. I started to look into it,
> >>>>> but I noticed that there's actually no implementation of
> >>>>> ValueState in userstate. Is there a reason for that? I started
> >>>>> to work on a patch to add it but I was just curious if there
> >>>>> was
> >>>>> some reason it was omitted that I should be aware of.
> >>>>> >
> >>>>> > We could certainly replicate the example without ValueState
> >>>>> by using BagState and clearing it before each write, but it
> >>>>> would be nice if we could draw a direct parallel.
> >>>>> >
> >>>>> > Brian
> >>>>> >
> >>>>> > On Fri, Apr 12, 2019 at 7:05 AM Maximilian Michels
> >>>>> <[email protected] <mailto:[email protected]>> wrote:
> >>>>> >>
> >>>>> >> > It would probably be pretty easy to add the corresponding
> >>>>> code snippets to the docs as well.
> >>>>> >>
> >>>>> >> It's probably a bit more work because there is no section
> >>>>> dedicated to
> >>>>> >> state/timer yet in the documentation. Tracked here:
> >>>>> >> https://jira.apache.org/jira/browse/BEAM-2472
> >>>>> >>
> >>>>> >> > I've been going over this topic a bit. I'll add the
> >>>>> snippets next week, if that's fine by y'all.
> >>>>> >>
> >>>>> >> That would be great. The blog posts are a great way to get
> >>>>> started with
> >>>>> >> state/timers.
> >>>>> >>
> >>>>> >> Thanks,
> >>>>> >> Max
> >>>>> >>
> >>>>> >> On 11.04.19 20:21, Pablo Estrada wrote:
> >>>>> >> > I've been going over this topic a bit. I'll add the
> >>>>> snippets next week,
> >>>>> >> > if that's fine by y'all.
> >>>>> >> > Best
> >>>>> >> > -P.
> >>>>> >> >
> >>>>> >> > On Thu, Apr 11, 2019 at 5:27 AM Robert Bradshaw
> >>>>> <[email protected] <mailto:[email protected]>
> >>>>> >> > <mailto:[email protected]
> >>>>> <mailto:[email protected]>>>
> >>>>> wrote:
> >>>>> >> >
> >>>>> >> > That's a great idea! It would probably be pretty easy
> >>>>> to add the
> >>>>> >> > corresponding code snippets to the docs as well.
> >>>>> >> >
> >>>>> >> > On Thu, Apr 11, 2019 at 2:00 PM Maximilian Michels
> >>>>> <[email protected] <mailto:[email protected]>
> >>>>> >> > <mailto:[email protected] <mailto:[email protected]>>>
> >>>>> wrote:
> >>>>> >> > >
> >>>>> >> > > Hi everyone,
> >>>>> >> > >
> >>>>> >> > > The Python SDK still lacks documentation on state
> >>>>> and timers.
> >>>>> >> > >
> >>>>> >> > > As a first step, what do you think about updating
> >>>>> these two blog
> >>>>> >> > posts
> >>>>> >> > > with the corresponding Python code?
> >>>>> >> > >
> >>>>> >> > >
> >>>>>
> >>>>> https://beam.apache.org/blog/2017/02/13/stateful-processing.html
> >>>>> >> > >
> >>>>> https://beam.apache.org/blog/2017/08/28/timely-processing.html
> >>>>> >> > >
> >>>>> >> > > Thanks,
> >>>>> >> > > Max
> >>>>> >> >
> >>>>>