I am in the camp where we should only support merging state (either
naturally via things like bags or via combiners). I believe that having the
wrapper that Brian suggests is useful for users. As for the @OnMerge
method, I believe combiners should have the ability to look at the window
information and we should treat @OnMerge as syntactic sugar over a combiner
if the combiner API is too cumbersome.

I believe using combiners can also extend to side inputs and help us deal
with singleton and map like side inputs when multiple firings occur. I also
like treating everything like a combiner because it will give us a lot
reuse of combiner implementations across all the places they could be used
and will be especially useful when we start exposing APIs related to
retractions on combiners.

On Fri, Apr 26, 2019 at 9:43 AM Brian Hulette <bhule...@google.com> wrote:

> Yeah the danger with out of order processing concerns me more than the
> merging as well. As a new Beam user, I immediately gravitated towards
> ValueState since it was easy to think about and I just assumed there wasn't
> anything to be concerned about. So it was shocking to learn that there is
> this dangerous edge-case.
>
> What if ValueState were just implemented as a wrapper of CombiningState
> with a LatestCombineFn and documented as such (and perhaps we encourage
> users to consider using a CombiningState explicitly if at all possible)?
>
> Brian
>
>
>
> On Fri, Apr 26, 2019 at 2:29 AM Robert Bradshaw <rober...@google.com>
> wrote:
>
>> On Fri, Apr 26, 2019 at 6:40 AM Kenneth Knowles <k...@apache.org> wrote:
>> >
>> > You could use a CombiningState with a CombineFn that returns the
>> minimum for this case.
>>
>> We've also wanted to be able to set data when setting a timer that
>> would be returned when the timer fires. (It's in the FnAPI, but not
>> the SDKs yet.)
>>
>> The metadata is an interesting usecase, do you have some more specific
>> examples? Might boil down to not having a rich enough (single) state
>> type.
>>
>> > But I've come to feel there is a mismatch. On the one hand,
>> ParDo(<stateful DoFn>) is a way to drop to a lower level and write logic
>> that does not fit a more general computational pattern, really taking fine
>> control. On the other hand, automatically merging state via CombiningState
>> or BagState is more of a no-knobs higher level of programming. To me there
>> seems to be a bit of a philosophical conflict.
>> >
>> > These days, I feel like an @OnMerge method would be more natural. If
>> you are using state and timers, you probably often want more direct control
>> over how state from windows gets merged. An of course we don't even have a
>> design for timers - you would need some kind of timestamp CombineFn but I
>> think setting/unsetting timers manually makes more sense. Especially
>> considering the trickiness around merging windows in the absence of
>> retractions, you really need this callback, so you can issue retractions
>> manually for any output your stateful DoFn emitted in windows that no
>> longer exist.
>>
>> I agree we'll probably need an @OnMerge. On the other hand, I like
>> being able to have good defaults. The high/low level thing is a
>> continuum (the indexing example falling towards the high end).
>>
>> Actually, the merging questions bother me less than how easy it is to
>> accidentally clobber previous values. It looks so easy (like the
>> easiest state to use) but is actually the most dangerous. If one wants
>> this behavior, I would rather an explicit AnyCombineFn or
>> LatestCombineFn which makes you think about the semantics.
>>
>> - Robert
>>
>> > On Thu, Apr 25, 2019 at 5:49 PM Reza Rokni <r...@google.com> wrote:
>> >>
>> >> +1 on the metadata use case.
>> >>
>> >> For performance reasons the Timer API does not support a read()
>> operation, which for the  vast majority of use cases is not a required
>> feature. In the small set of use cases where it is needed, for example when
>> you need to set a Timer in EventTime based on the smallest timestamp seen
>> in the elements within a DoFn, we can make use of a ValueState object to
>> keep track of the value.
>> >>
>> >> On Fri, 26 Apr 2019 at 00:38, Reuven Lax <re...@google.com> wrote:
>> >>>
>> >>> I see examples of people using ValueState that I think are not
>> captured CombiningState. For example, one common one is users who set a
>> timer and then record the timestamp of that timer in a ValueState. In
>> general when you store state that is metadata about other state you store,
>> then ValueState will usually make more sense than CombiningState.
>> >>>
>> >>> On Thu, Apr 25, 2019 at 9:32 AM Brian Hulette <bhule...@google.com>
>> wrote:
>> >>>>
>> >>>> Currently the Python SDK does not make ValueState available to
>> users. My initial inclination was to go ahead and implement it there to be
>> consistent with Java, but Robert brings up a great point here that
>> ValueState has an inherent race condition for out of order data, and a lot
>> of it's use cases can actually be implemented with a CombiningState instead.
>> >>>>
>> >>>> It seems to me that at the very least we should discourage the use
>> of ValueState by noting the danger in the documentation and preferring
>> CombiningState in examples, and perhaps we should go further and deprecate
>> it in Java and not implement it in python. Either way I think we should be
>> consistent between Java and Python.
>> >>>>
>> >>>> I'm curious what people think about this, are there use cases that
>> we really need to keep ValueState around for?
>> >>>>
>> >>>> Brian
>> >>>>
>> >>>> ---------- Forwarded message ---------
>> >>>> From: Robert Bradshaw <rober...@google.com>
>> >>>> Date: Thu, Apr 25, 2019, 08:31
>> >>>> Subject: Re: [docs] Python State & Timers
>> >>>> To: dev <dev@beam.apache.org>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Thu, Apr 25, 2019, 5:26 PM Maximilian Michels <m...@apache.org>
>> wrote:
>> >>>>>
>> >>>>> Completely agree that CombiningState is nicer in this example.
>> Users may
>> >>>>> still want to use ValueState when there is nothing to combine.
>> >>>>
>> >>>>
>> >>>> I've always had trouble coming up with any good examples of this.
>> >>>>
>> >>>>> Also,
>> >>>>> users already know ValueState from the Java SDK.
>> >>>>
>> >>>>
>> >>>> Maybe we should deprecate that :)
>> >>>>
>> >>>>
>> >>>>> On 25.04.19 17:12, Robert Bradshaw wrote:
>> >>>>> > On Thu, Apr 25, 2019 at 4:58 PM Maximilian Michels <
>> m...@apache.org> wrote:
>> >>>>> >>
>> >>>>> >> I forgot to give an example, just to clarify for others:
>> >>>>> >>
>> >>>>> >>> What was the specific example that was less natural?
>> >>>>> >>
>> >>>>> >> Basically every time we use ListState to express ValueState, e.g.
>> >>>>> >>
>> >>>>> >>     next_index, = list(state.read()) or [0]
>> >>>>> >>
>> >>>>> >> Taken from:
>> >>>>> >>
>> https://github.com/apache/beam/pull/8363/files#diff-ba1a2aed98079ccce869cd660ca9d97dR301
>> >>>>> >
>> >>>>> > Yes, ListState is much less natural here. I think generally
>> >>>>> > CombiningValue is often a better replacement. E.g. the Java
>> example
>> >>>>> > reads
>> >>>>> >
>> >>>>> >
>> >>>>> > public void processElement(
>> >>>>> >        ProcessContext context, @StateId("index")
>> ValueState<Integer> index) {
>> >>>>> >      int current = firstNonNull(index.read(), 0);
>> >>>>> >      context.output(KV.of(current, context.element()));
>> >>>>> >      index.write(current+1);
>> >>>>> > }
>> >>>>> >
>> >>>>> >
>> >>>>> > which is replaced with bag state
>> >>>>> >
>> >>>>> >
>> >>>>> > def process(self, element, state=DoFn.StateParam(INDEX_STATE)):
>> >>>>> >      next_index, = list(state.read()) or [0]
>> >>>>> >      yield (element, next_index)
>> >>>>> >      state.clear()
>> >>>>> >      state.add(next_index + 1)
>> >>>>> >
>> >>>>> >
>> >>>>> > whereas CombiningState would be more natural (than ListState, and
>> >>>>> > arguably than even ValueState), giving
>> >>>>> >
>> >>>>> >
>> >>>>> > def process(self, element, index=DoFn.StateParam(INDEX_STATE)):
>> >>>>> >      yield element, index.read()
>> >>>>> >      index.add(1)
>> >>>>> >
>> >>>>> >
>> >>>>> >
>> >>>>> >
>> >>>>> >>
>> >>>>> >> -Max
>> >>>>> >>
>> >>>>> >> On 25.04.19 16:40, Robert Bradshaw wrote:
>> >>>>> >>> https://github.com/apache/beam/pull/8402
>> >>>>> >>>
>> >>>>> >>> On Thu, Apr 25, 2019 at 4:26 PM Robert Bradshaw <
>> rober...@google.com> wrote:
>> >>>>> >>>>
>> >>>>> >>>> Oh, this is for the indexing example.
>> >>>>> >>>>
>> >>>>> >>>> I actually think using CombiningState is more cleaner than
>> ValueState.
>> >>>>> >>>>
>> >>>>> >>>>
>> https://github.com/apache/beam/blob/release-2.12.0/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py#L262
>> >>>>> >>>>
>> >>>>> >>>> (The fact that one must specify the accumulator coder is,
>> however,
>> >>>>> >>>> unfortunate. We should probably infer that if we can.)
>> >>>>> >>>>
>> >>>>> >>>> On Thu, Apr 25, 2019 at 4:19 PM Robert Bradshaw <
>> rober...@google.com> wrote:
>> >>>>> >>>>>
>> >>>>> >>>>> The desire was to avoid the implicit disallowed combination
>> wart in
>> >>>>> >>>>> Python (until we could make sense of it), and also ValueState
>> could be
>> >>>>> >>>>> surprising with respect to older values overwriting newer
>> ones. What
>> >>>>> >>>>> was the specific example that was less natural?
>> >>>>> >>>>>
>> >>>>> >>>>> On Thu, Apr 25, 2019 at 3:01 PM Maximilian Michels <
>> m...@apache.org> wrote:
>> >>>>> >>>>>>
>> >>>>> >>>>>> @Pablo: Thanks for following up with the PR! :)
>> >>>>> >>>>>>
>> >>>>> >>>>>> @Brian: I was wondering about this as well. It makes the
>> Python state
>> >>>>> >>>>>> code a bit unnatural. I'd suggest to add a ValueState
>> wrapper around
>> >>>>> >>>>>> ListState/CombiningState.
>> >>>>> >>>>>>
>> >>>>> >>>>>> @Robert: Like Reuven pointed out, we can disallow ValueState
>> for merging
>> >>>>> >>>>>> windows with state.
>> >>>>> >>>>>>
>> >>>>> >>>>>> @Reza: Great. Let's make sure it has Python examples out of
>> the box.
>> >>>>> >>>>>> Either Pablo or me could help there.
>> >>>>> >>>>>>
>> >>>>> >>>>>> Thanks,
>> >>>>> >>>>>> Max
>> >>>>> >>>>>>
>> >>>>> >>>>>> On 25.04.19 04:14, Reza Ardeshir Rokni wrote:
>> >>>>> >>>>>>> Pablo, Kenneth and I have a new blog ready for publication
>> which covers
>> >>>>> >>>>>>> how to create a "looping timer" it allows for default
>> values to be
>> >>>>> >>>>>>> created in a window when no incoming elements exists. We
>> just need to
>> >>>>> >>>>>>> clear a few bits before publication, but would be great to
>> have that
>> >>>>> >>>>>>> also include a python example, I wrote it in java...
>> >>>>> >>>>>>>
>> >>>>> >>>>>>> Cheers
>> >>>>> >>>>>>>
>> >>>>> >>>>>>> Reza
>> >>>>> >>>>>>>
>> >>>>> >>>>>>> On Thu, 25 Apr 2019 at 04:34, Reuven Lax <re...@google.com
>> >>>>> >>>>>>> <mailto:re...@google.com>> wrote:
>> >>>>> >>>>>>>
>> >>>>> >>>>>>>       Well state is still not implemented for merging
>> windows even for
>> >>>>> >>>>>>>       Java (though I believe the idea was to disallow
>> ValueState there).
>> >>>>> >>>>>>>
>> >>>>> >>>>>>>       On Wed, Apr 24, 2019 at 1:11 PM Robert Bradshaw <
>> rober...@google.com
>> >>>>> >>>>>>>       <mailto:rober...@google.com>> wrote:
>> >>>>> >>>>>>>
>> >>>>> >>>>>>>           It was unclear what the semantics were for
>> ValueState for merging
>> >>>>> >>>>>>>           windows. (It's also a bit weird as it's
>> inherently a race condition
>> >>>>> >>>>>>>           wrt element ordering, unlike Bag and
>> CombineState, though you can
>> >>>>> >>>>>>>           always implement it as a CombineState that always
>> returns the latest
>> >>>>> >>>>>>>           value which is a bit more explicit about the
>> dangers here.)
>> >>>>> >>>>>>>
>> >>>>> >>>>>>>           On Wed, Apr 24, 2019 at 10:08 PM Brian Hulette
>> >>>>> >>>>>>>           <bhule...@google.com <mailto:bhule...@google.com>>
>> wrote:
>> >>>>> >>>>>>>            >
>> >>>>> >>>>>>>            > That's a great idea! I thought about this too
>> after those
>> >>>>> >>>>>>>           posts came up on the list recently. I started to
>> look into it,
>> >>>>> >>>>>>>           but I noticed that there's actually no
>> implementation of
>> >>>>> >>>>>>>           ValueState in userstate. Is there a reason for
>> that? I started
>> >>>>> >>>>>>>           to work on a patch to add it but I was just
>> curious if there was
>> >>>>> >>>>>>>           some reason it was omitted that I should be aware
>> of.
>> >>>>> >>>>>>>            >
>> >>>>> >>>>>>>            > We could certainly replicate the example
>> without ValueState
>> >>>>> >>>>>>>           by using BagState and clearing it before each
>> write, but it
>> >>>>> >>>>>>>           would be nice if we could draw a direct parallel.
>> >>>>> >>>>>>>            >
>> >>>>> >>>>>>>            > Brian
>> >>>>> >>>>>>>            >
>> >>>>> >>>>>>>            > On Fri, Apr 12, 2019 at 7:05 AM Maximilian
>> Michels
>> >>>>> >>>>>>>           <m...@apache.org <mailto:m...@apache.org>> wrote:
>> >>>>> >>>>>>>            >>
>> >>>>> >>>>>>>            >> > It would probably be pretty easy to add the
>> corresponding
>> >>>>> >>>>>>>           code snippets to the docs as well.
>> >>>>> >>>>>>>            >>
>> >>>>> >>>>>>>            >> It's probably a bit more work because there
>> is no section
>> >>>>> >>>>>>>           dedicated to
>> >>>>> >>>>>>>            >> state/timer yet in the documentation. Tracked
>> here:
>> >>>>> >>>>>>>            >> https://jira.apache.org/jira/browse/BEAM-2472
>> >>>>> >>>>>>>            >>
>> >>>>> >>>>>>>            >> > I've been going over this topic a bit. I'll
>> add the
>> >>>>> >>>>>>>           snippets next week, if that's fine by y'all.
>> >>>>> >>>>>>>            >>
>> >>>>> >>>>>>>            >> That would be great. The blog posts are a
>> great way to get
>> >>>>> >>>>>>>           started with
>> >>>>> >>>>>>>            >> state/timers.
>> >>>>> >>>>>>>            >>
>> >>>>> >>>>>>>            >> Thanks,
>> >>>>> >>>>>>>            >> Max
>> >>>>> >>>>>>>            >>
>> >>>>> >>>>>>>            >> On 11.04.19 20:21, Pablo Estrada wrote:
>> >>>>> >>>>>>>            >> > I've been going over this topic a bit. I'll
>> add the
>> >>>>> >>>>>>>           snippets next week,
>> >>>>> >>>>>>>            >> > if that's fine by y'all.
>> >>>>> >>>>>>>            >> > Best
>> >>>>> >>>>>>>            >> > -P.
>> >>>>> >>>>>>>            >> >
>> >>>>> >>>>>>>            >> > On Thu, Apr 11, 2019 at 5:27 AM Robert
>> Bradshaw
>> >>>>> >>>>>>>           <rober...@google.com <mailto:rober...@google.com>
>> >>>>> >>>>>>>            >> > <mailto:rober...@google.com <mailto:
>> rober...@google.com>>>
>> >>>>> >>>>>>>           wrote:
>> >>>>> >>>>>>>            >> >
>> >>>>> >>>>>>>            >> >     That's a great idea! It would probably
>> be pretty easy
>> >>>>> >>>>>>>           to add the
>> >>>>> >>>>>>>            >> >     corresponding code snippets to the docs
>> as well.
>> >>>>> >>>>>>>            >> >
>> >>>>> >>>>>>>            >> >     On Thu, Apr 11, 2019 at 2:00 PM
>> Maximilian Michels
>> >>>>> >>>>>>>           <m...@apache.org <mailto:m...@apache.org>
>> >>>>> >>>>>>>            >> >     <mailto:m...@apache.org <mailto:
>> m...@apache.org>>> wrote:
>> >>>>> >>>>>>>            >> >      >
>> >>>>> >>>>>>>            >> >      > Hi everyone,
>> >>>>> >>>>>>>            >> >      >
>> >>>>> >>>>>>>            >> >      > The Python SDK still lacks
>> documentation on state
>> >>>>> >>>>>>>           and timers.
>> >>>>> >>>>>>>            >> >      >
>> >>>>> >>>>>>>            >> >      > As a first step, what do you think
>> about updating
>> >>>>> >>>>>>>           these two blog
>> >>>>> >>>>>>>            >> >     posts
>> >>>>> >>>>>>>            >> >      > with the corresponding Python code?
>> >>>>> >>>>>>>            >> >      >
>> >>>>> >>>>>>>            >> >      >
>> >>>>> >>>>>>>
>> https://beam.apache.org/blog/2017/02/13/stateful-processing.html
>> >>>>> >>>>>>>            >> >      >
>> >>>>> >>>>>>>
>> https://beam.apache.org/blog/2017/08/28/timely-processing.html
>> >>>>> >>>>>>>            >> >      >
>> >>>>> >>>>>>>            >> >      > Thanks,
>> >>>>> >>>>>>>            >> >      > Max
>> >>>>> >>>>>>>            >> >
>> >>>>> >>>>>>>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> This email may be confidential and privileged. If you received this
>> communication by mistake, please don't forward it to anyone else, please
>> erase all copies and attachments, and please let me know that it has gone
>> to the wrong person.
>> >>
>> >> The above terms reflect a potential business arrangement, are provided
>> solely as a basis for further discussion, and are not intended to be and do
>> not constitute a legally binding obligation. No legally binding obligations
>> will be created, implied, or inferred until an agreement in final form is
>> executed in writing by all parties involved.
>>
>

Reply via email to