2) is simply a bug that nobody has ever gotten around to fixing. Stateful ParDo should support merging windows such as sessions.
On Tue, Oct 9, 2018 at 11:40 AM Xinyu Liu <[email protected]> wrote: > We do use stateful ParDo in the same job for a different use case (and we > did read through Kenn's blogs :) ). Here are the reasons why we prefer > using aggregation: > > 1) It's much convenient for the user to define the window and trigger and > have the Combine on top of it. It's not very clear how early firing works > in Stateful Pardo, and it does seem to require more user effort to set up > the states/timers. > > 2) It seems Stateful ParDo doesn't support non-emergent windows, e.g. > session window. This is actually one of our use case. > > 3) It seems quite general and more flexible to our users to allow updating > state after firing. I don't want to tell our further users to stay with > from Combine for this and they have to handle the state explicitly. > > Thanks, > Xinyu > > > > On Tue, Oct 9, 2018 at 11:27 AM Rui Wang <[email protected]> wrote: > >> Hi Xinyu, >> >> There are two nice articles on Beam website about stateful processing >> that you may want to check out: >> >> https://beam.apache.org/blog/2017/02/13/stateful-processing.html >> https://beam.apache.org/blog/2017/08/28/timely-processing.html >> >> -Rui >> >> On Tue, Oct 9, 2018 at 11:07 AM Reuven Lax <[email protected]> wrote: >> >>> Have you considered using Beam's state API for this? >>> >>> On Tue, Oct 9, 2018 at 11:03 AM Xinyu Liu <[email protected]> wrote: >>> >>>> Hi, guys, >>>> >>>> Current triggering allows us to either discard the state or accumulate >>>> the state after a window pane is fired. We use the extractOutput() in >>>> CombinFn to return the output value after the firing. All these have been >>>> working well for us. We do have a use case which seems not handled here: we >>>> would like to update the state after the firing. Let me illustrate this use >>>> case by an example: we have a 10-min fixed window with repeatedly early >>>> trigger of 1 min over an input stream which contains events of user id and >>>> page id. The accumulator for the window has two parts: 1) set of page ids >>>> already seen; 2) set of user ids who first views a page in this window >>>> (this is done by looking up #1). For each early firing, we want to output >>>> #2, and clear the second part of the state. But we would like to keep the >>>> #1 around for later calculations in this window. This example might be too >>>> simple to make sense, but it comes from one of our real use cases which is >>>> needed for some anti-abuse scenarios. >>>> >>>> To address this use case, is it OK to add a AccumT updateAfterFiring(AccumT >>>> accumulator) in current CombinFn? That way the user can choose to >>>> update the state partially if needed, e.g. for our use case. Any feedback >>>> is very welcome. >>>> >>>> Thanks, >>>> Xinyu >>>> >>>> >>>> >>>> >>>>
