So, after some additional digging, it appears that Beam does not consistently check for timer expiry before calling process. The result is that it may be the case that the watermark has moved beyond your timer expiry, and if youre counting on the timer callback happening at the time you set it for, that simply may NOT have happened when you are in DoFn.process(). You can “fix” the behavior by simply checking the watermark manually in process() and doing what you would normally do for timestamp exipry before proceeding. See my latest updated code reproducing the issue and showing the fix at https://github.com/randomsamples/pardo_repro.
I would argue that users of this API will naturally expect that timer callback semantics will guarantee that when they are in process(), if the current watermark is past a timers expiry that the timer callback in question will have been called. Is there any reason why this isn’t happening? Am I misunderstanding something? From: "[email protected]" <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Monday, August 3, 2020 at 10:51 AM To: "[email protected]" <[email protected]> Subject: Re: Stateful Pardo Question Notice: This email is from an external sender. Yeah, unless I am misunderstanding something. The output from my repro code shows event timestamp and the context timestamp every time we process an event. Receiving event at: 2000-01-01T00:00:00.000Z Resetting timer to : 2000-01-01T00:15:00.000Z Receiving event at: 2000-01-01T00:05:00.000Z Resetting timer to : 2000-01-01T00:20:00.000Z <-- Shouldn’t the timer have fired before we processed the next event? Receiving event at: 2000-01-01T00:40:00.000Z Why didnt the timer fire? Resetting timer to : 2000-01-01T00:55:00.000Z Receiving event at: 2000-01-01T00:45:00.000Z Resetting timer to : 2000-01-01T01:00:00.000Z Receiving event at: 2000-01-01T00:50:00.000Z Resetting timer to : 2000-01-01T01:05:00.000Z Timer firing at: 2000-01-01T01:05:00.000Z From: Reuven Lax <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Monday, August 3, 2020 at 10:02 AM To: dev <[email protected]> Subject: Re: Stateful Pardo Question Notice: This email is from an external sender. Are you sure that there is a 15 minute gap in your data? On Mon, Aug 3, 2020 at 6:20 AM [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> wrote: I am confused about the behavior of timers on a simple stateful pardo. I have put together a little repro here: https://github.com/randomsamples/pardo_repro I basically want to build something like a session window, accumulating events until quiescence of the stream for a given key and gap time, then output results. But it appears that the timer is not firing when the watermark is passed it expiration time, so the event stream is not being split as I would have expected. Would love some help getting this work, the behavior is for a project I’m working on.
