Bundles would still be immutable pieces of work. E.g. in this case, T2 would never be sent to the runner.
On Wed, Jun 26, 2019 at 1:02 PM Jan Lukavský <je...@seznam.cz> wrote: > > I think that this approach breaks the assumption that bundles are > executed as immutable pieces of work. This way, runners would have to > update the runner while executing it. It is another possible option, but > seems to have issues of its own. > > On 6/26/19 12:28 PM, Robert Bradshaw wrote: > > Another option, that is nice from an API perspective but places a > > burden on SDK implementers (and possibly runners), is to maintain the > > ordering of timers by requiring timers to be fired in order, and if > > any timers are set to fire them immediately before processing later > > timers. In other words, if T1 sets T2 and modifies T3, these would > > take effect (locally, the runner may not even know about T2) before T3 > > was processed. > > > > On Wed, Jun 26, 2019 at 11:13 AM Jan Lukavský <je...@seznam.cz> wrote: > >> Hi, > >> > >> I have mentioned an issue I have come across [1] on several other > >> threads, but it probably didn't attract the attention that it would desire. > >> > >> I will try to restate the problem here for clarity: > >> > >> - on runners that use concept of bundles (the original issue mentions > >> DirectRunner, but it will probably apply for other runners, which use > >> bundles, as well), the workflow is as follows: > >> > >> a) process elements in bundle > >> > >> b) advance watermark > >> > >> c) process timers > >> > >> d) continue to next bundle > >> > >> - the issue with this is that when we are initially at time T0, set > >> two timers for T1 and T3, then advance watermark to T3 (or beyond), the > >> timers will fire (correctly) in order T1, T3, but if timer at T1 sets > >> another timer for T2, then this timer will be fired in next bundle (and > >> therefore after T3) > >> > >> - this causes issues mostly with race conditions in window GC timers > >> and user timers (and users don't have any way to solve that!) > >> > >> - note that the same applies when one timer tries to reset timer that > >> is already in the current bundle > >> > >> I have investigated a way of solving this by running timers only for > >> single timestamp (instant) at each bundle, but as Reuven pointed out, > >> that could regress performance (mostly by delaying firing of timers, > >> that could have fired). Options I see: > >> > >> 1) either set the OnTimerContext#timestamp() to current input > >> watermark (not the time that user actually set the timer), or > >> > >> 2) add OnTimerContext#getCurrentInputWatermark() and disallow setting > >> (or resetting) timers for time between OnProcessContext#timestamp and > >> OnProcessContext#getCurrentInputWatermark(), by throwing an exception > >> > >> 3) any other option? > >> > >> Option 1) seems to be broken by design, as it can result in corrupt data > >> (emitted with wrong timestamp, which is even somewhat arbitrary), I'm > >> including it just for completeness. Option 2) is breaking change, that > >> can result in PIpeline failures (although the failures will happen on > >> Pipelines, that are probably already broken). > >> > >> Although I have come with a workaround in the work where I originally > >> come across this issue, I think that this is generally serious and > >> should be dealt with. Mostly because when using user-facing APIs, there > >> are no workarounds possible, today. > >> > >> Thanks for discussion! > >> > >> Jan > >> > >> [1] https://issues.apache.org/jira/browse/BEAM-7520 > >>