On Tue, Apr 27, 2021 at 12:51 PM Jan Lukavský <[email protected]> wrote: > > On 4/27/21 9:26 PM, Robert Bradshaw wrote: > > > On Tue, Apr 27, 2021 at 12:05 PM Jan Lukavský <[email protected]> wrote: > >> On 4/27/21 8:51 PM, Robert Bradshaw wrote: > >>> On Tue, Apr 27, 2021 at 11:25 AM Jan Lukavský <[email protected]> wrote: > >>>>> Are you asking for a way to ignore early triggers on side input > >>>>> mapping, and only map to on-time triggered values for the window? > >>>> No, that could for sure be done before applying the View transform. I'd > >>>> like a know if it would be possible to create mode of the matching which > >>>> would be deterministic. One possibility to make it deterministic seems > >>>> to be, that main input elements would be pushed back until side input > >>>> watermark 'catches up' with main input. Whenever the side input > >>>> watermark would be delayed after the main input watermark, elements > >>>> would start to be pushed back again. Not sure if I'm explaining it using > >>>> the right words. The side input watermark can be controlled using timer > >>>> in an upstream transform, so this defines which elements in main input > >>>> would be matched onto which pane of the side input. > >>> Perhaps I'm not following the request correctly, but this is exactly > >>> how side inputs work by default. It is only when one explicitly > >>> requests a non-deterministic trigger upstream of the side input (e.g. > >>> one that may fire multiple times or ahead of the watermark) that one > >>> sees a side input with multiple variations or data in the side input > >>> before the watermark of the side input is caught up to the main input. > >> Yes, exactly. But take the example of side input in global windows (on > >> both the main input and side input). Then there has to be multiple > >> firings per window, because otherwise the side input would be available > >> at the end of time, which is not practical. The trigger doesn't have to > >> be non-deterministic, the data might come from a stateful ParDo, using a > >> timer with output timestamp, which would make the downstream watermark > >> progress quite well defined. The matching would still be > >> nondeterministic in this case. > > If everything is in the global window, things get non-deterministic > > across PCollections. For example, say the main input has element m20 > > and the side input has elements s10 and s30 (with the obvious > > timestamps). Suppose we have a total ordering of events as follows. > > > > s10 arrives > > side input watermark advances to 25 > > s30 arrives > > side input watermark advances to 100 > > m20 arrives > > > > In this case, while processing m20, one would see both s10 and s30. > > Alternatively we could have had > > > > s10 arrives > > side input watermark advances to 25 > > m20 arrives > > s30 arrives > > side input watermark advances to 100 > > > > in which case m20 would only see s10. > Yes, this is absolutely true, I didn't want to make things too > complicated, so I ignored this fact, that absolutely correct solution > would require the PCollectioView to be timestamp indexed so that if it > would receive both s10 and s30 it would correctly return s10 (as s30 > didn't exist at m20, right?). I simplified this at this moment for this > discussion, thanks for clarifying. > > > > Windowing is exactly the mechanism that gives us a cross-PCollection > > barrier with which to line things up. > Should not watermark do this?
The Watermark gives a one-sided bound but arbitrary out-of-orderedness can occur on the other side.
