On Tue, Apr 27, 2021 at 12:51 PM Jan Lukavský <[email protected]> wrote:
>
> On 4/27/21 9:26 PM, Robert Bradshaw wrote:
>
> > On Tue, Apr 27, 2021 at 12:05 PM Jan Lukavský <[email protected]> wrote:
> >> On 4/27/21 8:51 PM, Robert Bradshaw wrote:
> >>> On Tue, Apr 27, 2021 at 11:25 AM Jan Lukavský <[email protected]> wrote:
> >>>>> Are you asking for a way to ignore early triggers on side input 
> >>>>> mapping, and only map to on-time triggered values for the window?
> >>>> No, that could for sure be done before applying the View transform. I'd 
> >>>> like a know if it would be possible to create mode of the matching which 
> >>>> would be deterministic. One possibility to make it deterministic seems 
> >>>> to be, that main input elements would be pushed back until side input 
> >>>> watermark 'catches up' with main input. Whenever the side input 
> >>>> watermark would be delayed after the main input watermark, elements 
> >>>> would start to be pushed back again. Not sure if I'm explaining it using 
> >>>> the right words. The side input watermark can be controlled using timer 
> >>>> in an upstream transform, so this defines which elements in main input 
> >>>> would be matched onto which pane of the side input.
> >>> Perhaps I'm not following the request correctly, but this is exactly
> >>> how side inputs work by default. It is only when one explicitly
> >>> requests a non-deterministic trigger upstream of the side input (e.g.
> >>> one that may fire multiple times or ahead of the watermark) that one
> >>> sees a side input with multiple variations or data in the side input
> >>> before the watermark of the side input is caught up to the main input.
> >> Yes, exactly. But take the example of side input in global windows (on
> >> both the main input and side input). Then there has to be multiple
> >> firings per window, because otherwise the side input would be available
> >> at the end of time, which is not practical. The trigger doesn't have to
> >> be non-deterministic, the data might come from a stateful ParDo, using a
> >> timer with output timestamp, which would make the downstream watermark
> >> progress quite well defined. The matching would still be
> >> nondeterministic in this case.
> > If everything is in the global window, things get non-deterministic
> > across PCollections. For example, say the main input has element m20
> > and the side input has elements s10 and s30 (with the obvious
> > timestamps). Suppose we have a total ordering of events as follows.
> >
> >      s10 arrives
> >      side input watermark advances to 25
> >      s30 arrives
> >      side input watermark advances to 100
> >      m20 arrives
> >
> > In this case, while processing m20, one would see both s10 and s30.
> > Alternatively we could have had
> >
> >      s10 arrives
> >      side input watermark advances to 25
> >      m20 arrives
> >      s30 arrives
> >      side input watermark advances to 100
> >
> > in which case m20 would only see s10.
> Yes, this is absolutely true, I didn't want to make things too
> complicated, so I ignored this fact, that absolutely correct solution
> would require the PCollectioView to be timestamp indexed so that if it
> would receive both s10 and s30 it would correctly return s10 (as s30
> didn't exist at m20, right?). I simplified this at this moment for this
> discussion, thanks for clarifying.
> >
> > Windowing is exactly the mechanism that gives us a cross-PCollection
> > barrier with which to line things up.
> Should not watermark do this?

The Watermark gives a one-sided bound but arbitrary out-of-orderedness
can occur on the other side.

Reply via email to