Side inputs are matched based on event time, not processing time.
On Tue, Apr 27, 2021 at 12:43 AM Jan Lukavský <[email protected]> wrote:
> Hi,
>
> I have a question about matching side inputs to main input. First a
> recap, to make sure I understand the current state correctly:
>
> a) elements from main input are pushed back (stored in state) until a
> first side input pane arrives (that might be on time, or early)
>
> b) after that, elements from the main input are matched to the current
> side input view - the view is updated as new data arrives, but is
> matched to the main input elements in processing time
>
> If this is the current state, my question is, would it be possible to
> add a new mode of matching of side inputs? Something like
>
> ParDo.of(new MyDoFn()).withSideInput("name", myView,
> TimeDomain.EVENT_TIME)
>
> the semantics would be that the elements from the main PCollection would
> be stored into state as pairs with the value of the current main input
> watermark and on update of side-input watermark only main input elements
> with associated input watermark less than that of the side input would
> be matched with the side input and sent downstream. Although this
> approach is necessarily more expensive and introducing additional
> latency than processing time matching, there are situations when
> processing time matching is inapropriate for correctness reasons.
>
> WDYT?
>
> Jan
>
>