Hi,

I have a question about matching side inputs to main input. First a recap, to make sure I understand the current state correctly:

 a) elements from main input are pushed back (stored in state) until a first side input pane arrives (that might be on time, or early)

 b) after that, elements from the main input are matched to the current side input view - the view is updated as new data arrives, but is matched to the main input elements in processing time

If this is the current state, my question is, would it be possible to add a new mode of matching of side inputs? Something like

 ParDo.of(new MyDoFn()).withSideInput("name", myView, TimeDomain.EVENT_TIME)

the semantics would be that the elements from the main PCollection would be stored into state as pairs with the value of the current main input watermark and on update of side-input watermark only main input elements with associated input watermark less than that of the side input would be matched with the side input and sent downstream. Although this approach is necessarily more expensive and introducing additional latency than processing time matching, there are situations when processing time matching is inapropriate for correctness reasons.

WDYT?

 Jan

Reply via email to