Hi Reuven,
the code in FlinkRunner really does not seem to work with watermarks.
Moreover, what I was able to find, is this quote in [1]:
"The global window side input triggers on processing time, so the main
pipeline nondeterministically matches the side input to elements in
event time."
Jan
[1] https://beam.apache.org/documentation/patterns/side-inputs/
On 4/27/21 6:31 PM, Reuven Lax wrote:
Side inputs are matched based on event time, not processing time.
On Tue, Apr 27, 2021 at 12:43 AM Jan Lukavský <je...@seznam.cz
<mailto:je...@seznam.cz>> wrote:
Hi,
I have a question about matching side inputs to main input. First a
recap, to make sure I understand the current state correctly:
a) elements from main input are pushed back (stored in state)
until a
first side input pane arrives (that might be on time, or early)
b) after that, elements from the main input are matched to the
current
side input view - the view is updated as new data arrives, but is
matched to the main input elements in processing time
If this is the current state, my question is, would it be possible to
add a new mode of matching of side inputs? Something like
ParDo.of(new MyDoFn()).withSideInput("name", myView,
TimeDomain.EVENT_TIME)
the semantics would be that the elements from the main PCollection
would
be stored into state as pairs with the value of the current main
input
watermark and on update of side-input watermark only main input
elements
with associated input watermark less than that of the side input
would
be matched with the side input and sent downstream. Although this
approach is necessarily more expensive and introducing additional
latency than processing time matching, there are situations when
processing time matching is inapropriate for correctness reasons.
WDYT?
Jan