Hey!

A couple of weeks ago me and Arvid Heise played around with an idea to
address a long standing issue of Flink: lack of watermark/event time
alignment between different parallel instances of sources, that can lead to
ever growing state size for downstream operators like WindowOperator.

We had an impression that this is relatively low hanging fruit that can be
quite easily implemented - at least partially (the first part mentioned in
the FLIP document). I have written down our proposal [1] and you can also
check out our PoC that we have implemented [2].

We think that this is a quite easy proposal, that has been in large part
already implemented. There is one obvious limitation of our PoC. Namely we
can only easily block individual SourceOperators. This works perfectly fine
as long as there is at most one split per SourceOperator. However it
doesn't work with multiple splits. In that case, if a single
`SourceOperator` is responsible for processing both the least and the most
advanced splits, we won't be able to block this most advanced split for
generating new records. I'm proposing to solve this problem in the future
in another follow up FLIP, as a solution that works with a single split per
operator is easier and already valuable for some of the users.

What do you think about this proposal?
Best, Piotrek

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources
[2] https://github.com/pnowojski/flink/commits/aligned-sources

Reply via email to