On a PR (https://github.com/apache/beam/pull/13927) we got into a
discussion of a very old and strange feature of Beam that I think we should
revisit.

The WindowFn has the ability to shift timestamps forward in order to
unblock downstream watermarks. Why? Specifically in this situation:

 - aggregation/GBK with overlapping windows like SlidingWindows
 - timestamp combiner of the aggregated outputs is EARLIEST of the inputs
 - there is another downstream aggregation/GBK

The output watermark of the upstream aggregation is held to the minimum of
the inputs. When an output is emitted, we desire the output to flow through
the rest of the pipeline without delay. However, the downstream aggregation
can (and often will) be delayed by the window size because of watermark
holds in other later windows that are not released until those windows
output.

To avoid this problem, element x in window w will have its timestamp
shifted to not overlap with any earlier windows. It is a weird behavior. It
fixes the watermark hold problem but introduces a strange output with a
mysterious timestamp that is hard to justify.

Any other ideas?

Kenn

Reply via email to