On Wed, Feb 10, 2021 at 12:14 PM Kenneth Knowles <k...@apache.org> wrote:

> On a PR (https://github.com/apache/beam/pull/13927) we got into a
> discussion of a very old and strange feature of Beam that I think we should
> revisit.
>
> The WindowFn has the ability to shift timestamps forward in order to
> unblock downstream watermarks. Why? Specifically in this situation:
>
>  - aggregation/GBK with overlapping windows like SlidingWindows
>  - timestamp combiner of the aggregated outputs is EARLIEST of the inputs
>  - there is another downstream aggregation/GBK
>
> The output watermark of the upstream aggregation is held to the minimum of
> the inputs. When an output is emitted, we desire the output to flow through
> the rest of the pipeline without delay. However, the downstream aggregation
> can (and often will) be delayed by the window size because of *watermark
> holds in other later windows that are not released until those windows
> output.*
>
Could you describe this a bit more? Why would later windows hold up the
watermark for upstream steps. (Is it due to some subtlety? Such as tracking
the watermark for each stage, rather than for each step?)


>
> To avoid this problem, element x in window w will have its timestamp
> shifted to not overlap with any earlier windows. It is a weird behavior. It
> fixes the watermark hold problem but introduces a strange output with a
> mysterious timestamp that is hard to justify.
>
> Any other ideas?
>
> Kenn
>

Reply via email to