We ran into problems setting event time timers per-element in the Python SDK. Pipeline progress would stall.

Turns out, although the Python SDK does not expose the timer output timestamp feature to the user, it sets the timer output timestamp to the current input timestamp of an element.

This will lead to holding back the watermark until the timer fires (the Flink Runner respects the timer output timestamp when advancing the output watermark). We had set the fire timestamp to a timestamp so far in the future, that pipeline progress would completely stall for downstream transforms, due to the held back watermark.

Considering that this feature is not even exposed to the user in the Python SDK, I think we should set the default output timestamp to the fire timestamp, and not to the input timestamp. This is also how timer work in the Java SDK.

Let me know what you think.

-Max

PR: https://github.com/apache/beam/pull/12531

Reply via email to