Hi Maximilian,

It makes sense to set  hold_timestamp as fire_timestamp when the
fire_timestamp is in the event time domain. Otherwise, the system may
advance the watermark incorrectly.
I think we can do something similar to Java FnApiRunner[1]:

   - Expose set_output_timestamp API to python timer as well
   - If set_output_timestamp is not specified and timer is in event domain,
   we can use fire_timestamp as hold_timestamp
   - Otherwise, use input_timestamp as hold_timestamp.

What do you think?

[1]
https://github.com/apache/beam/blob/edb42952f6b0aa99477f5c7baca6d6a0d93deb4f/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java#L1433-L1493




On Tue, Aug 11, 2020 at 9:00 AM Maximilian Michels <m...@apache.org> wrote:

> We ran into problems setting event time timers per-element in the Python
> SDK. Pipeline progress would stall.
>
> Turns out, although the Python SDK does not expose the timer output
> timestamp feature to the user, it sets the timer output timestamp to the
> current input timestamp of an element.
>
> This will lead to holding back the watermark until the timer fires (the
> Flink Runner respects the timer output timestamp when advancing the
> output watermark). We had set the fire timestamp to a timestamp so far
> in the future, that pipeline progress would completely stall for
> downstream transforms, due to the held back watermark.
>
> Considering that this feature is not even exposed to the user in the
> Python SDK, I think we should set the default output timestamp to the
> fire timestamp, and not to the input timestamp. This is also how timer
> work in the Java SDK.
>
> Let me know what you think.
>
> -Max
>
> PR: https://github.com/apache/beam/pull/12531
>

Reply via email to