Thanks for your suggestions!

It makes sense to complete the work on this feature by exposing it in the Python API. We can do this as a next step. (There might be questions on how to do that exactly)

For now, I'm concerned with getting the semantics right and unblocking users from stalling pipelines.

I wasn't aware that processing timers used the input timestamp as the timer output timestamp. I've updated the PR accordingly. Please take a look: https://github.com/apache/beam/pull/12531

-Max

On 12.08.20 05:03, Luke Cwik wrote:
+1 on what Boyuan said. It is important that the defaults for processing time domain differ from the defaults for the event time domain.

On Tue, Aug 11, 2020 at 12:36 PM Yichi Zhang <zyi...@google.com <mailto:zyi...@google.com>> wrote:

    +1 to expose set_output_timestamp and enrich python set timer api.

    On Tue, Aug 11, 2020 at 12:01 PM Boyuan Zhang <boyu...@google.com
    <mailto:boyu...@google.com>> wrote:

        Hi Maximilian,

        It makes sense to set  hold_timestamp as fire_timestamp when the
        fire_timestamp is in the event time domain. Otherwise, the
        system may advance the watermark incorrectly.
        I think we can do something similar to Java FnApiRunner[1]:

          * Expose set_output_timestamp API to python timer as well
          * If set_output_timestamp is not specified and timer is in
            event domain, we can use fire_timestamp as hold_timestamp
          * Otherwise, use input_timestamp as hold_timestamp.

        What do you think?

        [1]
        
https://github.com/apache/beam/blob/edb42952f6b0aa99477f5c7baca6d6a0d93deb4f/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java#L1433-L1493




        On Tue, Aug 11, 2020 at 9:00 AM Maximilian Michels
        <m...@apache.org <mailto:m...@apache.org>> wrote:

            We ran into problems setting event time timers per-element
            in the Python
            SDK. Pipeline progress would stall.

            Turns out, although the Python SDK does not expose the timer
            output
            timestamp feature to the user, it sets the timer output
            timestamp to the
            current input timestamp of an element.

            This will lead to holding back the watermark until the timer
            fires (the
            Flink Runner respects the timer output timestamp when
            advancing the
            output watermark). We had set the fire timestamp to a
            timestamp so far
            in the future, that pipeline progress would completely stall
            for
            downstream transforms, due to the held back watermark.

            Considering that this feature is not even exposed to the
            user in the
            Python SDK, I think we should set the default output
            timestamp to the
            fire timestamp, and not to the input timestamp. This is also
            how timer
            work in the Java SDK.

            Let me know what you think.

            -Max

            PR: https://github.com/apache/beam/pull/12531

Reply via email to