Re: Timer.withOutputTimestamp().offset().setRelative seems unusable with event time

Jan Lukavský Wed, 05 May 2021 09:37:25 -0700

Yes, what I meant was the distinguishing of cleanup timers from plainuser other timers - that seems to be due to the fact that they fire baseon different watermark (input/output). And firing timers based on outputwatermark might be actually a good user-facing feature, because thatmight help tracking output watermark in transforms that want to dealwith potentially droppable data downstream (the input would have to bere-windowed to global window, of course). I don't know if there areother use-cases, if not maybe it might be sufficient to create aDroppableDataSplit transform, that would create a PCollectionTuple withdroppable and other data. But that was just an idea when Kenn mentionedthat the cleanup timers "fire differently" - I generally think that whenthere is a need for a different behavior, than it might signal there issomething possibly fundamental.

Jan


On 5/5/21 5:11 PM, Reuven Lax wrote:

This is one way to put it. I think in practice Beam guarantees thattimers fire in order for a given key (though there is still a bit of abug around looping timers - the fix for that got rolled back). Thismeans that as long as the runner sets the cleanup timer to be 1mspassed the end of the window (plus allowed lateness), it'sguaranteed to be the last timer that fires for that window.

On Wed, May 5, 2021 at 2:41 AM Jan Lukavský <je...@seznam.cz<mailto:je...@seznam.cz>> wrote:


    Hm, one thing in the last paragraph seems there might be some
    logical gap.

    > For a runner author to implement a "cleanup timer" requires a
    different mechanism. A window expires when *both* the input
    element watermark *and* the timer watermark are past the expiry
    time. In other words, the cleanup timer fires according to the
    minimum of these watermarks, combined. It *cannot* fire according
    to the input element watermark. If you naively try to implement it
    as a user timer, it will be incorrect. Incidentally this is
    why @OnWindowExpiration is a meaningful feature.

    The description describes a timer that fires not according to
    input watermark, but according to the output watermark (once the
    output watermark reaches certain point in time). That logically
    implies, that such a timer cannot have non-droppable output (at
    least if its output timestamp belongs to the respective window)
    and cannot create a watermark hold (because that would block the
    progress of the output watermark and might cause the timer to not
    fire ever). This maybe might be a useful user-feature as well,
    probably again mostly related to how user-code might want to deal
    with droppable data.

     Jan

    On 5/4/21 6:41 PM, Kenneth Knowles wrote:

    Mean to also add +Reuven Lax <mailto:re...@google.com>

    On Tue, May 4, 2021 at 9:41 AM Kenneth Knowles <k...@apache.org
    <mailto:k...@apache.org>> wrote:

        Explicitly pinging a couple folks who were involved in the
        original change which yours essentially reverts. There's a
        model question here that I want to clarify on-list:

        When you have a ParDo setting timers, you have an additional
        watermark that must be considered:

         - input element watermark
         - output watermark
         - *(user) timer watermark*

        The timer watermark is an input to the ParDo. Sometimes you
        might think of the "timer channel" as a self loop, where each
        timer is an element. Each timer has a timestamp (the output
        timestamp) and separately some instructions on when to
        deliver that timer. This is the same as the usual difference
        between event time and processing time.

        The instruction on when to deliver a timer can have two forms:

         - wait a certain amount of processing time
         - deliver the timer when the *input element watermark*
        reaches a time X

        Here is an important point: "cleanup timers" are *not* user
        timers. They are an implementation detail. They are not part
        of the model. The runner's job is to reclaim resources as
        windows expire. A user should never be reasoning about how
        their timers relate to cleanup timers (except for resource
        consumption). Because there is no relationship except that
        the cleanup should happen "eventually" and invisibly.

        For a runner author to implement a "cleanup timer" requires a
        different mechanism. A window expires when *both* the input
        element watermark *and* the timer watermark are past the
        expiry time. In other words, the cleanup timer fires
        according to the minimum of these watermarks, combined. It
        *cannot* fire according to the input element watermark. If
        you naively try to implement it as a user timer, it will be
        incorrect. Incidentally this is why @OnWindowExpiration is a
        meaningful feature.

        Kenn

        On Tue, May 4, 2021 at 4:29 AM Jan Lukavský <je...@seznam.cz
        <mailto:je...@seznam.cz>> wrote:

            Hi Kenn,

            I created BEAM-12276 [1] with PR [2].

             Jan

            [1] https://issues.apache.org/jira/browse/BEAM-12276
            <https://issues.apache.org/jira/browse/BEAM-12276>

            [2] https://github.com/apache/beam/pull/14718
            <https://github.com/apache/beam/pull/14718>

            On 5/3/21 7:46 PM, Kenneth Knowles wrote:

            This seems like just a bug. If you set a timer for X and
            have output timestamp Y where X < Y this should be fine.
            Is the problem the current input watermark? Are you
            trying to set a timer with output timestamp that is
            already past? I think that should be allowed, too, as
            long as the window is not expired, but I may be missing
            something.

            Some greater detail would be useful - maybe the full
            stack trace and/or a failing unit test in a PR?

            Kenn

            On Thu, Apr 29, 2021 at 12:51 AM Jan Lukavský
            <je...@seznam.cz <mailto:je...@seznam.cz>> wrote:

                Hi,

                I have come across a bug with timer output timestamp
                - when using event
                time and relative timers, setting the timer can
                arbitrarily throw
                IllegalArgumentException if the firing timestamp
                (input watermark) is
                ahead of the output timestamp (like
                .java.lang.IllegalArgumentException:
                Attempted to set an event-time timer with an output
                timestamp of
                2021-04-29T07:16:19.369Z that is after the timer
                firing timestamp
                -290308-12-21T19:59:05.225Z). But there is no way to
                access the firing
                timestamp from user code. This means that the use
                has to either catch
                the IllegalArgumentException, or not use this
                construct at all.

                Catching the exception should probably not be part
                of a contract, so we
                should do one of the following:

                  a) either throw the exception right away and
                disable using relative
                timers with output timestamp completely, or

                  b) support it correctly

                What is the actual reason not to support output
                timestamps, that are
                ahead of firing timestamp? From my understanding,
                that should not be an
                issue, because there is TimestampCombiner.EARLIEST
                on the
                watermarkholdstate that corresponds to the output
                timestamp. If that is
                correct can we simply remove the check?

                  Jan

Re: Timer.withOutputTimestamp().offset().setRelative seems unusable with event time

Reply via email to