This is one way to put it. I think in practice Beam guarantees that timers
fire in order for a given key (though there is still a bit of a bug around
looping timers - the fix for that got rolled back). This means that as long
as the runner sets the cleanup timer to be 1ms passed the end of the window
(plus allowed lateness), it's guaranteed to be the last timer that fires
for that window.



On Wed, May 5, 2021 at 2:41 AM Jan Lukavský <je...@seznam.cz> wrote:

> Hm, one thing in the last paragraph seems there might be some logical gap.
>
> > For a runner author to implement a "cleanup timer" requires a different
> mechanism. A window expires when *both* the input element watermark *and*
> the timer watermark are past the expiry time. In other words, the cleanup
> timer fires according to the minimum of these watermarks, combined. It
> *cannot* fire according to the input element watermark. If you naively try
> to implement it as a user timer, it will be incorrect. Incidentally this is
> why @OnWindowExpiration is a meaningful feature.
> The description describes a timer that fires not according to input
> watermark, but according to the output watermark (once the output watermark
> reaches certain point in time). That logically implies, that such a timer
> cannot have non-droppable output (at least if its output timestamp belongs
> to the respective window) and cannot create a watermark hold (because that
> would block the progress of the output watermark and might cause the timer
> to not fire ever). This maybe might be a useful user-feature as well,
> probably again mostly related to how user-code might want to deal with
> droppable data.
>
>  Jan
>
> On 5/4/21 6:41 PM, Kenneth Knowles wrote:
>
> Mean to also add +Reuven Lax <re...@google.com>
>
> On Tue, May 4, 2021 at 9:41 AM Kenneth Knowles <k...@apache.org> wrote:
>
>> Explicitly pinging a couple folks who were involved in the original
>> change which yours essentially reverts. There's a model question here that
>> I want to clarify on-list:
>>
>> When you have a ParDo setting timers, you have an additional watermark
>> that must be considered:
>>
>>  - input element watermark
>>  - output watermark
>>  - *(user) timer watermark*
>>
>> The timer watermark is an input to the ParDo. Sometimes you might think
>> of the "timer channel" as a self loop, where each timer is an element. Each
>> timer has a timestamp (the output timestamp) and separately some
>> instructions on when to deliver that timer. This is the same as the usual
>> difference between event time and processing time.
>>
>> The instruction on when to deliver a timer can have two forms:
>>
>>  - wait a certain amount of processing time
>>  - deliver the timer when the *input element watermark* reaches a time X
>>
>> Here is an important point: "cleanup timers" are *not* user timers. They
>> are an implementation detail. They are not part of the model. The runner's
>> job is to reclaim resources as windows expire. A user should never be
>> reasoning about how their timers relate to cleanup timers (except for
>> resource consumption). Because there is no relationship except that the
>> cleanup should happen "eventually" and invisibly.
>>
>> For a runner author to implement a "cleanup timer" requires a different
>> mechanism. A window expires when *both* the input element watermark *and*
>> the timer watermark are past the expiry time. In other words, the cleanup
>> timer fires according to the minimum of these watermarks, combined. It
>> *cannot* fire according to the input element watermark. If you naively try
>> to implement it as a user timer, it will be incorrect. Incidentally this is
>> why @OnWindowExpiration is a meaningful feature.
>>
>> Kenn
>>
>> On Tue, May 4, 2021 at 4:29 AM Jan Lukavský <je...@seznam.cz> wrote:
>>
>>> Hi Kenn,
>>>
>>> I created BEAM-12276 [1] with PR [2].
>>>
>>>  Jan
>>>
>>> [1] https://issues.apache.org/jira/browse/BEAM-12276
>>>
>>> [2] https://github.com/apache/beam/pull/14718
>>> On 5/3/21 7:46 PM, Kenneth Knowles wrote:
>>>
>>> This seems like just a bug. If you set a timer for X and have output
>>> timestamp Y where X < Y this should be fine. Is the problem the current
>>> input watermark? Are you trying to set a timer with output timestamp that
>>> is already past? I think that should be allowed, too, as long as the window
>>> is not expired, but I may be missing something.
>>>
>>> Some greater detail would be useful - maybe the full stack trace and/or
>>> a failing unit test in a PR?
>>>
>>> Kenn
>>>
>>> On Thu, Apr 29, 2021 at 12:51 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have come across a bug with timer output timestamp - when using event
>>>> time and relative timers, setting the timer can arbitrarily throw
>>>> IllegalArgumentException if the firing timestamp (input watermark) is
>>>> ahead of the output timestamp (like
>>>> .java.lang.IllegalArgumentException:
>>>> Attempted to set an event-time timer with an output timestamp of
>>>> 2021-04-29T07:16:19.369Z that is after the timer firing timestamp
>>>> -290308-12-21T19:59:05.225Z). But there is no way to access the firing
>>>> timestamp from user code. This means that the use has to either catch
>>>> the IllegalArgumentException, or not use this construct at all.
>>>>
>>>> Catching the exception should probably not be part of a contract, so we
>>>> should do one of the following:
>>>>
>>>>   a) either throw the exception right away and disable using relative
>>>> timers with output timestamp completely, or
>>>>
>>>>   b) support it correctly
>>>>
>>>> What is the actual reason not to support output timestamps, that are
>>>> ahead of firing timestamp? From my understanding, that should not be an
>>>> issue, because there is TimestampCombiner.EARLIEST on the
>>>> watermarkholdstate that corresponds to the output timestamp. If that is
>>>> correct can we simply remove the check?
>>>>
>>>>   Jan
>>>>
>>>>

Reply via email to