Yes, what I meant was the distinguishing of cleanup timers from plain
user other timers - that seems to be due to the fact that they fire base
on different watermark (input/output). And firing timers based on output
watermark might be actually a good user-facing feature, because that
might help tracking output watermark in transforms that want to deal
with potentially droppable data downstream (the input would have to be
re-windowed to global window, of course). I don't know if there are
other use-cases, if not maybe it might be sufficient to create a
DroppableDataSplit transform, that would create a PCollectionTuple with
droppable and other data. But that was just an idea when Kenn mentioned
that the cleanup timers "fire differently" - I generally think that when
there is a need for a different behavior, than it might signal there is
something possibly fundamental.
Jan
On 5/5/21 5:11 PM, Reuven Lax wrote:
This is one way to put it. I think in practice Beam guarantees that
timers fire in order for a given key (though there is still a bit of a
bug around looping timers - the fix for that got rolled back). This
means that as long as the runner sets the cleanup timer to be 1ms
passed the end of the window (plus allowed lateness), it's
guaranteed to be the last timer that fires for that window.
On Wed, May 5, 2021 at 2:41 AM Jan Lukavský <je...@seznam.cz
<mailto:je...@seznam.cz>> wrote:
Hm, one thing in the last paragraph seems there might be some
logical gap.
> For a runner author to implement a "cleanup timer" requires a
different mechanism. A window expires when *both* the input
element watermark *and* the timer watermark are past the expiry
time. In other words, the cleanup timer fires according to the
minimum of these watermarks, combined. It *cannot* fire according
to the input element watermark. If you naively try to implement it
as a user timer, it will be incorrect. Incidentally this is
why @OnWindowExpiration is a meaningful feature.
The description describes a timer that fires not according to
input watermark, but according to the output watermark (once the
output watermark reaches certain point in time). That logically
implies, that such a timer cannot have non-droppable output (at
least if its output timestamp belongs to the respective window)
and cannot create a watermark hold (because that would block the
progress of the output watermark and might cause the timer to not
fire ever). This maybe might be a useful user-feature as well,
probably again mostly related to how user-code might want to deal
with droppable data.
Jan
On 5/4/21 6:41 PM, Kenneth Knowles wrote:
Mean to also add +Reuven Lax <mailto:re...@google.com>
On Tue, May 4, 2021 at 9:41 AM Kenneth Knowles <k...@apache.org
<mailto:k...@apache.org>> wrote:
Explicitly pinging a couple folks who were involved in the
original change which yours essentially reverts. There's a
model question here that I want to clarify on-list:
When you have a ParDo setting timers, you have an additional
watermark that must be considered:
- input element watermark
- output watermark
- *(user) timer watermark*
The timer watermark is an input to the ParDo. Sometimes you
might think of the "timer channel" as a self loop, where each
timer is an element. Each timer has a timestamp (the output
timestamp) and separately some instructions on when to
deliver that timer. This is the same as the usual difference
between event time and processing time.
The instruction on when to deliver a timer can have two forms:
- wait a certain amount of processing time
- deliver the timer when the *input element watermark*
reaches a time X
Here is an important point: "cleanup timers" are *not* user
timers. They are an implementation detail. They are not part
of the model. The runner's job is to reclaim resources as
windows expire. A user should never be reasoning about how
their timers relate to cleanup timers (except for resource
consumption). Because there is no relationship except that
the cleanup should happen "eventually" and invisibly.
For a runner author to implement a "cleanup timer" requires a
different mechanism. A window expires when *both* the input
element watermark *and* the timer watermark are past the
expiry time. In other words, the cleanup timer fires
according to the minimum of these watermarks, combined. It
*cannot* fire according to the input element watermark. If
you naively try to implement it as a user timer, it will be
incorrect. Incidentally this is why @OnWindowExpiration is a
meaningful feature.
Kenn
On Tue, May 4, 2021 at 4:29 AM Jan Lukavský <je...@seznam.cz
<mailto:je...@seznam.cz>> wrote:
Hi Kenn,
I created BEAM-12276 [1] with PR [2].
Jan
[1] https://issues.apache.org/jira/browse/BEAM-12276
<https://issues.apache.org/jira/browse/BEAM-12276>
[2] https://github.com/apache/beam/pull/14718
<https://github.com/apache/beam/pull/14718>
On 5/3/21 7:46 PM, Kenneth Knowles wrote:
This seems like just a bug. If you set a timer for X and
have output timestamp Y where X < Y this should be fine.
Is the problem the current input watermark? Are you
trying to set a timer with output timestamp that is
already past? I think that should be allowed, too, as
long as the window is not expired, but I may be missing
something.
Some greater detail would be useful - maybe the full
stack trace and/or a failing unit test in a PR?
Kenn
On Thu, Apr 29, 2021 at 12:51 AM Jan Lukavský
<je...@seznam.cz <mailto:je...@seznam.cz>> wrote:
Hi,
I have come across a bug with timer output timestamp
- when using event
time and relative timers, setting the timer can
arbitrarily throw
IllegalArgumentException if the firing timestamp
(input watermark) is
ahead of the output timestamp (like
.java.lang.IllegalArgumentException:
Attempted to set an event-time timer with an output
timestamp of
2021-04-29T07:16:19.369Z that is after the timer
firing timestamp
-290308-12-21T19:59:05.225Z). But there is no way to
access the firing
timestamp from user code. This means that the use
has to either catch
the IllegalArgumentException, or not use this
construct at all.
Catching the exception should probably not be part
of a contract, so we
should do one of the following:
a) either throw the exception right away and
disable using relative
timers with output timestamp completely, or
b) support it correctly
What is the actual reason not to support output
timestamps, that are
ahead of firing timestamp? From my understanding,
that should not be an
issue, because there is TimestampCombiner.EARLIEST
on the
watermarkholdstate that corresponds to the output
timestamp. If that is
correct can we simply remove the check?
Jan