[
https://issues.apache.org/jira/browse/KAFKA-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297570#comment-16297570
]
Guozhang Wang commented on KAFKA-6323:
--------------------------------------
[~frederica] I'd suggest the following:
1) STREAM_TIME punctuation: I agree with [~mjsax] for aligning on scheduled
timestamp; to be more specific, suppose user provided interval is {{T}}, we
would first schedule the next timstamp as {{T}} exactly; then at any point
suppose our next scheduled the next timestamp {{T1}}, and stream time has
advanced to {{T2}} because of received data where {{T2 >= T1}}, then we just
punctuate with parameter {{floor(T2, T)}} and schedule the next punctuation at
{{floor(T2, T) + T}}.
2) WALL_CLOCK_TIME: we do not try to align on interval, i.e. with user provided
interval {{T}}, next scheduled time is {{now + T}}, and at the time we did the
check with scheduled timestamp {{T1}}, if the current system time is {{T2 (T2
>= T1)}} we punctuate at {{T2}} and schedule the next punctuation at timestamp
{{T2 + T}}. The argument is that with long GC /
single-record-taking-long-time-to-process / etc scenarios, we can never have a
precise or predictable punctuation based on system wall-clock time, so instead
we'd just try to expose the exact current system time when punctuation is
triggered.
> punctuate with WALL_CLOCK_TIME triggered immediately
> ----------------------------------------------------
>
> Key: KAFKA-6323
> URL: https://issues.apache.org/jira/browse/KAFKA-6323
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 1.0.0
> Reporter: Frederic Arno
> Assignee: Frederic Arno
> Fix For: 1.1.0, 1.0.1
>
>
> When working on a custom Processor from which I am scheduling a punctuation
> using WALL_CLOCK_TIME. I've noticed that whatever the punctuation interval I
> set, a call to my Punctuator is always triggered immediately.
> Having a quick look at kafka-streams' code, I could find that all
> PunctuationSchedule's timestamps are matched against the current time in
> order to decide whether or not to trigger the punctuator
> (org.apache.kafka.streams.processor.internals.PunctuationQueue#mayPunctuate).
> However, I've only seen code that initializes PunctuationSchedule's timestamp
> to 0, which I guess is what is causing an immediate punctuation.
> At least when using WALL_CLOCK_TIME, shouldn't the PunctuationSchedule's
> timestamp be initialized to current time + interval?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)