[ 
https://issues.apache.org/jira/browse/KAFKA-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283901#comment-16283901
 ] 

Guozhang Wang commented on KAFKA-6323:
--------------------------------------

Here are my thoughts on punctuation semantics (on both KAFKA-6323 and 
KAFKA-6092):

*First trigger*: we should only punctuate the first time after the specified 
period has elapsed. And here is a slight difference with wall-clock time and 
stream time:

1. WALL_CLOCK_TIME: when the stream application starts at t0 (system wall clock 
time), punctuate first-time on t0 + t_scheduled.
2. STREAM_TIME: when the stream application starts, we do no schedule the first 
punctuation until the stream time is known (i.e. we have received at least one 
record from each input topic), say it is T01, punctuate first-time on T0 + 
T_scheduled.

*Interval*: again I think there is a slight difference with wall-clock time and 
stream time:

1. WALL_CLOCK_TIME: when the stream application last punctuation at t1, 
punctuate next-time on t1 + t_scheduled, even if there is no data arrived 
during this period of time.
2. STREAM_TIME: this is data driven, and hence: when the stream application 
last punctuation at T1, and then stream time is updated and advanced to T2, 
where (T2 - T1) > t_scheduled, punctuate at T2 once even if (T2 - T1) >= 
t_scheduled * 2.

WDYT? cc [~stephane.maa...@gmail.com] [~mih...@wp.pl]

> punctuate with WALL_CLOCK_TIME triggered immediately
> ----------------------------------------------------
>
>                 Key: KAFKA-6323
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6323
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 1.0.0
>            Reporter: Frederic Arno
>            Assignee: Frederic Arno
>             Fix For: 1.1.0, 1.0.1
>
>
> When working on a custom Processor from which I am scheduling a punctuation 
> using WALL_CLOCK_TIME. I've noticed that whatever the punctuation interval I 
> set, a call to my Punctuator is always triggered immediately.
> Having a quick look at kafka-streams' code, I could find that all 
> PunctuationSchedule's timestamps are matched against the current time in 
> order to decide whether or not to trigger the punctuator 
> (org.apache.kafka.streams.processor.internals.PunctuationQueue#mayPunctuate). 
> However, I've only seen code that initializes PunctuationSchedule's timestamp 
> to 0, which I guess is what is causing an immediate punctuation.
> At least when using WALL_CLOCK_TIME, shouldn't the PunctuationSchedule's 
> timestamp be initialized to current time + interval?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to