[ 
https://issues.apache.org/jira/browse/KAFKA-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294539#comment-16294539
 ] 

Frederic Arno commented on KAFKA-6323:
--------------------------------------

I'd like to discuss about alignment as requested by [~mjsax] in my updated PR, 
especially here: https://github.com/apache/kafka/pull/4301#discussion_r157378045

With the code I pushed, the punctuations stay aligned (best effort), until 
there's a gap bigger than 2*interval. After a big gap, my code triggers a 
punctuation as early as possible and then aligns on that punctuation time, not 
matching the alignment we had before the gap.

In the discussion below, I'm only talking about alignment over a big gap (> 
2*interval), and I don't see the value of staying aligned (but as stated before 
I have very little experience with all of that), here's why:

1. The current API doesn't suggest alignment is important, it could be 
different if we had schedule(final long start, final long interval, final 
PunctuationType type, final Punctuator punctuator), where the start parameter 
would indicate the desired time of the first punctuation. In that case I could 
agree to only punctuate at start+x*interval. But that's not what we currently 
have, and it would come with further complications: we would need to decide 
what to do for incomplete intervals, either skip a punctuation, or have 2 
punctuations spaced by less than interval.

2. Because we don't have a start parameter (see previous point), the first 
punctuation can happen at anytime. Does it make sense to align on that time 
which is an almost random time? Indeed with STREAM_TIME, the first punctuation 
happens as soon as we have a stream time and that time comes from the data's 
timestamps which I don't expect to be aligned to anything meaningful. With 
WALL_CLOCK_TIME, the first punctuation happens roughly at now()+interval, now() 
being the time at which the schedule call was made, that can't be aligned to 
anything.

Am I missing something?

> punctuate with WALL_CLOCK_TIME triggered immediately
> ----------------------------------------------------
>
>                 Key: KAFKA-6323
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6323
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 1.0.0
>            Reporter: Frederic Arno
>            Assignee: Frederic Arno
>             Fix For: 1.1.0, 1.0.1
>
>
> When working on a custom Processor from which I am scheduling a punctuation 
> using WALL_CLOCK_TIME. I've noticed that whatever the punctuation interval I 
> set, a call to my Punctuator is always triggered immediately.
> Having a quick look at kafka-streams' code, I could find that all 
> PunctuationSchedule's timestamps are matched against the current time in 
> order to decide whether or not to trigger the punctuator 
> (org.apache.kafka.streams.processor.internals.PunctuationQueue#mayPunctuate). 
> However, I've only seen code that initializes PunctuationSchedule's timestamp 
> to 0, which I guess is what is causing an immediate punctuation.
> At least when using WALL_CLOCK_TIME, shouldn't the PunctuationSchedule's 
> timestamp be initialized to current time + interval?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to