[
https://issues.apache.org/jira/browse/KAFKA-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guozhang Wang updated KAFKA-3514:
---------------------------------
Labels: time (was: )
> Stream timestamp computation needs some further thoughts
> --------------------------------------------------------
>
> Key: KAFKA-3514
> URL: https://issues.apache.org/jira/browse/KAFKA-3514
> Project: Kafka
> Issue Type: Bug
> Components: kafka streams
> Reporter: Guozhang Wang
> Labels: time
> Fix For: 0.10.1.0
>
>
> Our current stream task's timestamp is used for punctuate function as well as
> selecting which stream to process next (i.e. best effort stream
> synchronization). And it is defined as the smallest timestamp over all
> partitions in the task's partition group. This results in two unintuitive
> corner cases:
> 1) observing a late arrived record would keep that stream's timestamp low for
> a period of time, and hence keep being process until that late record. For
> example take two partitions within the same task annotated by their
> timestamps:
> {code}
> Stream A: 5, 6, 7, 8, 9, 1, 10
> {code}
> {code}
> Stream B: 2, 3, 4, 5
> {code}
> The late arrived record with timestamp "1" will cause stream A to be selected
> continuously in the thread loop, i.e. messages with timestamp 5, 6, 7, 8, 9
> until the record itself is dequeued and processed, then stream B will be
> selected starting with timestamp 2.
> 2) an empty buffered partition will cause its timestamp to be not advanced,
> and hence the task timestamp as well since it is the smallest among all
> partitions. This may not be a severe problem compared with 1) above though.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)