[ https://issues.apache.org/jira/browse/KAFKA-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237655#comment-17237655 ]
Matthias J. Sax edited comment on KAFKA-10755 at 11/23/20, 7:33 PM: -------------------------------------------------------------------- After reconsideration, I think we should get this into 2.7.0 release, too. \cc [~bbejeck] was (Author: mjsax): After reconsideration, we should get this into 2.7.0 release, too. > Should consider commit latency when computing next commit timestamp > ------------------------------------------------------------------- > > Key: KAFKA-10755 > URL: https://issues.apache.org/jira/browse/KAFKA-10755 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 2.6.0 > Reporter: Matthias J. Sax > Assignee: Matthias J. Sax > Priority: Blocker > Fix For: 2.6.1, 2.8.0 > > > In 2.6, we reworked the main processing/commit loop in `StreamThread` and > introduced a regression, by _not_ updating the current time after committing. > This implies that we compute the next commit timestamp too low (ie, too > early). > For small commit intervals and high commit latency (like in EOS), this big > may lead to an increased commit frequency and fewer processed records between > two commits, and thus to reduced throughput. > For example, assume that the commit interval is 100ms and the commit latency > is 50ms, and we start the commit at timestamp 10000. The commit finishes at > 10050, and the next commit should happen at 10150. However, if we don't > update the current timestamp, we incorrectly compute the next commit time as > 10100, ie, 50ms too early, and we have only 50ms to process data instead of > the intended 100ms. > In the worst case, if the commit latency is larger than the commit interval, > it would imply that we commit after processing a single record per task. -- This message was sent by Atlassian Jira (v8.3.4#803005)