cadonna commented on code in PR #15361: URL: https://github.com/apache/kafka/pull/15361#discussion_r1504075515
########## streams/src/test/java/org/apache/kafka/streams/processor/internals/GlobalStateTaskTest.java: ########## @@ -217,21 +215,53 @@ public void shouldFlushStateManagerWithOffsets() { final Map<TopicPartition, Long> expectedOffsets = new HashMap<>(); expectedOffsets.put(t1, 52L); expectedOffsets.put(t2, 100L); + globalStateTask.initialize(); globalStateTask.update(record(topic1, 1, 51, "foo".getBytes(), "foo".getBytes())); globalStateTask.flushState(); + assertEquals(expectedOffsets, stateMgr.changelogOffsets()); + assertTrue(stateMgr.flushed); } @Test public void shouldCheckpointOffsetsWhenStateIsFlushed() { final Map<TopicPartition, Long> expectedOffsets = new HashMap<>(); expectedOffsets.put(t1, 102L); expectedOffsets.put(t2, 100L); + globalStateTask.initialize(); globalStateTask.update(record(topic1, 1, 101, "foo".getBytes(), "foo".getBytes())); globalStateTask.flushState(); - assertThat(stateMgr.changelogOffsets(), equalTo(expectedOffsets)); + + assertEquals(expectedOffsets, stateMgr.changelogOffsets()); + assertTrue(stateMgr.checkpointWritten); + } + + @Test + public void shouldNotCheckpointIfNotReceivedEnoughRecords() { + globalStateTask.initialize(); + globalStateTask.update(record(topic1, 1, 9000L, "foo".getBytes(), "foo".getBytes())); + globalStateTask.maybeCheckpoint(); + + assertEquals(offsets, stateMgr.changelogOffsets()); + assertFalse(stateMgr.flushed); + assertFalse(stateMgr.checkpointWritten); + } + + @Test + public void shouldCheckpointIfReceivedEnoughRecords() { + final Map<TopicPartition, Long> expectedOffsets = new HashMap<>(); + expectedOffsets.put(t1, 10051L); // t1 advanced with 10001 records + expectedOffsets.put(t2, 100L); + + globalStateTask.initialize(); + globalStateTask.update(record(topic1, 1, 10050L, "foo".getBytes(), "foo".getBytes())); Review Comment: Could you please additionally also do an update here just before (i.e. at 10049) the threshold and verify that no flush was performed and no checkpoint was written? ########## streams/src/test/java/org/apache/kafka/streams/processor/internals/StreamTaskTest.java: ########## Review Comment: Why were these changes needed? You did not change anything in `StreamTask`. ########## streams/src/main/java/org/apache/kafka/streams/processor/internals/GlobalStreamThread.java: ########## @@ -259,19 +251,14 @@ void initialize() { for (final Map.Entry<TopicPartition, Long> entry : partitionOffsets.entrySet()) { globalConsumer.seek(entry.getKey(), entry.getValue()); } - lastFlush = time.milliseconds(); } void pollAndUpdate() { final ConsumerRecords<byte[], byte[]> received = globalConsumer.poll(pollTime); for (final ConsumerRecord<byte[], byte[]> record : received) { stateMaintainer.update(record); } - final long now = time.milliseconds(); - if (now - flushInterval >= lastFlush) { - stateMaintainer.flushState(); - lastFlush = now; - } + stateMaintainer.maybeCheckpoint(); Review Comment: I think, we should still keep an interval-based flush. However, we should not flush if the offset delta is not larger than `OFFSET_DELTA_THRESHOLD_FOR_CHECKPOINT`. If we remove the interval-based flush that would be quite a change in behavior. Users that set the commit interval to large values would not expect to see flushes even though the offset delta is exceeded. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org