soarez commented on code in PR #14836: URL: https://github.com/apache/kafka/pull/14836#discussion_r1406906907
########## core/src/test/scala/unit/kafka/server/BrokerLifecycleManagerTest.scala: ########## @@ -201,7 +201,7 @@ class BrokerLifecycleManagerTest { while (!future.isDone || context.mockClient.hasInFlightRequests) { context.poll() manager.eventQueue.wakeup() - context.time.sleep(100) + context.time.sleep(5) Review Comment: It's true in theory, but in this unit test, it is not. I just tried with 6 heartbeats, and in 1,000 repetitions of the test, one of the runs was missing the last directory. On my laptop, with 7 heartbeats, 10,000 test runs had no failures. I think this happens because of how `poll()` works: it's rapidly advancing the clock and notifying three separate systems - BrokerLifecycleManager, MockClient and MockChannelManager — signaling them using `Object.notify()`, which does not guarantee each of those threads will run straightaway, it's still up to the OS to schedule them onto the CPU. In practice, outside of unit tests, the delays in scheduling the BrokerLifecycleManager thread should be insignificant compared to the heartbeat interval. So I don't expect failures to be delayed for more than a heartbeat. If the test proves to still be flaky even with 10 heartbeats, I think we can just increase the number. ########## core/src/test/scala/unit/kafka/server/BrokerLifecycleManagerTest.scala: ########## @@ -201,7 +201,7 @@ class BrokerLifecycleManagerTest { while (!future.isDone || context.mockClient.hasInFlightRequests) { context.poll() manager.eventQueue.wakeup() - context.time.sleep(100) + context.time.sleep(5) Review Comment: It's true in theory, but in this unit test, it is not. I just tried with 6 heartbeats, and in 1,000 repetitions of the test, one of the runs was missing the last directory. On my laptop, with 7 heartbeats, 10,000 test runs had no failures. I think this happens because of how `poll()` works: it's rapidly advancing the clock and notifying three separate systems - BrokerLifecycleManager, MockClient and MockChannelManager — signaling them using `Object.notify()`, which does not guarantee each of those threads will run straightaway, it's still up to the OS to schedule them onto the CPU. In practice, outside of unit tests, the delays in scheduling the BrokerLifecycleManager thread should be insignificant compared to the heartbeat interval. So I don't expect failures to be delayed for more than a heartbeat. If the test proves to still be flaky even with 10 heartbeats, I think we can just increase the number. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org