soarez commented on code in PR #14836:
URL: https://github.com/apache/kafka/pull/14836#discussion_r1406906907
##########
core/src/test/scala/unit/kafka/server/BrokerLifecycleManagerTest.scala:
##########
@@ -201,7 +201,7 @@ class BrokerLifecycleManagerTest {
while (!future.isDone || context.mockClient.hasInFlightRequests) {
context.poll()
manager.eventQueue.wakeup()
- context.time.sleep(100)
+ context.time.sleep(5)
Review Comment:
It's true in theory, but in this unit test, it is not.
I just tried with 6 heartbeats, and in 1,000 repetitions of the test, one of
the runs was missing the last directory. On my laptop, with 7 heartbeats,
10,000 test runs had no failures.
I think this happens because of how `poll()` works: it's rapidly advancing
the clock and notifying three separate systems - BrokerLifecycleManager,
MockClient and MockChannelManager — signaling them using `Object.notify()`,
which does not guarantee each of those threads will run straightaway, it's
still up to the OS to schedule them onto the CPU.
In practice, outside of unit tests, the delays in scheduling the
BrokerLifecycleManager thread should be insignificant compared to the heartbeat
interval. So I don't expect failures to be delayed for more than a heartbeat.
If the test proves to still be flaky even with 10 heartbeats, I think we can
just increase the number.
##########
core/src/test/scala/unit/kafka/server/BrokerLifecycleManagerTest.scala:
##########
@@ -201,7 +201,7 @@ class BrokerLifecycleManagerTest {
while (!future.isDone || context.mockClient.hasInFlightRequests) {
context.poll()
manager.eventQueue.wakeup()
- context.time.sleep(100)
+ context.time.sleep(5)
Review Comment:
It's true in theory, but in this unit test, it is not.
I just tried with 6 heartbeats, and in 1,000 repetitions of the test, one of
the runs was missing the last directory. On my laptop, with 7 heartbeats,
10,000 test runs had no failures.
I think this happens because of how `poll()` works: it's rapidly advancing
the clock and notifying three separate systems - BrokerLifecycleManager,
MockClient and MockChannelManager — signaling them using `Object.notify()`,
which does not guarantee each of those threads will run straightaway, it's
still up to the OS to schedule them onto the CPU.
In practice, outside of unit tests, the delays in scheduling the
BrokerLifecycleManager thread should be insignificant compared to the heartbeat
interval. So I don't expect failures to be delayed for more than a heartbeat.
If the test proves to still be flaky even with 10 heartbeats, I think we can
just increase the number.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]