Re: [PR] KAFKA-13907: Fix hanging ServerShutdownTest.testCleanShutdownWithKRaftControllerUnavailable [kafka]
chia7712 merged PR #12174: URL: https://github.com/apache/kafka/pull/12174 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-13907: Fix hanging ServerShutdownTest.testCleanShutdownWithKRaftControllerUnavailable [kafka]
chia7712 commented on PR #12174: URL: https://github.com/apache/kafka/pull/12174#issuecomment-2044780100 @soarez I believe those failed tests are unrelated, but could you rebase code to trigger QA again due to incompletion of `JDK 17 and Scala 2.13`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-13907: Fix hanging ServerShutdownTest.testCleanShutdownWithKRaftControllerUnavailable [kafka]
soarez commented on PR #12174: URL: https://github.com/apache/kafka/pull/12174#issuecomment-2044731959 @chia7712 done. It seems to me that the current test failures are unrelated. Could you confirm? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-13907: Fix hanging ServerShutdownTest.testCleanShutdownWithKRaftControllerUnavailable [kafka]
chia7712 commented on PR #12174: URL: https://github.com/apache/kafka/pull/12174#issuecomment-2041478299 @soarez Could you please fix the conflicts? thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-13907: Fix hanging ServerShutdownTest.testCleanShutdownWithKRaftControllerUnavailable [kafka]
soarez commented on code in PR #12174: URL: https://github.com/apache/kafka/pull/12174#discussion_r1554541084 ## core/src/test/scala/unit/kafka/integration/KafkaServerTestHarness.scala: ## @@ -260,9 +274,12 @@ abstract class KafkaServerTestHarness extends QuorumTestHarness { } def killBroker(index: Int): Unit = { Review Comment: Oh I see. It was a mistake to remove `awaitShutdown`. I think we should leave the behavior as is, I don't see any reason why the new `killBroker(idx, timeout)` alternative shouldn't also call `awaitShutdown`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-13907: Fix hanging ServerShutdownTest.testCleanShutdownWithKRaftControllerUnavailable [kafka]
chia7712 commented on code in PR #12174: URL: https://github.com/apache/kafka/pull/12174#discussion_r1551687551 ## core/src/test/scala/unit/kafka/integration/KafkaServerTestHarness.scala: ## @@ -260,9 +274,12 @@ abstract class KafkaServerTestHarness extends QuorumTestHarness { } def killBroker(index: Int): Unit = { Review Comment: My point was "should we keep `_brokers(index).awaitShutdown()`"? It seems the latch used by `awaitShutdown` should be notified after `shutdown` get completed, so maybe it is fine to remove it. It would be great to add comments if you prefer to remove `awaitShutdown` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-13907: Fix hanging ServerShutdownTest.testCleanShutdownWithKRaftControllerUnavailable [kafka]
chia7712 commented on code in PR #12174: URL: https://github.com/apache/kafka/pull/12174#discussion_r1551371510 ## core/src/test/scala/unit/kafka/server/ServerShutdownTest.scala: ## @@ -197,12 +198,11 @@ class ServerShutdownTest extends KafkaServerTestHarness { verifyNonDaemonThreadsStatus() } - @Disabled @ParameterizedTest(name = TestInfoUtils.TestWithParameterizedQuorumName) @ValueSource(strings = Array("kraft")) def testCleanShutdownWithKRaftControllerUnavailable(quorum: String): Unit = { Review Comment: That is good to me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-13907: Fix hanging ServerShutdownTest.testCleanShutdownWithKRaftControllerUnavailable [kafka]
soarez commented on code in PR #12174: URL: https://github.com/apache/kafka/pull/12174#discussion_r1551132979 ## core/src/test/java/kafka/test/junit/ZkClusterInvocationContext.java: ## @@ -323,7 +324,7 @@ public void rollingBrokerRestart() { throw new IllegalStateException("Tried to restart brokers but the cluster has not been started!"); } for (int i = 0; i < clusterReference.get().brokerCount(); i++) { -clusterReference.get().killBroker(i); +clusterReference.get().killBroker(i, Duration.ofSeconds(5)); Review Comment: I think this is unnecessary. Removing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-13907: Fix hanging ServerShutdownTest.testCleanShutdownWithKRaftControllerUnavailable [kafka]
soarez commented on code in PR #12174: URL: https://github.com/apache/kafka/pull/12174#discussion_r1551132385 ## core/src/test/scala/unit/kafka/server/ServerShutdownTest.scala: ## @@ -197,12 +198,11 @@ class ServerShutdownTest extends KafkaServerTestHarness { verifyNonDaemonThreadsStatus() } - @Disabled @ParameterizedTest(name = TestInfoUtils.TestWithParameterizedQuorumName) @ValueSource(strings = Array("kraft")) def testCleanShutdownWithKRaftControllerUnavailable(quorum: String): Unit = { Review Comment: That's a good point. If the controller is unavailable, the shutdown can't really be considered clean. However, I think it's better to just remove the `Clean` word from the method, rather than replacing it with `Dirty`. It seems cleaner and less confusing, as it might not be clear what "dirty" is referring to. Please let me know if you disagree. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-13907: Fix hanging ServerShutdownTest.testCleanShutdownWithKRaftControllerUnavailable [kafka]
soarez commented on code in PR #12174: URL: https://github.com/apache/kafka/pull/12174#discussion_r1551129021 ## core/src/test/scala/unit/kafka/integration/KafkaServerTestHarness.scala: ## @@ -260,9 +274,12 @@ abstract class KafkaServerTestHarness extends QuorumTestHarness { } def killBroker(index: Int): Unit = { Review Comment: The original implementation has a timeout of 5 minutes, so I'll keep the same behavior here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-13907: Fix hanging ServerShutdownTest.testCleanShutdownWithKRaftControllerUnavailable [kafka]
chia7712 commented on code in PR #12174: URL: https://github.com/apache/kafka/pull/12174#discussion_r1549916567 ## core/src/test/scala/unit/kafka/integration/KafkaServerTestHarness.scala: ## @@ -260,9 +274,12 @@ abstract class KafkaServerTestHarness extends QuorumTestHarness { } def killBroker(index: Int): Unit = { Review Comment: maybe we should keep origin implementation since it expect to await shutdown. ## core/src/test/java/kafka/test/junit/ZkClusterInvocationContext.java: ## @@ -323,7 +324,7 @@ public void rollingBrokerRestart() { throw new IllegalStateException("Tried to restart brokers but the cluster has not been started!"); } for (int i = 0; i < clusterReference.get().brokerCount(); i++) { -clusterReference.get().killBroker(i); +clusterReference.get().killBroker(i, Duration.ofSeconds(5)); Review Comment: what is the purpose of this change? ## core/src/test/scala/unit/kafka/server/ServerShutdownTest.scala: ## @@ -197,12 +198,11 @@ class ServerShutdownTest extends KafkaServerTestHarness { verifyNonDaemonThreadsStatus() } - @Disabled @ParameterizedTest(name = TestInfoUtils.TestWithParameterizedQuorumName) @ValueSource(strings = Array("kraft")) def testCleanShutdownWithKRaftControllerUnavailable(quorum: String): Unit = { Review Comment: the shutdown with timeout is a kind of dirty shutdown so we should rename the test to `testDirtyShutdownWithKRaftControllerUnavailable` ## core/src/test/scala/unit/kafka/integration/KafkaServerTestHarness.scala: ## @@ -260,9 +274,12 @@ abstract class KafkaServerTestHarness extends QuorumTestHarness { } def killBroker(index: Int): Unit = { -if (alive(index)) { - _brokers(index).shutdown() - _brokers(index).awaitShutdown() +killBroker(index, Duration.ofSeconds(5)) + } + + def killBroker(index: Int, timeout: Duration): Unit = { Review Comment: we need to document the difference of this variety. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-13907: Fix hanging ServerShutdownTest.testCleanShutdownWithKRaftControllerUnavailable [kafka]
soarez commented on code in PR #12174: URL: https://github.com/apache/kafka/pull/12174#discussion_r1549909706 ## core/src/main/scala/kafka/server/BrokerServer.scala: ## @@ -623,9 +623,12 @@ class BrokerServer( } } - override def shutdown(): Unit = { + override def shutdown(): Unit = shutdown(TimeUnit.MINUTES.toMillis(5)) Review Comment: Yup, that makes sense. Changed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-13907: Fix hanging ServerShutdownTest.testCleanShutdownWithKRaftControllerUnavailable [kafka]
soarez commented on code in PR #12174: URL: https://github.com/apache/kafka/pull/12174#discussion_r1549909351 ## core/src/main/scala/kafka/server/KafkaBroker.scala: ## @@ -93,6 +93,7 @@ trait KafkaBroker extends Logging { def startup(): Unit def awaitShutdown(): Unit def shutdown(): Unit + def shutdown(timeoutMs: Long): Unit Review Comment: Good suggestion. Applied -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-13907: Fix hanging ServerShutdownTest.testCleanShutdownWithKRaftControllerUnavailable [kafka]
chia7712 commented on code in PR #12174: URL: https://github.com/apache/kafka/pull/12174#discussion_r1549747480 ## core/src/main/scala/kafka/server/KafkaBroker.scala: ## @@ -93,6 +93,7 @@ trait KafkaBroker extends Logging { def startup(): Unit def awaitShutdown(): Unit def shutdown(): Unit + def shutdown(timeoutMs: Long): Unit Review Comment: How about using `Duration` instead of long type? Also, we can rename it from `timeoutMs` to `timeout` ## core/src/main/scala/kafka/server/BrokerServer.scala: ## @@ -623,9 +623,12 @@ class BrokerServer( } } - override def shutdown(): Unit = { + override def shutdown(): Unit = shutdown(TimeUnit.MINUTES.toMillis(5)) Review Comment: How about adding default implementation to parent class? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] KAFKA-13907: Fix hanging ServerShutdownTest.testCleanShutdownWithKRaftControllerUnavailable [kafka]
soarez opened a new pull request, #12174: URL: https://github.com/apache/kafka/pull/12174 When a controlled shutdown is requested the broker tries to communicate the state change to the controller via a heartbeat request. [1] In this test, the controller is not available so the request will fail. The current timeout behavior in a heartbeat request is to just keep retrying — which generally makes sense, just not in the context of a controlled shutdown. When a heartbeat request times out, if we are in the middle of a controlled shutdown, we shouldn't just retry forever but rather just give up on trying to contact the controller and proceed with the controlled shutdown. [1] https://github.com/apache/kafka/blob/f2d6282668a31b9a554563338f9178e2bba2833f/core/src/main/scala/kafka/server/BrokerLifecycleManager.scala#L217 *Summary of testing strategy* The test no longer fails ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-13907: Fix hanging ServerShutdownTest.testCleanShutdownWithKRaftControllerUnavailable [kafka]
soarez closed pull request #12174: KAFKA-13907: Fix hanging ServerShutdownTest.testCleanShutdownWithKRaftControllerUnavailable URL: https://github.com/apache/kafka/pull/12174 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org