GitHub user original-brownbear opened a pull request: https://github.com/apache/kafka/pull/2568
Kafka 4198: Fix Race Condition in KafkaServer Shutdown Fixes the initially reported issue in https://issues.apache.org/jira/browse/KAFKA-4198. The relevant part in fixing the initial issue here is the change to `kafka.server.KafkaServer#shutdown`. It contained this step: ```java val canShutdown = isShuttingDown.compareAndSet(false, true) if (canShutdown && shutdownLatch.getCount > 0) { ``` without any fallback for the case of `shutdownLatch.getCount == 0`. So in the case of `shutdownLatch.getCount == 0` (when a previous call to the shutdown method was right about to finish) you would set `isShuttingDown` to true again without any possibility of ever getting the server started (since `startup` will check `isShuttingDown` before setting up a new latch with count 1). Long story short: concurrent calls to shutdown can get the server locked in a broken state. This fixes the reported error: ```sh java.lang.IllegalStateException: Kafka server is still shutting down, cannot re-start! at kafka.server.KafkaServer.startup(KafkaServer.scala:184) at kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply$mcVI$sp(KafkaServerTestHarness.scala:117) at kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply(KafkaServerTestHarness.scala:116) at kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply(KafkaServerTestHarness.scala:116) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) at scala.collection.immutable.Range.foreach(Range.scala:160) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) at kafka.integration.KafkaServerTestHarness$class.restartDeadBrokers(KafkaServerTestHarness.scala:116) at kafka.api.ConsumerBounceTest.restartDeadBrokers(ConsumerBounceTest.scala:34) at kafka.api.ConsumerBounceTest$BounceBrokerScheduler.doWork(ConsumerBounceTest.scala:158) ``` That said this error (reported in a comment to the JIRA) is still left even with this fix: ```sh kafka.api.ConsumerBounceTest > testConsumptionWithBrokerFailures FAILED java.lang.IllegalArgumentException: You can only check the position for partitions assigned to this consumer. at org.apache.kafka.clients.consumer.KafkaConsumer.position(KafkaConsumer.java:1271) at kafka.api.ConsumerBounceTest.consumeWithBrokerFailures(ConsumerBounceTest.scala:96) at kafka.api.ConsumerBounceTest.testConsumptionWithBrokerFailures(ConsumerBounceTest.scala:69) ``` ... I think this one should get a separate JIRA though. It seems to me that the behaviour of the call to `partition` when a Broker just died is a separate issue from the one initially reported. You can merge this pull request into a Git repository by running: $ git pull https://github.com/original-brownbear/kafka KAFKA-4198 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/2568.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2568 ---- commit 08460a669b4c737c20793129b01c7a3676452dac Author: Armin Braun <m...@obrown.io> Date: 2017-02-18T08:11:10Z KAFKA-4198: Cleaner ExecutorService Handling commit 128db5ea5afeb940f108e95c9bcaad84c9889b10 Author: Armin Braun <m...@obrown.io> Date: 2017-02-18T09:12:15Z KAFKA-4198: Ensure Fresh MetaData commit 847d001e17c2baaba18936cc5c497756154c931a Author: Armin Braun <m...@obrown.io> Date: 2017-02-18T10:27:37Z KAFKA-4198: Revert Test Change commit 005ff8f4a180a7c2c45313accff6627e90e9983a Author: Armin Braun <m...@obrown.io> Date: 2017-02-18T11:39:47Z KAFKA-4198: Fix RunCondition in KafkaServer#shutdown commit 9559ad387bba6d24a0ba5f244aae5f6d32a897f1 Author: Armin Braun <m...@obrown.io> Date: 2017-02-18T11:41:00Z KAFKA-4198: Revert Experimental Change to KafkaConsumer commit d2f138c9f01800219fcd02a625a9f89b9315fd73 Author: Armin Braun <m...@obrown.io> Date: 2017-02-18T12:04:14Z KAFKA-4198: Revert Experimental Change to KafkaServerTestHarness commit 8cfb45240eda64cc358303c2533aef6c50f69225 Author: Armin Braun <m...@obrown.io> Date: 2017-02-18T12:06:28Z KAFKA-4198: Revert Experimental Change to ConsumerBounceTest ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---