[
https://issues.apache.org/jira/browse/KAFKA-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Eisele updated KAFKA-6714:
------------------------------
Priority: Critical (was: Major)
> KafkaController marks all Brokers as "Shutting down", though only one broker
> has been shut down
> -----------------------------------------------------------------------------------------------
>
> Key: KAFKA-6714
> URL: https://issues.apache.org/jira/browse/KAFKA-6714
> Project: Kafka
> Issue Type: Bug
> Components: controller, core
> Affects Versions: 0.11.0.2
> Environment: Kafka Cluster on Amazon AWS EC2 r4.2xlarge instances
> with 5 nodes and a Zookeeper Cluster on r4.2xlarge instances with 3 nodes.
> The Cluster is distributed across 2 availability zones.
> Reporter: Uwe Eisele
> Priority: Critical
>
> In our Kafka Cluster we experienced a situation in wich the Kafka controller
> has all Brokers marked as "Shutting down", though indeed only one Broker has
> been shut down.
> The last log entry about the broker state before the entry that states that
> all brokers are shutting down states that no brokers are shutting down.
> The consequence of this weird state is, that the Kafka controller is not able
> to elect any partition leader.
> {code:java}
> [2018-03-15 16:28:24,288] INFO [Controller 5]: Shutting down broker 5
> (kafka.controller.KafkaController)
> [2018-03-15 16:28:24,288] DEBUG [Controller 5]: All shutting down brokers: 5
> (kafka.controller.KafkaController)
> [2018-03-15 16:28:24,288] DEBUG [Controller 5]: Live brokers: 1,2,3,4
> (kafka.controller.KafkaController)
> ...
> [2018-03-15 16:28:36,846] INFO [Controller 3]: Currently active brokers in
> the cluster: Set(1, 2, 3, 4) (kafka.controller.KafkaController)
> [2018-03-15 16:28:36,846] INFO [Controller 3]: Currently shutting brokers in
> the cluster: Set() (kafka.controller.KafkaController)
> ...
> [2018-03-19 17:57:22,273] INFO [Controller 3]: Shutting down broker 1
> (kafka.controller.KafkaController)
> [2018-03-19 17:57:22,273] DEBUG [Controller 3]: All shutting down brokers:
> 1,5,2,3,4 (kafka.controller.KafkaController)
> [2018-03-19 17:57:22,273] DEBUG [Controller 3]: Live brokers:
> (kafka.controller.KafkaController)
> ...
> [2018-03-19 17:57:22,275] ERROR Controller 3 epoch 83 encountered error while
> electing leader for partition
> [zughaltphase_v3_intern_intern_partitioned_by_evanummer,6] due to: No other
> replicas in ISR 1,3,5 for
> [zughaltphase_v3_intern_intern_partitioned_by_evanummer,6] besides shutting
> down brokers 1,5,2,3,4. (state.change.logger) {code}
> The question is why the Kafka controller assumes that all brokers are
> shutting down?
> The only place in the Kafka code (0.11.0.2) we found in which the shutting
> down broker set is changed is in the class _kafka.controller.KafkaControler_
> in line 1407 in the method _doControlledShutdown_.
>
> {code:java}
> info("Shutting down broker " + id)
> if (!controllerContext.liveOrShuttingDownBrokerIds.contains(id))
> throw new BrokerNotAvailableException("Broker id %d does not
> exist.".format(id))
> controllerContext.shuttingDownBrokerIds.add(id)
> {code}
> However, we should see the log entry "Shutting down broker n" for all Brokers
> in the log file, but it is not there.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)