[ https://issues.apache.org/jira/browse/KAFKA-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Uwe Eisele updated KAFKA-6714: ------------------------------ Priority: Critical (was: Major) > KafkaController marks all Brokers as "Shutting down", though only one broker > has been shut down > ----------------------------------------------------------------------------------------------- > > Key: KAFKA-6714 > URL: https://issues.apache.org/jira/browse/KAFKA-6714 > Project: Kafka > Issue Type: Bug > Components: controller, core > Affects Versions: 0.11.0.2 > Environment: Kafka Cluster on Amazon AWS EC2 r4.2xlarge instances > with 5 nodes and a Zookeeper Cluster on r4.2xlarge instances with 3 nodes. > The Cluster is distributed across 2 availability zones. > Reporter: Uwe Eisele > Priority: Critical > > In our Kafka Cluster we experienced a situation in wich the Kafka controller > has all Brokers marked as "Shutting down", though indeed only one Broker has > been shut down. > The last log entry about the broker state before the entry that states that > all brokers are shutting down states that no brokers are shutting down. > The consequence of this weird state is, that the Kafka controller is not able > to elect any partition leader. > {code:java} > [2018-03-15 16:28:24,288] INFO [Controller 5]: Shutting down broker 5 > (kafka.controller.KafkaController) > [2018-03-15 16:28:24,288] DEBUG [Controller 5]: All shutting down brokers: 5 > (kafka.controller.KafkaController) > [2018-03-15 16:28:24,288] DEBUG [Controller 5]: Live brokers: 1,2,3,4 > (kafka.controller.KafkaController) > ... > [2018-03-15 16:28:36,846] INFO [Controller 3]: Currently active brokers in > the cluster: Set(1, 2, 3, 4) (kafka.controller.KafkaController) > [2018-03-15 16:28:36,846] INFO [Controller 3]: Currently shutting brokers in > the cluster: Set() (kafka.controller.KafkaController) > ... > [2018-03-19 17:57:22,273] INFO [Controller 3]: Shutting down broker 1 > (kafka.controller.KafkaController) > [2018-03-19 17:57:22,273] DEBUG [Controller 3]: All shutting down brokers: > 1,5,2,3,4 (kafka.controller.KafkaController) > [2018-03-19 17:57:22,273] DEBUG [Controller 3]: Live brokers: > (kafka.controller.KafkaController) > ... > [2018-03-19 17:57:22,275] ERROR Controller 3 epoch 83 encountered error while > electing leader for partition > [zughaltphase_v3_intern_intern_partitioned_by_evanummer,6] due to: No other > replicas in ISR 1,3,5 for > [zughaltphase_v3_intern_intern_partitioned_by_evanummer,6] besides shutting > down brokers 1,5,2,3,4. (state.change.logger) {code} > The question is why the Kafka controller assumes that all brokers are > shutting down? > The only place in the Kafka code (0.11.0.2) we found in which the shutting > down broker set is changed is in the class _kafka.controller.KafkaControler_ > in line 1407 in the method _doControlledShutdown_. > > {code:java} > info("Shutting down broker " + id) > if (!controllerContext.liveOrShuttingDownBrokerIds.contains(id)) > throw new BrokerNotAvailableException("Broker id %d does not > exist.".format(id)) > controllerContext.shuttingDownBrokerIds.add(id) > {code} > However, we should see the log entry "Shutting down broker n" for all Brokers > in the log file, but it is not there. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)