[ 
https://issues.apache.org/jira/browse/KAFKA-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Eisele updated KAFKA-6714:
------------------------------
    Priority: Critical  (was: Major)

> KafkaController marks all Brokers as "Shutting down", though only one broker 
> has been shut down
> -----------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-6714
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6714
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller, core
>    Affects Versions: 0.11.0.2
>         Environment: Kafka Cluster on Amazon AWS EC2 r4.2xlarge instances 
> with 5 nodes and a Zookeeper Cluster on r4.2xlarge instances with 3 nodes. 
> The Cluster is distributed across 2 availability zones.
>            Reporter: Uwe Eisele
>            Priority: Critical
>
> In our Kafka Cluster we experienced a situation in wich the Kafka controller 
> has all Brokers marked as "Shutting down", though indeed only one Broker has 
> been shut down.
> The last log entry about the broker state before the entry that states that 
> all brokers are shutting down states that no brokers are shutting down.
> The consequence of this weird state is, that the Kafka controller is not able 
> to elect any partition leader.
> {code:java}
> [2018-03-15 16:28:24,288] INFO [Controller 5]: Shutting down broker 5 
> (kafka.controller.KafkaController)
> [2018-03-15 16:28:24,288] DEBUG [Controller 5]: All shutting down brokers: 5 
> (kafka.controller.KafkaController)
> [2018-03-15 16:28:24,288] DEBUG [Controller 5]: Live brokers: 1,2,3,4 
> (kafka.controller.KafkaController)
> ...
> [2018-03-15 16:28:36,846] INFO [Controller 3]: Currently active brokers in 
> the cluster: Set(1, 2, 3, 4) (kafka.controller.KafkaController)
> [2018-03-15 16:28:36,846] INFO [Controller 3]: Currently shutting brokers in 
> the cluster: Set() (kafka.controller.KafkaController)
> ...
> [2018-03-19 17:57:22,273] INFO [Controller 3]: Shutting down broker 1 
> (kafka.controller.KafkaController)
> [2018-03-19 17:57:22,273] DEBUG [Controller 3]: All shutting down brokers: 
> 1,5,2,3,4 (kafka.controller.KafkaController)
> [2018-03-19 17:57:22,273] DEBUG [Controller 3]: Live brokers:  
> (kafka.controller.KafkaController)
> ...
> [2018-03-19 17:57:22,275] ERROR Controller 3 epoch 83 encountered error while 
> electing leader for partition 
> [zughaltphase_v3_intern_intern_partitioned_by_evanummer,6] due to: No other 
> replicas in ISR 1,3,5 for 
> [zughaltphase_v3_intern_intern_partitioned_by_evanummer,6] besides shutting 
> down brokers 1,5,2,3,4. (state.change.logger) {code}
> The question is why the Kafka controller assumes that all brokers are 
> shutting down?
> The only place in the Kafka code (0.11.0.2) we found in which the shutting 
> down broker set is changed is in the class _kafka.controller.KafkaControler_ 
> in line 1407 in the method _doControlledShutdown_.
>  
> {code:java}
> info("Shutting down broker " + id)
> if (!controllerContext.liveOrShuttingDownBrokerIds.contains(id))
>   throw new BrokerNotAvailableException("Broker id %d does not 
> exist.".format(id))
> controllerContext.shuttingDownBrokerIds.add(id)
> {code}
> However, we should see the log entry "Shutting down broker n" for all Brokers 
> in the log file, but it is not there.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to