Andrey Falko created KAFKA-8702:
-----------------------------------

             Summary: Kafka leader election doesn't happen when leader broker 
port is partitioned off the network
                 Key: KAFKA-8702
                 URL: https://issues.apache.org/jira/browse/KAFKA-8702
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 2.1.0
            Reporter: Andrey Falko


We first started seeing this with 2.1.1 version of Kafka. We are currently on 
2.3.0. 

We were able to actively reproduce this today on one of our staging 
environments. The reproduction steps are as follows: 
1) Push some traffic to a topic that looks like this: 
$ bin/kafka-topics.sh --describe --zookeeper $(grep zookeeper.connect= 
/kafka/config/server.properties | awk -F= '\{print $2}') --topic test 
Topic:test      PartitionCount:6        ReplicationFactor:3     
Configs:cleanup.policy=delete,[retention.ms|http://retention.ms/]=86400000 
       Topic: test     Partition: 0    Leader: 0       Replicas: 2,0,1 Isr: 1,0 
       Topic: test     Partition: 1    Leader: 0       Replicas: 0,1,2 Isr: 1,0 
       Topic: test     Partition: 2    Leader: 1       Replicas: 1,2,0 Isr: 1,0 
       Topic: test     Partition: 3    Leader: 1       Replicas: 2,1,0 Isr: 1,0 
       Topic: test     Partition: 4    Leader: 0       Replicas: 0,2,1 Isr: 1,0 
       Topic: test     Partition: 5    Leader: 1       Replicas: 1,0,2 Isr: 1,0

2) We proceed to run the following on broker 0:
iptables -D INPUT -j DROP -p tcp --destination-port 9093 && iptables -D OUTPUT 
-j DROP -p tcp --destination-port 9093
Note: our replication and traffic from clients comes in on TLS protected port 
9093 only. 

3) Leadership doesn't change b/c Zookeeper connection is unaffected. However, 
we start seeing URP. 

4) We reboot broker 0. We see offline partitions. Leadership never changes and 
the cluster only recovers when broker 0 comes back online.

My colleague Kailash was helping me reproduce this today and I have added him 
to the CC list. Should we post this behavior on the public Kafka channel and 
see if this is worthy of filing on a bug on? We don't mind the URP state 
behavior, but as soon as broker 0 get killed, leader election would ideally 
occur to avoid offline state.

Best regards,
Andrey Falko



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to