Ivan Babrou created KAFKA-5633:
----------------------------------

             Summary: Clarify another scenario of unclean leader election
                 Key: KAFKA-5633
                 URL: https://issues.apache.org/jira/browse/KAFKA-5633
             Project: Kafka
          Issue Type: Bug
            Reporter: Ivan Babrou


When unclean leader election is enabled, you don't need to lose all replicas of 
some partition, it's enough to lose just one. Leading replica can get into the 
state when it kicks everything out of ISR because it has issue with the 
network, then it can just die, causing leaderless partition.

This is what we saw:

{noformat}
Jul 24 18:05:53 broker-10029 kafka[4104]: INFO Partition [requests,9] on broker 
10029: Shrinking ISR for partition [requests,9] from 10029,10016,10072 to 10029 
(kafka.cluster.Partition)
{noformat}

{noformat}
        Topic: requests Partition: 9    Leader: -1      Replicas: 
10029,10072,10016     Isr: 10029
{noformat}

This is the default behavior in 0.11.0.0+, but I don't think that docs are 
completely clear about implications. Before the change you could silently lose 
data if the scenario described above happened, but now you can grind your whole 
pipeline to halt when just one node has issues. My understanding is that to 
avoid this you'd want to have min.insync.replicas > 1 and acks > 1 (probably 
all).

It's also worth documenting how to force leader election when unclean leader 
election is disabled. I assume it can be accomplished by switching 
unclean.leader.election.enable on and off again for problematic topic, but 
being crystal clear on this it docs would be tremendously helpful.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to