Ivan Babrou created KAFKA-5633:
----------------------------------
Summary: Clarify another scenario of unclean leader election
Key: KAFKA-5633
URL: https://issues.apache.org/jira/browse/KAFKA-5633
Project: Kafka
Issue Type: Bug
Reporter: Ivan Babrou
When unclean leader election is enabled, you don't need to lose all replicas of
some partition, it's enough to lose just one. Leading replica can get into the
state when it kicks everything out of ISR because it has issue with the
network, then it can just die, causing leaderless partition.
This is what we saw:
{noformat}
Jul 24 18:05:53 broker-10029 kafka[4104]: INFO Partition [requests,9] on broker
10029: Shrinking ISR for partition [requests,9] from 10029,10016,10072 to 10029
(kafka.cluster.Partition)
{noformat}
{noformat}
Topic: requests Partition: 9 Leader: -1 Replicas:
10029,10072,10016 Isr: 10029
{noformat}
This is the default behavior in 0.11.0.0+, but I don't think that docs are
completely clear about implications. Before the change you could silently lose
data if the scenario described above happened, but now you can grind your whole
pipeline to halt when just one node has issues. My understanding is that to
avoid this you'd want to have min.insync.replicas > 1 and acks > 1 (probably
all).
It's also worth documenting how to force leader election when unclean leader
election is disabled. I assume it can be accomplished by switching
unclean.leader.election.enable on and off again for problematic topic, but
being crystal clear on this it docs would be tremendously helpful.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)