[
https://issues.apache.org/jira/browse/KAFKA-6880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Lin updated KAFKA-6880:
----------------------------
Fix Version/s: 2.1.0
> Zombie replicas must be fenced
> ------------------------------
>
> Key: KAFKA-6880
> URL: https://issues.apache.org/jira/browse/KAFKA-6880
> Project: Kafka
> Issue Type: Improvement
> Reporter: Jason Gustafson
> Assignee: Jason Gustafson
> Priority: Major
> Labels: needs-kip
> Fix For: 2.1.0
>
>
> Let's say we have three replicas for a partition: 1, 2 ,and 3.
> In epoch 0, broker 1 is the leader and writes up to offset 50. Broker 2
> replicates up to offset 50, but broker 3 is a little behind at offset 40. The
> high watermark is 40.
> Suppose that broker 2 has a zk session expiration event, but fails to detect
> it or fails to reestablish a session (e.g. due to a bug like KAFKA-6879), and
> it continues fetching from broker 1.
> For whatever reason, broker 3 is elected the leader for epoch 1 beginning at
> offset 40. Broker 1 detects the leader change and truncates its log to offset
> 40. Some new data is appended up to offset 60, which is fully replicated to
> broker 1. Broker 2 continues fetching from broker 1 at offset 50, but gets
> NOT_LEADER_FOR_PARTITION errors, which is retriable and hence broker 2 will
> retry.
> After some time, broker 1 becomes the leader again for epoch 2. Broker 1
> begins at offset 60. Broker 2 has not exhausted retries and is now able to
> fetch at offset 50 and append the last 10 records in order to catch up.
> However, because it did not observed the leader changes, it never saw the
> need to truncate its log. Hence offsets 40-49 still reflect the uncommitted
> changes from epoch 0. Neither KIP-101 nor KIP-279 can fix this because the
> tail of the log is correct.
> The basic problem is that zombie replicas are not fenced properly by the
> leader epoch. We actually observed a sequence roughly like this after a
> broker had partially deadlocked from KAFKA-6879. We should add the leader
> epoch to fetch requests so that the leader can fence the zombie replicas.
> A related problem is that we currently allow such zombie replicas to be added
> to the ISR even if they are in an offline state. The problem is that the
> controller will never elect them, so being part of the ISR does not give the
> availability guarantee that is intended. This would also be fixed by
> verifying replica leader epoch in fetch requests.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)