[
https://issues.apache.org/jira/browse/KAFKA-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212602#comment-14212602
]
Scott Hunt commented on KAFKA-948:
----------------------------------
I think I just ran into this same issue on our cluster yesterday. Kafka
version 2.8.0-0.8.0+46.
I first noticed there was a real problem when we had a leader that wasn't in
the replica list. (Step 5 below.)
Here's what (I think) happened:
1. We had one broker in our cluster fail due to assumed hardware issues (id = 5)
2. A couple days into the failure, I lost faith in ever seeing that machine
resurrected and used kafka-reassign-topic.sh to remove broker 5 from all the
replica sets (replacing them with other nodes) so that we were back to full (3)
replication. There were 2 topics with 24 partitions each that were on broker 5
and needed to be moved. One of the topics is *really* low traffic (most
partitions get less than 1 message per day).
3. After moving broker 5 out of the replica sets for all partitions, I noticed
that broker 5 was still listed in the ISR for some of the partitions in the
low-traffic topic.
4. Later that night, our Technical Operations staff miraculously brought broker
5 back online. I assumed everything was fine and went back to sleep.
5. The next day I checked back and, due probably to some network hiccup, a
couple of the partitions listed the no-longer-dead broker as their leader, even
though it wasn't in the replica list.
i.e. it showed something like:
topic: xxx partition: 8 leader: 5 replicas: 8,4,3 isr:
8,5,4,3
6. I was somewhat alarmed.
7. So I shut down broker 5 (just stopping kafka), so that it would pick new
leaders for those partitions.
8. I now have 14 partitions that have broker 5 still in isr and not in replicas.
> ISR list in LeaderAndISR path not updated for partitions when Broker (which
> is not leader) is down
> --------------------------------------------------------------------------------------------------
>
> Key: KAFKA-948
> URL: https://issues.apache.org/jira/browse/KAFKA-948
> Project: Kafka
> Issue Type: Bug
> Components: controller
> Affects Versions: 0.8.0
> Reporter: Dibyendu Bhattacharya
> Assignee: Neha Narkhede
>
> When the broker which is the leader for a partition is down, the ISR list in
> the LeaderAndISR path is updated. But if the broker , which is not a leader
> of the partition is down, the ISR list is not getting updated. This is an
> issues because ISR list contains the stale entry.
> This issue I found in kafka-0.8.0-beta1-candidate1
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)