[ https://issues.apache.org/jira/browse/KAFKA-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212602#comment-14212602 ]
Scott Hunt commented on KAFKA-948: ---------------------------------- I think I just ran into this same issue on our cluster yesterday. Kafka version 2.8.0-0.8.0+46. I first noticed there was a real problem when we had a leader that wasn't in the replica list. (Step 5 below.) Here's what (I think) happened: 1. We had one broker in our cluster fail due to assumed hardware issues (id = 5) 2. A couple days into the failure, I lost faith in ever seeing that machine resurrected and used kafka-reassign-topic.sh to remove broker 5 from all the replica sets (replacing them with other nodes) so that we were back to full (3) replication. There were 2 topics with 24 partitions each that were on broker 5 and needed to be moved. One of the topics is *really* low traffic (most partitions get less than 1 message per day). 3. After moving broker 5 out of the replica sets for all partitions, I noticed that broker 5 was still listed in the ISR for some of the partitions in the low-traffic topic. 4. Later that night, our Technical Operations staff miraculously brought broker 5 back online. I assumed everything was fine and went back to sleep. 5. The next day I checked back and, due probably to some network hiccup, a couple of the partitions listed the no-longer-dead broker as their leader, even though it wasn't in the replica list. i.e. it showed something like: topic: xxx partition: 8 leader: 5 replicas: 8,4,3 isr: 8,5,4,3 6. I was somewhat alarmed. 7. So I shut down broker 5 (just stopping kafka), so that it would pick new leaders for those partitions. 8. I now have 14 partitions that have broker 5 still in isr and not in replicas. > ISR list in LeaderAndISR path not updated for partitions when Broker (which > is not leader) is down > -------------------------------------------------------------------------------------------------- > > Key: KAFKA-948 > URL: https://issues.apache.org/jira/browse/KAFKA-948 > Project: Kafka > Issue Type: Bug > Components: controller > Affects Versions: 0.8.0 > Reporter: Dibyendu Bhattacharya > Assignee: Neha Narkhede > > When the broker which is the leader for a partition is down, the ISR list in > the LeaderAndISR path is updated. But if the broker , which is not a leader > of the partition is down, the ISR list is not getting updated. This is an > issues because ISR list contains the stale entry. > This issue I found in kafka-0.8.0-beta1-candidate1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)