[ 
https://issues.apache.org/jira/browse/KAFKA-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212602#comment-14212602
 ] 

Scott Hunt commented on KAFKA-948:
----------------------------------

I think I just ran into this same issue on our cluster yesterday.  Kafka 
version 2.8.0-0.8.0+46.
I first noticed there was a real problem when we had a leader that wasn't in 
the replica list.  (Step 5 below.)

Here's what (I think) happened:
1. We had one broker in our cluster fail due to assumed hardware issues (id = 5)
2. A couple days into the failure, I lost faith in ever seeing that machine 
resurrected and used kafka-reassign-topic.sh to remove broker 5 from all the 
replica sets (replacing them with other nodes) so that we were back to full (3) 
replication.  There were 2 topics with 24 partitions each that were on broker 5 
and needed to be moved.  One of the topics is *really* low traffic (most 
partitions get less than 1 message per day).
3. After moving broker 5 out of the replica sets for all partitions, I noticed 
that broker 5 was still listed in the ISR for some of the partitions in the 
low-traffic topic.
4. Later that night, our Technical Operations staff miraculously brought broker 
5 back online.  I assumed everything was fine and went back to sleep.
5. The next day I checked back and, due probably to some network hiccup, a 
couple of the partitions listed the no-longer-dead broker as their leader, even 
though it wasn't in the replica list.
    i.e. it showed something like:
        topic: xxx    partition: 8    leader: 5    replicas: 8,4,3    isr: 
8,5,4,3
6. I was somewhat alarmed.
7. So I shut down broker 5 (just stopping kafka), so that it would pick new 
leaders for those partitions.
8. I now have 14 partitions that have broker 5 still in isr and not in replicas.


> ISR list in LeaderAndISR path not updated for partitions when Broker (which 
> is not leader) is down
> --------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-948
>                 URL: https://issues.apache.org/jira/browse/KAFKA-948
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.8.0
>            Reporter: Dibyendu Bhattacharya
>            Assignee: Neha Narkhede
>
> When the broker which is the leader for a partition is down, the ISR list in 
> the LeaderAndISR path is updated. But if the broker , which is not a leader 
> of the partition is down, the ISR list is not getting updated. This is an 
> issues because ISR list contains the stale entry.
> This issue I found in kafka-0.8.0-beta1-candidate1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to