[ https://issues.apache.org/jira/browse/KAFKA-13173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Gustafson updated KAFKA-13173: ------------------------------------ Description: In `ReplicationControlManager.fenceStaleBrokers`, we find all of the current stale replicas and attempt to remove them from the ISR. However, when multiple expirations occur at once, we do not properly accumulate the ISR changes. For example, I ran a test where the ISR of a partition was initialized to [1, 2, 3]. Then I triggered a timeout of replicas 2 and 3 at the same time. The records that were generated by `fenceStaleBrokers` were the following: {code} ApiMessageAndVersion(PartitionChangeRecord(partitionId=0, topicId=_seg8hBuSymBHUQ1sMKr2g, isr=[1, 3], leader=1, replicas=null, removingReplicas=null, addingReplicas=null) at version 0), ApiMessageAndVersion(FenceBrokerRecord(id=2, epoch=102) at version 0), ApiMessageAndVersion(PartitionChangeRecord(partitionId=0, topicId=_seg8hBuSymBHUQ1sMKr2g, isr=[1, 2], leader=1, replicas=null, removingReplicas=null, addingReplicas=null) at version 0), ApiMessageAndVersion(FenceBrokerRecord(id=3, epoch=103) at version 0)] {code} First the ISR is shrunk to [1, 3] as broker 2 is fenced. We also see the record to fence broker 2. Then the ISR is modified to [1, 2] as the fencing of broker 3 is handled. So we did not account for the fact that we had already fenced broker 2 in the request. A simple solution for now is to change the logic to handle fencing only one broker at a time. was: In `ReplicationControlManager.fenceStaleBrokers`, we find all of the current stale replicas and attempt to remove them from the ISR. However, when multiple expirations occur at once, we do not properly accumulate the ISR changes. For example, I ran a test where the ISR of a partition was initialized to [1, 2, 3]. Then I triggered a timeout of replicas 2 and 3 at the same time. The records that were generated by `fenceStaleBrokers` were the following: {code} ApiMessageAndVersion(PartitionChangeRecord(partitionId=0, topicId=_seg8hBuSymBHUQ1sMKr2g, isr=[1, 3], leader=1, replicas=null, removingReplicas=null, addingReplicas=null) at version 0), ApiMessageAndVersion(FenceBrokerRecord(id=2, epoch=102) at version 0), ApiMessageAndVersion(PartitionChangeRecord(partitionId=0, topicId=_seg8hBuSymBHUQ1sMKr2g, isr=[1, 2], leader=1, replicas=null, removingReplicas=null, addingReplicas=null) at version 0), ApiMessageAndVersion(FenceBrokerRecord(id=3, epoch=103) at version 0)] {code} First the ISR is shrunk to [1, 3] as broker 2 is fenced. We also see the record to fence broker 2. Then the ISR is modified to [1, 2] as the fencing of broker 3 is handled. So we did not account for the fact that we had already fenced broker 2 in the request. A simple solution for now is to change the logic to handle fencing only one broker at a time. > KRaft controller does not handle simultaneous broker expirations correctly > -------------------------------------------------------------------------- > > Key: KAFKA-13173 > URL: https://issues.apache.org/jira/browse/KAFKA-13173 > Project: Kafka > Issue Type: Bug > Reporter: Jason Gustafson > Assignee: Niket Goel > Priority: Blocker > Fix For: 3.0.0 > > > In `ReplicationControlManager.fenceStaleBrokers`, we find all of the current > stale replicas and attempt to remove them from the ISR. However, when > multiple expirations occur at once, we do not properly accumulate the ISR > changes. For example, I ran a test where the ISR of a partition was > initialized to [1, 2, 3]. Then I triggered a timeout of replicas 2 and 3 at > the same time. The records that were generated by `fenceStaleBrokers` were > the following: > {code} > ApiMessageAndVersion(PartitionChangeRecord(partitionId=0, > topicId=_seg8hBuSymBHUQ1sMKr2g, isr=[1, 3], leader=1, replicas=null, > removingReplicas=null, addingReplicas=null) at version 0), > ApiMessageAndVersion(FenceBrokerRecord(id=2, epoch=102) at version 0), > ApiMessageAndVersion(PartitionChangeRecord(partitionId=0, > topicId=_seg8hBuSymBHUQ1sMKr2g, isr=[1, 2], leader=1, replicas=null, > removingReplicas=null, addingReplicas=null) at version 0), > ApiMessageAndVersion(FenceBrokerRecord(id=3, epoch=103) at version 0)] > {code} > First the ISR is shrunk to [1, 3] as broker 2 is fenced. We also see the > record to fence broker 2. Then the ISR is modified to [1, 2] as the fencing > of broker 3 is handled. So we did not account for the fact that we had > already fenced broker 2 in the request. > A simple solution for now is to change the logic to handle fencing only one > broker at a time. -- This message was sent by Atlassian Jira (v8.3.4#803005)