[jira] [Updated] (KAFKA-13173) KRaft controller does not handle simultaneous broker expirations correctly

Jason Gustafson (Jira) Thu, 05 Aug 2021 13:46:03 -0700


     [ 
https://issues.apache.org/jira/browse/KAFKA-13173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jason Gustafson updated KAFKA-13173:
------------------------------------
    Fix Version/s: 3.0.0

> KRaft controller does not handle simultaneous broker expirations correctly
> --------------------------------------------------------------------------
>
>                 Key: KAFKA-13173
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13173
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jason Gustafson
>            Priority: Blocker
>             Fix For: 3.0.0
>
>
> In `ReplicationControlManager.fenceStaleBrokers`, we find all of the current 
> stale replicas and attempt to remove them from the ISR. However, when 
> multiple expirations occur at once, we do not properly accumulate the ISR 
> changes. For example, I ran a test where the ISR of a partition was 
> initialized to [1, 2, 3]. Then I triggered a timeout of replicas 2 and 3 at 
> the same time. The records that were generated by `fenceStaleBrokers` were 
> the following:
> {code}
> ApiMessageAndVersion(PartitionChangeRecord(partitionId=0, 
> topicId=_seg8hBuSymBHUQ1sMKr2g, isr=[1, 3], leader=1, replicas=null, 
> removingReplicas=null, addingReplicas=null) at version 0), 
> ApiMessageAndVersion(FenceBrokerRecord(id=2, epoch=102) at version 0), 
> ApiMessageAndVersion(PartitionChangeRecord(partitionId=0, 
> topicId=_seg8hBuSymBHUQ1sMKr2g, isr=[1, 2], leader=1, replicas=null, 
> removingReplicas=null, addingReplicas=null) at version 0), 
> ApiMessageAndVersion(FenceBrokerRecord(id=3, epoch=103) at version 0)]
> {code}
> First the ISR is shrunk to [1, 3] as broker 2 is fenced. We also see the 
> record to fence broker 2. Then the ISR is modified to [1, 2] as the fencing 
> of broker 3 is handled. So we did not account for the fact that we had 
> already fenced broker 2 in the request. 
> A simple solution for now is to change the logic to handle fencing only one 
> broker at a time. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (KAFKA-13173) KRaft controller does not handle simultaneous broker expirations correctly

Reply via email to