[ 
https://issues.apache.org/jira/browse/KAFKA-13790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-13790.
---------------------------------
    Fix Version/s: 3.3.0
         Reviewer: Jason Gustafson
       Resolution: Fixed

> ReplicaManager should be robust to all partition updates from kraft metadata 
> log
> --------------------------------------------------------------------------------
>
>                 Key: KAFKA-13790
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13790
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jason Gustafson
>            Assignee: David Jacot
>            Priority: Major
>             Fix For: 3.3.0
>
>
> There are two ways that partition state can be updated in the zk world: one 
> is through `LeaderAndIsr` requests and one is through `AlterPartition` 
> responses. All changes made to partition state result in new LeaderAndIsr 
> requests, but replicas will ignore them if the leader epoch is less than or 
> equal to the current known leader epoch. Basically it works like this:
>  * Changes made by the leader are done through AlterPartition requests. These 
> changes bump the partition epoch (or zk version), but leave the leader epoch 
> unchanged. LeaderAndIsr requests are sent by the controller, but replicas 
> ignore them. Partition state is instead only updated when the AlterIsr 
> response is received.
>  * Changes made by the controller are made directly by the controller and 
> always result in a leader epoch bump. These changes are sent to replicas 
> through LeaderAndIsr requests and are applied by replicas.
> The code in `kafka.server.ReplicaManager` and `kafka.cluster.Partition` are 
> built on top of these assumptions. The logic in `makeLeader`, for example, 
> assumes that the leader epoch has indeed been bumped. Specifically, follower 
> state gets reset and a new entry is written to the leader epoch cache.
> In KRaft, we also have two paths to update partition state. One is 
> AlterPartition, just like in the zk world. The second is updates received 
> from the metadata log. These follow the same path as LeaderAndIsr requests 
> for the most part, but a big difference is that all changes are sent down to 
> `kafka.cluster.Partition`, even those which do not have a bumped leader 
> epoch. This breaks the assumptions mentioned above in `makeLeader`, which 
> could result in leader epoch cache inconsistency. Another side effect of this 
> on the follower side is that replica fetchers for updated partitions get 
> unnecessarily restarted. There may be others as well.
> We need to either replicate the same logic on the zookeeper side or make the 
> logic robust to all updates including those without a leader epoch bump.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to