[ https://issues.apache.org/jira/browse/KAFKA-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viktor Somogyi updated KAFKA-5453: ---------------------------------- Fix Version/s: (was: 2.2.0) 2.3.0 > Controller may miss requests sent to the broker when zk session timeout > happens. > -------------------------------------------------------------------------------- > > Key: KAFKA-5453 > URL: https://issues.apache.org/jira/browse/KAFKA-5453 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.11.0.0 > Reporter: Jiangjie Qin > Assignee: Viktor Somogyi > Priority: Major > Fix For: 2.3.0 > > > The issue I encountered was the following: > 1. Partition reassignment was in progress, one replica of a partition is > being reassigned from broker 1 to broker 2. > 2. Controller received an ISR change notification which indicates broker 2 > has caught up. > 3. Controller was sending StopReplicaRequest to broker 1. > 4. Broker 1 zk session timeout occurs. Controller removed broker 1 from the > cluster and cleaned up the queue. i.e. the StopReplicaRequest was removed > from the ControllerChannelManager. > 5. Broker 1 reconnected to zk and act as if it is still a follower replica of > the partition. > 6. Broker 1 will always receive exception from the leader because it is not > in the replica list. > Not sure what is the correct fix here. It seems that broke 1 in this case > should ask the controller for the latest replica assignment. > There are two related bugs: > 1. when a {{NotAssignedReplicaException}} is thrown from > {{Partition.updateReplicaLogReadResult()}}, the other partitions in the same > request will failed to update the fetch timestamp and offset and thus also > drop out of the ISR. > 2. The {{NotAssignedReplicaException}} was not properly returned to the > replicas, instead, a UnknownServerException is returned. -- This message was sent by Atlassian JIRA (v7.6.3#76005)