[jira] [Created] (KAFKA-15240) BrokerToControllerChannelManager cache activeController error cause DefaultAlterPartitionManager send AlterPartition request failed

shilin Lu (Jira) Mon, 24 Jul 2023 01:51:05 -0700

shilin Lu created KAFKA-15240:
---------------------------------

             Summary: BrokerToControllerChannelManager cache activeController 
error cause DefaultAlterPartitionManager send AlterPartition request failed
                 Key: KAFKA-15240
                 URL: https://issues.apache.org/jira/browse/KAFKA-15240
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 3.5.0, 2.8.2, 2.8.1, 2.8.0
         Environment: 2.8.1 kafka version
            Reporter: shilin Lu
            Assignee: shilin Lu
         Attachments: image-2023-07-24-16-35-56-589.png


After KIP-497，partition leader do not use zk to propagateIsrChanges，it will 
send AlterPartitionRequest to controller to propagateIsrChanges.Then broker 
will cache active controller node info through controllerNodeProvider interface.

2023.07.12，in kafka product environment，we find so much `Broker had a stale 
broker epoch` when send partitionAlterRequest to controller.And in this kafka 
cluster has so much replica not in isr assignment with replica fetch is 
correct.So it only propagateIsrChanges failed.

!https://iwiki.woa.com/tencent/api/attachments/s3/url?attachmentid=3165506!

But there has something strange，if broker send partitionAlterRequest failed 
controller will print some log like this.But in active controller node not find 
this log info

!image-2023-07-24-16-35-56-589.png!

Then i just suspect this broker connect to an error active controller.Through 
network packet capture, find this broker connect to an unfamiliar broker 
port(9092) send request.Refer to this kafka cluster operation history，find this 
unfamiliar broker is an old broker node in this cluster and this node is a 
controller node in new kafka cluster.

Current BrokerToControllerChannelManager update active controller only happened 
when disconnect or responseCode is NOT_CONTROLLER. So when no request send and 
error broker node is another kafka cluster controller node，this case will 
repetite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-15240) BrokerToControllerChannelManager cache activeController error cause DefaultAlterPartitionManager send AlterPartition request failed

Reply via email to