[ https://issues.apache.org/jira/browse/KAFKA-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752604#comment-17752604 ]
shilin Lu commented on KAFKA-15240: ----------------------------------- [~cmccabe] [~hachikuji] please take a look at this issue,thks > BrokerToControllerChannelManager cache activeController error cause > DefaultAlterPartitionManager send AlterPartition request failed > ----------------------------------------------------------------------------------------------------------------------------------- > > Key: KAFKA-15240 > URL: https://issues.apache.org/jira/browse/KAFKA-15240 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 2.8.0, 2.8.1, 2.8.2, 3.5.0 > Environment: 2.8.1 kafka version > Reporter: shilin Lu > Assignee: shilin Lu > Priority: Major > Attachments: image-2023-07-24-16-35-56-589.png > > > After KIP-497,partition leader do not use zk to propagateIsrChanges,it will > send AlterPartitionRequest to controller to propagateIsrChanges.Then broker > will cache active controller node info through controllerNodeProvider > interface. > 2023.07.12,in kafka product environment,we find so much `Broker had a stale > broker epoch` when send partitionAlterRequest to controller.And in this kafka > cluster has so much replica not in isr assignment with replica fetch is > correct.So it only propagateIsrChanges failed. > !https://iwiki.woa.com/tencent/api/attachments/s3/url?attachmentid=3165506! > But there has something strange,if broker send partitionAlterRequest failed > controller will print some log like this.But in active controller node not > find this log info > !image-2023-07-24-16-35-56-589.png! > Then i just suspect this broker connect to an error active controller.Through > network packet capture, find this broker connect to an unfamiliar broker > port(9092) send request.Refer to this kafka cluster operation history,find > this unfamiliar broker is an old broker node in this cluster and this node is > a controller node in new kafka cluster. > Current BrokerToControllerChannelManager update active controller only > happened when disconnect or responseCode is NOT_CONTROLLER. So when no > request send and error broker node is another kafka cluster controller > node,this case will repetite. -- This message was sent by Atlassian Jira (v8.20.10#820010)