[jira] [Commented] (KAFKA-15240) BrokerToControllerChannelManager cache activeController error cause DefaultAlterPartitionManager send AlterPartition request failed

Guozhang Wang (Jira) Thu, 27 Jul 2023 10:56:04 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748281#comment-17748281
 ]


Guozhang Wang commented on KAFKA-15240:
---------------------------------------

[~lushilin] Thanks for reporting this. I think [~cmccabe] [~hachikuji] would 
have the most context to help investigating.

> BrokerToControllerChannelManager cache activeController error cause 
> DefaultAlterPartitionManager send AlterPartition request failed
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-15240
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15240
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 2.8.0, 2.8.1, 2.8.2, 3.5.0
>         Environment: 2.8.1 kafka version
>            Reporter: shilin Lu
>            Assignee: shilin Lu
>            Priority: Major
>         Attachments: image-2023-07-24-16-35-56-589.png
>
>
> After KIP-497，partition leader do not use zk to propagateIsrChanges，it will 
> send AlterPartitionRequest to controller to propagateIsrChanges.Then broker 
> will cache active controller node info through controllerNodeProvider 
> interface.
> 2023.07.12，in kafka product environment，we find so much `Broker had a stale 
> broker epoch` when send partitionAlterRequest to controller.And in this kafka 
> cluster has so much replica not in isr assignment with replica fetch is 
> correct.So it only propagateIsrChanges failed.
> !https://iwiki.woa.com/tencent/api/attachments/s3/url?attachmentid=3165506!
> But there has something strange，if broker send partitionAlterRequest failed 
> controller will print some log like this.But in active controller node not 
> find this log info
> !image-2023-07-24-16-35-56-589.png!
> Then i just suspect this broker connect to an error active controller.Through 
> network packet capture, find this broker connect to an unfamiliar broker 
> port(9092) send request.Refer to this kafka cluster operation history，find 
> this unfamiliar broker is an old broker node in this cluster and this node is 
> a controller node in new kafka cluster.
> Current BrokerToControllerChannelManager update active controller only 
> happened when disconnect or responseCode is NOT_CONTROLLER. So when no 
> request send and error broker node is another kafka cluster controller 
> node，this case will repetite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-15240) BrokerToControllerChannelManager cache activeController error cause DefaultAlterPartitionManager send AlterPartition request failed

Reply via email to