subject:"\[jira\] \[Commented\] \(KAFKA\-13944\) Shutting down broker can be elected as partition leader in KRaft"

[jira] [Commented] (KAFKA-13944) Shutting down broker can be elected as partition leader in KRaft

2022-06-03 Thread Jose Armando Garcia Sancio (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-13944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17547157#comment-17547157
 ] 

Jose Armando Garcia Sancio commented on KAFKA-13944:


Looks like this issue is addressed by 
https://issues.apache.org/jira/browse/KAFKA-13916

 

> Shutting down broker can be elected as partition leader in KRaft
> 
>
> Key: KAFKA-13944
> URL: https://issues.apache.org/jira/browse/KAFKA-13944
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
>  Labels: kip-500
>
> When a broker requests shutdown, it transitions to the CONTROLLED_SHUTDOWN 
> state in the controller. It is possible for the broker to remain unfenced in 
> this state until the controlled shutdown completes. When doing an election, 
> the only thing we generally check is that the broker is unfenced, so this 
> means we can elect a broker that is in controlled shutdown. 
> Here are a few snippets from a recent system test in which this occurred:
> {code:java}
> // broker 2 starts controlled shutdown
> [2022-05-26 21:17:26,451] INFO [Controller 3001] Unfenced broker 2 has 
> requested and been granted a controlled shutdown. 
> (org.apache.kafka.controller.BrokerHeartbeatManager)
>  
> // there is only one replica, so we set leader to -1
> [2022-05-26 21:17:26,452] DEBUG [Controller 3001] partition change for _foo-1 
> with topic ID _iUQ72T_R4mmZgI3WrsyXw: leader: 2 -> -1, leaderEpoch: 0 -> 1, 
> partitionEpoch: 0 -> 1 (org.apache.kafka.controller.ReplicationControlManager)
> // controlled shutdown cannot complete immediately
> [2022-05-26 21:17:26,529] DEBUG [Controller 3001] The request from broker 2 
> to shut down can not yet be granted because the lowest active offset 177 is 
> not greater than the broker's shutdown offset 244. 
> (org.apache.kafka.controller.BrokerHeartbeatManager)
> [2022-05-26 21:17:26,530] DEBUG [Controller 3001] Updated the controlled 
> shutdown offset for broker 2 to 244. 
> (org.apache.kafka.controller.BrokerHeartbeatManager)
> // later on we elect leader 2 again
> [2022-05-26 21:17:27,703] DEBUG [Controller 3001] partition change for _foo-1 
> with topic ID _iUQ72T_R4mmZgI3WrsyXw: leader: -1 -> 2, leaderEpoch: 1 -> 2, 
> partitionEpoch: 1 -> 2 (org.apache.kafka.controller.ReplicationControlManager)
> // now controlled shutdown is stuck because of the newly elected leader
> [2022-05-26 21:17:28,531] DEBUG [Controller 3001] Broker 2 is in controlled 
> shutdown state, but can not shut down because more leaders still need to be 
> moved. (org.apache.kafka.controller.BrokerHeartbeatManager)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (KAFKA-13944) Shutting down broker can be elected as partition leader in KRaft

2022-06-01 Thread Jose Armando Garcia Sancio (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-13944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17545042#comment-17545042
 ] 

Jose Armando Garcia Sancio commented on KAFKA-13944:


When fixing this lets improve the logging so that the replica control manager 
logs the reason that triggered the election.

> Shutting down broker can be elected as partition leader in KRaft
> 
>
> Key: KAFKA-13944
> URL: https://issues.apache.org/jira/browse/KAFKA-13944
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Priority: Major
>  Labels: kip-500
>
> When a broker requests shutdown, it transitions to the CONTROLLED_SHUTDOWN 
> state in the controller. It is possible for the broker to remain unfenced in 
> this state until the controlled shutdown completes. When doing an election, 
> the only thing we generally check is that the broker is unfenced, so this 
> means we can elect a broker that is in controlled shutdown. 
> Here are a few snippets from a recent system test in which this occurred:
> {code:java}
> // broker 2 starts controlled shutdown
> [2022-05-26 21:17:26,451] INFO [Controller 3001] Unfenced broker 2 has 
> requested and been granted a controlled shutdown. 
> (org.apache.kafka.controller.BrokerHeartbeatManager)
>  
> // there is only one replica, so we set leader to -1
> [2022-05-26 21:17:26,452] DEBUG [Controller 3001] partition change for _foo-1 
> with topic ID _iUQ72T_R4mmZgI3WrsyXw: leader: 2 -> -1, leaderEpoch: 0 -> 1, 
> partitionEpoch: 0 -> 1 (org.apache.kafka.controller.ReplicationControlManager)
> // controlled shutdown cannot complete immediately
> [2022-05-26 21:17:26,529] DEBUG [Controller 3001] The request from broker 2 
> to shut down can not yet be granted because the lowest active offset 177 is 
> not greater than the broker's shutdown offset 244. 
> (org.apache.kafka.controller.BrokerHeartbeatManager)
> [2022-05-26 21:17:26,530] DEBUG [Controller 3001] Updated the controlled 
> shutdown offset for broker 2 to 244. 
> (org.apache.kafka.controller.BrokerHeartbeatManager)
> // later on we elect leader 2 again
> [2022-05-26 21:17:27,703] DEBUG [Controller 3001] partition change for _foo-1 
> with topic ID _iUQ72T_R4mmZgI3WrsyXw: leader: -1 -> 2, leaderEpoch: 1 -> 2, 
> partitionEpoch: 1 -> 2 (org.apache.kafka.controller.ReplicationControlManager)
> // now controlled shutdown is stuck because of the newly elected leader
> [2022-05-26 21:17:28,531] DEBUG [Controller 3001] Broker 2 is in controlled 
> shutdown state, but can not shut down because more leaders still need to be 
> moved. (org.apache.kafka.controller.BrokerHeartbeatManager)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (KAFKA-13944) Shutting down broker can be elected as partition leader in KRaft

[jira] [Commented] (KAFKA-13944) Shutting down broker can be elected as partition leader in KRaft

2 matches

Site Navigation

Mail list logo

Footer information