[ 
https://issues.apache.org/jira/browse/KAFKA-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124325#comment-16124325
 ] 

Pierre Mage commented on KAFKA-5074:
------------------------------------

Running 0.11.0 and observing similar behaviour.

Sequence of events recorded in logs:
1. ZooKeeper session expires
2. Kafka controller stops broker 0
3. Kafka re-register broker 0 in ZooKeeper
4. Leader cache \[mytopic,29\] -> 
(Leader:2,ISR:2,0,LeaderEpoch:0,ControllerEpoch:1)
5. Invoking state change to OfflineReplica for replicas 
\[Topic=mytopic,Partition=29,Replica=0\]
6. Retaining last ISR 0 of partition \[mytopic,29\] since unclean leader 
election is disabled
7. New leader and ISR for partition \[mytopic,29\] is 
{"leader":-1,"leader_epoch":4,"isr":[0]}
8. Not sending request (type=StopReplicaRequest...) to broker 0, since it is 
offline
9. Invoking state change to OnlineReplica for replicas 
\[Topic=mytopic,Partition=29,Replica=0\]
10. Cycle of failing preferred leader elections starts

OfflinePartitionLeaderSelector is not called as the partition's state is still 
OnlinePartition.
{code}
ERROR Controller 2 epoch 4 encountered error while electing leader for 
partition [mytopic,29] due to: Preferred replica 2 for partition [mytopci,29] 
is either not alive or not in the isr. Current leader and ISR 
[{"leader":-1,"leader_epoch":4,"isr":[0]}].
ERROR Controller 2 epoch 4 initiated state change for partition [mytopic,29] 
from OnlinePartition to OnlinePartition failed
{code}

> Transition to OnlinePartition without preferred leader in ISR fails
> -------------------------------------------------------------------
>
>                 Key: KAFKA-5074
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5074
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.9.0.0
>            Reporter: Dustin Cote
>
> Running 0.9.0.0, the controller can get into a state where it no longer is 
> able to elect a leader for an Offline partition. It's unclear how this state 
> is first achieved but in the steady state, this happens:
> -There are partitions with a leader of -1
> -The Controller repeatedly attempts a preferred leader election for these 
> partitions
> -The preferred leader election fails because the only replica in the ISR is 
> not the preferred leader
> The log cycle looks like this:
> {code}
> [2017-04-12 18:00:18,891] INFO [Controller 8]: Starting preferred replica 
> leader election for partitions topic,1
> [2017-04-12 18:00:18,891] INFO [Partition state machine on Controller 8]: 
> Invoking state change to OnlinePartition for partitions topic,1
> [2017-04-12 18:00:18,892] INFO [PreferredReplicaPartitionLeaderSelector]: 
> Current leader -1 for partition [topic,1] is not the preferred replica. 
> Trigerring preferred replica leader election 
> (kafka.controller.PreferredReplicaPartitionLeaderSelector)
> [2017-04-12 18:00:18,893] WARN [Controller 8]: Partition [topic,1] failed to 
> complete preferred replica leader election. Leader is -1 
> (kafka.controller.KafkaController)
> {code}
> It's not clear if this would happen on versions later that 0.9.0.0.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to