AS created KAFKA-6178:
-------------------------

             Summary: Broker is listed as only ISR for all partitions it is 
leader of
                 Key: KAFKA-6178
                 URL: https://issues.apache.org/jira/browse/KAFKA-6178
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 0.10.1.0
         Environment: Windows
            Reporter: AS
         Attachments: KafkaServiceOutput.txt, log-cleaner.log, server.log

We're running a 15 broker cluster on windows machines, and one of the brokers, 
10, is the only ISR on all partitions that it is the leader of. On partitions 
where it isn't the leader, it seems to follow the leadeer fine. This is an 
excerpt from 'describe':

{{Topic: ClientQosCombined      Partition: 458  Leader: 10      Replicas: 
10,6,7,8,9,0,1   Isr: 10
Topic: ClientQosCombined      Partition: 459  Leader: 11      Replicas: 
11,7,8,9,0,1,10 Isr: 0,10,1,9,7,11,8}}

The server.log files all seem to be pretty standard, and the only indication of 
this issue is the following pattern that often repeats:

{{2017-11-06 20:28:25,207 [INFO] kafka.cluster.Partition 
[kafka-request-handler-8:] - Partition [ClientQosCombined,398] on broker 10: 
Expanding ISR for partition [ClientQosCombined,398] from 10 to 5,10
2017-11-06 20:28:39,382 [INFO] kafka.cluster.Partition [kafka-scheduler-1:] - 
Partition [ClientQosCombined,398] on broker 10: Shrinking ISR for partition 
[ClientQosCombined,398] from 5,10 to 10}}

For each of the partitions that 10 leads. This is the only topic that we 
currently have in our cluster. The __consumer_offsets topic seems completely 
normal in terms of isr counts. The controller is broker 5, which is cycling 
through attempting and failing to trigger leader elections on broker 10 led 
partitions. From the controller log in broker 5:

{{2017-11-06 20:45:04,857 [INFO] kafka.controller.KafkaController 
[kafka-scheduler-0:] - [Controller 5]: Starting preferred replica leader 
election for partitions [ClientQosCombined,375]
2017-11-06 20:45:04,857 [INFO] kafka.controller.PartitionStateMachine 
[kafka-scheduler-0:] - [Partition state machine on Controller 5]: Invoking 
state change to OnlinePartition for partitions [ClientQosCombined,375]
2017-11-06 20:45:04,857 [INFO] 
kafka.controller.PreferredReplicaPartitionLeaderSelector [kafka-scheduler-0:] - 
[PreferredReplicaPartitionLeaderSelector]: Current leader 10 for partition 
[ClientQosCombined,375] is not the preferred replica. Trigerring preferred 
replica leader election
2017-11-06 20:45:04,857 [WARN] kafka.controller.KafkaController 
[kafka-scheduler-0:] - [Controller 5]: Partition [ClientQosCombined,375] failed 
to complete preferred replica leader election. Leader is 10}}

I've also attached the logs and output from broker 10. Any idea what's wrong 
here? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to