Dhirendra Singh created KAFKA-13720:
---------------------------------------

             Summary: Few topic partitions remain under replicated after broker 
lose connectivity to zookeeper
                 Key: KAFKA-13720
                 URL: https://issues.apache.org/jira/browse/KAFKA-13720
             Project: Kafka
          Issue Type: Bug
          Components: controller
    Affects Versions: 2.7.1
            Reporter: Dhirendra Singh


Few topic partitions remain under replicated after broker lose connectivity to 
zookeeper.
It only happens when brokers lose connectivity to zookeeper and it results in 
change in active controller. Issue does not occur always but randomly.
Issue never occurs when there is no change in active controller when brokers 
lose connectivity to zookeeper.
Following error message i found in the log file.


[2022-02-28 04:01:20,217] WARN [Partition __consumer_offsets-4 broker=1] 
Controller failed to update ISR to PendingExpandIsr(isr=Set(1), 
newInSyncReplicaId=2) due to unexpected UNKNOWN_SERVER_ERROR. Retrying. 
(kafka.cluster.Partition)
[2022-02-28 04:01:20,217] ERROR [broker-1-to-controller] Uncaught error in 
request completion: (org.apache.kafka.clients.NetworkClient)
java.lang.IllegalStateException: Failed to enqueue `AlterIsr` request with 
state LeaderAndIsr(leader=1, leaderEpoch=2728, isr=List(1, 2), zkVersion=4719) 
for partition __consumer_offsets-4
at kafka.cluster.Partition.sendAlterIsrRequest(Partition.scala:1403)
at 
kafka.cluster.Partition.$anonfun$handleAlterIsrResponse$1(Partition.scala:1438)
at kafka.cluster.Partition.handleAlterIsrResponse(Partition.scala:1417)
at kafka.cluster.Partition.$anonfun$sendAlterIsrRequest$1(Partition.scala:1398)
at 
kafka.cluster.Partition.$anonfun$sendAlterIsrRequest$1$adapted(Partition.scala:1398)
at 
kafka.server.AlterIsrManagerImpl.$anonfun$handleAlterIsrResponse$8(AlterIsrManager.scala:166)
at 
kafka.server.AlterIsrManagerImpl.$anonfun$handleAlterIsrResponse$8$adapted(AlterIsrManager.scala:163)
at scala.collection.immutable.List.foreach(List.scala:333)
at 
kafka.server.AlterIsrManagerImpl.handleAlterIsrResponse(AlterIsrManager.scala:163)
at kafka.server.AlterIsrManagerImpl.responseHandler$1(AlterIsrManager.scala:94)
at 
kafka.server.AlterIsrManagerImpl.$anonfun$sendRequest$2(AlterIsrManager.scala:104)
at 
kafka.server.BrokerToControllerRequestThread.handleResponse(BrokerToControllerChannelManagerImpl.scala:175)
at 
kafka.server.BrokerToControllerRequestThread.$anonfun$generateRequests$1(BrokerToControllerChannelManagerImpl.scala:158)
at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
at 
org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:586)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:578)
at kafka.common.InterBrokerSendThread.doWork(InterBrokerSendThread.scala:71)
at 
kafka.server.BrokerToControllerRequestThread.doWork(BrokerToControllerChannelManagerImpl.scala:183)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
 
under replication count goes to zero after the controller broker is restarted 
again. but this require manual intervention.
Expectation is that when broker reconnect with zookeeper cluster should come 
back to stable state with under replication count as zero by itself without any 
manual intervention.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to