[ https://issues.apache.org/jira/browse/KAFKA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933991#comment-15933991 ]
Ronghua Lin commented on KAFKA-2729: ------------------------------------ [~junrao], we also have this problem in a small cluster which has 3 brokers, running Kafka 0.10.1.1. When it happened, the logs of each broker look like this: {code:title=broker 2 | borderStyle=solid} [2017-03-20 01:03:48,903] INFO [Group Metadata Manager on Broker 2]: Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager) [2017-03-20 01:13:27,283] INFO Creating /controller (is it secure? false) (kafka.utils.ZKCheckedEphemeral) [2017-03-20 01:13:27,293] INFO Result of znode creation is: OK (kafka.utils.ZKCheckedEphemeral) [2017-03-20 01:13:27,294] INFO 2 successfully elected as leader (kafka.server.ZookeeperLeaderElector) [2017-03-20 01:13:28,203] INFO re-registering broker info in ZK for broker 2 (kafka.server.KafkaHealthcheck$SessionExpireListener) [2017-03-20 01:13:28,205] INFO Creating /brokers/ids/2 (is it secure? false) (kafka.utils.ZKCheckedEphemeral) [2017-03-20 01:13:28,218] INFO Result of znode creation is: OK (kafka.utils.ZKCheckedEphemeral) [2017-03-20 01:13:28,219] INFO Registered broker 2 at path /brokers/ids/2 with addresses: PLAINTEXT -> EndPoint(xxxxx, xxxx,PLAINTEXT) (kafka.utils.ZkUtils) [2017-03-20 01:13:28,219] INFO done re-registering broker (kafka.server.KafkaHealthcheck$SessionExpireListener) [2017-03-20 01:13:28,220] INFO Subscribing to /brokers/topics path to watch for new topics (kafka.server.KafkaHealthcheck$SessionExpireListener) [2017-03-20 01:13:28,224] INFO New leader is 2 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener) [2017-03-20 01:13:28,227] INFO New leader is 2 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener) [2017-03-20 01:13:38,812] INFO Partition [topic1,1] on broker 2: Shrinking ISR for partition [topic1,1] from 0,2,1 to 2,1 (kafka.cluster.Partition) [2017-03-20 01:13:38,825] INFO Partition [topic1,1] on broker 2: Cached zkVersion [6] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition) [2017-03-20 01:13:38,825] INFO Partition [topic2,1] on broker 2: Shrinking ISR for partition [topic2,1] from 0,2,1 to 2,1 (kafka.cluster.Partition) [2017-03-20 01:13:38,835] INFO Partition [topic2,1] on broker 2: Cached zkVersion [6] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition) [2017-03-20 01:13:38,835] INFO Partition [topic3,0] on broker 2: Shrinking ISR for partition [topic3,0] from 0,2,1 to 2,1 (kafka.cluster.Partition) [2017-03-20 01:13:38,847] INFO Partition [topic3,0] on broker 2: Cached zkVersion [6] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition) .... {code} {code:title=broker 1 | borderStyle=solid} [2017-03-20 01:03:38,255] INFO [Group Metadata Manager on Broker 1]: Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager) [2017-03-20 01:13:27,451] INFO New leader is 2 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener) [2017-03-20 01:13:27,490] INFO re-registering broker info in ZK for broker 1 (kafka.server.KafkaHealthcheck$SessionExpireListener) [2017-03-20 01:13:27,491] INFO Creating /brokers/ids/1 (is it secure? false) (kafka.utils.ZKCheckedEphemeral) [2017-03-20 01:13:27,503] INFO Result of znode creation is: OK (kafka.utils.ZKCheckedEphemeral) [2017-03-20 01:13:27,503] INFO Registered broker 1 at path /brokers/ids/1 with addresses: PLAINTEXT -> EndPoint(xxxx,xxxx,PLAINTEXT) (kafka.utils.ZkUtils) [2017-03-20 01:13:27,504] INFO done re-registering broker (kafka.server.KafkaHealthcheck$SessionExpireListener) [2017-03-20 01:13:27,504] INFO Subscribing to /brokers/topics path to watch for new topics (kafka.server.KafkaHealthcheck$SessionExpireListener) [2017-03-20 01:13:27,508] INFO New leader is 2 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener) [2017-03-20 01:13:38,134] INFO Partition [__consumer_offsets,40] on broker 1: Shrinking ISR for partition [__consumer_offsets,40] from 1,0 to 1 (kafka.cluster.Partition) [2017-03-20 01:13:38,155] INFO Partition [__consumer_offsets,40] on broker 1: Cached zkVersion [2] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition) [2017-03-20 01:13:38,156] INFO Partition [__consumer_offsets,0] on broker 1: Shrinking ISR for partition [__consumer_offsets,0] from 1,0 to 1 (kafka.cluster.Partition) [2017-03-20 01:13:38,161] INFO Partition [__consumer_offsets,0] on broker 1: Cached zkVersion [2] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition) [2017-03-20 01:13:38,162] INFO Partition [__consumer_offsets,12] on broker 1: Shrinking ISR for partition [__consumer_offsets,12] from 1,0 to 1 (kafka.cluster.Partition) [2017-03-20 01:13:38,170] INFO Partition [__consumer_offsets,12] on broker 1: Cached zkVersion [2] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition) [2017-03-20 01:13:38,171] INFO Partition [__consumer_offsets,14] on broker 1: Shrinking ISR for partition [__consumer_offsets,14] from 1,0 to 1 (kafka.cluster.Partition) [2017-03-20 01:13:38,191] INFO Partition [__consumer_offsets,14] on broker 1: Cached zkVersion [2] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition) [2017-03-20 01:13:38,191] INFO Partition [__consumer_offsets,24] on broker 1: Shrinking ISR for partition [__consumer_offsets,24] from 1,0 to 1 (kafka.cluster.Partition) [2017-03-20 01:13:38,200] INFO Partition [__consumer_offsets,24] on broker 1: Cached zkVersion [2] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition) [2017-03-20 01:13:38,200] INFO Partition [__consumer_offsets,48] on broker 1: Shrinking ISR for partition [__consumer_offsets,48] from 1,0 to 1 (kafka.cluster.Partition) [2017-03-20 01:13:38,209] INFO Partition [__consumer_offsets,48] on broker 1: Cached zkVersion [2] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition) [2017-03-20 01:13:38,209] INFO Partition [__consumer_offsets,2] on broker 1: Shrinking ISR for partition [__consumer_offsets,2] from 1,0 to 1 (kafka.cluster.Partition) [2017-03-20 01:13:38,215] INFO Partition [__consumer_offsets,2] on broker 1: Cached zkVersion [2] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition) [2017-03-20 01:13:38,216] INFO Partition [__consumer_offsets,32] on broker 1: Shrinking ISR for partition [__consumer_offsets,32] from 1,0 to 1 (kafka.cluster.Partition) {code} {code:title=broker 0 | borderStyle=solid} [2017-03-20 01:03:09,479] INFO [Group Metadata Manager on Broker 0]: Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager) [2017-03-20 01:13:09,479] INFO [Group Metadata Manager on Broker 0]: Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager) [2017-03-20 01:13:27,317] INFO re-registering broker info in ZK for broker 0 (kafka.server.KafkaHealthcheck$SessionExpireListener) [2017-03-20 01:13:27,320] INFO Creating /brokers/ids/0 (is it secure? false) (kafka.utils.ZKCheckedEphemeral) [2017-03-20 01:13:27,333] INFO Result of znode creation is: OK (kafka.utils.ZKCheckedEphemeral) [2017-03-20 01:13:27,333] INFO Registered broker 0 at path /brokers/ids/0 with addresses: PLAINTEXT -> EndPoint(xxxx,xxxx,PLAINTEXT) (kafka.utils.ZkUtils) [2017-03-20 01:13:27,333] INFO done re-registering broker (kafka.server.KafkaHealthcheck$SessionExpireListener) [2017-03-20 01:13:27,334] INFO Subscribing to /brokers/topics path to watch for new topics (kafka.server.KafkaHealthcheck$SessionExpireListener) [2017-03-20 01:13:27,342] INFO New leader is 2 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener) [2017-03-20 01:13:27,362] INFO New leader is 2 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener) [2017-03-20 01:13:28,128] INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions xxxxx,xxxxx(all topics) (kafka.server.ReplicaFetcherManager) [2017-03-20 01:13:28,142] INFO [ReplicaFetcherThread-0-2], Shutting down (kafka.server.ReplicaFetcherThread) [2017-03-20 01:13:28,465] INFO [ReplicaFetcherThread-0-2], Stopped (kafka.server.ReplicaFetcherThread) [2017-03-20 01:13:28,465] INFO [ReplicaFetcherThread-0-2], Shutdown completed (kafka.server.ReplicaFetcherThread) [2017-03-20 01:13:28,481] INFO [ReplicaFetcherThread-0-1], Shutting down (kafka.server.ReplicaFetcherThread) [2017-03-20 01:13:28,597] INFO [ReplicaFetcherThread-0-1], Stopped (kafka.server.ReplicaFetcherThread) [2017-03-20 01:13:28,597] INFO [ReplicaFetcherThread-0-1], Shutdown completed (kafka.server.ReplicaFetcherThread) {code} The broker 0 worked fine. But broker 1 and broker 2(leader) had the same problem. Notice that the topics in broker 1 and broker 2 which refused to update the ISRs are different. Not all the topic in Kafka cluster were refusing to update ISRs. > Cached zkVersion not equal to that in zookeeper, broker not recovering. > ----------------------------------------------------------------------- > > Key: KAFKA-2729 > URL: https://issues.apache.org/jira/browse/KAFKA-2729 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.8.2.1 > Reporter: Danil Serdyuchenko > > After a small network wobble where zookeeper nodes couldn't reach each other, > we started seeing a large number of undereplicated partitions. The zookeeper > cluster recovered, however we continued to see a large number of > undereplicated partitions. Two brokers in the kafka cluster were showing this > in the logs: > {code} > [2015-10-27 11:36:00,888] INFO Partition > [__samza_checkpoint_event-creation_1,3] on broker 5: Shrinking ISR for > partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5 > (kafka.cluster.Partition) > [2015-10-27 11:36:00,891] INFO Partition > [__samza_checkpoint_event-creation_1,3] on broker 5: Cached zkVersion [66] > not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition) > {code} > For all of the topics on the effected brokers. Both brokers only recovered > after a restart. Our own investigation yielded nothing, I was hoping you > could shed some light on this issue. Possibly if it's related to: > https://issues.apache.org/jira/browse/KAFKA-1382 , however we're using > 0.8.2.1. -- This message was sent by Atlassian JIRA (v6.3.15#6346)