[
https://issues.apache.org/jira/browse/KAFKA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933991#comment-15933991
]
Ronghua Lin commented on KAFKA-2729:
------------------------------------
[~junrao], we also have this problem in a small cluster which has 3 brokers,
running Kafka 0.10.1.1. When it happened, the logs of each broker look like
this:
{code:title=broker 2 | borderStyle=solid}
[2017-03-20 01:03:48,903] INFO [Group Metadata Manager on Broker 2]: Removed 0
expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)
[2017-03-20 01:13:27,283] INFO Creating /controller (is it secure? false)
(kafka.utils.ZKCheckedEphemeral)
[2017-03-20 01:13:27,293] INFO Result of znode creation is: OK
(kafka.utils.ZKCheckedEphemeral)
[2017-03-20 01:13:27,294] INFO 2 successfully elected as leader
(kafka.server.ZookeeperLeaderElector)
[2017-03-20 01:13:28,203] INFO re-registering broker info in ZK for broker 2
(kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-03-20 01:13:28,205] INFO Creating /brokers/ids/2 (is it secure? false)
(kafka.utils.ZKCheckedEphemeral)
[2017-03-20 01:13:28,218] INFO Result of znode creation is: OK
(kafka.utils.ZKCheckedEphemeral)
[2017-03-20 01:13:28,219] INFO Registered broker 2 at path /brokers/ids/2 with
addresses: PLAINTEXT -> EndPoint(xxxxx, xxxx,PLAINTEXT) (kafka.utils.ZkUtils)
[2017-03-20 01:13:28,219] INFO done re-registering broker
(kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-03-20 01:13:28,220] INFO Subscribing to /brokers/topics path to watch for
new topics (kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-03-20 01:13:28,224] INFO New leader is 2
(kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2017-03-20 01:13:28,227] INFO New leader is 2
(kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2017-03-20 01:13:38,812] INFO Partition [topic1,1] on broker 2: Shrinking ISR
for partition [topic1,1] from 0,2,1 to 2,1 (kafka.cluster.Partition)
[2017-03-20 01:13:38,825] INFO Partition [topic1,1] on broker 2: Cached
zkVersion [6] not equal to that in zookeeper, skip updating ISR
(kafka.cluster.Partition)
[2017-03-20 01:13:38,825] INFO Partition [topic2,1] on broker 2: Shrinking ISR
for partition [topic2,1] from 0,2,1 to 2,1 (kafka.cluster.Partition)
[2017-03-20 01:13:38,835] INFO Partition [topic2,1] on broker 2: Cached
zkVersion [6] not equal to that in zookeeper, skip updating ISR
(kafka.cluster.Partition)
[2017-03-20 01:13:38,835] INFO Partition [topic3,0] on broker 2: Shrinking ISR
for partition [topic3,0] from 0,2,1 to 2,1 (kafka.cluster.Partition)
[2017-03-20 01:13:38,847] INFO Partition [topic3,0] on broker 2: Cached
zkVersion [6] not equal to that in zookeeper, skip updating ISR
(kafka.cluster.Partition)
....
{code}
{code:title=broker 1 | borderStyle=solid}
[2017-03-20 01:03:38,255] INFO [Group Metadata Manager on Broker 1]: Removed 0
expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)
[2017-03-20 01:13:27,451] INFO New leader is 2
(kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2017-03-20 01:13:27,490] INFO re-registering broker info in ZK for broker 1
(kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-03-20 01:13:27,491] INFO Creating /brokers/ids/1 (is it secure? false)
(kafka.utils.ZKCheckedEphemeral)
[2017-03-20 01:13:27,503] INFO Result of znode creation is: OK
(kafka.utils.ZKCheckedEphemeral)
[2017-03-20 01:13:27,503] INFO Registered broker 1 at path /brokers/ids/1 with
addresses: PLAINTEXT -> EndPoint(xxxx,xxxx,PLAINTEXT) (kafka.utils.ZkUtils)
[2017-03-20 01:13:27,504] INFO done re-registering broker
(kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-03-20 01:13:27,504] INFO Subscribing to /brokers/topics path to watch for
new topics (kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-03-20 01:13:27,508] INFO New leader is 2
(kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2017-03-20 01:13:38,134] INFO Partition [__consumer_offsets,40] on broker 1:
Shrinking ISR for partition [__consumer_offsets,40] from 1,0 to 1
(kafka.cluster.Partition)
[2017-03-20 01:13:38,155] INFO Partition [__consumer_offsets,40] on broker 1:
Cached zkVersion [2] not equal to that in zookeeper, skip updating ISR
(kafka.cluster.Partition)
[2017-03-20 01:13:38,156] INFO Partition [__consumer_offsets,0] on broker 1:
Shrinking ISR for partition [__consumer_offsets,0] from 1,0 to 1
(kafka.cluster.Partition)
[2017-03-20 01:13:38,161] INFO Partition [__consumer_offsets,0] on broker 1:
Cached zkVersion [2] not equal to that in zookeeper, skip updating ISR
(kafka.cluster.Partition)
[2017-03-20 01:13:38,162] INFO Partition [__consumer_offsets,12] on broker 1:
Shrinking ISR for partition [__consumer_offsets,12] from 1,0 to 1
(kafka.cluster.Partition)
[2017-03-20 01:13:38,170] INFO Partition [__consumer_offsets,12] on broker 1:
Cached zkVersion [2] not equal to that in zookeeper, skip updating ISR
(kafka.cluster.Partition)
[2017-03-20 01:13:38,171] INFO Partition [__consumer_offsets,14] on broker 1:
Shrinking ISR for partition [__consumer_offsets,14] from 1,0 to 1
(kafka.cluster.Partition)
[2017-03-20 01:13:38,191] INFO Partition [__consumer_offsets,14] on broker 1:
Cached zkVersion [2] not equal to that in zookeeper, skip updating ISR
(kafka.cluster.Partition)
[2017-03-20 01:13:38,191] INFO Partition [__consumer_offsets,24] on broker 1:
Shrinking ISR for partition [__consumer_offsets,24] from 1,0 to 1
(kafka.cluster.Partition)
[2017-03-20 01:13:38,200] INFO Partition [__consumer_offsets,24] on broker 1:
Cached zkVersion [2] not equal to that in zookeeper, skip updating ISR
(kafka.cluster.Partition)
[2017-03-20 01:13:38,200] INFO Partition [__consumer_offsets,48] on broker 1:
Shrinking ISR for partition [__consumer_offsets,48] from 1,0 to 1
(kafka.cluster.Partition)
[2017-03-20 01:13:38,209] INFO Partition [__consumer_offsets,48] on broker 1:
Cached zkVersion [2] not equal to that in zookeeper, skip updating ISR
(kafka.cluster.Partition)
[2017-03-20 01:13:38,209] INFO Partition [__consumer_offsets,2] on broker 1:
Shrinking ISR for partition [__consumer_offsets,2] from 1,0 to 1
(kafka.cluster.Partition)
[2017-03-20 01:13:38,215] INFO Partition [__consumer_offsets,2] on broker 1:
Cached zkVersion [2] not equal to that in zookeeper, skip updating ISR
(kafka.cluster.Partition)
[2017-03-20 01:13:38,216] INFO Partition [__consumer_offsets,32] on broker 1:
Shrinking ISR for partition [__consumer_offsets,32] from 1,0 to 1
(kafka.cluster.Partition)
{code}
{code:title=broker 0 | borderStyle=solid}
[2017-03-20 01:03:09,479] INFO [Group Metadata Manager on Broker 0]: Removed 0
expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)
[2017-03-20 01:13:09,479] INFO [Group Metadata Manager on Broker 0]: Removed 0
expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)
[2017-03-20 01:13:27,317] INFO re-registering broker info in ZK for broker 0
(kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-03-20 01:13:27,320] INFO Creating /brokers/ids/0 (is it secure? false)
(kafka.utils.ZKCheckedEphemeral)
[2017-03-20 01:13:27,333] INFO Result of znode creation is: OK
(kafka.utils.ZKCheckedEphemeral)
[2017-03-20 01:13:27,333] INFO Registered broker 0 at path /brokers/ids/0 with
addresses: PLAINTEXT -> EndPoint(xxxx,xxxx,PLAINTEXT) (kafka.utils.ZkUtils)
[2017-03-20 01:13:27,333] INFO done re-registering broker
(kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-03-20 01:13:27,334] INFO Subscribing to /brokers/topics path to watch for
new topics (kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-03-20 01:13:27,342] INFO New leader is 2
(kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2017-03-20 01:13:27,362] INFO New leader is 2
(kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2017-03-20 01:13:28,128] INFO [ReplicaFetcherManager on broker 0] Removed
fetcher for partitions xxxxx,xxxxx(all topics)
(kafka.server.ReplicaFetcherManager)
[2017-03-20 01:13:28,142] INFO [ReplicaFetcherThread-0-2], Shutting down
(kafka.server.ReplicaFetcherThread)
[2017-03-20 01:13:28,465] INFO [ReplicaFetcherThread-0-2], Stopped
(kafka.server.ReplicaFetcherThread)
[2017-03-20 01:13:28,465] INFO [ReplicaFetcherThread-0-2], Shutdown completed
(kafka.server.ReplicaFetcherThread)
[2017-03-20 01:13:28,481] INFO [ReplicaFetcherThread-0-1], Shutting down
(kafka.server.ReplicaFetcherThread)
[2017-03-20 01:13:28,597] INFO [ReplicaFetcherThread-0-1], Stopped
(kafka.server.ReplicaFetcherThread)
[2017-03-20 01:13:28,597] INFO [ReplicaFetcherThread-0-1], Shutdown completed
(kafka.server.ReplicaFetcherThread)
{code}
The broker 0 worked fine. But broker 1 and broker 2(leader) had the same
problem. Notice that the topics in broker 1 and broker 2 which refused to
update the ISRs are different. Not all the topic in Kafka cluster were refusing
to update ISRs.
> Cached zkVersion not equal to that in zookeeper, broker not recovering.
> -----------------------------------------------------------------------
>
> Key: KAFKA-2729
> URL: https://issues.apache.org/jira/browse/KAFKA-2729
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 0.8.2.1
> Reporter: Danil Serdyuchenko
>
> After a small network wobble where zookeeper nodes couldn't reach each other,
> we started seeing a large number of undereplicated partitions. The zookeeper
> cluster recovered, however we continued to see a large number of
> undereplicated partitions. Two brokers in the kafka cluster were showing this
> in the logs:
> {code}
> [2015-10-27 11:36:00,888] INFO Partition
> [__samza_checkpoint_event-creation_1,3] on broker 5: Shrinking ISR for
> partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5
> (kafka.cluster.Partition)
> [2015-10-27 11:36:00,891] INFO Partition
> [__samza_checkpoint_event-creation_1,3] on broker 5: Cached zkVersion [66]
> not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
> {code}
> For all of the topics on the effected brokers. Both brokers only recovered
> after a restart. Our own investigation yielded nothing, I was hoping you
> could shed some light on this issue. Possibly if it's related to:
> https://issues.apache.org/jira/browse/KAFKA-1382 , however we're using
> 0.8.2.1.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)