[ 
https://issues.apache.org/jira/browse/KAFKA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933991#comment-15933991
 ] 

Ronghua Lin commented on KAFKA-2729:
------------------------------------

[~junrao], we also have this problem in a small cluster which has 3 brokers, 
running Kafka 0.10.1.1. When it happened, the logs of each broker look like 
this:
{code:title=broker 2 | borderStyle=solid}
[2017-03-20 01:03:48,903] INFO [Group Metadata Manager on Broker 2]: Removed 0 
expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)
[2017-03-20 01:13:27,283] INFO Creating /controller (is it secure? false) 
(kafka.utils.ZKCheckedEphemeral)
[2017-03-20 01:13:27,293] INFO Result of znode creation is: OK 
(kafka.utils.ZKCheckedEphemeral)
[2017-03-20 01:13:27,294] INFO 2 successfully elected as leader 
(kafka.server.ZookeeperLeaderElector)
[2017-03-20 01:13:28,203] INFO re-registering broker info in ZK for broker 2 
(kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-03-20 01:13:28,205] INFO Creating /brokers/ids/2 (is it secure? false) 
(kafka.utils.ZKCheckedEphemeral)
[2017-03-20 01:13:28,218] INFO Result of znode creation is: OK 
(kafka.utils.ZKCheckedEphemeral)
[2017-03-20 01:13:28,219] INFO Registered broker 2 at path /brokers/ids/2 with 
addresses: PLAINTEXT -> EndPoint(xxxxx, xxxx,PLAINTEXT) (kafka.utils.ZkUtils)
[2017-03-20 01:13:28,219] INFO done re-registering broker 
(kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-03-20 01:13:28,220] INFO Subscribing to /brokers/topics path to watch for 
new topics (kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-03-20 01:13:28,224] INFO New leader is 2 
(kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2017-03-20 01:13:28,227] INFO New leader is 2 
(kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2017-03-20 01:13:38,812] INFO Partition [topic1,1] on broker 2: Shrinking ISR 
for partition [topic1,1] from 0,2,1 to 2,1 (kafka.cluster.Partition)
[2017-03-20 01:13:38,825] INFO Partition [topic1,1] on broker 2: Cached 
zkVersion [6] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2017-03-20 01:13:38,825] INFO Partition [topic2,1] on broker 2: Shrinking ISR 
for partition [topic2,1] from 0,2,1 to 2,1 (kafka.cluster.Partition)
[2017-03-20 01:13:38,835] INFO Partition [topic2,1] on broker 2: Cached 
zkVersion [6] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2017-03-20 01:13:38,835] INFO Partition [topic3,0] on broker 2: Shrinking ISR 
for partition [topic3,0] from 0,2,1 to 2,1 (kafka.cluster.Partition)
[2017-03-20 01:13:38,847] INFO Partition [topic3,0] on broker 2: Cached 
zkVersion [6] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
....
{code}

{code:title=broker 1 | borderStyle=solid}
[2017-03-20 01:03:38,255] INFO [Group Metadata Manager on Broker 1]: Removed 0 
expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)
[2017-03-20 01:13:27,451] INFO New leader is 2 
(kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2017-03-20 01:13:27,490] INFO re-registering broker info in ZK for broker 1 
(kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-03-20 01:13:27,491] INFO Creating /brokers/ids/1 (is it secure? false) 
(kafka.utils.ZKCheckedEphemeral)
[2017-03-20 01:13:27,503] INFO Result of znode creation is: OK 
(kafka.utils.ZKCheckedEphemeral)
[2017-03-20 01:13:27,503] INFO Registered broker 1 at path /brokers/ids/1 with 
addresses: PLAINTEXT -> EndPoint(xxxx,xxxx,PLAINTEXT) (kafka.utils.ZkUtils)
[2017-03-20 01:13:27,504] INFO done re-registering broker 
(kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-03-20 01:13:27,504] INFO Subscribing to /brokers/topics path to watch for 
new topics (kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-03-20 01:13:27,508] INFO New leader is 2 
(kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2017-03-20 01:13:38,134] INFO Partition [__consumer_offsets,40] on broker 1: 
Shrinking ISR for partition [__consumer_offsets,40] from 1,0 to 1 
(kafka.cluster.Partition)
[2017-03-20 01:13:38,155] INFO Partition [__consumer_offsets,40] on broker 1: 
Cached zkVersion [2] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2017-03-20 01:13:38,156] INFO Partition [__consumer_offsets,0] on broker 1: 
Shrinking ISR for partition [__consumer_offsets,0] from 1,0 to 1 
(kafka.cluster.Partition)
[2017-03-20 01:13:38,161] INFO Partition [__consumer_offsets,0] on broker 1: 
Cached zkVersion [2] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2017-03-20 01:13:38,162] INFO Partition [__consumer_offsets,12] on broker 1: 
Shrinking ISR for partition [__consumer_offsets,12] from 1,0 to 1 
(kafka.cluster.Partition)
[2017-03-20 01:13:38,170] INFO Partition [__consumer_offsets,12] on broker 1: 
Cached zkVersion [2] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2017-03-20 01:13:38,171] INFO Partition [__consumer_offsets,14] on broker 1: 
Shrinking ISR for partition [__consumer_offsets,14] from 1,0 to 1 
(kafka.cluster.Partition)
[2017-03-20 01:13:38,191] INFO Partition [__consumer_offsets,14] on broker 1: 
Cached zkVersion [2] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2017-03-20 01:13:38,191] INFO Partition [__consumer_offsets,24] on broker 1: 
Shrinking ISR for partition [__consumer_offsets,24] from 1,0 to 1 
(kafka.cluster.Partition)
[2017-03-20 01:13:38,200] INFO Partition [__consumer_offsets,24] on broker 1: 
Cached zkVersion [2] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2017-03-20 01:13:38,200] INFO Partition [__consumer_offsets,48] on broker 1: 
Shrinking ISR for partition [__consumer_offsets,48] from 1,0 to 1 
(kafka.cluster.Partition)
[2017-03-20 01:13:38,209] INFO Partition [__consumer_offsets,48] on broker 1: 
Cached zkVersion [2] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2017-03-20 01:13:38,209] INFO Partition [__consumer_offsets,2] on broker 1: 
Shrinking ISR for partition [__consumer_offsets,2] from 1,0 to 1 
(kafka.cluster.Partition)
[2017-03-20 01:13:38,215] INFO Partition [__consumer_offsets,2] on broker 1: 
Cached zkVersion [2] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2017-03-20 01:13:38,216] INFO Partition [__consumer_offsets,32] on broker 1: 
Shrinking ISR for partition [__consumer_offsets,32] from 1,0 to 1 
(kafka.cluster.Partition)
{code}

{code:title=broker 0 | borderStyle=solid}
[2017-03-20 01:03:09,479] INFO [Group Metadata Manager on Broker 0]: Removed 0 
expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)
[2017-03-20 01:13:09,479] INFO [Group Metadata Manager on Broker 0]: Removed 0 
expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)
[2017-03-20 01:13:27,317] INFO re-registering broker info in ZK for broker 0 
(kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-03-20 01:13:27,320] INFO Creating /brokers/ids/0 (is it secure? false) 
(kafka.utils.ZKCheckedEphemeral)
[2017-03-20 01:13:27,333] INFO Result of znode creation is: OK 
(kafka.utils.ZKCheckedEphemeral)
[2017-03-20 01:13:27,333] INFO Registered broker 0 at path /brokers/ids/0 with 
addresses: PLAINTEXT -> EndPoint(xxxx,xxxx,PLAINTEXT) (kafka.utils.ZkUtils)
[2017-03-20 01:13:27,333] INFO done re-registering broker 
(kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-03-20 01:13:27,334] INFO Subscribing to /brokers/topics path to watch for 
new topics (kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-03-20 01:13:27,342] INFO New leader is 2 
(kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2017-03-20 01:13:27,362] INFO New leader is 2 
(kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2017-03-20 01:13:28,128] INFO [ReplicaFetcherManager on broker 0] Removed 
fetcher for partitions xxxxx,xxxxx(all topics) 
(kafka.server.ReplicaFetcherManager)
[2017-03-20 01:13:28,142] INFO [ReplicaFetcherThread-0-2], Shutting down 
(kafka.server.ReplicaFetcherThread)
[2017-03-20 01:13:28,465] INFO [ReplicaFetcherThread-0-2], Stopped  
(kafka.server.ReplicaFetcherThread)
[2017-03-20 01:13:28,465] INFO [ReplicaFetcherThread-0-2], Shutdown completed 
(kafka.server.ReplicaFetcherThread)
[2017-03-20 01:13:28,481] INFO [ReplicaFetcherThread-0-1], Shutting down 
(kafka.server.ReplicaFetcherThread)
[2017-03-20 01:13:28,597] INFO [ReplicaFetcherThread-0-1], Stopped  
(kafka.server.ReplicaFetcherThread)
[2017-03-20 01:13:28,597] INFO [ReplicaFetcherThread-0-1], Shutdown completed 
(kafka.server.ReplicaFetcherThread)
{code}
The broker 0 worked fine. But broker 1 and broker 2(leader) had the same 
problem. Notice that the topics in broker 1 and broker 2 which refused to 
update the ISRs are different. Not all the topic in Kafka cluster were refusing 
to update ISRs.

> Cached zkVersion not equal to that in zookeeper, broker not recovering.
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-2729
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2729
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.2.1
>            Reporter: Danil Serdyuchenko
>
> After a small network wobble where zookeeper nodes couldn't reach each other, 
> we started seeing a large number of undereplicated partitions. The zookeeper 
> cluster recovered, however we continued to see a large number of 
> undereplicated partitions. Two brokers in the kafka cluster were showing this 
> in the logs:
> {code}
> [2015-10-27 11:36:00,888] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Shrinking ISR for 
> partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5 
> (kafka.cluster.Partition)
> [2015-10-27 11:36:00,891] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Cached zkVersion [66] 
> not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
> {code}
> For all of the topics on the effected brokers. Both brokers only recovered 
> after a restart. Our own investigation yielded nothing, I was hoping you 
> could shed some light on this issue. Possibly if it's related to: 
> https://issues.apache.org/jira/browse/KAFKA-1382 , however we're using 
> 0.8.2.1.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to