[ https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15232645#comment-15232645 ]
Jun Rao commented on KAFKA-3042: -------------------------------- [~delbaeth], [~wushujames], a few things. 1. Supposedly after step 5), controller 3 will send the latest ZK version for the ISR path to broker 2 through LeaderAndIsrRequest. That should stop the warning on "Cached zkVersion...". It seems somehow that didn't happen. Could you send the state-change log in broker 2 around that time? You want to include probably the log 5 mins before and 5 mins after the very first "Cached zkVersion...". Could you also do that for the controller log in controller 1 and controller 3? 2. The controller log shows that controller 3 stopped at 01:05:23. Is broker 3 still up at that time? 3. We have discovered a few issues due to ZK session expiration, not all of which have been fixed. So, in the short term, it would be good to avoid ZK session expiration in the first place. You mentioned this may be due to a network issue? How long did the network issue last? Another common cause of ZK session expiration is broker GC. Do you have the GC log on the session expired brokers? > updateIsr should stop after failed several times due to zkVersion issue > ----------------------------------------------------------------------- > > Key: KAFKA-3042 > URL: https://issues.apache.org/jira/browse/KAFKA-3042 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.8.2.1 > Environment: jdk 1.7 > centos 6.4 > Reporter: Jiahongchao > Attachments: controller.log, server.log.2016-03-23-01, > state-change.log > > > sometimes one broker may repeatly log > "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR" > I think this is because the broker consider itself as the leader in fact it's > a follower. > So after several failed tries, it need to find out who is the leader -- This message was sent by Atlassian JIRA (v6.3.4#6332)