[ 
https://issues.apache.org/jira/browse/KAFKA-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Santilli updated KAFKA-7165:
-------------------------------------
    Description: 
Kafka version: 1.1.0

Zookeeper version: 3.4.12

4 Kafka Brokers

4 Zookeeper servers

 

In one of the 4 brokers of the cluster, we detect the following error:

[2018-07-14 04:38:23,784] INFO Unable to read additional data from server 
sessionid 0x3000c2420cb458d, likely server has closed socket, closing socket 
connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
 [2018-07-14 04:38:24,509] INFO Opening socket connection to server 
*ZOOKEEPER_SERVER_1:PORT*. Will not attempt to authenticate using SASL (unknown 
error) (org.apache.zookeeper.ClientCnxn)
 [2018-07-14 04:38:24,510] INFO Socket connection established to 
*ZOOKEEPER_SERVER_1:PORT*, initiating session (org.apache.zookeeper.ClientCnxn)
 [2018-07-14 04:38:24,513] INFO Unable to read additional data from server 
sessionid 0x3000c2420cb458d, likely server has closed socket, closing socket 
connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
 [2018-07-14 04:38:25,287] INFO Opening socket connection to server 
*ZOOKEEPER_SERVER_2:PORT*. Will not attempt to authenticate using SASL (unknown 
error) (org.apache.zookeeper.ClientCnxn)
 [2018-07-14 04:38:25,287] INFO Socket connection established to 
*ZOOKEEPER_SERVER_2:PORT*, initiating session (org.apache.zookeeper.ClientCnxn)
 [2018-07-14 04:38:25,954] INFO [Partition TOPIC_NAME-PARTITION-# broker=1|#* 
broker=1] Shrinking ISR from 1,3,4,2 to 1,4,2 (kafka.cluster.Partition)
 [2018-07-14 04:38:26,444] WARN Unable to reconnect to ZooKeeper service, 
session 0x3000c2420cb458d has expired (org.apache.zookeeper.ClientCnxn)
 [2018-07-14 04:38:26,444] INFO Unable to reconnect to ZooKeeper service, 
session 0x3000c2420cb458d has expired, closing socket connection 
(org.apache.zookeeper.ClientCnxn)
 [2018-07-14 04:38:26,445] INFO EventThread shut down for session: 
0x3000c2420cb458d (org.apache.zookeeper.ClientCnxn)
 [2018-07-14 04:38:26,446] INFO [ZooKeeperClient] Session expired. 
(kafka.zookeeper.ZooKeeperClient)
 [2018-07-14 04:38:26,459] INFO [ZooKeeperClient] Initializing a new session to 
*ZOOKEEPER_SERVER_1:PORT*,*ZOOKEEPER_SERVER_2:PORT*,*ZOOKEEPER_SERVER_3:PORT*,*ZOOKEEPER_SERVER_4:PORT*.
 (kafka.zookeeper.ZooKeeperClient)
 [2018-07-14 04:38:26,459] INFO Initiating client connection, 
connectString=*ZOOKEEPER_SERVER_1:PORT*,*ZOOKEEPER_SERVER_2:PORT*,*ZOOKEEPER_SERVER_3:PORT*,*ZOOKEEPER_SERVER_4:PORT*
 sessionTimeout=6000 
watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@44821a96 
(org.apache.zookeeper.ZooKeeper)
 [2018-07-14 04:38:26,465] INFO Opening socket connection to server 
*ZOOKEEPER_SERVER_1:PORT*. Will not attempt to authenticate using SASL (unknown 
error) (org.apache.zookeeper.ClientCnxn)
 [2018-07-14 04:38:26,477] INFO Socket connection established to 
*ZOOKEEPER_SERVER_1:PORT*, initiating session (org.apache.zookeeper.ClientCnxn)
 [2018-07-14 04:38:26,484] INFO Session establishment complete on server 
*ZOOKEEPER_SERVER_1:PORT*, sessionid = 0x4005b59eb6a0000, negotiated timeout = 
6000 (org.apache.zookeeper.ClientCnxn)
 [2018-07-14 04:38:26,496] *INFO Creating /brokers/ids/1* (is it secure? false) 
(kafka.zk.KafkaZkClient)
 [2018-07-14 04:38:26,500] INFO Processing notification(s) to /config/changes 
(kafka.common.ZkNodeChangeNotificationListener)
 *[2018-07-14 04:38:26,547] ERROR Error while creating ephemeral at 
/brokers/ids/1, node already exists and owner '216186131422332301' does not 
match current session '288330817911521280' 
(kafka.zk.KafkaZkClient$CheckedEphemeral)*
 [2018-07-14 04:38:26,547] *INFO Result of znode creation at /brokers/ids/1 is: 
NODEEXISTS* (kafka.zk.KafkaZkClient)
 [2018-07-14 04:38:26,559] ERROR Uncaught exception in scheduled task 
'isr-expiration' (kafka.utils.KafkaScheduler)

org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /brokers/topics/*TOPIC_NAME*/partitions/*PARTITION-#*/state
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at kafka.zookeeper.AsyncResponse.resultException(ZooKeeperClient.scala:465)
 at kafka.zk.KafkaZkClient.conditionalUpdatePath(KafkaZkClient.scala:621)
 at kafka.utils.ReplicationUtils$.updateLeaderAndIsr(ReplicationUtils.scala:33)
 at 
kafka.cluster.Partition.kafka$cluster$Partition$$updateIsr(Partition.scala:669)
 at kafka.cluster.Partition$$anonfun$4.apply$mcZ$sp(Partition.scala:513)
 at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:504)
 at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:504)
 at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:250)
 at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:258)
 at kafka.cluster.Partition.maybeShrinkIsr(Partition.scala:503)
 at 
kafka.server.ReplicaManager$$anonfun$kafka$server$ReplicaManager$$maybeShrinkIsr$2.apply(ReplicaManager.scala:1335)
 at 
kafka.server.ReplicaManager$$anonfun$kafka$server$ReplicaManager$$maybeShrinkIsr$2.apply(ReplicaManager.scala:1335)
 at scala.collection.Iterator$class.foreach(Iterator.scala:891)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
 at 
kafka.server.ReplicaManager.kafka$server$ReplicaManager$$maybeShrinkIsr(ReplicaManager.scala:1335)
 at 
kafka.server.ReplicaManager$$anonfun$2.apply$mcV$sp(ReplicaManager.scala:322)
 at kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
 at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:62)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)

[2018-07-14 04:38:45,938] INFO [Partition TOPIC_NAME-PARTITION-# broker=1|#* 
broker=1] Shrinking ISR from 1,3,4,2 to 1 (kafka.cluster.Partition)
 [2018-07-14 04:38:45,952] INFO [Partition TOPIC_NAME-PARTITION-# broker=1|#* 
broker=1] Cached zkVersion [0] not equal to that in zookeeper, skip updating 
ISR (kafka.cluster.Partition)
 [2018-07-14 04:38:45,952] INFO [Partition __consumer_offsets-# broker=1|#* 
broker=1] Shrinking ISR from 1,2,3,4 to 1 (kafka.cluster.Partition)
 [2018-07-14 04:38:45,992] INFO [Partition __consumer_offsets-# broker=1|#* 
broker=1] Cached zkVersion [139] not equal to that in zookeeper, skip updating 
ISR (kafka.cluster.Partition)

...

...

 

The previous exception continues showing constantly without stopping. Since the 
cluster was still up, I have left the error showing up for about 6 hours to 
check if the broker recovers itself.

The only solution, to put back the broker into the cluster, was restarting.

After the restart, the replicas sync each other and the broker was able to 
serve again.

I imagine this is not a normal behavior.

  was:
Kafka version: 1.1.0

Zookeeper version: 3.4.12

4 Kafka Brokers

4 Zookeeper servers

 

In one of the 4 brokers of the cluster, we detect the following error:

[2018-07-14 04:38:23,784] INFO Unable to read additional data from server 
sessionid 0x3000c2420cb458d, likely server has closed socket, closing socket 
connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2018-07-14 04:38:24,509] INFO Opening socket connection to server 
*ZOOKEEPER_SERVER_1:PORT*. Will not attempt to authenticate using SASL (unknown 
error) (org.apache.zookeeper.ClientCnxn)
[2018-07-14 04:38:24,510] INFO Socket connection established to 
*ZOOKEEPER_SERVER_1:PORT*, initiating session (org.apache.zookeeper.ClientCnxn)
[2018-07-14 04:38:24,513] INFO Unable to read additional data from server 
sessionid 0x3000c2420cb458d, likely server has closed socket, closing socket 
connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2018-07-14 04:38:25,287] INFO Opening socket connection to server 
*ZOOKEEPER_SERVER_2:PORT*. Will not attempt to authenticate using SASL (unknown 
error) (org.apache.zookeeper.ClientCnxn)
[2018-07-14 04:38:25,287] INFO Socket connection established to 
*ZOOKEEPER_SERVER_2:PORT*, initiating session (org.apache.zookeeper.ClientCnxn)
[2018-07-14 04:38:25,954] INFO [Partition *TOPIC_NAME-PARTITION-#* broker=1] 
Shrinking ISR from 1,3,4,2 to 1,4,2 (kafka.cluster.Partition)
[2018-07-14 04:38:26,444] WARN Unable to reconnect to ZooKeeper service, 
session 0x3000c2420cb458d has expired (org.apache.zookeeper.ClientCnxn)
[2018-07-14 04:38:26,444] INFO Unable to reconnect to ZooKeeper service, 
session 0x3000c2420cb458d has expired, closing socket connection 
(org.apache.zookeeper.ClientCnxn)
[2018-07-14 04:38:26,445] INFO EventThread shut down for session: 
0x3000c2420cb458d (org.apache.zookeeper.ClientCnxn)
[2018-07-14 04:38:26,446] INFO [ZooKeeperClient] Session expired. 
(kafka.zookeeper.ZooKeeperClient)
[2018-07-14 04:38:26,459] INFO [ZooKeeperClient] Initializing a new session to 
*ZOOKEEPER_SERVER_1:PORT*,*ZOOKEEPER_SERVER_2:PORT*,*ZOOKEEPER_SERVER_3:PORT*,*ZOOKEEPER_SERVER_4:PORT*.
 (kafka.zookeeper.ZooKeeperClient)
[2018-07-14 04:38:26,459] INFO Initiating client connection, 
connectString=*ZOOKEEPER_SERVER_1:PORT*,*ZOOKEEPER_SERVER_2:PORT*,*ZOOKEEPER_SERVER_3:PORT*,*ZOOKEEPER_SERVER_4:PORT*
 sessionTimeout=6000 
watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@44821a96 
(org.apache.zookeeper.ZooKeeper)
[2018-07-14 04:38:26,465] INFO Opening socket connection to server 
*ZOOKEEPER_SERVER_1:PORT*. Will not attempt to authenticate using SASL (unknown 
error) (org.apache.zookeeper.ClientCnxn)
[2018-07-14 04:38:26,477] INFO Socket connection established to 
*ZOOKEEPER_SERVER_1:PORT*, initiating session (org.apache.zookeeper.ClientCnxn)
[2018-07-14 04:38:26,484] INFO Session establishment complete on server 
*ZOOKEEPER_SERVER_1:PORT*, sessionid = 0x4005b59eb6a0000, negotiated timeout = 
6000 (org.apache.zookeeper.ClientCnxn)
[2018-07-14 04:38:26,496] *INFO Creating /brokers/ids/1* (is it secure? false) 
(kafka.zk.KafkaZkClient)
[2018-07-14 04:38:26,500] INFO Processing notification(s) to /config/changes 
(kafka.common.ZkNodeChangeNotificationListener)
*[2018-07-14 04:38:26,547] ERROR Error while creating ephemeral at 
/brokers/ids/1, node already exists and owner '216186131422332301' does not 
match current session '288330817911521280' 
(kafka.zk.KafkaZkClient$CheckedEphemeral)*
[2018-07-14 04:38:26,547] *INFO Result of znode creation at /brokers/ids/1 is: 
NODEEXISTS* (kafka.zk.KafkaZkClient)
[2018-07-14 04:38:26,559] ERROR Uncaught exception in scheduled task 
'isr-expiration' (kafka.utils.KafkaScheduler)

org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /brokers/topics/*TOPIC_NAME*/partitions/*PARTITION-#*/state
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at kafka.zookeeper.AsyncResponse.resultException(ZooKeeperClient.scala:465)
at kafka.zk.KafkaZkClient.conditionalUpdatePath(KafkaZkClient.scala:621)
at kafka.utils.ReplicationUtils$.updateLeaderAndIsr(ReplicationUtils.scala:33)
at 
kafka.cluster.Partition.kafka$cluster$Partition$$updateIsr(Partition.scala:669)
at kafka.cluster.Partition$$anonfun$4.apply$mcZ$sp(Partition.scala:513)
at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:504)
at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:504)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:250)
at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:258)
at kafka.cluster.Partition.maybeShrinkIsr(Partition.scala:503)
at 
kafka.server.ReplicaManager$$anonfun$kafka$server$ReplicaManager$$maybeShrinkIsr$2.apply(ReplicaManager.scala:1335)
at 
kafka.server.ReplicaManager$$anonfun$kafka$server$ReplicaManager$$maybeShrinkIsr$2.apply(ReplicaManager.scala:1335)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at 
kafka.server.ReplicaManager.kafka$server$ReplicaManager$$maybeShrinkIsr(ReplicaManager.scala:1335)
at kafka.server.ReplicaManager$$anonfun$2.apply$mcV$sp(ReplicaManager.scala:322)
at kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:62)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

[2018-07-14 04:38:45,938] INFO [Partition *TOPIC_NAME-PARTITION-#* broker=1] 
Shrinking ISR from 1,3,4,2 to 1 (kafka.cluster.Partition)
[2018-07-14 04:38:45,952] INFO [Partition *TOPIC_NAME-PARTITION-#* broker=1] 
Cached zkVersion [0] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2018-07-14 04:38:45,952] INFO [Partition __consumer_offsets-*#* broker=1] 
Shrinking ISR from 1,2,3,4 to 1 (kafka.cluster.Partition)
[2018-07-14 04:38:45,992] INFO [Partition __consumer_offsets-*#* broker=1] 
Cached zkVersion [139] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)

...

...

 

The previous exception continues showing constantly without stopping. Since the 
cluster was still up, I have left the error showing up for about 6 hours to 
check if the broker recovers itself.

The only solution, to put back the broker into the cluster, was restarting.

After the restart, the replicas sync each other and the broker was able to 
serve again.

I imagine this is not a normal behavior.


> Error while creating ephemeral at /brokers/ids/BROKER_ID
> --------------------------------------------------------
>
>                 Key: KAFKA-7165
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7165
>             Project: Kafka
>          Issue Type: Bug
>          Components: core, zkclient
>    Affects Versions: 1.1.0
>            Reporter: Jonathan Santilli
>            Priority: Major
>
> Kafka version: 1.1.0
> Zookeeper version: 3.4.12
> 4 Kafka Brokers
> 4 Zookeeper servers
>  
> In one of the 4 brokers of the cluster, we detect the following error:
> [2018-07-14 04:38:23,784] INFO Unable to read additional data from server 
> sessionid 0x3000c2420cb458d, likely server has closed socket, closing socket 
> connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:24,509] INFO Opening socket connection to server 
> *ZOOKEEPER_SERVER_1:PORT*. Will not attempt to authenticate using SASL 
> (unknown error) (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:24,510] INFO Socket connection established to 
> *ZOOKEEPER_SERVER_1:PORT*, initiating session 
> (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:24,513] INFO Unable to read additional data from server 
> sessionid 0x3000c2420cb458d, likely server has closed socket, closing socket 
> connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:25,287] INFO Opening socket connection to server 
> *ZOOKEEPER_SERVER_2:PORT*. Will not attempt to authenticate using SASL 
> (unknown error) (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:25,287] INFO Socket connection established to 
> *ZOOKEEPER_SERVER_2:PORT*, initiating session 
> (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:25,954] INFO [Partition TOPIC_NAME-PARTITION-# broker=1|#* 
> broker=1] Shrinking ISR from 1,3,4,2 to 1,4,2 (kafka.cluster.Partition)
>  [2018-07-14 04:38:26,444] WARN Unable to reconnect to ZooKeeper service, 
> session 0x3000c2420cb458d has expired (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,444] INFO Unable to reconnect to ZooKeeper service, 
> session 0x3000c2420cb458d has expired, closing socket connection 
> (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,445] INFO EventThread shut down for session: 
> 0x3000c2420cb458d (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,446] INFO [ZooKeeperClient] Session expired. 
> (kafka.zookeeper.ZooKeeperClient)
>  [2018-07-14 04:38:26,459] INFO [ZooKeeperClient] Initializing a new session 
> to 
> *ZOOKEEPER_SERVER_1:PORT*,*ZOOKEEPER_SERVER_2:PORT*,*ZOOKEEPER_SERVER_3:PORT*,*ZOOKEEPER_SERVER_4:PORT*.
>  (kafka.zookeeper.ZooKeeperClient)
>  [2018-07-14 04:38:26,459] INFO Initiating client connection, 
> connectString=*ZOOKEEPER_SERVER_1:PORT*,*ZOOKEEPER_SERVER_2:PORT*,*ZOOKEEPER_SERVER_3:PORT*,*ZOOKEEPER_SERVER_4:PORT*
>  sessionTimeout=6000 
> watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@44821a96 
> (org.apache.zookeeper.ZooKeeper)
>  [2018-07-14 04:38:26,465] INFO Opening socket connection to server 
> *ZOOKEEPER_SERVER_1:PORT*. Will not attempt to authenticate using SASL 
> (unknown error) (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,477] INFO Socket connection established to 
> *ZOOKEEPER_SERVER_1:PORT*, initiating session 
> (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,484] INFO Session establishment complete on server 
> *ZOOKEEPER_SERVER_1:PORT*, sessionid = 0x4005b59eb6a0000, negotiated timeout 
> = 6000 (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,496] *INFO Creating /brokers/ids/1* (is it secure? 
> false) (kafka.zk.KafkaZkClient)
>  [2018-07-14 04:38:26,500] INFO Processing notification(s) to /config/changes 
> (kafka.common.ZkNodeChangeNotificationListener)
>  *[2018-07-14 04:38:26,547] ERROR Error while creating ephemeral at 
> /brokers/ids/1, node already exists and owner '216186131422332301' does not 
> match current session '288330817911521280' 
> (kafka.zk.KafkaZkClient$CheckedEphemeral)*
>  [2018-07-14 04:38:26,547] *INFO Result of znode creation at /brokers/ids/1 
> is: NODEEXISTS* (kafka.zk.KafkaZkClient)
>  [2018-07-14 04:38:26,559] ERROR Uncaught exception in scheduled task 
> 'isr-expiration' (kafka.utils.KafkaScheduler)
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired for 
> /brokers/topics/*TOPIC_NAME*/partitions/*PARTITION-#*/state
>  at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>  at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>  at kafka.zookeeper.AsyncResponse.resultException(ZooKeeperClient.scala:465)
>  at kafka.zk.KafkaZkClient.conditionalUpdatePath(KafkaZkClient.scala:621)
>  at 
> kafka.utils.ReplicationUtils$.updateLeaderAndIsr(ReplicationUtils.scala:33)
>  at 
> kafka.cluster.Partition.kafka$cluster$Partition$$updateIsr(Partition.scala:669)
>  at kafka.cluster.Partition$$anonfun$4.apply$mcZ$sp(Partition.scala:513)
>  at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:504)
>  at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:504)
>  at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:250)
>  at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:258)
>  at kafka.cluster.Partition.maybeShrinkIsr(Partition.scala:503)
>  at 
> kafka.server.ReplicaManager$$anonfun$kafka$server$ReplicaManager$$maybeShrinkIsr$2.apply(ReplicaManager.scala:1335)
>  at 
> kafka.server.ReplicaManager$$anonfun$kafka$server$ReplicaManager$$maybeShrinkIsr$2.apply(ReplicaManager.scala:1335)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>  at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>  at 
> kafka.server.ReplicaManager.kafka$server$ReplicaManager$$maybeShrinkIsr(ReplicaManager.scala:1335)
>  at 
> kafka.server.ReplicaManager$$anonfun$2.apply$mcV$sp(ReplicaManager.scala:322)
>  at 
> kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
>  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:62)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> [2018-07-14 04:38:45,938] INFO [Partition TOPIC_NAME-PARTITION-# broker=1|#* 
> broker=1] Shrinking ISR from 1,3,4,2 to 1 (kafka.cluster.Partition)
>  [2018-07-14 04:38:45,952] INFO [Partition TOPIC_NAME-PARTITION-# broker=1|#* 
> broker=1] Cached zkVersion [0] not equal to that in zookeeper, skip updating 
> ISR (kafka.cluster.Partition)
>  [2018-07-14 04:38:45,952] INFO [Partition __consumer_offsets-# broker=1|#* 
> broker=1] Shrinking ISR from 1,2,3,4 to 1 (kafka.cluster.Partition)
>  [2018-07-14 04:38:45,992] INFO [Partition __consumer_offsets-# broker=1|#* 
> broker=1] Cached zkVersion [139] not equal to that in zookeeper, skip 
> updating ISR (kafka.cluster.Partition)
> ...
> ...
>  
> The previous exception continues showing constantly without stopping. Since 
> the cluster was still up, I have left the error showing up for about 6 hours 
> to check if the broker recovers itself.
> The only solution, to put back the broker into the cluster, was restarting.
> After the restart, the replicas sync each other and the broker was able to 
> serve again.
> I imagine this is not a normal behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to