Re: Strange ZK Error precedes frequent rebalances
Yes. The rebalance is on consumers in the group and does not take topics into account. On Wed, Oct 14, 2015 at 1:59 PM, noah wrote: > Thanks Gwen. > > So am I right in deducing that any consumer in the same group dropping will > cause a rebalance, regardless of which topics they are subscribed to? > > On Wed, Oct 14, 2015 at 3:52 PM Gwen Shapira wrote: > > > It is not strange, it means that one of the consumers lost connectivity > to > > Zookeeper, its session timed-out and this caused ephemeral ZK nodes (like > > /consumers/real-time-updates/ids/real-time-updates_infra- > > buildagent-06-1444854764478-4dd4d6af) to be removed and ultimately cause > > the rebalance. > > > > What you need is to make sure your consumers don't lose connectivity to > > Zookeeper or that sessions don't time out. You do this by: > > 1. Tuning garbage collection on the consumer apps (G1 is recommended) to > > avoid long GC pauses - leading cause for timeouts > > 2. Increasing Zookeeper session timeout on the consumer > > > > Gwen > > > > On Wed, Oct 14, 2015 at 1:47 PM, noah wrote: > > > > > A number of our developers are seeing errors like the one below in > their > > > console when running a consumer on their laptop. The error is always > > > followed by logging indicating that the local consumer is rebalancing, > > and > > > in the meantime we are not making much progress. > > > > > > I'm reading this as the consumer trying to read a ZK node for another > > > consumer in the same group (running on a different machine,) but the > node > > > is no longer there. I can't tell if that is triggering a rebalance, or > if > > > it's just coincident. > > > > > > In our dev environment, we have a lot (hundreds) of consumers coming > and > > > going from the same consumer group, but they are mostly subscribed to > > > different topics. Is this setup (sharing a consumer group across > topics) > > > potentially causing more rebalances than we would otherwise need? Or is > > > something else entirely going on? > > > > > > LOG: > > > > > > INFO [2015-10-14 20:32:49,138] > > kafka.consumer.ZookeeperConsumerConnector: > > > [real-time-updates_Noahs-MacBook-Pro.local-1444853969114-7b52ecb5], > > > exception during rebalance > > > ! org.I0Itec.zkclient.exception.ZkNoNodeException: > > > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > > > NoNode for > > > > > > > > > /consumers/real-time-updates/ids/real-time-updates_infra-buildagent-06-1444854764478-4dd4d6af > > > ! at > > org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) > > > ~[zkclient-0.3.jar:0.3] > > > ! at > org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685) > > > ~[zkclient-0.3.jar:0.3] > > > ! at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766) > > > ~[zkclient-0.3.jar:0.3] > > > ! at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761) > > > ~[zkclient-0.3.jar:0.3] > > > ! at kafka.utils.ZkUtils$.readData(ZkUtils.scala:443) > > > ~[kafka_2.10-0.8.2.1.jar:na] > > > ! at > kafka.consumer.TopicCount$.constructTopicCount(TopicCount.scala:61) > > > ~[kafka_2.10-0.8.2.1.jar:na] > > > ! at > > > > > > > > > kafka.utils.ZkUtils$$anonfun$getConsumersPerTopic$1.apply(ZkUtils.scala:665) > > > ~[kafka_2.10-0.8.2.1.jar:na] > > > ! at > > > > > > > > > kafka.utils.ZkUtils$$anonfun$getConsumersPerTopic$1.apply(ZkUtils.scala:664) > > > ~[kafka_2.10-0.8.2.1.jar:na] > > > ! at scala.collection.Iterator$class.foreach(Iterator.scala:727) > > > ~[scala-library-2.10.4.jar:na] > > > ! at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > > > ~[scala-library-2.10.4.jar:na] > > > ! at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > > > ~[scala-library-2.10.4.jar:na] > > > ! at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > > > ~[scala-library-2.10.4.jar:na] > > > ! at kafka.utils.ZkUtils$.getConsumersPerTopic(ZkUtils.scala:664) > > > ~[kafka_2.10-0.8.2.1.jar:na] > > > ! at > kafka.consumer.AssignmentContext.(PartitionAssignor.scala:52) > > > ~[kafka_2.10-0.8.2.1.jar:na] > > > ! at > > > > > > > > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:659) > > > [kafka_2.10-0.8.2.1.jar:na] > > > ! at > > > > > > > > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:608) > > > ~[kafka_2.10-0.8.2.1.jar:na] > > > ! at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) > > > [scala-library-2.10.4.jar:na] > > > ! at > > > > > > > > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:602) > > > [kafka_2.10-0.8.2.1.jar:na] > > > ! at > > > > > > > > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConn
Re: Strange ZK Error precedes frequent rebalances
Thanks Gwen. So am I right in deducing that any consumer in the same group dropping will cause a rebalance, regardless of which topics they are subscribed to? On Wed, Oct 14, 2015 at 3:52 PM Gwen Shapira wrote: > It is not strange, it means that one of the consumers lost connectivity to > Zookeeper, its session timed-out and this caused ephemeral ZK nodes (like > /consumers/real-time-updates/ids/real-time-updates_infra- > buildagent-06-1444854764478-4dd4d6af) to be removed and ultimately cause > the rebalance. > > What you need is to make sure your consumers don't lose connectivity to > Zookeeper or that sessions don't time out. You do this by: > 1. Tuning garbage collection on the consumer apps (G1 is recommended) to > avoid long GC pauses - leading cause for timeouts > 2. Increasing Zookeeper session timeout on the consumer > > Gwen > > On Wed, Oct 14, 2015 at 1:47 PM, noah wrote: > > > A number of our developers are seeing errors like the one below in their > > console when running a consumer on their laptop. The error is always > > followed by logging indicating that the local consumer is rebalancing, > and > > in the meantime we are not making much progress. > > > > I'm reading this as the consumer trying to read a ZK node for another > > consumer in the same group (running on a different machine,) but the node > > is no longer there. I can't tell if that is triggering a rebalance, or if > > it's just coincident. > > > > In our dev environment, we have a lot (hundreds) of consumers coming and > > going from the same consumer group, but they are mostly subscribed to > > different topics. Is this setup (sharing a consumer group across topics) > > potentially causing more rebalances than we would otherwise need? Or is > > something else entirely going on? > > > > LOG: > > > > INFO [2015-10-14 20:32:49,138] > kafka.consumer.ZookeeperConsumerConnector: > > [real-time-updates_Noahs-MacBook-Pro.local-1444853969114-7b52ecb5], > > exception during rebalance > > ! org.I0Itec.zkclient.exception.ZkNoNodeException: > > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > > NoNode for > > > > > /consumers/real-time-updates/ids/real-time-updates_infra-buildagent-06-1444854764478-4dd4d6af > > ! at > org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) > > ~[zkclient-0.3.jar:0.3] > > ! at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685) > > ~[zkclient-0.3.jar:0.3] > > ! at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766) > > ~[zkclient-0.3.jar:0.3] > > ! at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761) > > ~[zkclient-0.3.jar:0.3] > > ! at kafka.utils.ZkUtils$.readData(ZkUtils.scala:443) > > ~[kafka_2.10-0.8.2.1.jar:na] > > ! at kafka.consumer.TopicCount$.constructTopicCount(TopicCount.scala:61) > > ~[kafka_2.10-0.8.2.1.jar:na] > > ! at > > > > > kafka.utils.ZkUtils$$anonfun$getConsumersPerTopic$1.apply(ZkUtils.scala:665) > > ~[kafka_2.10-0.8.2.1.jar:na] > > ! at > > > > > kafka.utils.ZkUtils$$anonfun$getConsumersPerTopic$1.apply(ZkUtils.scala:664) > > ~[kafka_2.10-0.8.2.1.jar:na] > > ! at scala.collection.Iterator$class.foreach(Iterator.scala:727) > > ~[scala-library-2.10.4.jar:na] > > ! at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > > ~[scala-library-2.10.4.jar:na] > > ! at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > > ~[scala-library-2.10.4.jar:na] > > ! at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > > ~[scala-library-2.10.4.jar:na] > > ! at kafka.utils.ZkUtils$.getConsumersPerTopic(ZkUtils.scala:664) > > ~[kafka_2.10-0.8.2.1.jar:na] > > ! at kafka.consumer.AssignmentContext.(PartitionAssignor.scala:52) > > ~[kafka_2.10-0.8.2.1.jar:na] > > ! at > > > > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:659) > > [kafka_2.10-0.8.2.1.jar:na] > > ! at > > > > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:608) > > ~[kafka_2.10-0.8.2.1.jar:na] > > ! at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) > > [scala-library-2.10.4.jar:na] > > ! at > > > > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:602) > > [kafka_2.10-0.8.2.1.jar:na] > > ! at > > > > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599) > > [kafka_2.10-0.8.2.1.jar:na] > > ! at > > > > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599) > > [kafka_2.10-0.8.2.1.jar:na] > > ! at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) > > [kafka_2.10-0.8.2.1.jar:na] > > ! at > > > > > kafka.consumer.ZookeeperConsumerCon
Re: Strange ZK Error precedes frequent rebalances
It is not strange, it means that one of the consumers lost connectivity to Zookeeper, its session timed-out and this caused ephemeral ZK nodes (like /consumers/real-time-updates/ids/real-time-updates_infra- buildagent-06-1444854764478-4dd4d6af) to be removed and ultimately cause the rebalance. What you need is to make sure your consumers don't lose connectivity to Zookeeper or that sessions don't time out. You do this by: 1. Tuning garbage collection on the consumer apps (G1 is recommended) to avoid long GC pauses - leading cause for timeouts 2. Increasing Zookeeper session timeout on the consumer Gwen On Wed, Oct 14, 2015 at 1:47 PM, noah wrote: > A number of our developers are seeing errors like the one below in their > console when running a consumer on their laptop. The error is always > followed by logging indicating that the local consumer is rebalancing, and > in the meantime we are not making much progress. > > I'm reading this as the consumer trying to read a ZK node for another > consumer in the same group (running on a different machine,) but the node > is no longer there. I can't tell if that is triggering a rebalance, or if > it's just coincident. > > In our dev environment, we have a lot (hundreds) of consumers coming and > going from the same consumer group, but they are mostly subscribed to > different topics. Is this setup (sharing a consumer group across topics) > potentially causing more rebalances than we would otherwise need? Or is > something else entirely going on? > > LOG: > > INFO [2015-10-14 20:32:49,138] kafka.consumer.ZookeeperConsumerConnector: > [real-time-updates_Noahs-MacBook-Pro.local-1444853969114-7b52ecb5], > exception during rebalance > ! org.I0Itec.zkclient.exception.ZkNoNodeException: > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for > > /consumers/real-time-updates/ids/real-time-updates_infra-buildagent-06-1444854764478-4dd4d6af > ! at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) > ~[zkclient-0.3.jar:0.3] > ! at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685) > ~[zkclient-0.3.jar:0.3] > ! at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766) > ~[zkclient-0.3.jar:0.3] > ! at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761) > ~[zkclient-0.3.jar:0.3] > ! at kafka.utils.ZkUtils$.readData(ZkUtils.scala:443) > ~[kafka_2.10-0.8.2.1.jar:na] > ! at kafka.consumer.TopicCount$.constructTopicCount(TopicCount.scala:61) > ~[kafka_2.10-0.8.2.1.jar:na] > ! at > > kafka.utils.ZkUtils$$anonfun$getConsumersPerTopic$1.apply(ZkUtils.scala:665) > ~[kafka_2.10-0.8.2.1.jar:na] > ! at > > kafka.utils.ZkUtils$$anonfun$getConsumersPerTopic$1.apply(ZkUtils.scala:664) > ~[kafka_2.10-0.8.2.1.jar:na] > ! at scala.collection.Iterator$class.foreach(Iterator.scala:727) > ~[scala-library-2.10.4.jar:na] > ! at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > ~[scala-library-2.10.4.jar:na] > ! at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > ~[scala-library-2.10.4.jar:na] > ! at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > ~[scala-library-2.10.4.jar:na] > ! at kafka.utils.ZkUtils$.getConsumersPerTopic(ZkUtils.scala:664) > ~[kafka_2.10-0.8.2.1.jar:na] > ! at kafka.consumer.AssignmentContext.(PartitionAssignor.scala:52) > ~[kafka_2.10-0.8.2.1.jar:na] > ! at > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:659) > [kafka_2.10-0.8.2.1.jar:na] > ! at > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:608) > ~[kafka_2.10-0.8.2.1.jar:na] > ! at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) > [scala-library-2.10.4.jar:na] > ! at > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:602) > [kafka_2.10-0.8.2.1.jar:na] > ! at > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599) > [kafka_2.10-0.8.2.1.jar:na] > ! at > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599) > [kafka_2.10-0.8.2.1.jar:na] > ! at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) > [kafka_2.10-0.8.2.1.jar:na] > ! at > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:598) > [kafka_2.10-0.8.2.1.jar:na] > ! at > > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(ZookeeperConsumerConnector.scala:551) > [kafka_2.10-0.8.2.1.jar:na] > Caused by: ! org.apache.zookeeper.KeeperException$NoNodeException: > KeeperErrorCode = NoNode for > > /consumers/real-time-updates/ids/real-time-updates_infra-buildagent
Strange ZK Error precedes frequent rebalances
A number of our developers are seeing errors like the one below in their console when running a consumer on their laptop. The error is always followed by logging indicating that the local consumer is rebalancing, and in the meantime we are not making much progress. I'm reading this as the consumer trying to read a ZK node for another consumer in the same group (running on a different machine,) but the node is no longer there. I can't tell if that is triggering a rebalance, or if it's just coincident. In our dev environment, we have a lot (hundreds) of consumers coming and going from the same consumer group, but they are mostly subscribed to different topics. Is this setup (sharing a consumer group across topics) potentially causing more rebalances than we would otherwise need? Or is something else entirely going on? LOG: INFO [2015-10-14 20:32:49,138] kafka.consumer.ZookeeperConsumerConnector: [real-time-updates_Noahs-MacBook-Pro.local-1444853969114-7b52ecb5], exception during rebalance ! org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/real-time-updates/ids/real-time-updates_infra-buildagent-06-1444854764478-4dd4d6af ! at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) ~[zkclient-0.3.jar:0.3] ! at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685) ~[zkclient-0.3.jar:0.3] ! at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766) ~[zkclient-0.3.jar:0.3] ! at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761) ~[zkclient-0.3.jar:0.3] ! at kafka.utils.ZkUtils$.readData(ZkUtils.scala:443) ~[kafka_2.10-0.8.2.1.jar:na] ! at kafka.consumer.TopicCount$.constructTopicCount(TopicCount.scala:61) ~[kafka_2.10-0.8.2.1.jar:na] ! at kafka.utils.ZkUtils$$anonfun$getConsumersPerTopic$1.apply(ZkUtils.scala:665) ~[kafka_2.10-0.8.2.1.jar:na] ! at kafka.utils.ZkUtils$$anonfun$getConsumersPerTopic$1.apply(ZkUtils.scala:664) ~[kafka_2.10-0.8.2.1.jar:na] ! at scala.collection.Iterator$class.foreach(Iterator.scala:727) ~[scala-library-2.10.4.jar:na] ! at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) ~[scala-library-2.10.4.jar:na] ! at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) ~[scala-library-2.10.4.jar:na] ! at scala.collection.AbstractIterable.foreach(Iterable.scala:54) ~[scala-library-2.10.4.jar:na] ! at kafka.utils.ZkUtils$.getConsumersPerTopic(ZkUtils.scala:664) ~[kafka_2.10-0.8.2.1.jar:na] ! at kafka.consumer.AssignmentContext.(PartitionAssignor.scala:52) ~[kafka_2.10-0.8.2.1.jar:na] ! at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:659) [kafka_2.10-0.8.2.1.jar:na] ! at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:608) ~[kafka_2.10-0.8.2.1.jar:na] ! at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) [scala-library-2.10.4.jar:na] ! at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:602) [kafka_2.10-0.8.2.1.jar:na] ! at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599) [kafka_2.10-0.8.2.1.jar:na] ! at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599) [kafka_2.10-0.8.2.1.jar:na] ! at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) [kafka_2.10-0.8.2.1.jar:na] ! at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:598) [kafka_2.10-0.8.2.1.jar:na] ! at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(ZookeeperConsumerConnector.scala:551) [kafka_2.10-0.8.2.1.jar:na] Caused by: ! org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/real-time-updates/ids/real-time-updates_infra-buildagent-06-1444854764478-4dd4d6af ! at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) ~[zookeeper-3.4.6.jar:3.4.6-1569965] ! at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[zookeeper-3.4.6.jar:3.4.6-1569965] ! at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) ~[zookeeper-3.4.6.jar:3.4.6-1569965] ! at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1184) ~[zookeeper-3.4.6.jar:3.4.6-1569965] ! at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103) ~[zkclient-0.3.jar:0.3] ! at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770) ~[zkclient-0.3.jar:0.3] ! at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766) ~[zkclient-0.3.jar:0.3] ! at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675) ~[zkclient-0.3.jar:0.3] !... 21 common frames omitted I