[ https://issues.apache.org/jira/browse/KAFKA-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kyle Ambroff-Kao updated KAFKA-6469: ------------------------------------ Description: When the writes /isr_change_notification in ZooKeeper (which is effectively a queue of ISR change events for the controller) happen at a rate high enough that the node with a watch can't dequeue them, the trouble starts. The watcher kafka.controller.IsrChangeNotificationListener is fired in the controller when a new entry is written to /isr_change_notification, and the zkclient library sends a GetChildrenRequest to zookeeper to fetch all child znodes. We've seen this happen in one of our test clusters as the partition count started to climb north of 60k per broker. We had brokers writing child nodes under /isr_change_notification that were larger than the jute.maxbuffer size in ZooKeeper (1MB), causing the ZooKeeper server to drop the controller's session, effectively bricking the cluster. This can be partially mitigated by chunking ISR notifications to increase the maximum number of partitions a broker can host. was: When the writes /isr_change_notification in ZooKeeper (which is effectively a queue of ISR change events for the controller) happen at a rate high enough that the node with a watch can't dequeue them, the trouble starts. The watcher kafka.controller.IsrChangeNotificationListener is fired in the controller when a new entry is written to /isr_change_notification, and the zkclient library sends a GetChildrenRequest to zookeeper to fetch all child znodes. We've seen this happen in one of our test clusters as the partition count started to climb north of 60k per broker. We had brokers writing child nodes under /isr_change_notification that were larger than the jute.maxbuffer size in ZooKeeper (1MB), causing the ZooKeeper server to drop the controller's session, effectively bricking the cluster. This can be partially mitigated by chunking ISR notifications to increase the maximum number of partitions a broker can host. > ISR change notification queue can prevent controller from making progress > ------------------------------------------------------------------------- > > Key: KAFKA-6469 > URL: https://issues.apache.org/jira/browse/KAFKA-6469 > Project: Kafka > Issue Type: Bug > Reporter: Kyle Ambroff-Kao > Priority: Major > > When the writes /isr_change_notification in ZooKeeper (which is effectively a > queue of ISR change events for the controller) happen at a rate high enough > that the node with a watch can't dequeue them, the trouble starts. > The watcher kafka.controller.IsrChangeNotificationListener is fired in the > controller when a new entry is written to /isr_change_notification, and the > zkclient library sends a GetChildrenRequest to zookeeper to fetch all child > znodes. > We've seen this happen in one of our test clusters as the partition count > started to climb north of 60k per broker. We had brokers writing child nodes > under /isr_change_notification that were larger than the jute.maxbuffer size > in ZooKeeper (1MB), causing the ZooKeeper server to drop the controller's > session, effectively bricking the cluster. > This can be partially mitigated by chunking ISR notifications to increase the > maximum number of partitions a broker can host. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)