[ https://issues.apache.org/jira/browse/KAFKA-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16335351#comment-16335351 ]
Kyle Ambroff-Kao commented on KAFKA-6469: ----------------------------------------- I'm going to work on a patch for this. > ISR change notification queue has a maximum size > ------------------------------------------------ > > Key: KAFKA-6469 > URL: https://issues.apache.org/jira/browse/KAFKA-6469 > Project: Kafka > Issue Type: Bug > Reporter: Kyle Ambroff-Kao > Priority: Major > > When the writes /isr_change_notification in ZooKeeper (which is effectively a > queue of ISR change events for the controller) happen at a rate high enough > that the node with a watch can't keep up dequeuing them, the trouble starts. > The watcher kafka.controller.IsrChangeNotificationListener is fired in the > controller when a new entry is written to /isr_change_notification, and the > zkclient library sends a GetChildrenRequest to zookeeper to fetch all child > znodes. The size of the GetChildrenResponse returned by ZooKeeper is the > problem. Reading through the code and running some tests to confirm shows > that an empty GetChildrenResponse is 4 bytes on the wire, and every child > node name minimum 4 bytes as well. Since these znodes are length 21, that > means every child znode will account for 25 bytes in the response. > A GetChildrenResponse with 42k child nodes of the same length will be just > about 1.001MB, which is larger than the 1MB data frame that ZooKeeper uses. > This causes the ZooKeeper server to drop the broker's session. > So if 42k ISR changes happen at once, and the controller pauses at just the > right time, you'll end up with a queue that can no longer be drained. > We've seen this happen in one of our test clusters as the partition count > started to climb north of 60k per broker. We had a hardware failure that lead > to the cluster writing so many child nodes to /isr_change_notification that > the controller could no longer list its children, effectively bricking the > cluster. > This can be partially mitigated by chunking ISR notifications to increase the > maximum number of partitions a broker can host. -- This message was sent by Atlassian JIRA (v7.6.3#76005)