[ 
https://issues.apache.org/jira/browse/KAFKA-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Ambroff-Kao updated KAFKA-6469:
------------------------------------
    Summary: ISR change notification queue can prevent controller from making 
progress  (was: ISR change notification queue has a maximum size)

> ISR change notification queue can prevent controller from making progress
> -------------------------------------------------------------------------
>
>                 Key: KAFKA-6469
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6469
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Kyle Ambroff-Kao
>            Priority: Major
>
> When the writes /isr_change_notification in ZooKeeper (which is effectively a 
> queue of ISR change events for the controller) happen at a rate high enough 
> that the node with a watch can't keep up dequeuing them, the trouble starts.
> The watcher kafka.controller.IsrChangeNotificationListener is fired in the 
> controller when a new entry is written to /isr_change_notification, and the 
> zkclient library sends a GetChildrenRequest to zookeeper to fetch all child 
> znodes. The size of the GetChildrenResponse returned by ZooKeeper is the 
> problem. Reading through the code and running some tests to confirm shows 
> that an empty GetChildrenResponse is 4 bytes on the wire, and every child 
> node name minimum 4 bytes as well. Since these znodes are length 21, that 
> means every child znode will account for 25 bytes in the response.
> A GetChildrenResponse with 42k child nodes of the same length will be just 
> about 1.001MB, which is larger than the 1MB data frame that ZooKeeper uses. 
> This causes the ZooKeeper server to drop the broker's session.
> So if 42k ISR changes happen at once, and the controller pauses at just the 
> right time, you'll end up with a queue that can no longer be drained.
> We've seen this happen in one of our test clusters as the partition count 
> started to climb north of 60k per broker. We had a hardware failure that lead 
> to the cluster writing so many child nodes to /isr_change_notification that 
> the controller could no longer list its children, effectively bricking the 
> cluster.
> This can be partially mitigated by chunking ISR notifications to increase the 
> maximum number of partitions a broker can host.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to