[jira] [Updated] (KAFKA-6469) ISR change notification queue can prevent controller from making progress

2018-01-22 Thread Kyle Ambroff-Kao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Ambroff-Kao updated KAFKA-6469:

Summary: ISR change notification queue can prevent controller from making 
progress  (was: ISR change notification queue has a maximum size)

> ISR change notification queue can prevent controller from making progress
> -
>
> Key: KAFKA-6469
> URL: https://issues.apache.org/jira/browse/KAFKA-6469
> Project: Kafka
>  Issue Type: Bug
>Reporter: Kyle Ambroff-Kao
>Priority: Major
>
> When the writes /isr_change_notification in ZooKeeper (which is effectively a 
> queue of ISR change events for the controller) happen at a rate high enough 
> that the node with a watch can't keep up dequeuing them, the trouble starts.
> The watcher kafka.controller.IsrChangeNotificationListener is fired in the 
> controller when a new entry is written to /isr_change_notification, and the 
> zkclient library sends a GetChildrenRequest to zookeeper to fetch all child 
> znodes. The size of the GetChildrenResponse returned by ZooKeeper is the 
> problem. Reading through the code and running some tests to confirm shows 
> that an empty GetChildrenResponse is 4 bytes on the wire, and every child 
> node name minimum 4 bytes as well. Since these znodes are length 21, that 
> means every child znode will account for 25 bytes in the response.
> A GetChildrenResponse with 42k child nodes of the same length will be just 
> about 1.001MB, which is larger than the 1MB data frame that ZooKeeper uses. 
> This causes the ZooKeeper server to drop the broker's session.
> So if 42k ISR changes happen at once, and the controller pauses at just the 
> right time, you'll end up with a queue that can no longer be drained.
> We've seen this happen in one of our test clusters as the partition count 
> started to climb north of 60k per broker. We had a hardware failure that lead 
> to the cluster writing so many child nodes to /isr_change_notification that 
> the controller could no longer list its children, effectively bricking the 
> cluster.
> This can be partially mitigated by chunking ISR notifications to increase the 
> maximum number of partitions a broker can host.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6469) ISR change notification queue can prevent controller from making progress

2018-01-22 Thread Kyle Ambroff-Kao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Ambroff-Kao updated KAFKA-6469:

Description: 
When the writes /isr_change_notification in ZooKeeper (which is effectively a 
queue of ISR change events for the controller) happen at a rate high enough 
that the node with a watch can't dequeue them, the trouble starts.

 

The watcher kafka.controller.IsrChangeNotificationListener is fired in the 
controller when a new entry is written to /isr_change_notification, and the 
zkclient library sends a GetChildrenRequest to zookeeper to fetch all child 
znodes.

We've seen this happen in one of our test clusters as the partition count 
started to climb north of 60k per broker. We had brokers writing child nodes 
under /isr_change_notification that were larger than the jute.maxbuffer size in 
ZooKeeper (1MB), causing the ZooKeeper server to drop the controller's session, 
effectively bricking the cluster.

This can be partially mitigated by chunking ISR notifications to increase the 
maximum number of partitions a broker can host.

 

  was:
When the writes /isr_change_notification in ZooKeeper (which is effectively a 
queue of ISR change events for the controller) happen at a rate high enough 
that the node with a watch can't keep up dequeuing them, the trouble starts.

The watcher kafka.controller.IsrChangeNotificationListener is fired in the 
controller when a new entry is written to /isr_change_notification, and the 
zkclient library sends a GetChildrenRequest to zookeeper to fetch all child 
znodes. The size of the GetChildrenResponse returned by ZooKeeper is the 
problem. Reading through the code and running some tests to confirm shows that 
an empty GetChildrenResponse is 4 bytes on the wire, and every child node name 
minimum 4 bytes as well. Since these znodes are length 21, that means every 
child znode will account for 25 bytes in the response.

A GetChildrenResponse with 42k child nodes of the same length will be just 
about 1.001MB, which is larger than the 1MB data frame that ZooKeeper uses. 
This causes the ZooKeeper server to drop the broker's session.

So if 42k ISR changes happen at once, and the controller pauses at just the 
right time, you'll end up with a queue that can no longer be drained.

We've seen this happen in one of our test clusters as the partition count 
started to climb north of 60k per broker. We had a hardware failure that lead 
to the cluster writing so many child nodes to /isr_change_notification that the 
controller could no longer list its children, effectively bricking the cluster.

This can be partially mitigated by chunking ISR notifications to increase the 
maximum number of partitions a broker can host.


> ISR change notification queue can prevent controller from making progress
> -
>
> Key: KAFKA-6469
> URL: https://issues.apache.org/jira/browse/KAFKA-6469
> Project: Kafka
>  Issue Type: Bug
>Reporter: Kyle Ambroff-Kao
>Priority: Major
>
> When the writes /isr_change_notification in ZooKeeper (which is effectively a 
> queue of ISR change events for the controller) happen at a rate high enough 
> that the node with a watch can't dequeue them, the trouble starts.
>  
> The watcher kafka.controller.IsrChangeNotificationListener is fired in the 
> controller when a new entry is written to /isr_change_notification, and the 
> zkclient library sends a GetChildrenRequest to zookeeper to fetch all child 
> znodes.
> We've seen this happen in one of our test clusters as the partition count 
> started to climb north of 60k per broker. We had brokers writing child nodes 
> under /isr_change_notification that were larger than the jute.maxbuffer size 
> in ZooKeeper (1MB), causing the ZooKeeper server to drop the controller's 
> session, effectively bricking the cluster.
> This can be partially mitigated by chunking ISR notifications to increase the 
> maximum number of partitions a broker can host.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6469) ISR change notification queue can prevent controller from making progress

2018-01-22 Thread Kyle Ambroff-Kao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Ambroff-Kao updated KAFKA-6469:

Description: 
When the writes /isr_change_notification in ZooKeeper (which is effectively a 
queue of ISR change events for the controller) happen at a rate high enough 
that the node with a watch can't dequeue them, the trouble starts.

The watcher kafka.controller.IsrChangeNotificationListener is fired in the 
controller when a new entry is written to /isr_change_notification, and the 
zkclient library sends a GetChildrenRequest to zookeeper to fetch all child 
znodes.

We've seen this happen in one of our test clusters as the partition count 
started to climb north of 60k per broker. We had brokers writing child nodes 
under /isr_change_notification that were larger than the jute.maxbuffer size in 
ZooKeeper (1MB), causing the ZooKeeper server to drop the controller's session, 
effectively bricking the cluster.

This can be partially mitigated by chunking ISR notifications to increase the 
maximum number of partitions a broker can host.

 

  was:
When the writes /isr_change_notification in ZooKeeper (which is effectively a 
queue of ISR change events for the controller) happen at a rate high enough 
that the node with a watch can't dequeue them, the trouble starts.

 

The watcher kafka.controller.IsrChangeNotificationListener is fired in the 
controller when a new entry is written to /isr_change_notification, and the 
zkclient library sends a GetChildrenRequest to zookeeper to fetch all child 
znodes.

We've seen this happen in one of our test clusters as the partition count 
started to climb north of 60k per broker. We had brokers writing child nodes 
under /isr_change_notification that were larger than the jute.maxbuffer size in 
ZooKeeper (1MB), causing the ZooKeeper server to drop the controller's session, 
effectively bricking the cluster.

This can be partially mitigated by chunking ISR notifications to increase the 
maximum number of partitions a broker can host.

 


> ISR change notification queue can prevent controller from making progress
> -
>
> Key: KAFKA-6469
> URL: https://issues.apache.org/jira/browse/KAFKA-6469
> Project: Kafka
>  Issue Type: Bug
>Reporter: Kyle Ambroff-Kao
>Priority: Major
>
> When the writes /isr_change_notification in ZooKeeper (which is effectively a 
> queue of ISR change events for the controller) happen at a rate high enough 
> that the node with a watch can't dequeue them, the trouble starts.
> The watcher kafka.controller.IsrChangeNotificationListener is fired in the 
> controller when a new entry is written to /isr_change_notification, and the 
> zkclient library sends a GetChildrenRequest to zookeeper to fetch all child 
> znodes.
> We've seen this happen in one of our test clusters as the partition count 
> started to climb north of 60k per broker. We had brokers writing child nodes 
> under /isr_change_notification that were larger than the jute.maxbuffer size 
> in ZooKeeper (1MB), causing the ZooKeeper server to drop the controller's 
> session, effectively bricking the cluster.
> This can be partially mitigated by chunking ISR notifications to increase the 
> maximum number of partitions a broker can host.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6469) ISR change notification queue can prevent controller from making progress

2018-01-22 Thread Kyle Ambroff-Kao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Ambroff-Kao updated KAFKA-6469:

Description: 
When the writes /isr_change_notification in ZooKeeper (which is effectively a 
queue of ISR change events for the controller) happen at a rate high enough 
that the node with a watch can't dequeue them, the trouble starts.

The watcher kafka.controller.IsrChangeNotificationListener is fired in the 
controller when a new entry is written to /isr_change_notification, and the 
zkclient library sends a GetChildrenRequest to zookeeper to fetch all child 
znodes.

We've failures in one of our test clusters as the partition count started to 
climb north of 60k per broker. We had brokers writing child nodes under 
/isr_change_notification that were larger than the jute.maxbuffer size in 
ZooKeeper (1MB), causing the ZooKeeper server to drop the controller's session, 
effectively bricking the cluster.

This can be partially mitigated by chunking ISR notifications to increase the 
maximum number of partitions a broker can host.

 

  was:
When the writes /isr_change_notification in ZooKeeper (which is effectively a 
queue of ISR change events for the controller) happen at a rate high enough 
that the node with a watch can't dequeue them, the trouble starts.

The watcher kafka.controller.IsrChangeNotificationListener is fired in the 
controller when a new entry is written to /isr_change_notification, and the 
zkclient library sends a GetChildrenRequest to zookeeper to fetch all child 
znodes.

We've seen this happen in one of our test clusters as the partition count 
started to climb north of 60k per broker. We had brokers writing child nodes 
under /isr_change_notification that were larger than the jute.maxbuffer size in 
ZooKeeper (1MB), causing the ZooKeeper server to drop the controller's session, 
effectively bricking the cluster.

This can be partially mitigated by chunking ISR notifications to increase the 
maximum number of partitions a broker can host.

 


> ISR change notification queue can prevent controller from making progress
> -
>
> Key: KAFKA-6469
> URL: https://issues.apache.org/jira/browse/KAFKA-6469
> Project: Kafka
>  Issue Type: Bug
>Reporter: Kyle Ambroff-Kao
>Assignee: Kyle Ambroff-Kao
>Priority: Major
>
> When the writes /isr_change_notification in ZooKeeper (which is effectively a 
> queue of ISR change events for the controller) happen at a rate high enough 
> that the node with a watch can't dequeue them, the trouble starts.
> The watcher kafka.controller.IsrChangeNotificationListener is fired in the 
> controller when a new entry is written to /isr_change_notification, and the 
> zkclient library sends a GetChildrenRequest to zookeeper to fetch all child 
> znodes.
> We've failures in one of our test clusters as the partition count started to 
> climb north of 60k per broker. We had brokers writing child nodes under 
> /isr_change_notification that were larger than the jute.maxbuffer size in 
> ZooKeeper (1MB), causing the ZooKeeper server to drop the controller's 
> session, effectively bricking the cluster.
> This can be partially mitigated by chunking ISR notifications to increase the 
> maximum number of partitions a broker can host.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)