[ 
https://issues.apache.org/jira/browse/KAFKA-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14658781#comment-14658781
 ] 

Jiangjie Qin commented on KAFKA-2406:
-------------------------------------

[~junrao], I'm still doing the test, here is what I see:
1. When controller was on old version, the controlled shutdown during rolling 
bounce seems to be normal.
2. After controller is running on new version, the controlled shutdown of 
brokers become very slow.
3. There are many zk paths still left in /isr_change_notification after the 
cluster bounce.
4. The first broker shuts down a little bit slower than before, but after that 
the subsequent shutdown takes super long - which is expected.

Because of [1] I was thinking maybe throttling UpdateMetadataRequest would be 
the minimum solution for now, but I have the same concern as you do on the 
number of zk writes and watcher fires.
[3] probably is an indication of zk watcher cannot catch up with the change 
reported by brokers.

Having broker side to batch the changes makes sense. I am thinking about doing 
the following:
1. Broker only update ISR change in a batch periodically by writing partitions 
data to /isr_change_notification/brokerId_IsrChangeEpoch path. so zkWrite is 
bounded to #broker/update_interval.
2. Instead of using zk watcher, controller simply periodically query zookeeper 
and propagate ISR changes.

> ISR propagation should be throttled to avoid overwhelming controller.
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-2406
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2406
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jiangjie Qin
>            Assignee: Jiangjie Qin
>            Priority: Blocker
>
> This is a follow up patch for KAFKA-1367.
> We need to throttle the ISR propagation rate to avoid flooding in controller 
> to broker traffic. This might significantly increase time of controlled 
> shutdown or cluster startup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to