[ https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13671922#comment-13671922 ]
Sriram Subramanian commented on KAFKA-927: ------------------------------------------ Thank you for the review guys. Jun - 1. I did not add an explicit check since the state manager anyway throw. Added one explicitly. 3. I have thought different approaches to do it exactly right but all of them seems ugly. I will think a little bit more but it should not be a blocker since most of the active topic would realize the isr change soon due to the number of messages. 4. I don't like to overload config value with multiple meanings. The config name should do exactly what it is meant to in my opinion. 5. It works perfectly fine. Infact there is a unit test for the old tool that works fine. The functionality of the tool has not changed. 6. Added the missing files Neha - 1. 1.1 Done 1.2 Done 1.3 Done 1.4 Refactoring the method is actually pretty tricky. There are actually multiple dependencies between the methods (previousController, channels). The individual functions (if we manage to refactor), dont really do a single operation to provide a good name to it. I have added comments to make things clear. 1.6 Added a warn 1.7 renamed the method name 2. Done 3. If the leader did realize that the follower has fallen off the isr soon then there are no issues. However that does not seem to be the case. We use the number of messages or time period to decide when to remove the follower from the isr. The default time period used is 10sec. So it could take as long as 10 seconds to realize the isr change. However the controller may do it much sooner and hence we get the better of the two. Having said that, there is an ugly fix to make the leader realize the isr change immediately after the replica thread stops but I need more time to think about that. 4. updated the state change log 5. added the missing files > Integrate controlled shutdown into kafka shutdown hook > ------------------------------------------------------ > > Key: KAFKA-927 > URL: https://issues.apache.org/jira/browse/KAFKA-927 > Project: Kafka > Issue Type: Bug > Reporter: Sriram Subramanian > Assignee: Sriram Subramanian > Attachments: KAFKA-927.patch, KAFKA-927-v2.patch > > > The controlled shutdown mechanism should be integrated into the software for > better operational benefits. Also few optimizations can be done to reduce > unnecessary rpc and zk calls. This patch has been tested on a prod like > environment by doing rolling bounces continuously for a day. The average time > of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes > without this patch is 340 seconds. With this patch it reduces to 220 seconds. > Also it ensures correctness in scenarios where the controller shrinks the isr > and the new leader could place the broker to be shutdown back into the isr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira