[ 
https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13671922#comment-13671922
 ] 

Sriram Subramanian commented on KAFKA-927:
------------------------------------------

Thank you for the review guys.

Jun - 

1. I did not add an explicit check since the state manager anyway throw. Added 
one explicitly.
3. I have thought different approaches to do it exactly right but all of them 
seems ugly. I will think a little bit more but it should not be a blocker since 
most of the active topic would realize the isr change soon due to the number of 
messages.
4. I don't like to overload config value with multiple meanings. The config 
name should do exactly what it is meant to in my opinion.
5. It works perfectly fine. Infact there is a unit test for the old tool that 
works fine. The functionality of the tool has not changed.
6. Added the missing files

Neha -
1.
1.1 Done
1.2 Done
1.3 Done
1.4 Refactoring the method is actually pretty tricky. There are actually 
multiple dependencies between the methods (previousController, channels). The 
individual functions (if we manage to refactor), dont really do a single 
operation to provide a good name to it. I have added comments to make things 
clear. 
1.6 Added a warn
1.7 renamed the method name

2. Done
3. If the leader did realize that the follower has fallen off the isr soon then 
there are no issues. However that does not seem to be the case. We use the 
number of messages or time period to decide when to remove the follower from 
the isr. The default time period used is 10sec. So it could take as long as 10 
seconds to realize the isr change. However the controller may do it much sooner 
and hence we get the better of the two. Having said that, there is an ugly fix 
to make the leader realize the isr change immediately after the replica thread 
stops but I need more time to think about that.
4. updated the state change log
5. added the missing files


                
> Integrate controlled shutdown into kafka shutdown hook
> ------------------------------------------------------
>
>                 Key: KAFKA-927
>                 URL: https://issues.apache.org/jira/browse/KAFKA-927
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Sriram Subramanian
>            Assignee: Sriram Subramanian
>         Attachments: KAFKA-927.patch, KAFKA-927-v2.patch
>
>
> The controlled shutdown mechanism should be integrated into the software for 
> better operational benefits. Also few optimizations can be done to reduce 
> unnecessary rpc and zk calls. This patch has been tested on a prod like 
> environment by doing rolling bounces continuously for a day. The average time 
> of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes 
> without this patch is 340 seconds. With this patch it reduces to 220 seconds. 
> Also it ensures correctness in scenarios where the controller shrinks the isr 
> and the new leader could place the broker to be shutdown back into the isr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to