[jira] [Commented] (KAFKA-6029) Controller should wait for the leader migration to finish before ack a ControlledShutdownRequest

Jiangjie Qin (JIRA) Tue, 10 Oct 2017 19:23:03 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199706#comment-16199706
 ]


Jiangjie Qin commented on KAFKA-6029:
-------------------------------------

[~junrao] Good point. That seems more likely to happen. Just to check if I 
understand correctly. Are you suggesting the following solution?
1. Let each broker have an epoch which changes on restart.
2. During controlled shtudown, the controller will send LeaderAndIsrRequest 
with the new ISR + shutting down broker with epoch.
3. Add the broker epoch to the FetchRequest so the each follower will send 
FetchRequest with their broker epoch.
4. If the leader sees a fetch request from a broker that matches the shutting 
down broker and epoch it will not add it back to the ISR.
5. After the broker restarts, the leaders will see a new broker epoch and add 
the restarted broker back to ISR.



> Controller should wait for the leader migration to finish before ack a 
> ControlledShutdownRequest
> ------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-6029
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6029
>             Project: Kafka
>          Issue Type: Improvement
>          Components: controller, core
>    Affects Versions: 1.0.0
>            Reporter: Jiangjie Qin
>             Fix For: 1.1.0
>
>
> In the controlled shutdown process, the controller will return the 
> ControlledShutdownResponse immediately after the state machine is updated. 
> Because the LeaderAndIsrRequests and UpdateMetadataRequests may not have been 
> successfully processed by the brokers, the leader migration and active ISR 
> shrink may not have done when the shutting down broker proceeds to shut down. 
> This will cause some of the leaders to take up to replica.lag.time.max.ms to 
> kick the broker out of ISR. Meanwhile the produce purgatory size will grow.
> Ideally, the controller should wait until all the LeaderAndIsrRequests and 
> UpdateMetadataRequests has been acked before sending back the 
> ControlledShutdownResponse.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-6029) Controller should wait for the leader migration to finish before ack a ControlledShutdownRequest

Reply via email to