[ https://issues.apache.org/jira/browse/KAFKA-340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joel Koshy updated KAFKA-340: ----------------------------- Attachment: KAFKA-340-v1.patch Short summary of this implementation of clean shutdown: - Shutdown is triggered through a JMX operation on the controller. - Steps during shutdown: - Record the broker as shutting down in controller context. - This set will contain all shutting brokers until they are actually taken down. The liveBroker set will mask these (through the custom getter/setter). - Send a "wildcard" StopReplica request to the broker to stop its replica fetchers. This will cause it to fall out of ISR sooner (as explained in the previous comment.) - Identify partitions led by the broker with replication factor > 1 - Transition leadership to another broker in ISR - Return the number of remaining partitions that are led by the broker. In practice, the way you would do clean shutdown is: - Use the admin tool: ./bin/kafka-run-class.sh kafka.admin.ShutdownBroker --broker <bid> --zookeeper <zkconnect> - If the shutdown status that it prints out is "complete" then it means broker <bid> has stopped its replica fetchers, and does not lead any partitions. In this case, send a SIGTERM to the Kafka process to actually take down the broker. - If the shutdown status that is prints is "incomplete" then you may want to wait a bit before retrying - which would typically make sense in a rolling bounce. - If you are bringing down the entire cluster, you will eventually hit the "incomplete" status - since there will be insufficient brokers to move the partition leadership to. In this case the operator presumably knows the situation and will proceed to do an "unclean" shutdown on the remaining brokers. - If the jmx operation itself fails (say due to a controller failover), simply retry. Other comments: - I initially thought to use boolean for handleStateChange, but needed to query for the actual moved partition counts so did away with that. - Also, considered using a zkpath (instead of jmx), but did not do this because we would effectively lock the zkclient event thread until all partition leadership moves are attempted. In this implementation the controller context's lock is relinquished after moving each partition. Another benefit of jmx over the zkpath is that it is convenient to return the shutdown status so there's no need for a follow-up status check. - For stopping the replica fetchers, I simply used a "wildcard" StopReplica request - i.e., without any partitions listed. The broker will not get any more leaderAndIsr requests (since it is no longer exposed under liveBrokers) so the fetchers will not restart. - I added a slightly dumb unit test (in addition to local stand-alone testing), but we will need a more rigorous system test for this. - Please let me know if you can think of corner cases to test for. > Implement clean shutdown in 0.8 > ------------------------------- > > Key: KAFKA-340 > URL: https://issues.apache.org/jira/browse/KAFKA-340 > Project: Kafka > Issue Type: Sub-task > Components: core > Affects Versions: 0.8 > Reporter: Jun Rao > Assignee: Joel Koshy > Priority: Blocker > Labels: bugs > Attachments: KAFKA-340-v1.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > If we are shutting down a broker when the ISR of a partition includes only > that broker, we could lose some messages that have been previously committed. > For clean shutdown, we need to guarantee that there is at least 1 other > broker in ISR after the broker is shut down. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira