[jira] [Commented] (KAFKA-911) Bug in controlled shutdown logic in controller leads to controller not sending out some state change request
[ https://issues.apache.org/jira/browse/KAFKA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699500#comment-13699500 ] Sriram Subramanian commented on KAFKA-911: -- This has been fixed. Bug in controlled shutdown logic in controller leads to controller not sending out some state change request - Key: KAFKA-911 URL: https://issues.apache.org/jira/browse/KAFKA-911 Project: Kafka Issue Type: Bug Components: controller Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Blocker Labels: kafka-0.8, p1 Attachments: kafka-911-v1.patch, kafka-911-v2.patch The controlled shutdown logic in the controller first tries to move the leaders from the broker being shutdown. Then it tries to remove the broker from the isr list. During that operation, it does not synchronize on the controllerLock. This causes a race condition while dispatching data using the controller's channel manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-911) Bug in controlled shutdown logic in controller leads to controller not sending out some state change request
[ https://issues.apache.org/jira/browse/KAFKA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13666445#comment-13666445 ] Neha Narkhede commented on KAFKA-911: - You are right that we can send the reduced ISR request to the leader, but that is independent of removing the shutting down broker from the ISR in zookeeper. I'm arguing that the zookeeper write is unnecessary. To handle the issue you described, we can send a leader and isr request just to the leader with the reduced isr. Bug in controlled shutdown logic in controller leads to controller not sending out some state change request - Key: KAFKA-911 URL: https://issues.apache.org/jira/browse/KAFKA-911 Project: Kafka Issue Type: Bug Components: controller Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Blocker Labels: kafka-0.8, p1 Attachments: kafka-911-v1.patch The controlled shutdown logic in the controller first tries to move the leaders from the broker being shutdown. Then it tries to remove the broker from the isr list. During that operation, it does not synchronize on the controllerLock. This causes a race condition while dispatching data using the controller's channel manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-911) Bug in controlled shutdown logic in controller leads to controller not sending out some state change request
[ https://issues.apache.org/jira/browse/KAFKA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13666532#comment-13666532 ] Joel Koshy commented on KAFKA-911: -- I had to revisit the notes from KAFKA-340. I think this was touched upon. i.e., the fact that the current implementation's attempt to shrink ISR may be ineffective for partitions whose leadership has been moved from the current broker - https://issues.apache.org/jira/browse/KAFKA-340?focusedCommentId=13483478page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13483478 quote 3.4 What is the point of sending leader and isr request at the end of shutdownBroker, since the OfflineReplica state change would've taken care of that anyway. It seems like you just need to send the stop replica request with the delete partitions flag turned off, no ? I still need (as an optimization) to send the leader and isr request to the leaders of all partitions that are present on the shutting down broker so it can remove the shutting down broker from its inSyncReplicas cache (in Partition.scala) so it no longer waits for acks from the shutting down broker if a producer request's num-acks is set to -1. Otherwise, we have to wait for the leader to organically shrink the ISR. This also applies to partitions which are moved (i.e., partitions for which the shutting down broker was the leader): the ControlledShutdownLeaderSelector needs to send the updated leaderAndIsr request to the shutting down broker as well (to tell it that it is no longer the leader) at which point it will start up a replica fetcher and re-enter the ISR. So in fact, there is actually not much point in removing the current leader from the ISR in the ControlledShutdownLeaderSelector.selectLeader. /quote and https://issues.apache.org/jira/browse/KAFKA-340?focusedCommentId=13484727page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13484727 (I don't think I actually filed that jira though.) Bug in controlled shutdown logic in controller leads to controller not sending out some state change request - Key: KAFKA-911 URL: https://issues.apache.org/jira/browse/KAFKA-911 Project: Kafka Issue Type: Bug Components: controller Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Blocker Labels: kafka-0.8, p1 Attachments: kafka-911-v1.patch The controlled shutdown logic in the controller first tries to move the leaders from the broker being shutdown. Then it tries to remove the broker from the isr list. During that operation, it does not synchronize on the controllerLock. This causes a race condition while dispatching data using the controller's channel manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira