[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change
[ https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331093#comment-16331093 ] Jeff Widman commented on KAFKA-1120: The issue description says "the broker will be in this weird state until it is restarted." Couldn't this also be fixed by simply forcing a controller re-election? Since it will re-identiy the leaders? > Controller could miss a broker state change > > > Key: KAFKA-1120 > URL: https://issues.apache.org/jira/browse/KAFKA-1120 > Project: Kafka > Issue Type: Sub-task > Components: core >Affects Versions: 0.8.1 >Reporter: Jun Rao >Assignee: Mickael Maison >Priority: Major > Labels: reliability > Fix For: 1.1.0 > > > When the controller is in the middle of processing a task (e.g., preferred > leader election, broker change), it holds a controller lock. During this > time, a broker could have de-registered and re-registered itself in ZK. After > the controller finishes processing the current task, it will start processing > the logic in the broker change listener. However, it will see no broker > change and therefore won't do anything to the restarted broker. This broker > will be in a weird state since the controller doesn't inform it to become the > leader of any partition. Yet, the cached metadata in other brokers could > still list that broker as the leader for some partitions. Client requests > routed to that broker will then get a TopicOrPartitionNotExistException. This > broker will continue to be in this bad state until it's restarted again. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change
[ https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277309#comment-16277309 ] Ramnatthan Alagappan commented on KAFKA-1120: - My way to constantly reproduce this issue might be not so correct or desired. However, here are the "steps". The key problem is with the broker registering with ZK before two events on the controller: 1. ZKClient re-registering callback for /brokers/ids (the callback is initially issued for the deletion of the broker), 2. ZKClient getting the data of /brokers/ids to check what has changed. This code is done in the fireChildChangedEvents function in the ZKClient library. To consistently trigger this issue, I patched ZkClient with a sleep inside this function: Thread.sleep(5000); exists(path); List children = getChildren(path). With this added delay, the deletion handler would sleep before re-registering the callback. If I restart the shutdown broker during this sleep, I see that the restarted broker would never appear in newBrokerIds. > Controller could miss a broker state change > > > Key: KAFKA-1120 > URL: https://issues.apache.org/jira/browse/KAFKA-1120 > Project: Kafka > Issue Type: Sub-task > Components: core >Affects Versions: 0.8.1 >Reporter: Jun Rao >Assignee: Mickael Maison > Labels: reliability > Fix For: 1.1.0 > > > When the controller is in the middle of processing a task (e.g., preferred > leader election, broker change), it holds a controller lock. During this > time, a broker could have de-registered and re-registered itself in ZK. After > the controller finishes processing the current task, it will start processing > the logic in the broker change listener. However, it will see no broker > change and therefore won't do anything to the restarted broker. This broker > will be in a weird state since the controller doesn't inform it to become the > leader of any partition. Yet, the cached metadata in other brokers could > still list that broker as the leader for some partitions. Client requests > routed to that broker will then get a TopicOrPartitionNotExistException. This > broker will continue to be in this bad state until it's restarted again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change
[ https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277179#comment-16277179 ] Mickael Maison commented on KAFKA-1120: --- [~ramanala] Can you share your steps ? > Controller could miss a broker state change > > > Key: KAFKA-1120 > URL: https://issues.apache.org/jira/browse/KAFKA-1120 > Project: Kafka > Issue Type: Sub-task > Components: core >Affects Versions: 0.8.1 >Reporter: Jun Rao >Assignee: Mickael Maison > Labels: reliability > Fix For: 1.1.0 > > > When the controller is in the middle of processing a task (e.g., preferred > leader election, broker change), it holds a controller lock. During this > time, a broker could have de-registered and re-registered itself in ZK. After > the controller finishes processing the current task, it will start processing > the logic in the broker change listener. However, it will see no broker > change and therefore won't do anything to the restarted broker. This broker > will be in a weird state since the controller doesn't inform it to become the > leader of any partition. Yet, the cached metadata in other brokers could > still list that broker as the leader for some partitions. Client requests > routed to that broker will then get a TopicOrPartitionNotExistException. This > broker will continue to be in this bad state until it's restarted again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change
[ https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277146#comment-16277146 ] Ramnatthan Alagappan commented on KAFKA-1120: - I ran into this issue and have a reproducible setup irrespective of the number of partitions or nodes. [~onurkaraman]'s analysis in comment @ [#comment-16113645] is correct. The root cause is that the shutdown broker restarts and registers with ZK in a short interval of time. During this time, ZK delivers a callback for deletion of the broker. Before ZKClient can reestablish the callback (by issuing a stat call), the broker registers with ZK. By the time ZKClient gets the /brokers/ids node from ZK, the shutdown broker also appears in /brokers/ids. With this, the shutdown broker appears both in curBrokerIds and liveOrShuttingDownBrokerIds, causing newBrokerIds to be empty, which causes this problem. > Controller could miss a broker state change > > > Key: KAFKA-1120 > URL: https://issues.apache.org/jira/browse/KAFKA-1120 > Project: Kafka > Issue Type: Sub-task > Components: core >Affects Versions: 0.8.1 >Reporter: Jun Rao >Assignee: Mickael Maison > Labels: reliability > Fix For: 1.1.0 > > > When the controller is in the middle of processing a task (e.g., preferred > leader election, broker change), it holds a controller lock. During this > time, a broker could have de-registered and re-registered itself in ZK. After > the controller finishes processing the current task, it will start processing > the logic in the broker change listener. However, it will see no broker > change and therefore won't do anything to the restarted broker. This broker > will be in a weird state since the controller doesn't inform it to become the > leader of any partition. Yet, the cached metadata in other brokers could > still list that broker as the leader for some partitions. Client requests > routed to that broker will then get a TopicOrPartitionNotExistException. This > broker will continue to be in this bad state until it's restarted again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change
[ https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268731#comment-16268731 ] Edoardo Comar commented on KAFKA-1120: -- Interestingly using [~wushujames] script in [#comment-16110002] on a development laptop running trunk code : * with the suggested 2x5000 partitions, 2x replicated - the cluster is unstable, after resting idle, in a steady state for some 5-10 minutes, one or two of the brokers get disconnected from zookeeper, will reconnect and start a bounce where one or the other get out of sync * with lower number of partitions (eg 2500,3500) the above instability doesn't show but with either a controlled shudown with short timeout, or a ungraceful kill, followed by broker restart get the cluster back in sync without issues > Controller could miss a broker state change > > > Key: KAFKA-1120 > URL: https://issues.apache.org/jira/browse/KAFKA-1120 > Project: Kafka > Issue Type: Sub-task > Components: core >Affects Versions: 0.8.1 >Reporter: Jun Rao >Assignee: Mickael Maison > Labels: reliability > Fix For: 1.1.0 > > > When the controller is in the middle of processing a task (e.g., preferred > leader election, broker change), it holds a controller lock. During this > time, a broker could have de-registered and re-registered itself in ZK. After > the controller finishes processing the current task, it will start processing > the logic in the broker change listener. However, it will see no broker > change and therefore won't do anything to the restarted broker. This broker > will be in a weird state since the controller doesn't inform it to become the > leader of any partition. Yet, the cached metadata in other brokers could > still list that broker as the leader for some partitions. Client requests > routed to that broker will then get a TopicOrPartitionNotExistException. This > broker will continue to be in this bad state until it's restarted again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change
[ https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267123#comment-16267123 ] Jun Rao commented on KAFKA-1120: [~mimaison], thanks for helping out! > Controller could miss a broker state change > > > Key: KAFKA-1120 > URL: https://issues.apache.org/jira/browse/KAFKA-1120 > Project: Kafka > Issue Type: Sub-task > Components: core >Affects Versions: 0.8.1 >Reporter: Jun Rao >Assignee: Mickael Maison > Labels: reliability > Fix For: 1.1.0 > > > When the controller is in the middle of processing a task (e.g., preferred > leader election, broker change), it holds a controller lock. During this > time, a broker could have de-registered and re-registered itself in ZK. After > the controller finishes processing the current task, it will start processing > the logic in the broker change listener. However, it will see no broker > change and therefore won't do anything to the restarted broker. This broker > will be in a weird state since the controller doesn't inform it to become the > leader of any partition. Yet, the cached metadata in other brokers could > still list that broker as the leader for some partitions. Client requests > routed to that broker will then get a TopicOrPartitionNotExistException. This > broker will continue to be in this bad state until it's restarted again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change
[ https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267107#comment-16267107 ] Mickael Maison commented on KAFKA-1120: --- [~junrao] Thanks for the confirmation, in that case I'll grab it and start working on a fix ! > Controller could miss a broker state change > > > Key: KAFKA-1120 > URL: https://issues.apache.org/jira/browse/KAFKA-1120 > Project: Kafka > Issue Type: Sub-task > Components: core >Affects Versions: 0.8.1 >Reporter: Jun Rao > Labels: reliability > Fix For: 1.1.0 > > > When the controller is in the middle of processing a task (e.g., preferred > leader election, broker change), it holds a controller lock. During this > time, a broker could have de-registered and re-registered itself in ZK. After > the controller finishes processing the current task, it will start processing > the logic in the broker change listener. However, it will see no broker > change and therefore won't do anything to the restarted broker. This broker > will be in a weird state since the controller doesn't inform it to become the > leader of any partition. Yet, the cached metadata in other brokers could > still list that broker as the leader for some partitions. Client requests > routed to that broker will then get a TopicOrPartitionNotExistException. This > broker will continue to be in this bad state until it's restarted again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change
[ https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266652#comment-16266652 ] Mickael Maison commented on KAFKA-1120: --- We've hit this issue ([KAFKA-3944]) a number of times in our 0.10.2 clusters, when a broker is restarted too fast, it can receive a bunch of StopReplica requests from the controller which is still processing the ControllerShutdown request. As [~wushujames] has mentioned, pre 0.11 apart from waiting "long enough" there's unfortunately not really a way to be sure the controller is done processing the ControlledShutdown request. As far as I can tell this issue should still present in 1.0.0 (I haven't had a chance to try reproducing yet). If so, the solution suggested by [~onurkaraman] to use the broker's session id to prevent broker to process requests for the previous generation should work. > Controller could miss a broker state change > > > Key: KAFKA-1120 > URL: https://issues.apache.org/jira/browse/KAFKA-1120 > Project: Kafka > Issue Type: Sub-task > Components: core >Affects Versions: 0.8.1 >Reporter: Jun Rao > Labels: reliability > Fix For: 1.1.0 > > > When the controller is in the middle of processing a task (e.g., preferred > leader election, broker change), it holds a controller lock. During this > time, a broker could have de-registered and re-registered itself in ZK. After > the controller finishes processing the current task, it will start processing > the logic in the broker change listener. However, it will see no broker > change and therefore won't do anything to the restarted broker. This broker > will be in a weird state since the controller doesn't inform it to become the > leader of any partition. Yet, the cached metadata in other brokers could > still list that broker as the leader for some partitions. Client requests > routed to that broker will then get a TopicOrPartitionNotExistException. This > broker will continue to be in this bad state until it's restarted again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change
[ https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142409#comment-16142409 ] James Cheng commented on KAFKA-1120: I'd actually be more comfortable with simply delaying the startup of the broker, until I am certain that the controller is done processing the ControlledShutdown events. I can understand and explain that better, rather than understanding the intricacies of the controller. However, there is the question of how do I actually detect "that the controller is done processing the shutdown event". KAFKA-5135 would actually help me figure out whether the controller is still processing the shutdown event, but we aren't on 0.11 yet. So maybe it wouldn't be that easy. > Controller could miss a broker state change > > > Key: KAFKA-1120 > URL: https://issues.apache.org/jira/browse/KAFKA-1120 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8.1 >Reporter: Jun Rao > Labels: reliability > Fix For: 1.0.0 > > > When the controller is in the middle of processing a task (e.g., preferred > leader election, broker change), it holds a controller lock. During this > time, a broker could have de-registered and re-registered itself in ZK. After > the controller finishes processing the current task, it will start processing > the logic in the broker change listener. However, it will see no broker > change and therefore won't do anything to the restarted broker. This broker > will be in a weird state since the controller doesn't inform it to become the > leader of any partition. Yet, the cached metadata in other brokers could > still list that broker as the leader for some partitions. Client requests > routed to that broker will then get a TopicOrPartitionNotExistException. This > broker will continue to be in this bad state until it's restarted again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change
[ https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142407#comment-16142407 ] James Cheng commented on KAFKA-1120: It's not so rare in a cloud environment, where nodes may fail. What will happen during that time when a broker goes down hard and during the controller.socket.timeout.ms timeframe? Will the controller keep attempting to do thing using that channel (send LeaderAndIsr requests, etc) and simply wait until they timeout? > Controller could miss a broker state change > > > Key: KAFKA-1120 > URL: https://issues.apache.org/jira/browse/KAFKA-1120 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8.1 >Reporter: Jun Rao > Labels: reliability > Fix For: 1.0.0 > > > When the controller is in the middle of processing a task (e.g., preferred > leader election, broker change), it holds a controller lock. During this > time, a broker could have de-registered and re-registered itself in ZK. After > the controller finishes processing the current task, it will start processing > the logic in the broker change listener. However, it will see no broker > change and therefore won't do anything to the restarted broker. This broker > will be in a weird state since the controller doesn't inform it to become the > leader of any partition. Yet, the cached metadata in other brokers could > still list that broker as the leader for some partitions. Client requests > routed to that broker will then get a TopicOrPartitionNotExistException. This > broker will continue to be in this bad state until it's restarted again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change
[ https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142400#comment-16142400 ] Jun Rao commented on KAFKA-1120: Yes, that's the right metric. The only down side of increasing the timeout is the latency to detect TCP connection issue when a broker goes down hard (e.g. power outage) while the controller is communicating with the brokers. This should be rare though. > Controller could miss a broker state change > > > Key: KAFKA-1120 > URL: https://issues.apache.org/jira/browse/KAFKA-1120 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8.1 >Reporter: Jun Rao > Labels: reliability > Fix For: 1.0.0 > > > When the controller is in the middle of processing a task (e.g., preferred > leader election, broker change), it holds a controller lock. During this > time, a broker could have de-registered and re-registered itself in ZK. After > the controller finishes processing the current task, it will start processing > the logic in the broker change listener. However, it will see no broker > change and therefore won't do anything to the restarted broker. This broker > will be in a weird state since the controller doesn't inform it to become the > leader of any partition. Yet, the cached metadata in other brokers could > still list that broker as the leader for some partitions. Client requests > routed to that broker will then get a TopicOrPartitionNotExistException. This > broker will continue to be in this bad state until it's restarted again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change
[ https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142355#comment-16142355 ] James Cheng commented on KAFKA-1120: Do you mean this mbean? kafka.network:type=RequestMetrics,name=TotalTimeMs,request=ControlledShutdown [~junrao], I think you've mentioned before that controller.socket.timeout.ms applies to *all* broker-controller communication. So not just ControlledShutdown requests, but for LeaderAndIsr updates and stuff like that. I'm hesitant to touch that metric. Although, with high partition counts, would it be recommended? Most of those other requests are fairly quick requests, so I don't think they would benefit from increased socket timeouts. But then again, the increased timeout wouldn't hurt them either. I think ControlledShutdown is one of the few synchronous operations, right? > Controller could miss a broker state change > > > Key: KAFKA-1120 > URL: https://issues.apache.org/jira/browse/KAFKA-1120 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8.1 >Reporter: Jun Rao > Labels: reliability > Fix For: 1.0.0 > > > When the controller is in the middle of processing a task (e.g., preferred > leader election, broker change), it holds a controller lock. During this > time, a broker could have de-registered and re-registered itself in ZK. After > the controller finishes processing the current task, it will start processing > the logic in the broker change listener. However, it will see no broker > change and therefore won't do anything to the restarted broker. This broker > will be in a weird state since the controller doesn't inform it to become the > leader of any partition. Yet, the cached metadata in other brokers could > still list that broker as the leader for some partitions. Client requests > routed to that broker will then get a TopicOrPartitionNotExistException. This > broker will continue to be in this bad state until it's restarted again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change
[ https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142304#comment-16142304 ] James Cheng commented on KAFKA-1120: Onur, while you are figuring out a fix, can you recommend anything we can do to avoid triggering the scenario? We have high partition counts, and so are encountering long ControlledShutdown events, which means we are hitting this quite often. It seems like the bug gets triggered when the broker gets back into zookeeper before the controlled shutdown is finished being processed. So, I either make ControlledShutdown get processed faster, or I make it so that the broker gets into zookeeper slower. Would that be right? I can make ControlledShutdown get processed faster by reducing partition count, for example. I can make the broker get into zookeeper slower either by making sure it takes longer to shutdown (increasing controlled.shutdown.retry.backoff.ms?) or delaying startup ("sleep 60 && ./bin/kafka-server-start.sh") > Controller could miss a broker state change > > > Key: KAFKA-1120 > URL: https://issues.apache.org/jira/browse/KAFKA-1120 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8.1 >Reporter: Jun Rao > Labels: reliability > Fix For: 1.0.0 > > > When the controller is in the middle of processing a task (e.g., preferred > leader election, broker change), it holds a controller lock. During this > time, a broker could have de-registered and re-registered itself in ZK. After > the controller finishes processing the current task, it will start processing > the logic in the broker change listener. However, it will see no broker > change and therefore won't do anything to the restarted broker. This broker > will be in a weird state since the controller doesn't inform it to become the > leader of any partition. Yet, the cached metadata in other brokers could > still list that broker as the leader for some partitions. Client requests > routed to that broker will then get a TopicOrPartitionNotExistException. This > broker will continue to be in this bad state until it's restarted again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change
[ https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114911#comment-16114911 ] Onur Karaman commented on KAFKA-1120: - [~wushujames] I think Jun's comments and the redesign doc in KAFKA-5027 are sort of saying the same thing. The broker-generation concept has two use cases which was sort of implied: 1. the controller using broker generations to distinguish events from a broker across generations. 2. controller-to-broker requests should include broker generation so that brokers can ignore requests that applied to its former generation. While I think czxid's will work for the 1st use case, I don't think we can naively reuse czxid for the 2nd use case. The reason is a bit silly: zookeeper's CreateResponse only provides the path. It doesn't provide the created znode's Stat, So you have to do a later lookup to find out the znode's czxid. If we want to solve both use cases with the same approach, I think we have a couple of options: 1. maybe we can get away with using czxids by doing a multi-op when registering brokers to transactionally create a znode and read that same znode to read the czxid of the znode it just created. 2. we can instead use the session id as the broker generation. The controller can infer the broker's generation by observing the broker znode's ephemeralOwner property. Brokers can determine their generation id by looking up the underlying zookeeper client's session id which is just ZooKeeper.getSessionId(). The ephemeralOwner of an ephemeral znode its the client's session id which is why this would work. > Controller could miss a broker state change > > > Key: KAFKA-1120 > URL: https://issues.apache.org/jira/browse/KAFKA-1120 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8.1 >Reporter: Jun Rao > Labels: reliability > Fix For: 1.0.0 > > > When the controller is in the middle of processing a task (e.g., preferred > leader election, broker change), it holds a controller lock. During this > time, a broker could have de-registered and re-registered itself in ZK. After > the controller finishes processing the current task, it will start processing > the logic in the broker change listener. However, it will see no broker > change and therefore won't do anything to the restarted broker. This broker > will be in a weird state since the controller doesn't inform it to become the > leader of any partition. Yet, the cached metadata in other brokers could > still list that broker as the leader for some partitions. Client requests > routed to that broker will then get a TopicOrPartitionNotExistException. This > broker will continue to be in this bad state until it's restarted again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change
[ https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114678#comment-16114678 ] James Cheng commented on KAFKA-1120: [~onurkaraman], what do you think of Jun's suggestions in his [10/Jun/16 16:28|#comment-15325496] and [15/Dec/16 02:10|#comment-15750982] comments above? > Controller could miss a broker state change > > > Key: KAFKA-1120 > URL: https://issues.apache.org/jira/browse/KAFKA-1120 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8.1 >Reporter: Jun Rao > Labels: reliability > Fix For: 1.0.0 > > > When the controller is in the middle of processing a task (e.g., preferred > leader election, broker change), it holds a controller lock. During this > time, a broker could have de-registered and re-registered itself in ZK. After > the controller finishes processing the current task, it will start processing > the logic in the broker change listener. However, it will see no broker > change and therefore won't do anything to the restarted broker. This broker > will be in a weird state since the controller doesn't inform it to become the > leader of any partition. Yet, the cached metadata in other brokers could > still list that broker as the leader for some partitions. Client requests > routed to that broker will then get a TopicOrPartitionNotExistException. This > broker will continue to be in this bad state until it's restarted again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change
[ https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113645#comment-16113645 ] Onur Karaman commented on KAFKA-1120: - Alright I might know what's happening. Here's the red flag: {code} > grep -r "Newly added brokers" . ./kafka_2.11-0.11.0.0/logs/controller.log:[2017-08-03 13:40:09,121] INFO [Controller 1]: Newly added brokers: 1, deleted brokers: , all live brokers: 1 (kafka.controller.KafkaController) ./kafka_2.11-0.11.0.0/logs/controller.log:[2017-08-03 13:40:27,172] INFO [Controller 1]: Newly added brokers: 2, deleted brokers: , all live brokers: 1,2 (kafka.controller.KafkaController) ./kafka_2.11-0.11.0.0/logs/controller.log:[2017-08-03 13:47:15,215] INFO [Controller 1]: Newly added brokers: , deleted brokers: , all live brokers: 1,2 (kafka.controller.KafkaController) ./kafka_2.11-0.11.0.0/logs/controller.log:[2017-08-03 13:47:17,927] INFO [Controller 1]: Newly added brokers: , deleted brokers: , all live brokers: 1,2 (kafka.controller.KafkaController) {code} Here's the relevant code in BrokerChange.process: {code} val curBrokers = zkUtils.getAllBrokersInCluster().toSet val curBrokerIds = curBrokers.map(_.id) val liveOrShuttingDownBrokerIds = controllerContext.liveOrShuttingDownBrokerIds val newBrokerIds = curBrokerIds -- liveOrShuttingDownBrokerIds val deadBrokerIds = liveOrShuttingDownBrokerIds -- curBrokerIds {code} Basically the ControlledShutdown event took so long to process that the BrokerChange corresponding to the killed broker (3rd BrokerChange in the above snippet) and BrokerChange corresponding to the restarted broker (4th BrokerChange in the above snippet) are queued up waiting for ControlledShutdown's completion. By the time these BrokerChange events get processed, the restarted broker is already registered in zookeeper, causing the broker to appear in both controllerContext.liveOrShuttingDownBrokerIds and the brokers listed in zookeeper. This means the controller will not execute the onBrokerFailure in the 3rd BrokerChange and will also not execute onBrokerJoin in the 4th BrokerChange. I'm not sure of the fix. Broker generations as defined in the redesign doc in KAFKA-5027 would work but I'm not sure if it's strictly required. > Controller could miss a broker state change > > > Key: KAFKA-1120 > URL: https://issues.apache.org/jira/browse/KAFKA-1120 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8.1 >Reporter: Jun Rao > Labels: reliability > Fix For: 1.0.0 > > > When the controller is in the middle of processing a task (e.g., preferred > leader election, broker change), it holds a controller lock. During this > time, a broker could have de-registered and re-registered itself in ZK. After > the controller finishes processing the current task, it will start processing > the logic in the broker change listener. However, it will see no broker > change and therefore won't do anything to the restarted broker. This broker > will be in a weird state since the controller doesn't inform it to become the > leader of any partition. Yet, the cached metadata in other brokers could > still list that broker as the leader for some partitions. Client requests > routed to that broker will then get a TopicOrPartitionNotExistException. This > broker will continue to be in this bad state until it's restarted again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change
[ https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110205#comment-16110205 ] Onur Karaman commented on KAFKA-1120: - Thanks [~wushujames] that is perfect. I can reproduce the problem. > Controller could miss a broker state change > > > Key: KAFKA-1120 > URL: https://issues.apache.org/jira/browse/KAFKA-1120 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8.1 >Reporter: Jun Rao > Labels: reliability > Fix For: 1.0.0 > > > When the controller is in the middle of processing a task (e.g., preferred > leader election, broker change), it holds a controller lock. During this > time, a broker could have de-registered and re-registered itself in ZK. After > the controller finishes processing the current task, it will start processing > the logic in the broker change listener. However, it will see no broker > change and therefore won't do anything to the restarted broker. This broker > will be in a weird state since the controller doesn't inform it to become the > leader of any partition. Yet, the cached metadata in other brokers could > still list that broker as the leader for some partitions. Client requests > routed to that broker will then get a TopicOrPartitionNotExistException. This > broker will continue to be in this bad state until it's restarted again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change
[ https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110002#comment-16110002 ] James Cheng commented on KAFKA-1120: [~onurkaraman], try this? I made it as copy/pasteable as possible. {code} wget http://apache.ip-guide.com/kafka/0.11.0.0/kafka_2.11-0.11.0.0.tgz tar xvfz kafka_2.11-0.11.0.0.tgz cp -r kafka_2.11-0.11.0.0 kafka_2.11-0.11.0.0_2 KAFKA_DIR=kafka_2.11-0.11.0.0 ( cd ${KAFKA_DIR}; cp config/server.properties config/server1.properties; echo >> config/server1.properties; echo broker.id=1 >> config/server1.properties; echo delete.topic.enable=true >> config/server1.properties; echo listeners=PLAINTEXT://:9092 >> config/server1.properties; echo log.dirs=/tmp/kafka-logs1 >> config/server1.properties; echo log.index.size.max.bytes=10 >> config/server1.properties; echo controlled.shutdown.max.retries=1 >> config/server1.properties; ) KAFKA_DIR=kafka_2.11-0.11.0.0_2 ( cd ${KAFKA_DIR}; cp config/server.properties config/server2.properties; echo >> config/server2.properties; echo broker.id=2 >> config/server2.properties; echo delete.topic.enable=true >> config/server2.properties; echo listeners=PLAINTEXT://:9093 >> config/server2.properties; echo log.dirs=/tmp/kafka-logs2 >> config/server2.properties; echo log.index.size.max.bytes=10 >> config/server2.properties; echo controlled.shutdown.max.retries=1 >> config/server2.properties; ) # Start zookeeper and kafka brokers # In terminal 1, zookeeper cd kafka_2.11-0.11.0.0 ./bin/zookeeper-server-start.sh config/zookeeper.properties # In terminal 2, broker 1 cd kafka_2.11-0.11.0.0 ./bin/kafka-server-start.sh config/server1.properties # In terminal 3, broker 2 cd kafka_2.11-0.11.0.0_2 ./bin/kafka-server-start.sh config/server2.properties # Create topics cd kafka_2.11-0.11.0.0 # create 5000 partitions, all leaders on broker 1 ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic leader1 --replica-assignment `echo -n 1:2; for i in \`seq 4999\`; do echo -n ,1:2; done` # create 5000 partitions, all leaders on broker 2 ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic leader2 --replica-assignment `echo -n 2:1; for i in \`seq 4999\`; do echo -n ,2:1; done` # Wait until topics are fully created # Stop and immediately start a broker 1. Verify that broker1 is the controller 2. Ctrl-C on broker2. Broker 2 will attempt controlled shutdown and will give up after 30 seconds. 3. The instant it exits, restart broker2. 4. Wait until both brokers have settled down and controller has settled down. Notice that broker2 is missing from lots of ISRs. {code} > Controller could miss a broker state change > > > Key: KAFKA-1120 > URL: https://issues.apache.org/jira/browse/KAFKA-1120 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8.1 >Reporter: Jun Rao > Labels: reliability > Fix For: 1.0.0 > > > When the controller is in the middle of processing a task (e.g., preferred > leader election, broker change), it holds a controller lock. During this > time, a broker could have de-registered and re-registered itself in ZK. After > the controller finishes processing the current task, it will start processing > the logic in the broker change listener. However, it will see no broker > change and therefore won't do anything to the restarted broker. This broker > will be in a weird state since the controller doesn't inform it to become the > leader of any partition. Yet, the cached metadata in other brokers could > still list that broker as the leader for some partitions. Client requests > routed to that broker will then get a TopicOrPartitionNotExistException. This > broker will continue to be in this bad state until it's restarted again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change
[ https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108543#comment-16108543 ] Ismael Juma commented on KAFKA-1120: I tentatively assigned this to 1.0.0 so that we investigate the root cause. cc [~onurkaraman] > Controller could miss a broker state change > > > Key: KAFKA-1120 > URL: https://issues.apache.org/jira/browse/KAFKA-1120 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8.1 >Reporter: Jun Rao > Labels: reliability > Fix For: 1.0.0 > > > When the controller is in the middle of processing a task (e.g., preferred > leader election, broker change), it holds a controller lock. During this > time, a broker could have de-registered and re-registered itself in ZK. After > the controller finishes processing the current task, it will start processing > the logic in the broker change listener. However, it will see no broker > change and therefore won't do anything to the restarted broker. This broker > will be in a weird state since the controller doesn't inform it to become the > leader of any partition. Yet, the cached metadata in other brokers could > still list that broker as the leader for some partitions. Client requests > routed to that broker will then get a TopicOrPartitionNotExistException. This > broker will continue to be in this bad state until it's restarted again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change
[ https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107857#comment-16107857 ] James Cheng commented on KAFKA-1120: Hi, I retested this will Kafka 0.11. The problem still exists. I followed the steps from my 24/Feb/17 22:57 comment. I ran it maybe 10 times in a row. Every single time, the broker that I restarted came back up and did not take leadership for any partitions. In addition, it only became a follower for about half the partitions. The fact that it became follower for half the partitions shows that the controller is at least aware that the broker exists (that is, the controller successfully saw the broker come back online.). But the controller didn't tell the broker to follow all the partitions that it should have. > Controller could miss a broker state change > > > Key: KAFKA-1120 > URL: https://issues.apache.org/jira/browse/KAFKA-1120 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8.1 >Reporter: Jun Rao > Labels: reliability > > When the controller is in the middle of processing a task (e.g., preferred > leader election, broker change), it holds a controller lock. During this > time, a broker could have de-registered and re-registered itself in ZK. After > the controller finishes processing the current task, it will start processing > the logic in the broker change listener. However, it will see no broker > change and therefore won't do anything to the restarted broker. This broker > will be in a weird state since the controller doesn't inform it to become the > leader of any partition. Yet, the cached metadata in other brokers could > still list that broker as the leader for some partitions. Client requests > routed to that broker will then get a TopicOrPartitionNotExistException. This > broker will continue to be in this bad state until it's restarted again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change
[ https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16072663#comment-16072663 ] Andrew Olson commented on KAFKA-1120: - [~wushujames] Can you retest with Kafka 0.11 to see if KAFKA-1211 resolves this problem? > Controller could miss a broker state change > > > Key: KAFKA-1120 > URL: https://issues.apache.org/jira/browse/KAFKA-1120 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8.1 >Reporter: Jun Rao > Labels: reliability > > When the controller is in the middle of processing a task (e.g., preferred > leader election, broker change), it holds a controller lock. During this > time, a broker could have de-registered and re-registered itself in ZK. After > the controller finishes processing the current task, it will start processing > the logic in the broker change listener. However, it will see no broker > change and therefore won't do anything to the restarted broker. This broker > will be in a weird state since the controller doesn't inform it to become the > leader of any partition. Yet, the cached metadata in other brokers could > still list that broker as the leader for some partitions. Client requests > routed to that broker will then get a TopicOrPartitionNotExistException. This > broker will continue to be in this bad state until it's restarted again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)