Re: Facing new leader election issues in zookeeper-less kafka cluster having version 3.3.2
Thanks Divij, I will check further. --- Thanks & Regards, Kunal Jadhav On Fri, Apr 14, 2023 at 4:25 PM Divij Vaidya wrote: > Hey Kunal > > We would need more information to debug your scenario since there are no > known bugs (AFAIK) in 3.3.2 associated with leader election. > > At a very high level, the ideal sequence of events should be as follows: > 1. When the existing leader shuts down, it will stop sending requests for > heartbeat/metadata to the controller. > 2. Controller will detect that it hasn't received a heartbeat from a broker > for > broker.heartbeat.interval.ms (defaults to 2s). > 3. Controller will elect a new leader and send LeadershipAndISR requests to > other brokers in the ISR, one of them will be elected as a leader. > > You should be able to look at the state change logs and verify the sequence > of events. In case your controller resides on the same machines as the > leader in step 1, there will be a controller failover first followed by the > sequence of events described above. > > Could you please tell us the sequence of events by looking at your state > change logs? I would also look at controller logs to ensure that it is > actually performing a leader failover. > > Also, how are you checking that a leader is not elected? Could it be that > the partition is under-replicated or below ISR and that is why you aren't > able to produce/consume from it but it still has a leader? > > -- > Divij Vaidya > > > > On Fri, Apr 14, 2023 at 12:32 PM Kunal Jadhav > wrote: > > > Hello All, > > > > We have implemented 3 brokers cluster on a single node server in the > > kubernetes environment, which is a zookeeper-less cluster having kafka > > version 3.32. And facing one issue like when the existing leader broker > > gets down then the new leader is not elected. We have faced this issue > > several times and always need to restart the cluster. So please help me > to > > solve this problem. Thanks in advance. > > > > --- > > Thanks & Regards, > > Kunal Jadhav > > >
Re: Facing new leader election issues in zookeeper-less kafka cluster having version 3.3.2
Hey Kunal We would need more information to debug your scenario since there are no known bugs (AFAIK) in 3.3.2 associated with leader election. At a very high level, the ideal sequence of events should be as follows: 1. When the existing leader shuts down, it will stop sending requests for heartbeat/metadata to the controller. 2. Controller will detect that it hasn't received a heartbeat from a broker for > broker.heartbeat.interval.ms (defaults to 2s). 3. Controller will elect a new leader and send LeadershipAndISR requests to other brokers in the ISR, one of them will be elected as a leader. You should be able to look at the state change logs and verify the sequence of events. In case your controller resides on the same machines as the leader in step 1, there will be a controller failover first followed by the sequence of events described above. Could you please tell us the sequence of events by looking at your state change logs? I would also look at controller logs to ensure that it is actually performing a leader failover. Also, how are you checking that a leader is not elected? Could it be that the partition is under-replicated or below ISR and that is why you aren't able to produce/consume from it but it still has a leader? -- Divij Vaidya On Fri, Apr 14, 2023 at 12:32 PM Kunal Jadhav wrote: > Hello All, > > We have implemented 3 brokers cluster on a single node server in the > kubernetes environment, which is a zookeeper-less cluster having kafka > version 3.32. And facing one issue like when the existing leader broker > gets down then the new leader is not elected. We have faced this issue > several times and always need to restart the cluster. So please help me to > solve this problem. Thanks in advance. > > --- > Thanks & Regards, > Kunal Jadhav >
Facing new leader election issues in zookeeper-less kafka cluster having version 3.3.2
Hello All, We have implemented 3 brokers cluster on a single node server in the kubernetes environment, which is a zookeeper-less cluster having kafka version 3.32. And facing one issue like when the existing leader broker gets down then the new leader is not elected. We have faced this issue several times and always need to restart the cluster. So please help me to solve this problem. Thanks in advance. --- Thanks & Regards, Kunal Jadhav
Re: Leader election strategy
Hi Pierre, You may try to use cruise control: https://github.com/linkedin/cruise-control I didn't try it yet but it has task which may help you to auto-balance partitions in the cluster. BR, Mikhail On Tue, Nov 15, 2022 at 3:17 PM sunil chaudhari wrote: > Hi, > Use confluent. It has auto balancing feature. > You dont need to do these manual things. > > > On Tue, 15 Nov 2022 at 7:22 PM, Pierre Coquentin < > pierre.coquen...@gmail.com> > wrote: > > > Hello Luke, and thank you for your answer. > > What I would have hoped for is something more automatic, something that > > will spread the load when a Kafka broker goes down without any human > > intervention. The reassign script is a bit complicated, you need to > > generate the topics and partitions list, then get the current assignment > > and rework it to force a new leader. > > > > On Tue, Nov 15, 2022 at 5:18 AM Luke Chen wrote: > > > > > Hi Pierre, > > > > > > Try using kafka-reassign-partitions.sh to reassign partitions to > > different > > > replicas you like. > > > ref: https://kafka.apache.org/documentation/#basic_ops_automigrate > > > > > > Luke > > > > > > On Mon, Nov 14, 2022 at 3:55 PM Pierre Coquentin < > > > pierre.coquen...@gmail.com> > > > wrote: > > > > > > > Hello, > > > > We have a Kafka cluster (2.4.1) with a replication factor of 3. I > > notice > > > > when we stop a broker that only one broker takes all the load from > the > > > > missing broker and becomes the leader to all partitions. > > > > I would have thought that Kafka would split the load evenly among the > > > > remaining brokers. > > > > > > > > So if I have this kind of configuration > > > > Topic: test > > > > Partition 0 - Leader: 1 - Replicas: 1,2,3 - Isr: 1,2,3 > > > > Partition 1 - Leader: 2 - Replicas: 2,3,1 - Isr: 1,2,3 > > > > Partition 2 - Leader: 3 - Replicas: 3,1,2 - Isr: 1,2,3 > > > > Partition 3 - Leader: 1 - Replicas: 1,2,3 - Isr: 1,2,3 > > > > Partition 4 - Leader: 2 - Replicas: 2,3,1 - Isr: 1,2,3 > > > > Partition 5 - Leader: 3 - Replicas: 3,1,2 - Isr: 1,2,3 > > > > > > > > If I stop broker 1, I want something like this (load is split evenly > > > among > > > > broker 2 and 3): > > > > Topic: test > > > > Partition 0 - Leader: 2 - Replicas: 1,2,3 - Isr: 2,3 > > > > Partition 1 - Leader: 2 - Replicas: 2,3,1 - Isr: 2,3 > > > > Partition 2 - Leader: 3 - Replicas: 3,1,2 - Isr: 2,3 > > > > Partition 3 - Leader: 3 - Replicas: 1,2,3 - Isr: 2,3 > > > > Partition 4 - Leader: 2 - Replicas: 2,3,1 - Isr: 2,3 > > > > Partition 5 - Leader: 3 - Replicas: 3,1,2 - Isr: 2,3 > > > > > > > > What I observe is currently this (broker 2 takes all the load from > > broker > > > > 1): > > > > Partition 0 - Leader: 2 - Replicas: 1,2,3 - Isr: 2,3 > > > > Partition 1 - Leader: 2 - Replicas: 2,3,1 - Isr: 2,3 > > > > Partition 2 - Leader: 3 - Replicas: 3,1,2 - Isr: 2,3 > > > > Partition 3 - Leader: 2 - Replicas: 1,2,3 - Isr: 2,3 > > > > Partition 4 - Leader: 2 - Replicas: 2,3,1 - Isr: 2,3 > > > > Partition 5 - Leader: 3 - Replicas: 3,1,2 - Isr: 2,3 > > > > > > > > My concern here is that at all times, a broker should not exceed 50% > of > > > its > > > > network bandwidth which could be a problem in my case. > > > > Is there a way to change this behavior (manually by forcing a leader, > > > > programmatically, or by configuration)? > > > > From my understanding, the script kafka-leader-election.sh allows > only > > to > > > > set the preferred (the first in the list of replicas) or uncleaned > > > > (replicas not in sync can become a leader). > > > > Regards, > > > > > > > > Pierre > > > > > > > > > >
Re: Leader election strategy
Hi, Use confluent. It has auto balancing feature. You dont need to do these manual things. On Tue, 15 Nov 2022 at 7:22 PM, Pierre Coquentin wrote: > Hello Luke, and thank you for your answer. > What I would have hoped for is something more automatic, something that > will spread the load when a Kafka broker goes down without any human > intervention. The reassign script is a bit complicated, you need to > generate the topics and partitions list, then get the current assignment > and rework it to force a new leader. > > On Tue, Nov 15, 2022 at 5:18 AM Luke Chen wrote: > > > Hi Pierre, > > > > Try using kafka-reassign-partitions.sh to reassign partitions to > different > > replicas you like. > > ref: https://kafka.apache.org/documentation/#basic_ops_automigrate > > > > Luke > > > > On Mon, Nov 14, 2022 at 3:55 PM Pierre Coquentin < > > pierre.coquen...@gmail.com> > > wrote: > > > > > Hello, > > > We have a Kafka cluster (2.4.1) with a replication factor of 3. I > notice > > > when we stop a broker that only one broker takes all the load from the > > > missing broker and becomes the leader to all partitions. > > > I would have thought that Kafka would split the load evenly among the > > > remaining brokers. > > > > > > So if I have this kind of configuration > > > Topic: test > > > Partition 0 - Leader: 1 - Replicas: 1,2,3 - Isr: 1,2,3 > > > Partition 1 - Leader: 2 - Replicas: 2,3,1 - Isr: 1,2,3 > > > Partition 2 - Leader: 3 - Replicas: 3,1,2 - Isr: 1,2,3 > > > Partition 3 - Leader: 1 - Replicas: 1,2,3 - Isr: 1,2,3 > > > Partition 4 - Leader: 2 - Replicas: 2,3,1 - Isr: 1,2,3 > > > Partition 5 - Leader: 3 - Replicas: 3,1,2 - Isr: 1,2,3 > > > > > > If I stop broker 1, I want something like this (load is split evenly > > among > > > broker 2 and 3): > > > Topic: test > > > Partition 0 - Leader: 2 - Replicas: 1,2,3 - Isr: 2,3 > > > Partition 1 - Leader: 2 - Replicas: 2,3,1 - Isr: 2,3 > > > Partition 2 - Leader: 3 - Replicas: 3,1,2 - Isr: 2,3 > > > Partition 3 - Leader: 3 - Replicas: 1,2,3 - Isr: 2,3 > > > Partition 4 - Leader: 2 - Replicas: 2,3,1 - Isr: 2,3 > > > Partition 5 - Leader: 3 - Replicas: 3,1,2 - Isr: 2,3 > > > > > > What I observe is currently this (broker 2 takes all the load from > broker > > > 1): > > > Partition 0 - Leader: 2 - Replicas: 1,2,3 - Isr: 2,3 > > > Partition 1 - Leader: 2 - Replicas: 2,3,1 - Isr: 2,3 > > > Partition 2 - Leader: 3 - Replicas: 3,1,2 - Isr: 2,3 > > > Partition 3 - Leader: 2 - Replicas: 1,2,3 - Isr: 2,3 > > > Partition 4 - Leader: 2 - Replicas: 2,3,1 - Isr: 2,3 > > > Partition 5 - Leader: 3 - Replicas: 3,1,2 - Isr: 2,3 > > > > > > My concern here is that at all times, a broker should not exceed 50% of > > its > > > network bandwidth which could be a problem in my case. > > > Is there a way to change this behavior (manually by forcing a leader, > > > programmatically, or by configuration)? > > > From my understanding, the script kafka-leader-election.sh allows only > to > > > set the preferred (the first in the list of replicas) or uncleaned > > > (replicas not in sync can become a leader). > > > Regards, > > > > > > Pierre > > > > > >
Re: Leader election strategy
Hello Luke, and thank you for your answer. What I would have hoped for is something more automatic, something that will spread the load when a Kafka broker goes down without any human intervention. The reassign script is a bit complicated, you need to generate the topics and partitions list, then get the current assignment and rework it to force a new leader. On Tue, Nov 15, 2022 at 5:18 AM Luke Chen wrote: > Hi Pierre, > > Try using kafka-reassign-partitions.sh to reassign partitions to different > replicas you like. > ref: https://kafka.apache.org/documentation/#basic_ops_automigrate > > Luke > > On Mon, Nov 14, 2022 at 3:55 PM Pierre Coquentin < > pierre.coquen...@gmail.com> > wrote: > > > Hello, > > We have a Kafka cluster (2.4.1) with a replication factor of 3. I notice > > when we stop a broker that only one broker takes all the load from the > > missing broker and becomes the leader to all partitions. > > I would have thought that Kafka would split the load evenly among the > > remaining brokers. > > > > So if I have this kind of configuration > > Topic: test > > Partition 0 - Leader: 1 - Replicas: 1,2,3 - Isr: 1,2,3 > > Partition 1 - Leader: 2 - Replicas: 2,3,1 - Isr: 1,2,3 > > Partition 2 - Leader: 3 - Replicas: 3,1,2 - Isr: 1,2,3 > > Partition 3 - Leader: 1 - Replicas: 1,2,3 - Isr: 1,2,3 > > Partition 4 - Leader: 2 - Replicas: 2,3,1 - Isr: 1,2,3 > > Partition 5 - Leader: 3 - Replicas: 3,1,2 - Isr: 1,2,3 > > > > If I stop broker 1, I want something like this (load is split evenly > among > > broker 2 and 3): > > Topic: test > > Partition 0 - Leader: 2 - Replicas: 1,2,3 - Isr: 2,3 > > Partition 1 - Leader: 2 - Replicas: 2,3,1 - Isr: 2,3 > > Partition 2 - Leader: 3 - Replicas: 3,1,2 - Isr: 2,3 > > Partition 3 - Leader: 3 - Replicas: 1,2,3 - Isr: 2,3 > > Partition 4 - Leader: 2 - Replicas: 2,3,1 - Isr: 2,3 > > Partition 5 - Leader: 3 - Replicas: 3,1,2 - Isr: 2,3 > > > > What I observe is currently this (broker 2 takes all the load from broker > > 1): > > Partition 0 - Leader: 2 - Replicas: 1,2,3 - Isr: 2,3 > > Partition 1 - Leader: 2 - Replicas: 2,3,1 - Isr: 2,3 > > Partition 2 - Leader: 3 - Replicas: 3,1,2 - Isr: 2,3 > > Partition 3 - Leader: 2 - Replicas: 1,2,3 - Isr: 2,3 > > Partition 4 - Leader: 2 - Replicas: 2,3,1 - Isr: 2,3 > > Partition 5 - Leader: 3 - Replicas: 3,1,2 - Isr: 2,3 > > > > My concern here is that at all times, a broker should not exceed 50% of > its > > network bandwidth which could be a problem in my case. > > Is there a way to change this behavior (manually by forcing a leader, > > programmatically, or by configuration)? > > From my understanding, the script kafka-leader-election.sh allows only to > > set the preferred (the first in the list of replicas) or uncleaned > > (replicas not in sync can become a leader). > > Regards, > > > > Pierre > > >
Re: Leader election strategy
Hi Pierre, Try using kafka-reassign-partitions.sh to reassign partitions to different replicas you like. ref: https://kafka.apache.org/documentation/#basic_ops_automigrate Luke On Mon, Nov 14, 2022 at 3:55 PM Pierre Coquentin wrote: > Hello, > We have a Kafka cluster (2.4.1) with a replication factor of 3. I notice > when we stop a broker that only one broker takes all the load from the > missing broker and becomes the leader to all partitions. > I would have thought that Kafka would split the load evenly among the > remaining brokers. > > So if I have this kind of configuration > Topic: test > Partition 0 - Leader: 1 - Replicas: 1,2,3 - Isr: 1,2,3 > Partition 1 - Leader: 2 - Replicas: 2,3,1 - Isr: 1,2,3 > Partition 2 - Leader: 3 - Replicas: 3,1,2 - Isr: 1,2,3 > Partition 3 - Leader: 1 - Replicas: 1,2,3 - Isr: 1,2,3 > Partition 4 - Leader: 2 - Replicas: 2,3,1 - Isr: 1,2,3 > Partition 5 - Leader: 3 - Replicas: 3,1,2 - Isr: 1,2,3 > > If I stop broker 1, I want something like this (load is split evenly among > broker 2 and 3): > Topic: test > Partition 0 - Leader: 2 - Replicas: 1,2,3 - Isr: 2,3 > Partition 1 - Leader: 2 - Replicas: 2,3,1 - Isr: 2,3 > Partition 2 - Leader: 3 - Replicas: 3,1,2 - Isr: 2,3 > Partition 3 - Leader: 3 - Replicas: 1,2,3 - Isr: 2,3 > Partition 4 - Leader: 2 - Replicas: 2,3,1 - Isr: 2,3 > Partition 5 - Leader: 3 - Replicas: 3,1,2 - Isr: 2,3 > > What I observe is currently this (broker 2 takes all the load from broker > 1): > Partition 0 - Leader: 2 - Replicas: 1,2,3 - Isr: 2,3 > Partition 1 - Leader: 2 - Replicas: 2,3,1 - Isr: 2,3 > Partition 2 - Leader: 3 - Replicas: 3,1,2 - Isr: 2,3 > Partition 3 - Leader: 2 - Replicas: 1,2,3 - Isr: 2,3 > Partition 4 - Leader: 2 - Replicas: 2,3,1 - Isr: 2,3 > Partition 5 - Leader: 3 - Replicas: 3,1,2 - Isr: 2,3 > > My concern here is that at all times, a broker should not exceed 50% of its > network bandwidth which could be a problem in my case. > Is there a way to change this behavior (manually by forcing a leader, > programmatically, or by configuration)? > From my understanding, the script kafka-leader-election.sh allows only to > set the preferred (the first in the list of replicas) or uncleaned > (replicas not in sync can become a leader). > Regards, > > Pierre >
Leader election strategy
Hello, We have a Kafka cluster (2.4.1) with a replication factor of 3. I notice when we stop a broker that only one broker takes all the load from the missing broker and becomes the leader to all partitions. I would have thought that Kafka would split the load evenly among the remaining brokers. So if I have this kind of configuration Topic: test Partition 0 - Leader: 1 - Replicas: 1,2,3 - Isr: 1,2,3 Partition 1 - Leader: 2 - Replicas: 2,3,1 - Isr: 1,2,3 Partition 2 - Leader: 3 - Replicas: 3,1,2 - Isr: 1,2,3 Partition 3 - Leader: 1 - Replicas: 1,2,3 - Isr: 1,2,3 Partition 4 - Leader: 2 - Replicas: 2,3,1 - Isr: 1,2,3 Partition 5 - Leader: 3 - Replicas: 3,1,2 - Isr: 1,2,3 If I stop broker 1, I want something like this (load is split evenly among broker 2 and 3): Topic: test Partition 0 - Leader: 2 - Replicas: 1,2,3 - Isr: 2,3 Partition 1 - Leader: 2 - Replicas: 2,3,1 - Isr: 2,3 Partition 2 - Leader: 3 - Replicas: 3,1,2 - Isr: 2,3 Partition 3 - Leader: 3 - Replicas: 1,2,3 - Isr: 2,3 Partition 4 - Leader: 2 - Replicas: 2,3,1 - Isr: 2,3 Partition 5 - Leader: 3 - Replicas: 3,1,2 - Isr: 2,3 What I observe is currently this (broker 2 takes all the load from broker 1): Partition 0 - Leader: 2 - Replicas: 1,2,3 - Isr: 2,3 Partition 1 - Leader: 2 - Replicas: 2,3,1 - Isr: 2,3 Partition 2 - Leader: 3 - Replicas: 3,1,2 - Isr: 2,3 Partition 3 - Leader: 2 - Replicas: 1,2,3 - Isr: 2,3 Partition 4 - Leader: 2 - Replicas: 2,3,1 - Isr: 2,3 Partition 5 - Leader: 3 - Replicas: 3,1,2 - Isr: 2,3 My concern here is that at all times, a broker should not exceed 50% of its network bandwidth which could be a problem in my case. Is there a way to change this behavior (manually by forcing a leader, programmatically, or by configuration)? >From my understanding, the script kafka-leader-election.sh allows only to set the preferred (the first in the list of replicas) or uncleaned (replicas not in sync can become a leader). Regards, Pierre
Replica leader election stops when zk connection is re-established
Hi all! We have a problem with 6 of our Kafka clusters since we upgraded to 2.8.0 from 2.3.1 a few months back. A seventh cluster is still on 2.3.1 and never had this problem. The cluster runs fine for a random period, days, or weeks. Suddenly when creating new topics, they never get assigned partitions. It gets no ISR, and leader is "none". When using the zkCli to browse the topic it has no partitions. When this happens, we have been forced to restart the Kafka service on the "controller" host that will cause a new controller to be elected and that solves the problem. I've found out that after the Zookeeper leader host rebooted, the Kafka "Controller" host stopped with "Processing automatic preferred replica leader election" messages in the log, even though it reconnected fine. This seems related. When trying to run the kafka-leader-election.sh (using --bootstrap-server) for all topics it fails saying that none of the partitions/topics exist. Oct 26 20:17:06 ip-10-227-143-9 kafka[598]: [2021-10-26 20:17:06,099] INFO [Controller id=1002] Skipping replica leader election (PREFERRED) for partition example-topic-51 by AdminClientTriggered since it doesn't exist. (kafka.controller.KafkaController) Oct 26 20:17:06 ip-10-227-143-9 kafka[598]: [2021-10-26 20:17:06,099] INFO [Controller id=1002] Skipping replica leader election (PREFERRED) for partition another-example-topic-7 by AdminClientTriggered since it doesn't exist. (kafka.controller.KafkaController) etc.. However, it is possible to consume messages from the same bootstrap server from an old topic. So, it looks like the Kafka Controller ends up in limbo state where it is connected and registered with Zookeeper, but it doesn't get any data from Zookeeper. I have still not been able to find a good way to reproduce this. No errors or warnings in the logs on either Zookeeper or Kafka. Zookeeper is running 3.5.9 with 3 nodes The Kafka clusters are in the size of 3, 5 or 7 nodes. Does anybody have an idea what happens? What is triggering the automatic replica leader elections? Is that Zookeeper? Thanks!
Re: Replica selection in unclean leader election and min.insync.replicas=2
On Tue, Jun 29, 2021 at 5:45 PM Péter Sinóros-Szabó wrote: > Hey, > > we had the same issue as you. > > I checked the code and it chooses the first live replica from the > assignment list. So if you describe a topic with kafka-topics, you will see > the brokers list that has the replica of each partition. For example: > [1001, 1002, 1003]. If that is the list, Kafka will choose the first > replica that is available (is online) in that list. > That was our understanding of the relevant code as well (unless the assignment sequence is ordered in a way that most-in-sync replica goes first, which is doubtful): def offlinePartitionLeaderElection(assignment: Seq[Int], isr: Seq[Int], liveReplicas: Set[Int], uncleanLeaderElectionEnabled: Boolean, controllerContext: ControllerContext): Option[Int] = { assignment.find(id => liveReplicas.contains(id) && isr.contains(id)).orElse { if (uncleanLeaderElectionEnabled) { val leaderOpt = assignment.find(liveReplicas.contains) if (leaderOpt.isDefined) controllerContext.stats.uncleanLeaderElectionRate.mark() leaderOpt } else { None } } } https://github.com/apache/kafka/blob/99b9b3e84f4e98c3f07714e1de6a139a004cbc5b/core/src/main/scala/kafka/controller/PartitionStateMachine.scala#L516-L527 We use "acks=all" and "min.insync.replicas=2", so that should mean that > even if the leader is down and the rest of the replicas fall out of the > ISR, one of the follower replicas should have up to date data. I'm thinking that the problem here is that Kafka allows the ISR list to shrink to 1 with the above settings in the first place: this way the information about the most-in-sync replica is effectively lost. I'm wondering now if there is a chance to adjust this behavior without the need to change the client-server protocol. The decision to stop publishing when min.insync.replicas requirement isn't met, is it made on the client or on the server side? Regards, -- Alex
Re: Replica selection in unclean leader election and min.insync.replicas=2
Hey, we had the same issue as you. I checked the code and it chooses the first live replica from the assignment list. So if you describe a topic with kafka-topics, you will see the brokers list that has the replica of each partition. For example: [1001, 1002, 1003]. If that is the list, Kafka will choose the first replica that is available (is online) in that list. We use "acks=all" and "min.insync.replicas=2", so that should mean that even if the leader is down and the rest of the replicas fall out of the ISR, one of the follower replicas should have up to date data. You can compare the two follower replicas with kafka-dump-tool to see which are more up-to-date. If you run a partition reassignment, you can change the order of the followers in the assignment list and then trigger an unclean leader election for the reassigned partitions. So it seems that this way, assuming the use of "acks=all" and "min.insync.replicas=2", we can recover without data loss. But only if my above assumption is correct. And please test this before using on live data. Peter On Mon, 28 Jun 2021 at 09:53, Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > On Mon, Jun 21, 2021 at 12:33 PM Oleksandr Shulgin < > oleksandr.shul...@zalando.de> wrote: > > > > In summary: is there a risk of data loss in such a scenario? Is this > risk avoidable and if so, what are > > the prerequisites? > > Apologies if I messed up line breaks and that made reading harder. O:-) > > The question boils down to: is replica selection completely random in case > of unclean leader election or not? > > > Regards, > -- > Alex >
Re: Replica selection in unclean leader election and min.insync.replicas=2
On Mon, Jun 21, 2021 at 12:33 PM Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > > In summary: is there a risk of data loss in such a scenario? Is this risk avoidable and if so, what are > the prerequisites? Apologies if I messed up line breaks and that made reading harder. O:-) The question boils down to: is replica selection completely random in case of unclean leader election or not? Regards, -- Alex
Replica selection in unclean leader election and min.insync.replicas=2
Hi, We are running Apache Kafka v2.7.0 in production in a 3-rack setup (3 AZs in a single AWS region) with the per-topic replication factor of 3 and the following global settings: unclean.leader.election.enable=false min.insync.replicas=2 replica.lag.time.max.ms=1 replica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelector The Kafka producer is configured with acks=all on the client side. Recently we have experienced network performance degradation and partitioning in one of the AZs. For some of the hosted partitions this has resulted in their ISR lists shrinking down to just the leader running in that problematic zone and the partitions going offline. The brokers in that zone were ultimately shut down by the administrators. Nothing unexpected so far, but we would like to have a better understanding of the overall situation. First, because of the combination of our minimum insync-replica requirement and our client config, we expect that at least one of the remaining two brokers has all the data for this partition that was acknowledged by the leader before the ISR shrunk down to just the leader itself. Is this understanding correct? Second, once the leader is completely down a clean leader election is not possible. If we enabled unclean leader election for the affected topic, should we expect Kafka to select one of the remaining brokers in a *completely random* fashion, or does it try to take into account how far they have fallen behind the former leader? If it's the former, we face a 50% risk of losing some data, even though we know all acknowledged writes were replicated to one of the brokers that are still available. In summary: is there a risk of data loss in such a scenario? Is this risk avoidable and if so, what are the prerequisites? Cheers, -- Alex
Re: Enabling Unclean leader election
You should be able to use the command `kafka-leader-election` to accomplish this. This command has an option called "--election-type" that you can use to specify whether the election is preferred or unclean. -- Ricardo On 7/22/20 11:31 AM, nitin agarwal wrote: Hi, Is there a way to enable Unclean leader election in Kafka without restarting the broker? We have a use case where we want to enable the Unclean leader election conditionally. Thanks, Nitin
Enabling Unclean leader election
Hi, Is there a way to enable Unclean leader election in Kafka without restarting the broker? We have a use case where we want to enable the Unclean leader election conditionally. Thanks, Nitin
hangup of kafka consumer in case of unclean leader election
It seems, that we discovered a bug: In case if unclean leader election happened, KafkaConsumer may hang up indefinitely Full version According to documentation, in case if `auto.offset.reset` is set to none or not set, the exception is thrown to a client code, allowing to handle it in a way that client want. In case if one will take a closer look on this mechanism, it will turn out that it is not working. Starting from kafka 2.3 new offset reset negotiation algorithm added (org.apache.kafka.clients.consumer.internals.Fetcher#validateOffsetsAsync) During this validation, Fetcher `org.apache.kafka.clients.consumer.internals.SubscriptionState` is held in `AWAIT_VALIDATION` fetch state. This effectively means that fetch requests are not issued and consumption stopped. In case if unclean leader election is happening during this time, `LogTruncationException` is thrown from future listener in method `validateOffsetsAsync`. The main problem is that this exception (thrown from listener of future) is effectively swallowed by `org.apache.kafka.clients.consumer.internals.AsyncClient#sendAsyncRequest` by this part of code ``` } catch (RuntimeException e) { if (!future.isDone()) { future.raise(e); } } ``` In the end the result is: The only way to get out of AWAIT_VALIDATION and continue consumption is to successfully finish validation, but it can not be finished. However - consumer is alive, but is consuming nothing. The only way to resume consumption is to terminate consumer and start another one. We discovered this situation by means of kstreams application, where valid value of `auto.offset.reset` provided by our code is replaced by `None` value for a purpose of position reset (org.apache.kafka.streams.processor.internals.StreamThread#create). And with kstreams it is even worse, as application may be working, logging warn messages of format `Truncation detected for partition ...,` but data is not generated for a long time and in the end is lost, making kstreams application unreliable. *Did someone saw it already, maybe there are some ways to reconfigure this behavior?* -- Dmitry Sorokin mailto://dmitry.soro...@gmail.com
Logs for overridden records, when unclean leader election happens
Hi, According to Kafka's replication mechanism, when a unclean leader election happened (suppose unclean.leader.election=true), there are possibilities that replicas of the same partition are inconsistent, i.e. on a certain offset records store different contents. IIUC, Kafka will override one version with another to solve the inconsistency. So my question is: Are there's logs describing such record overriding, so that we can get information about how many records were lost, etc? Another question I have is: Are there logs describing the details of unclean leader election event (e.g. topic/partition name, leader election result)? Thanks, Liu
Re: Optimising Time For Leader Election
Hi Mark, Reducing the zookeeper session timeouts would enable the broker change zookeeper listener on the controller to fire earlier than later. This will enable the controller to detect that the broker is down earlier. Increasing the network threads (processor threads) would not help here, unless you are seeing that the brokers are not able to pickup new connections fast enough. Increasing the api handler threads (KafkaRequestHandler) threads might help, if you are seeing that you request queue is filling up faster than the requests that can be processed. There is a metric called RequestHandlerAvgIdlePercent that you can use to tell if the Request Handlers are always busy and cannot keep up. Thanks, Mayuresh On Mon, Dec 10, 2018 at 7:38 AM Mark Anderson wrote: > Mayuresh, > > Thanks for the details. I'll need to do some more tests to get back with > specific numbers re delay and check for timeouts. > > For now (pre KIP-291 being implemented), the only parameters that will tune > leader election will be the zookeeper timeout and increasing the number of > network threads (To try and work through the queued requests faster)? > > Thanks, > Mark > > On Thu, 6 Dec 2018 at 23:43 Mayuresh Gharat > wrote: > > > Hi Mark, > > > > The leader election of a new topic partition happens once the controller > > detects that the Leader has crashed. > > This happens asynchronously via a zookeeper listener. Once a zookeeper > > listener is fired, the corresponding object indicating the event happened > > is put in to a controller queue. > > The controller has a single thread that pulls data out of this queue and > > handles each event one after another. > > I can't remember of a config to tune this, on top of my head. > > How much delay are you seeing in leadership change? Are there any > > controller socket timeouts in the log? > > Also might want to take a look at KIP-291 (KAFKA-4453), which is meant > for > > shortening this time period for handling controller events. > > > > Thanks, > > > > Mayuresh > > > > On Thu, Dec 6, 2018 at 9:50 AM Harper Henn > wrote: > > > > > Hi Mark, > > > > > > If a broker fails and you want to elect a new leader as quickly as > > > possible, you could tweak zookeeper.session.timeout.ms in the kafka > > broker > > > configuration. According to the documentation: "If the consumer fails > to > > > heartbeat to ZooKeeper for this period of time it is considered dead > and > > a > > > rebalance will occur." > > > > > > https://kafka.apache.org/0101/documentation.html > > > > > > I think making zookeeper.session.timeout.ms smaller will result in > > faster > > > detection of a dead node, but the downside is that a leader election > > might > > > get triggered by network blips or other cases where your broker is not > > > actually dead. > > > > > > Harper > > > > > > On Thu, Dec 6, 2018 at 9:11 AM Mark Anderson > > > wrote: > > > > > > > Hi, > > > > > > > > I'm currently testing how Kafka reacts in cases of broker failure due > > to > > > > process failure or network timeout. > > > > > > > > I'd like to have the election of a new leader for a topic partition > > > happen > > > > as quickly as possible but it is unclear from the documentation or > > broker > > > > configuration what the key parameters are to tune to make this > > possible. > > > > > > > > Does anyone have any pointers? Or are there any guides online? > > > > > > > > Thanks, > > > > Mark > > > > > > > > > > > > > -- > > -Regards, > > Mayuresh R. Gharat > > (862) 250-7125 > > > -- -Regards, Mayuresh R. Gharat (862) 250-7125
Re: Optimising Time For Leader Election
Mayuresh, Thanks for the details. I'll need to do some more tests to get back with specific numbers re delay and check for timeouts. For now (pre KIP-291 being implemented), the only parameters that will tune leader election will be the zookeeper timeout and increasing the number of network threads (To try and work through the queued requests faster)? Thanks, Mark On Thu, 6 Dec 2018 at 23:43 Mayuresh Gharat wrote: > Hi Mark, > > The leader election of a new topic partition happens once the controller > detects that the Leader has crashed. > This happens asynchronously via a zookeeper listener. Once a zookeeper > listener is fired, the corresponding object indicating the event happened > is put in to a controller queue. > The controller has a single thread that pulls data out of this queue and > handles each event one after another. > I can't remember of a config to tune this, on top of my head. > How much delay are you seeing in leadership change? Are there any > controller socket timeouts in the log? > Also might want to take a look at KIP-291 (KAFKA-4453), which is meant for > shortening this time period for handling controller events. > > Thanks, > > Mayuresh > > On Thu, Dec 6, 2018 at 9:50 AM Harper Henn wrote: > > > Hi Mark, > > > > If a broker fails and you want to elect a new leader as quickly as > > possible, you could tweak zookeeper.session.timeout.ms in the kafka > broker > > configuration. According to the documentation: "If the consumer fails to > > heartbeat to ZooKeeper for this period of time it is considered dead and > a > > rebalance will occur." > > > > https://kafka.apache.org/0101/documentation.html > > > > I think making zookeeper.session.timeout.ms smaller will result in > faster > > detection of a dead node, but the downside is that a leader election > might > > get triggered by network blips or other cases where your broker is not > > actually dead. > > > > Harper > > > > On Thu, Dec 6, 2018 at 9:11 AM Mark Anderson > > wrote: > > > > > Hi, > > > > > > I'm currently testing how Kafka reacts in cases of broker failure due > to > > > process failure or network timeout. > > > > > > I'd like to have the election of a new leader for a topic partition > > happen > > > as quickly as possible but it is unclear from the documentation or > broker > > > configuration what the key parameters are to tune to make this > possible. > > > > > > Does anyone have any pointers? Or are there any guides online? > > > > > > Thanks, > > > Mark > > > > > > > > -- > -Regards, > Mayuresh R. Gharat > (862) 250-7125 >
Re: Optimising Time For Leader Election
Hi Mark, The leader election of a new topic partition happens once the controller detects that the Leader has crashed. This happens asynchronously via a zookeeper listener. Once a zookeeper listener is fired, the corresponding object indicating the event happened is put in to a controller queue. The controller has a single thread that pulls data out of this queue and handles each event one after another. I can't remember of a config to tune this, on top of my head. How much delay are you seeing in leadership change? Are there any controller socket timeouts in the log? Also might want to take a look at KIP-291 (KAFKA-4453), which is meant for shortening this time period for handling controller events. Thanks, Mayuresh On Thu, Dec 6, 2018 at 9:50 AM Harper Henn wrote: > Hi Mark, > > If a broker fails and you want to elect a new leader as quickly as > possible, you could tweak zookeeper.session.timeout.ms in the kafka broker > configuration. According to the documentation: "If the consumer fails to > heartbeat to ZooKeeper for this period of time it is considered dead and a > rebalance will occur." > > https://kafka.apache.org/0101/documentation.html > > I think making zookeeper.session.timeout.ms smaller will result in faster > detection of a dead node, but the downside is that a leader election might > get triggered by network blips or other cases where your broker is not > actually dead. > > Harper > > On Thu, Dec 6, 2018 at 9:11 AM Mark Anderson > wrote: > > > Hi, > > > > I'm currently testing how Kafka reacts in cases of broker failure due to > > process failure or network timeout. > > > > I'd like to have the election of a new leader for a topic partition > happen > > as quickly as possible but it is unclear from the documentation or broker > > configuration what the key parameters are to tune to make this possible. > > > > Does anyone have any pointers? Or are there any guides online? > > > > Thanks, > > Mark > > > -- -Regards, Mayuresh R. Gharat (862) 250-7125
Re: Optimising Time For Leader Election
Hi Mark, If a broker fails and you want to elect a new leader as quickly as possible, you could tweak zookeeper.session.timeout.ms in the kafka broker configuration. According to the documentation: "If the consumer fails to heartbeat to ZooKeeper for this period of time it is considered dead and a rebalance will occur." https://kafka.apache.org/0101/documentation.html I think making zookeeper.session.timeout.ms smaller will result in faster detection of a dead node, but the downside is that a leader election might get triggered by network blips or other cases where your broker is not actually dead. Harper On Thu, Dec 6, 2018 at 9:11 AM Mark Anderson wrote: > Hi, > > I'm currently testing how Kafka reacts in cases of broker failure due to > process failure or network timeout. > > I'd like to have the election of a new leader for a topic partition happen > as quickly as possible but it is unclear from the documentation or broker > configuration what the key parameters are to tune to make this possible. > > Does anyone have any pointers? Or are there any guides online? > > Thanks, > Mark >
Optimising Time For Leader Election
Hi, I'm currently testing how Kafka reacts in cases of broker failure due to process failure or network timeout. I'd like to have the election of a new leader for a topic partition happen as quickly as possible but it is unclear from the documentation or broker configuration what the key parameters are to tune to make this possible. Does anyone have any pointers? Or are there any guides online? Thanks, Mark
Re: Unclean Leader Election Expected Behavior
In your case, you need to restart B2 with unclean.leader.election=true. This will enable B2 to become leader with 90 messages. On Thu, Jun 28, 2018 at 11:51 PM Jordan Pilat wrote: > If I restart the broker, won't that cause all 100 messages to be lost? > > On 2018/06/28 02:59:15, Manikumar wrote: > > You can enable unclean.leader.election temporarily for specific topic by > > using kafka-topics.sh command. > > This requires broker restart to take effect. > > > > http://kafka.apache.org/documentation/#topicconfigs > > > > On Thu, Jun 28, 2018 at 2:27 AM Jordan Pilat wrote: > > > > > Heya, > > > > > > I had a question about what behavior to expect from a particular > > > scenario. Given: > > > A. Unclean leader elections are disabled > > > B. A partition is led by Broker1 and followed by Broker2 > > > C. Broker1 is on offset 100 > > > D. Broker2 is on offset 90 > > > E. Broker2 has fallen out of the ISR, leaving only Broker1 in the ISR > > > F. Broker1 has a hard drive failure and goes down. All messages for > the > > > partition in question are permanently lost. > > > > > > _As I understand it_, the only way for the partition to come back > online > > > is to bring Broker1 back online, and suffer the loss of 100 messages > (as > > > Broker2's log will be truncated to Broker1's offset, which will start > from > > > scratch due to the hard drive loss) > > > > > > Is there a procedure in such a case, to force-elect Broker2 the leader, > > > and thus only lose 10 messages? > > > > > > Thanks! > > > - Jordan Pilat > > > > > >
Re: Unclean Leader Election Expected Behavior
If I restart the broker, won't that cause all 100 messages to be lost? On 2018/06/28 02:59:15, Manikumar wrote: > You can enable unclean.leader.election temporarily for specific topic by > using kafka-topics.sh command. > This requires broker restart to take effect. > > http://kafka.apache.org/documentation/#topicconfigs > > On Thu, Jun 28, 2018 at 2:27 AM Jordan Pilat wrote: > > > Heya, > > > > I had a question about what behavior to expect from a particular > > scenario. Given: > > A. Unclean leader elections are disabled > > B. A partition is led by Broker1 and followed by Broker2 > > C. Broker1 is on offset 100 > > D. Broker2 is on offset 90 > > E. Broker2 has fallen out of the ISR, leaving only Broker1 in the ISR > > F. Broker1 has a hard drive failure and goes down. All messages for the > > partition in question are permanently lost. > > > > _As I understand it_, the only way for the partition to come back online > > is to bring Broker1 back online, and suffer the loss of 100 messages (as > > Broker2's log will be truncated to Broker1's offset, which will start from > > scratch due to the hard drive loss) > > > > Is there a procedure in such a case, to force-elect Broker2 the leader, > > and thus only lose 10 messages? > > > > Thanks! > > - Jordan Pilat > > >
Re: Unclean Leader Election Expected Behavior
You can enable unclean.leader.election temporarily for specific topic by using kafka-topics.sh command. This requires broker restart to take effect. http://kafka.apache.org/documentation/#topicconfigs On Thu, Jun 28, 2018 at 2:27 AM Jordan Pilat wrote: > Heya, > > I had a question about what behavior to expect from a particular > scenario. Given: > A. Unclean leader elections are disabled > B. A partition is led by Broker1 and followed by Broker2 > C. Broker1 is on offset 100 > D. Broker2 is on offset 90 > E. Broker2 has fallen out of the ISR, leaving only Broker1 in the ISR > F. Broker1 has a hard drive failure and goes down. All messages for the > partition in question are permanently lost. > > _As I understand it_, the only way for the partition to come back online > is to bring Broker1 back online, and suffer the loss of 100 messages (as > Broker2's log will be truncated to Broker1's offset, which will start from > scratch due to the hard drive loss) > > Is there a procedure in such a case, to force-elect Broker2 the leader, > and thus only lose 10 messages? > > Thanks! > - Jordan Pilat >
Unclean Leader Election Expected Behavior
Heya, I had a question about what behavior to expect from a particular scenario. Given: A. Unclean leader elections are disabled B. A partition is led by Broker1 and followed by Broker2 C. Broker1 is on offset 100 D. Broker2 is on offset 90 E. Broker2 has fallen out of the ISR, leaving only Broker1 in the ISR F. Broker1 has a hard drive failure and goes down. All messages for the partition in question are permanently lost. _As I understand it_, the only way for the partition to come back online is to bring Broker1 back online, and suffer the loss of 100 messages (as Broker2's log will be truncated to Broker1's offset, which will start from scratch due to the hard drive loss) Is there a procedure in such a case, to force-elect Broker2 the leader, and thus only lose 10 messages? Thanks! - Jordan Pilat
Re: clean leader election on kafka 0.10.2.1
Henry, I am not sure what do you mean by "waits for the leader of partition to start up"? Leader election should not affect leader - follower starting up process. Guozhang On Thu, Nov 2, 2017 at 4:56 PM, Henry Cai <h...@pinterest.com.invalid> wrote: > We were on kafka 0.10.2.1. We tried to switch from unclean leader election > to clean leader election and found it became very difficult to start up the > whole cluster. > > It seems the hosts went into a deadlock situation during startup > - broker A was a follower on partition 1 and waits for the leader of > partition 1 (which is broker B) to start up > - broker B was a follower on partition 2 and waits for the leader of > partition 2 (which is broker A) to start up > > We found there are quite a few deadlock related bugs fixed in 0.11 or > later, do we have to upgrade our kafka version to use clean leader > election? > -- -- Guozhang
clean leader election on kafka 0.10.2.1
We were on kafka 0.10.2.1. We tried to switch from unclean leader election to clean leader election and found it became very difficult to start up the whole cluster. It seems the hosts went into a deadlock situation during startup - broker A was a follower on partition 1 and waits for the leader of partition 1 (which is broker B) to start up - broker B was a follower on partition 2 and waits for the leader of partition 2 (which is broker A) to start up We found there are quite a few deadlock related bugs fixed in 0.11 or later, do we have to upgrade our kafka version to use clean leader election?
issues with leader election
HI, I ran into an issue with kafka where the leader was set to -1 : Topic:AP-100PartitionCount:5ReplicationFactor:3 Configs:retention.ms=1579929985 Topic: AP-100 Partition: 0Leader: 26 Replicas: 26,24,25 Isr: 24,26,25 Topic: AP-100 Partition: 1Leader: 22 Replicas: 22,25,26 Isr: 22,26,25 Topic: AP-100 Partition: 2Leader: 23 Replicas: 23,26,22 Isr: 26,22,23 Topic: AP-100 Partition: 3Leader: 24 Replicas: 24,22,23 Isr: 24,22,23 Topic: AP-100 Partition: 4Leader: -1 Replicas: 25,23,24 Isr: 24 What is the best way to fix this? Thank you, Tyler Scoville
Re: Kafka 0.9.0.1 failing on new leader election
This looks correct. Sorry, not sure what else it could be. On Sat, Jul 30, 2016 at 4:24 AM, Sean Morris (semorris)wrote: > Kafka 0.9.0.1 > Zookeeper 3.4.6 > Zkclient 0.7 > > I have verified I only have one zkclient.jar in my class path. > > Thanks, > Sean > > > > > On 7/29/16, 9:35 PM, "Gwen Shapira" wrote: > >>you know, I ran into those null pointer exceptions when I accidentally >>tested Kafka with mismatching version of zkclient. >> >>Can you share the versions of both? And make sure you have only one >>zkclient on your classpath? >> >>On Tue, Jul 26, 2016 at 6:40 AM, Sean Morris (semorris) >> wrote: >>> I have a setup with 2 brokers and it is going through leader re-election >>> but seems to fail to complete. The behavior I start to see is that some >>> published succeed but others will fail with NotLeader exceptions like this >>> >>> >>> java.util.concurrent.ExecutionException: >>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server >>> is not the leader for that topic-partition. >>> >>> at >>> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:56) >>> >>> at >>> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:43) >>> >>> at >>> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:25) >>> >>> >>> My Kafka and zookeeper log file has errors like this >>> >>> >>> [2016-07-26 02:01:12,842] ERROR >>> [kafka.controller.ControllerBrokerRequestBatch] Haven't been able to send >>> metadata update requests, current state of the map is Map(2 -> Map(eox-1 -> >>> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:46,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), >>> notify-eportal-1 -> >>> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), >>> psirts-1 -> >>> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), >>> notify-pushNotif-low-1 -> >>> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1)), >>> 1 -> Map(eox-1 -> >>> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:46,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), >>> notify-eportal-1 -> >>> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), >>> psirts-1 -> >>> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), >>> notify-pushNotif-low-1 -> >>> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1))) >>> >>> [2016-07-26 02:01:12,845] ERROR [kafka.controller.KafkaController] >>> [Controller 1]: Forcing the controller to resign >>> >>> >>> Which is then followed by a null pointer exception >>> >>> >>> [2016-07-26 02:01:13,021] ERROR [org.I0Itec.zkclient.ZkEventThread] Error >>> handling event ZkEvent[Children of /isr_change_notification changed sent to >>> kafka.controller.IsrChangeNotificationListener@55ca3750] >>> >>> java.lang.IllegalStateException: java.lang.NullPointerException >>> >>> at >>> kafka.controller.ControllerBrokerRequestBatch.sendRequestsToBrokers(ControllerChannelManager.scala:434) >>> >>> at >>> kafka.controller.KafkaController.sendUpdateMetadataRequest(KafkaController.scala:1029) >>> >>> at >>> kafka.controller.IsrChangeNotificationListener.kafka$controller$IsrChangeNotificationListener$$processUpdateNotifications(KafkaController.scala:1372) >>> >>> at >>> kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply$mcV$sp(KafkaController.scala:1359) >>> >>> at >>> kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352) >>> >>> at >>> kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352) >>> >>> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) >>> >>> at >>> kafka.controller.IsrChangeNotificationListener.handleChildChange(KafkaController.scala:1352) >>> >>> at org.I0Itec.zkclient.ZkClient$10.run(ZkClient.java:842) >>> >>> at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) >>> >>> Caused by: java.lang.NullPointerException >>> >>> at >>> kafka.controller.KafkaController.sendRequest(KafkaController.scala:699) >>> >>> at >>> kafka.controller.ControllerBrokerRequestBatch$$anonfun$sendRequestsToBrokers$2.apply(ControllerChannelManager.scala:403) >>> >>> at >>> kafka.controller.ControllerBrokerRequestBatch$$anonfun$sendRequestsToBrokers$2.apply(ControllerChannelManager.scala:369) >>> >>> at >>>
Re: Kafka 0.9.0.1 failing on new leader election
Kafka 0.9.0.1 Zookeeper 3.4.6 Zkclient 0.7 I have verified I only have one zkclient.jar in my class path. Thanks, Sean On 7/29/16, 9:35 PM, "Gwen Shapira"wrote: >you know, I ran into those null pointer exceptions when I accidentally >tested Kafka with mismatching version of zkclient. > >Can you share the versions of both? And make sure you have only one >zkclient on your classpath? > >On Tue, Jul 26, 2016 at 6:40 AM, Sean Morris (semorris) > wrote: >> I have a setup with 2 brokers and it is going through leader re-election but >> seems to fail to complete. The behavior I start to see is that some >> published succeed but others will fail with NotLeader exceptions like this >> >> >> java.util.concurrent.ExecutionException: >> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server >> is not the leader for that topic-partition. >> >> at >> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:56) >> >> at >> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:43) >> >> at >> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:25) >> >> >> My Kafka and zookeeper log file has errors like this >> >> >> [2016-07-26 02:01:12,842] ERROR >> [kafka.controller.ControllerBrokerRequestBatch] Haven't been able to send >> metadata update requests, current state of the map is Map(2 -> Map(eox-1 -> >> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:46,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), >> notify-eportal-1 -> >> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), >> psirts-1 -> >> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), >> notify-pushNotif-low-1 -> >> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1)), >> 1 -> Map(eox-1 -> >> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:46,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), >> notify-eportal-1 -> >> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), >> psirts-1 -> >> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), >> notify-pushNotif-low-1 -> >> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1))) >> >> [2016-07-26 02:01:12,845] ERROR [kafka.controller.KafkaController] >> [Controller 1]: Forcing the controller to resign >> >> >> Which is then followed by a null pointer exception >> >> >> [2016-07-26 02:01:13,021] ERROR [org.I0Itec.zkclient.ZkEventThread] Error >> handling event ZkEvent[Children of /isr_change_notification changed sent to >> kafka.controller.IsrChangeNotificationListener@55ca3750] >> >> java.lang.IllegalStateException: java.lang.NullPointerException >> >> at >> kafka.controller.ControllerBrokerRequestBatch.sendRequestsToBrokers(ControllerChannelManager.scala:434) >> >> at >> kafka.controller.KafkaController.sendUpdateMetadataRequest(KafkaController.scala:1029) >> >> at >> kafka.controller.IsrChangeNotificationListener.kafka$controller$IsrChangeNotificationListener$$processUpdateNotifications(KafkaController.scala:1372) >> >> at >> kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply$mcV$sp(KafkaController.scala:1359) >> >> at >> kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352) >> >> at >> kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352) >> >> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) >> >> at >> kafka.controller.IsrChangeNotificationListener.handleChildChange(KafkaController.scala:1352) >> >> at org.I0Itec.zkclient.ZkClient$10.run(ZkClient.java:842) >> >> at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) >> >> Caused by: java.lang.NullPointerException >> >> at >> kafka.controller.KafkaController.sendRequest(KafkaController.scala:699) >> >> at >> kafka.controller.ControllerBrokerRequestBatch$$anonfun$sendRequestsToBrokers$2.apply(ControllerChannelManager.scala:403) >> >> at >> kafka.controller.ControllerBrokerRequestBatch$$anonfun$sendRequestsToBrokers$2.apply(ControllerChannelManager.scala:369) >> >> at >> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) >> >> at >> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) >> >> at >> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) >> >> at
Re: Kafka 0.9.0.1 failing on new leader election
you know, I ran into those null pointer exceptions when I accidentally tested Kafka with mismatching version of zkclient. Can you share the versions of both? And make sure you have only one zkclient on your classpath? On Tue, Jul 26, 2016 at 6:40 AM, Sean Morris (semorris)wrote: > I have a setup with 2 brokers and it is going through leader re-election but > seems to fail to complete. The behavior I start to see is that some published > succeed but others will fail with NotLeader exceptions like this > > > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is > not the leader for that topic-partition. > > at > org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:56) > > at > org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:43) > > at > org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:25) > > > My Kafka and zookeeper log file has errors like this > > > [2016-07-26 02:01:12,842] ERROR > [kafka.controller.ControllerBrokerRequestBatch] Haven't been able to send > metadata update requests, current state of the map is Map(2 -> Map(eox-1 -> > (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:46,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), > notify-eportal-1 -> > (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), > psirts-1 -> > (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), > notify-pushNotif-low-1 -> > (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1)), > 1 -> Map(eox-1 -> > (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:46,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), > notify-eportal-1 -> > (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), > psirts-1 -> > (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), > notify-pushNotif-low-1 -> > (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1))) > > [2016-07-26 02:01:12,845] ERROR [kafka.controller.KafkaController] > [Controller 1]: Forcing the controller to resign > > > Which is then followed by a null pointer exception > > > [2016-07-26 02:01:13,021] ERROR [org.I0Itec.zkclient.ZkEventThread] Error > handling event ZkEvent[Children of /isr_change_notification changed sent to > kafka.controller.IsrChangeNotificationListener@55ca3750] > > java.lang.IllegalStateException: java.lang.NullPointerException > > at > kafka.controller.ControllerBrokerRequestBatch.sendRequestsToBrokers(ControllerChannelManager.scala:434) > > at > kafka.controller.KafkaController.sendUpdateMetadataRequest(KafkaController.scala:1029) > > at > kafka.controller.IsrChangeNotificationListener.kafka$controller$IsrChangeNotificationListener$$processUpdateNotifications(KafkaController.scala:1372) > > at > kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply$mcV$sp(KafkaController.scala:1359) > > at > kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352) > > at > kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352) > > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) > > at > kafka.controller.IsrChangeNotificationListener.handleChildChange(KafkaController.scala:1352) > > at org.I0Itec.zkclient.ZkClient$10.run(ZkClient.java:842) > > at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) > > Caused by: java.lang.NullPointerException > > at > kafka.controller.KafkaController.sendRequest(KafkaController.scala:699) > > at > kafka.controller.ControllerBrokerRequestBatch$$anonfun$sendRequestsToBrokers$2.apply(ControllerChannelManager.scala:403) > > at > kafka.controller.ControllerBrokerRequestBatch$$anonfun$sendRequestsToBrokers$2.apply(ControllerChannelManager.scala:369) > > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) > > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > > at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) > > at > kafka.controller.ControllerBrokerRequestBatch.sendRequestsToBrokers(ControllerChannelManager.scala:369) > > ... 9 more > > > I eventually restarted zookeeper and my
Re: Kafka 0.9.0.1 failing on new leader election
Yes. This is happening after several days of running data, not on initial startup. Thanks. On 7/29/16, 11:54 AM, "David Garcia" <dav...@spiceworks.com> wrote: >Well, just a dumb question, but did you include all the brokers in your client >connection properties? > >On 7/29/16, 10:48 AM, "Sean Morris (semorris)" <semor...@cisco.com> wrote: > >Anyone have any ideas? > >From: semorris <semor...@cisco.com<mailto:semor...@cisco.com>> >Date: Tuesday, July 26, 2016 at 9:40 AM >To: "users@kafka.apache.org<mailto:users@kafka.apache.org>" > <users@kafka.apache.org<mailto:users@kafka.apache.org>> >Subject: Kafka 0.9.0.1 failing on new leader election > >I have a setup with 2 brokers and it is going through leader re-election > but seems to fail to complete. The behavior I start to see is that some > published succeed but others will fail with NotLeader exceptions like this > > >java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is > not the leader for that topic-partition. > >at > org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:56) > >at > org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:43) > >at > org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:25) > > >My Kafka and zookeeper log file has errors like this > > >[2016-07-26 02:01:12,842] ERROR > [kafka.controller.ControllerBrokerRequestBatch] Haven't been able to send > metadata update requests, current state of the map is Map(2 -> Map(eox-1 -> > (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:46,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), > notify-eportal-1 -> > (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), > psirts-1 -> > (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), > notify-pushNotif-low-1 -> > (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1)), > 1 -> Map(eox-1 -> > (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:46,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), > notify-eportal-1 -> > (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), > psirts-1 -> > (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), > notify-pushNotif-low-1 -> > (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1))) > >[2016-07-26 02:01:12,845] ERROR [kafka.controller.KafkaController] > [Controller 1]: Forcing the controller to resign > > >Which is then followed by a null pointer exception > > >[2016-07-26 02:01:13,021] ERROR [org.I0Itec.zkclient.ZkEventThread] Error > handling event ZkEvent[Children of /isr_change_notification changed sent to > kafka.controller.IsrChangeNotificationListener@55ca3750] > >java.lang.IllegalStateException: java.lang.NullPointerException > >at > kafka.controller.ControllerBrokerRequestBatch.sendRequestsToBrokers(ControllerChannelManager.scala:434) > >at > kafka.controller.KafkaController.sendUpdateMetadataRequest(KafkaController.scala:1029) > >at > kafka.controller.IsrChangeNotificationListener.kafka$controller$IsrChangeNotificationListener$$processUpdateNotifications(KafkaController.scala:1372) > >at > kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply$mcV$sp(KafkaController.scala:1359) > >at > kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352) > >at > kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352) > >at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) > >at > kafka.controller.IsrChangeNotificationListener.handleChildChange(KafkaController.scala:1352) > >at org.I0Itec.zkclient.ZkClient$10.run(ZkClient.java:842) > >at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) > >Caused by: java.lang.NullPointerException
Re: Kafka 0.9.0.1 failing on new leader election
Well, just a dumb question, but did you include all the brokers in your client connection properties? On 7/29/16, 10:48 AM, "Sean Morris (semorris)" <semor...@cisco.com> wrote: Anyone have any ideas? From: semorris <semor...@cisco.com<mailto:semor...@cisco.com>> Date: Tuesday, July 26, 2016 at 9:40 AM To: "users@kafka.apache.org<mailto:users@kafka.apache.org>" <users@kafka.apache.org<mailto:users@kafka.apache.org>> Subject: Kafka 0.9.0.1 failing on new leader election I have a setup with 2 brokers and it is going through leader re-election but seems to fail to complete. The behavior I start to see is that some published succeed but others will fail with NotLeader exceptions like this java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:56) at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:43) at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:25) My Kafka and zookeeper log file has errors like this [2016-07-26 02:01:12,842] ERROR [kafka.controller.ControllerBrokerRequestBatch] Haven't been able to send metadata update requests, current state of the map is Map(2 -> Map(eox-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:46,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), notify-eportal-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), psirts-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), notify-pushNotif-low-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1)), 1 -> Map(eox-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:46,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), notify-eportal-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), psirts-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), notify-pushNotif-low-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1))) [2016-07-26 02:01:12,845] ERROR [kafka.controller.KafkaController] [Controller 1]: Forcing the controller to resign Which is then followed by a null pointer exception [2016-07-26 02:01:13,021] ERROR [org.I0Itec.zkclient.ZkEventThread] Error handling event ZkEvent[Children of /isr_change_notification changed sent to kafka.controller.IsrChangeNotificationListener@55ca3750] java.lang.IllegalStateException: java.lang.NullPointerException at kafka.controller.ControllerBrokerRequestBatch.sendRequestsToBrokers(ControllerChannelManager.scala:434) at kafka.controller.KafkaController.sendUpdateMetadataRequest(KafkaController.scala:1029) at kafka.controller.IsrChangeNotificationListener.kafka$controller$IsrChangeNotificationListener$$processUpdateNotifications(KafkaController.scala:1372) at kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply$mcV$sp(KafkaController.scala:1359) at kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352) at kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) at kafka.controller.IsrChangeNotificationListener.handleChildChange(KafkaController.scala:1352) at org.I0Itec.zkclient.ZkClient$10.run(ZkClient.java:842) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) Caused by: java.lang.NullPointerException at kafka.controller.KafkaController.sendRequest(KafkaController.scala:699) at kafka.controller.ControllerBrokerRequestBatch$$anonfun$sendRequestsToBrokers$2.apply(ControllerChannelManager.scala:403) at kafka.controller.ControllerBrokerRequestBatch$$anonfun$sendRequestsToBrokers$2.apply(ControllerChannelManager.scala:369) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashTable$class.foreachEntry(
Re: Kafka 0.9.0.1 failing on new leader election
Anyone have any ideas? From: semorris <semor...@cisco.com<mailto:semor...@cisco.com>> Date: Tuesday, July 26, 2016 at 9:40 AM To: "users@kafka.apache.org<mailto:users@kafka.apache.org>" <users@kafka.apache.org<mailto:users@kafka.apache.org>> Subject: Kafka 0.9.0.1 failing on new leader election I have a setup with 2 brokers and it is going through leader re-election but seems to fail to complete. The behavior I start to see is that some published succeed but others will fail with NotLeader exceptions like this java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:56) at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:43) at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:25) My Kafka and zookeeper log file has errors like this [2016-07-26 02:01:12,842] ERROR [kafka.controller.ControllerBrokerRequestBatch] Haven't been able to send metadata update requests, current state of the map is Map(2 -> Map(eox-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:46,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), notify-eportal-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), psirts-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), notify-pushNotif-low-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1)), 1 -> Map(eox-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:46,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), notify-eportal-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), psirts-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), notify-pushNotif-low-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1))) [2016-07-26 02:01:12,845] ERROR [kafka.controller.KafkaController] [Controller 1]: Forcing the controller to resign Which is then followed by a null pointer exception [2016-07-26 02:01:13,021] ERROR [org.I0Itec.zkclient.ZkEventThread] Error handling event ZkEvent[Children of /isr_change_notification changed sent to kafka.controller.IsrChangeNotificationListener@55ca3750] java.lang.IllegalStateException: java.lang.NullPointerException at kafka.controller.ControllerBrokerRequestBatch.sendRequestsToBrokers(ControllerChannelManager.scala:434) at kafka.controller.KafkaController.sendUpdateMetadataRequest(KafkaController.scala:1029) at kafka.controller.IsrChangeNotificationListener.kafka$controller$IsrChangeNotificationListener$$processUpdateNotifications(KafkaController.scala:1372) at kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply$mcV$sp(KafkaController.scala:1359) at kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352) at kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) at kafka.controller.IsrChangeNotificationListener.handleChildChange(KafkaController.scala:1352) at org.I0Itec.zkclient.ZkClient$10.run(ZkClient.java:842) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) Caused by: java.lang.NullPointerException at kafka.controller.KafkaController.sendRequest(KafkaController.scala:699) at kafka.controller.ControllerBrokerRequestBatch$$anonfun$sendRequestsToBrokers$2.apply(ControllerChannelManager.scala:403) at kafka.controller.ControllerBrokerRequestBatch$$anonfun$sendRequestsToBrokers$2.apply(ControllerChannelManager.scala:369) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) at kafka.controller.ControllerBrokerRequestBatch.sendRequestsToBrokers(ControllerChannelManager.scala:369) ... 9 more I eventually restarted zookeeper and my brokers. This has happened twice in the last week. Any ideas? Thanks, Sean
Kafka 0.9.0.1 failing on new leader election
I have a setup with 2 brokers and it is going through leader re-election but seems to fail to complete. The behavior I start to see is that some published succeed but others will fail with NotLeader exceptions like this java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:56) at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:43) at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:25) My Kafka and zookeeper log file has errors like this [2016-07-26 02:01:12,842] ERROR [kafka.controller.ControllerBrokerRequestBatch] Haven't been able to send metadata update requests, current state of the map is Map(2 -> Map(eox-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:46,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), notify-eportal-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), psirts-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), notify-pushNotif-low-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1)), 1 -> Map(eox-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:46,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), notify-eportal-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), psirts-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1), notify-pushNotif-low-1 -> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1))) [2016-07-26 02:01:12,845] ERROR [kafka.controller.KafkaController] [Controller 1]: Forcing the controller to resign Which is then followed by a null pointer exception [2016-07-26 02:01:13,021] ERROR [org.I0Itec.zkclient.ZkEventThread] Error handling event ZkEvent[Children of /isr_change_notification changed sent to kafka.controller.IsrChangeNotificationListener@55ca3750] java.lang.IllegalStateException: java.lang.NullPointerException at kafka.controller.ControllerBrokerRequestBatch.sendRequestsToBrokers(ControllerChannelManager.scala:434) at kafka.controller.KafkaController.sendUpdateMetadataRequest(KafkaController.scala:1029) at kafka.controller.IsrChangeNotificationListener.kafka$controller$IsrChangeNotificationListener$$processUpdateNotifications(KafkaController.scala:1372) at kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply$mcV$sp(KafkaController.scala:1359) at kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352) at kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) at kafka.controller.IsrChangeNotificationListener.handleChildChange(KafkaController.scala:1352) at org.I0Itec.zkclient.ZkClient$10.run(ZkClient.java:842) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) Caused by: java.lang.NullPointerException at kafka.controller.KafkaController.sendRequest(KafkaController.scala:699) at kafka.controller.ControllerBrokerRequestBatch$$anonfun$sendRequestsToBrokers$2.apply(ControllerChannelManager.scala:403) at kafka.controller.ControllerBrokerRequestBatch$$anonfun$sendRequestsToBrokers$2.apply(ControllerChannelManager.scala:369) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) at kafka.controller.ControllerBrokerRequestBatch.sendRequestsToBrokers(ControllerChannelManager.scala:369) ... 9 more I eventually restarted zookeeper and my brokers. This has happened twice in the last week. Any ideas? Thanks, Sean
Can I use a single-partition topic for leader election?
Hello, I am developing a service so that all clustered nodes form a consumer group. I also need to run some logic on only one of the node. Can I use a special single-partition topic for leader election? That is, in my node I can use ConsumerRebalanceListener to make sure that if the "leader" TopicPartition is being assigned to this node, then it is now the leader, otherwise, it is not a leader. Thanks, Yi
Re: leader election bug
So what could happen then? There is no broker registered in zookeeper, but it's still a leader somehow. On Mon, May 2, 2016 at 3:27 PM, Gwen Shapira <g...@confluent.io> wrote: > Thats a good version :) > > On Mon, May 2, 2016 at 11:04 AM, Kane Kim <kane.ist...@gmail.com> wrote: > > We are running Zookeeper version: 3.4.6-1569965, built on 02/20/2014 > 09:09 > > GMT, does it have any known problems? > > > > On Fri, Apr 29, 2016 at 2:35 PM, James Brown <jbr...@easypost.com> > wrote: > > > >> What version of ZooKeeper are you on? There have been a few bugs over > >> the years where ZK has lost ephemeral nodes (and spontaneously > >> de-registered brokers). > >> > >> On Fri, Apr 29, 2016 at 11:30 AM, Kane Kim <kane.ist...@gmail.com> > wrote: > >> > Any idea why it's happening? I'm sure rolling restart would fix it. Is > >> it a > >> > bug? > >> > > >> > On Wed, Apr 27, 2016 at 5:42 PM, Kane Kim <kane.ist...@gmail.com> > wrote: > >> > > >> >> Hello, > >> >> > >> >> Looks like we are hitting leader election bug. I've stopped one > broker > >> >> (104224873) on other brokers I see following: > >> >> > >> >> WARN kafka.controller.ControllerChannelManager - [Channel manager > on > >> >> controller 104224863]: Not sending request Name: StopReplicaRequest; > >> >> Version: 0; CorrelationId: 843100; ClientId: ; DeletePartitions: > false; > >> >> ControllerId: 104224863; ControllerEpoch: 8; Partitions: > [mp-auth,169] > >> to > >> >> broker 104224873, since it is offline. > >> >> > >> >> Also describing topics returns this: > >> >> Topic: mp-unknown Partition: 597 Leader: 104224873 Replicas: > >> >> 104224874,104224873,104224875 Isr: 104224873,104224875 > >> >> > >> >> broker 104224873 is shut down, but it's still leader for the > partition > >> (at > >> >> least for a couple of hours as I monitor it). > >> >> Zookeeper cluster is healthy. > >> >> > >> >> ls /brokers/ids > >> >> [104224874, 104224875, 104224863, 104224864, 104224871, 104224867, > >> >> 104224868, 104224865, 104224866, 104224876, 104224877, 104224869, > >> >> 104224878, 104224879] > >> >> > >> >> That broker is not registered in ZK. > >> >> > >> > >> > >> > >> -- > >> James Brown > >> Engineer > >> >
Re: leader election bug
Thats a good version :) On Mon, May 2, 2016 at 11:04 AM, Kane Kim <kane.ist...@gmail.com> wrote: > We are running Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 > GMT, does it have any known problems? > > On Fri, Apr 29, 2016 at 2:35 PM, James Brown <jbr...@easypost.com> wrote: > >> What version of ZooKeeper are you on? There have been a few bugs over >> the years where ZK has lost ephemeral nodes (and spontaneously >> de-registered brokers). >> >> On Fri, Apr 29, 2016 at 11:30 AM, Kane Kim <kane.ist...@gmail.com> wrote: >> > Any idea why it's happening? I'm sure rolling restart would fix it. Is >> it a >> > bug? >> > >> > On Wed, Apr 27, 2016 at 5:42 PM, Kane Kim <kane.ist...@gmail.com> wrote: >> > >> >> Hello, >> >> >> >> Looks like we are hitting leader election bug. I've stopped one broker >> >> (104224873) on other brokers I see following: >> >> >> >> WARN kafka.controller.ControllerChannelManager - [Channel manager on >> >> controller 104224863]: Not sending request Name: StopReplicaRequest; >> >> Version: 0; CorrelationId: 843100; ClientId: ; DeletePartitions: false; >> >> ControllerId: 104224863; ControllerEpoch: 8; Partitions: [mp-auth,169] >> to >> >> broker 104224873, since it is offline. >> >> >> >> Also describing topics returns this: >> >> Topic: mp-unknown Partition: 597 Leader: 104224873 Replicas: >> >> 104224874,104224873,104224875 Isr: 104224873,104224875 >> >> >> >> broker 104224873 is shut down, but it's still leader for the partition >> (at >> >> least for a couple of hours as I monitor it). >> >> Zookeeper cluster is healthy. >> >> >> >> ls /brokers/ids >> >> [104224874, 104224875, 104224863, 104224864, 104224871, 104224867, >> >> 104224868, 104224865, 104224866, 104224876, 104224877, 104224869, >> >> 104224878, 104224879] >> >> >> >> That broker is not registered in ZK. >> >> >> >> >> >> -- >> James Brown >> Engineer >>
Re: leader election bug
Also that broker is not registered in ZK as we can check with zk-shell, but kafka still thinks it's a leader for some partitions. On Mon, May 2, 2016 at 11:04 AM, Kane Kim <kane.ist...@gmail.com> wrote: > We are running Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 > GMT, does it have any known problems? > > > On Fri, Apr 29, 2016 at 2:35 PM, James Brown <jbr...@easypost.com> wrote: > >> What version of ZooKeeper are you on? There have been a few bugs over >> the years where ZK has lost ephemeral nodes (and spontaneously >> de-registered brokers). >> >> On Fri, Apr 29, 2016 at 11:30 AM, Kane Kim <kane.ist...@gmail.com> wrote: >> > Any idea why it's happening? I'm sure rolling restart would fix it. Is >> it a >> > bug? >> > >> > On Wed, Apr 27, 2016 at 5:42 PM, Kane Kim <kane.ist...@gmail.com> >> wrote: >> > >> >> Hello, >> >> >> >> Looks like we are hitting leader election bug. I've stopped one broker >> >> (104224873) on other brokers I see following: >> >> >> >> WARN kafka.controller.ControllerChannelManager - [Channel manager on >> >> controller 104224863]: Not sending request Name: StopReplicaRequest; >> >> Version: 0; CorrelationId: 843100; ClientId: ; DeletePartitions: false; >> >> ControllerId: 104224863; ControllerEpoch: 8; Partitions: [mp-auth,169] >> to >> >> broker 104224873, since it is offline. >> >> >> >> Also describing topics returns this: >> >> Topic: mp-unknown Partition: 597 Leader: 104224873 Replicas: >> >> 104224874,104224873,104224875 Isr: 104224873,104224875 >> >> >> >> broker 104224873 is shut down, but it's still leader for the partition >> (at >> >> least for a couple of hours as I monitor it). >> >> Zookeeper cluster is healthy. >> >> >> >> ls /brokers/ids >> >> [104224874, 104224875, 104224863, 104224864, 104224871, 104224867, >> >> 104224868, 104224865, 104224866, 104224876, 104224877, 104224869, >> >> 104224878, 104224879] >> >> >> >> That broker is not registered in ZK. >> >> >> >> >> >> -- >> James Brown >> Engineer >> > >
Re: leader election bug
We are running Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT, does it have any known problems? On Fri, Apr 29, 2016 at 2:35 PM, James Brown <jbr...@easypost.com> wrote: > What version of ZooKeeper are you on? There have been a few bugs over > the years where ZK has lost ephemeral nodes (and spontaneously > de-registered brokers). > > On Fri, Apr 29, 2016 at 11:30 AM, Kane Kim <kane.ist...@gmail.com> wrote: > > Any idea why it's happening? I'm sure rolling restart would fix it. Is > it a > > bug? > > > > On Wed, Apr 27, 2016 at 5:42 PM, Kane Kim <kane.ist...@gmail.com> wrote: > > > >> Hello, > >> > >> Looks like we are hitting leader election bug. I've stopped one broker > >> (104224873) on other brokers I see following: > >> > >> WARN kafka.controller.ControllerChannelManager - [Channel manager on > >> controller 104224863]: Not sending request Name: StopReplicaRequest; > >> Version: 0; CorrelationId: 843100; ClientId: ; DeletePartitions: false; > >> ControllerId: 104224863; ControllerEpoch: 8; Partitions: [mp-auth,169] > to > >> broker 104224873, since it is offline. > >> > >> Also describing topics returns this: > >> Topic: mp-unknown Partition: 597 Leader: 104224873 Replicas: > >> 104224874,104224873,104224875 Isr: 104224873,104224875 > >> > >> broker 104224873 is shut down, but it's still leader for the partition > (at > >> least for a couple of hours as I monitor it). > >> Zookeeper cluster is healthy. > >> > >> ls /brokers/ids > >> [104224874, 104224875, 104224863, 104224864, 104224871, 104224867, > >> 104224868, 104224865, 104224866, 104224876, 104224877, 104224869, > >> 104224878, 104224879] > >> > >> That broker is not registered in ZK. > >> > > > > -- > James Brown > Engineer >
Re: leader election bug
What version of ZooKeeper are you on? There have been a few bugs over the years where ZK has lost ephemeral nodes (and spontaneously de-registered brokers). On Fri, Apr 29, 2016 at 11:30 AM, Kane Kim <kane.ist...@gmail.com> wrote: > Any idea why it's happening? I'm sure rolling restart would fix it. Is it a > bug? > > On Wed, Apr 27, 2016 at 5:42 PM, Kane Kim <kane.ist...@gmail.com> wrote: > >> Hello, >> >> Looks like we are hitting leader election bug. I've stopped one broker >> (104224873) on other brokers I see following: >> >> WARN kafka.controller.ControllerChannelManager - [Channel manager on >> controller 104224863]: Not sending request Name: StopReplicaRequest; >> Version: 0; CorrelationId: 843100; ClientId: ; DeletePartitions: false; >> ControllerId: 104224863; ControllerEpoch: 8; Partitions: [mp-auth,169] to >> broker 104224873, since it is offline. >> >> Also describing topics returns this: >> Topic: mp-unknown Partition: 597 Leader: 104224873 Replicas: >> 104224874,104224873,104224875 Isr: 104224873,104224875 >> >> broker 104224873 is shut down, but it's still leader for the partition (at >> least for a couple of hours as I monitor it). >> Zookeeper cluster is healthy. >> >> ls /brokers/ids >> [104224874, 104224875, 104224863, 104224864, 104224871, 104224867, >> 104224868, 104224865, 104224866, 104224876, 104224877, 104224869, >> 104224878, 104224879] >> >> That broker is not registered in ZK. >> -- James Brown Engineer
Re: leader election bug
Any idea why it's happening? I'm sure rolling restart would fix it. Is it a bug? On Wed, Apr 27, 2016 at 5:42 PM, Kane Kim <kane.ist...@gmail.com> wrote: > Hello, > > Looks like we are hitting leader election bug. I've stopped one broker > (104224873) on other brokers I see following: > > WARN kafka.controller.ControllerChannelManager - [Channel manager on > controller 104224863]: Not sending request Name: StopReplicaRequest; > Version: 0; CorrelationId: 843100; ClientId: ; DeletePartitions: false; > ControllerId: 104224863; ControllerEpoch: 8; Partitions: [mp-auth,169] to > broker 104224873, since it is offline. > > Also describing topics returns this: > Topic: mp-unknown Partition: 597 Leader: 104224873 Replicas: > 104224874,104224873,104224875 Isr: 104224873,104224875 > > broker 104224873 is shut down, but it's still leader for the partition (at > least for a couple of hours as I monitor it). > Zookeeper cluster is healthy. > > ls /brokers/ids > [104224874, 104224875, 104224863, 104224864, 104224871, 104224867, > 104224868, 104224865, 104224866, 104224876, 104224877, 104224869, > 104224878, 104224879] > > That broker is not registered in ZK. >
leader election bug
Hello, Looks like we are hitting leader election bug. I've stopped one broker (104224873) on other brokers I see following: WARN kafka.controller.ControllerChannelManager - [Channel manager on controller 104224863]: Not sending request Name: StopReplicaRequest; Version: 0; CorrelationId: 843100; ClientId: ; DeletePartitions: false; ControllerId: 104224863; ControllerEpoch: 8; Partitions: [mp-auth,169] to broker 104224873, since it is offline. Also describing topics returns this: Topic: mp-unknown Partition: 597 Leader: 104224873 Replicas: 104224874,104224873,104224875 Isr: 104224873,104224875 broker 104224873 is shut down, but it's still leader for the partition (at least for a couple of hours as I monitor it). Zookeeper cluster is healthy. ls /brokers/ids [104224874, 104224875, 104224863, 104224864, 104224871, 104224867, 104224868, 104224865, 104224866, 104224876, 104224877, 104224869, 104224878, 104224879] That broker is not registered in ZK.
Re: Questions about unclean leader election and "Halting because log truncation is not allowed"
Anthony, I filed https://issues.apache.org/jira/browse/KAFKA-3410 to track this. -James > On Feb 25, 2016, at 2:16 PM, Anthony Sparks <anthony.spark...@gmail.com> > wrote: > > Hello James, > > We received this exact same error this past Tuesday (we are on 0.8.2). To > answer at least one of your bullet points -- this is a valid scenario. We > had the same questions, I'm starting to think this is a bug -- thank you > for the reproducing steps! > > I looked over the Release Notes to see if maybe there were some fixes in > newer versions -- this bug fix looked the most related: > https://issues.apache.org/jira/browse/KAFKA-2143 > > Thank you, > > Tony > > On Thu, Feb 25, 2016 at 3:46 PM, James Cheng <jch...@tivo.com> wrote: > >> Hi, >> >> I ran into a scenario where one of my brokers would continually shutdown, >> with the error message: >> [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting >> because log truncation is not allowed for topic test, Current leader 1's >> latest offset 0 is less than replica 2's latest offset 151 >> (kafka.server.ReplicaFetcherThread) >> >> I managed to reproduce it with the following scenario: >> 1. Start broker1, with unclean.leader.election.enable=false >> 2. Start broker2, with unclean.leader.election.enable=false >> >> 3. Create topic, single partition, with replication-factor 2. >> 4. Write data to the topic. >> >> 5. At this point, both brokers are in the ISR. Broker1 is the partition >> leader. >> >> 6. Ctrl-Z on broker2. (Simulates a GC pause or a slow network) Broker2 >> gets dropped out of ISR. Broker1 is still the leader. I can still write >> data to the partition. >> >> 7. Shutdown Broker1. Hard or controlled, doesn't matter. >> >> 8. rm -rf the log directory of broker1. (This simulates a disk replacement >> or full hardware replacement) >> >> 9. Resume broker2. It attempts to connect to broker1, but doesn't succeed >> because broker1 is down. At this point, the partition is offline. Can't >> write to it. >> >> 10. Resume broker1. Broker1 resumes leadership of the topic. Broker2 >> attempts to join ISR, and immediately halts with the error message: >> [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting >> because log truncation is not allowed for topic test, Current leader 1's >> latest offset 0 is less than replica 2's latest offset 151 >> (kafka.server.ReplicaFetcherThread) >> >> I am able to recover by setting unclean.leader.election.enable=true on my >> brokers. >> >> I'm trying to understand a couple things: >> * Is my scenario a valid supported one, or is it along the lines of "don't >> ever do that"? >> * In step 10, why is broker1 allowed to resume leadership even though it >> has no data? >> * In step 10, why is it necessary to stop the entire broker due to one >> partition that is in this state? Wouldn't it be possible for the broker to >> continue to serve traffic for all the other topics, and just mark this one >> as unavailable? >> * Would it make sense to allow an operator to manually specify which >> broker they want to become the new master? This would give me more control >> over how much data loss I am willing to handle. In this case, I would want >> broker2 to become the new master. Or, is that possible and I just don't >> know how to do it? >> * Would it be possible to make unclean.leader.election.enable to be a >> per-topic configuration? This would let me control how much data loss I am >> willing to handle. >> >> Btw, the comment in the source code for that error message indicates: >> >> https://github.com/apache/kafka/blob/01aeea7c7bca34f1edce40116b7721335938b13b/core/src/main/scala/kafka/server/ReplicaFetcherThread.scala#L164-L166 >> >> // Prior to truncating the follower's log, ensure that doing so is >> not disallowed by the configuration for unclean leader election. >> // This situation could only happen if the unclean election >> configuration for a topic changes while a replica is down. Otherwise, >> // we should never encounter this situation since a non-ISR leader >> cannot be elected if disallowed by the broker configuration. >> >> But I don't believe that happened. I never changed the configuration. But >> I did venture into "unclean leader election" territory, so I'm not sure if >> the comment still applies. >> >> Thanks, >> -James >> >> >> >> __
Re: Questions about unclean leader election and "Halting because log truncation is not allowed"
unclean.leader.election.enable is actually a valid topic-level configuration, I opened https://issues.apache.org/jira/browse/KAFKA-3298 to get the documentation updated. That code comment doesn’t tell the complete story and could probably be updated for clarity as we’ve learned a lot since then. It’s still theoretically possible in certain severe split-brain situations such as the one your reproduction scenario introduces. Hopefully https://issues.apache.org/jira/browse/KAFKA-2143 helps to prevent the possibility from arising however. On 2/25/16, 3:46 PM, "James Cheng" <jch...@tivo.com> wrote: >Hi, > >I ran into a scenario where one of my brokers would continually shutdown, with >the error message: >[2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting because >log truncation is not allowed for topic test, Current leader 1's latest offset >0 is less than replica 2's latest offset 151 >(kafka.server.ReplicaFetcherThread) > >I managed to reproduce it with the following scenario: >1. Start broker1, with unclean.leader.election.enable=false >2. Start broker2, with unclean.leader.election.enable=false > >3. Create topic, single partition, with replication-factor 2. >4. Write data to the topic. > >5. At this point, both brokers are in the ISR. Broker1 is the partition leader. > >6. Ctrl-Z on broker2. (Simulates a GC pause or a slow network) Broker2 gets >dropped out of ISR. Broker1 is still the leader. I can still write data to the >partition. > >7. Shutdown Broker1. Hard or controlled, doesn't matter. > >8. rm -rf the log directory of broker1. (This simulates a disk replacement or >full hardware replacement) > >9. Resume broker2. It attempts to connect to broker1, but doesn't succeed >because broker1 is down. At this point, the partition is offline. Can't write >to it. > >10. Resume broker1. Broker1 resumes leadership of the topic. Broker2 attempts >to join ISR, and immediately halts with the error message: >[2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting because >log truncation is not allowed for topic test, Current leader 1's latest offset >0 is less than replica 2's latest offset 151 >(kafka.server.ReplicaFetcherThread) > >I am able to recover by setting unclean.leader.election.enable=true on my >brokers. > >I'm trying to understand a couple things: >* Is my scenario a valid supported one, or is it along the lines of "don't >ever do that"? >* In step 10, why is broker1 allowed to resume leadership even though it has >no data? >* In step 10, why is it necessary to stop the entire broker due to one >partition that is in this state? Wouldn't it be possible for the broker to >continue to serve traffic for all the other topics, and just mark this one as >unavailable? >* Would it make sense to allow an operator to manually specify which broker >they want to become the new master? This would give me more control over how >much data loss I am willing to handle. In this case, I would want broker2 to >become the new master. Or, is that possible and I just don't know how to do it? >* Would it be possible to make unclean.leader.election.enable to be a >per-topic configuration? This would let me control how much data loss I am >willing to handle. > >Btw, the comment in the source code for that error message indicates: >https://github.com/apache/kafka/blob/01aeea7c7bca34f1edce40116b7721335938b13b/core/src/main/scala/kafka/server/ReplicaFetcherThread.scala#L164-L166 > > // Prior to truncating the follower's log, ensure that doing so is not > disallowed by the configuration for unclean leader election. > // This situation could only happen if the unclean election > configuration for a topic changes while a replica is down. Otherwise, > // we should never encounter this situation since a non-ISR leader > cannot be elected if disallowed by the broker configuration. > >But I don't believe that happened. I never changed the configuration. But I >did venture into "unclean leader election" territory, so I'm not sure if the >comment still applies. > >Thanks, >-James > > > > > >This email and any attachments may contain confidential and privileged >material for the sole use of the intended recipient. Any review, copying, or >distribution of this email (or any attachments) by others is prohibited. If >you are not the intended recipient, please contact the sender immediately and >permanently delete this email and any attachments. No employee or agent of >TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo >Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed &g
Re: Questions about unclean leader election and "Halting because log truncation is not allowed"
Hello James, We received this exact same error this past Tuesday (we are on 0.8.2). To answer at least one of your bullet points -- this is a valid scenario. We had the same questions, I'm starting to think this is a bug -- thank you for the reproducing steps! I looked over the Release Notes to see if maybe there were some fixes in newer versions -- this bug fix looked the most related: https://issues.apache.org/jira/browse/KAFKA-2143 Thank you, Tony On Thu, Feb 25, 2016 at 3:46 PM, James Cheng <jch...@tivo.com> wrote: > Hi, > > I ran into a scenario where one of my brokers would continually shutdown, > with the error message: > [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting > because log truncation is not allowed for topic test, Current leader 1's > latest offset 0 is less than replica 2's latest offset 151 > (kafka.server.ReplicaFetcherThread) > > I managed to reproduce it with the following scenario: > 1. Start broker1, with unclean.leader.election.enable=false > 2. Start broker2, with unclean.leader.election.enable=false > > 3. Create topic, single partition, with replication-factor 2. > 4. Write data to the topic. > > 5. At this point, both brokers are in the ISR. Broker1 is the partition > leader. > > 6. Ctrl-Z on broker2. (Simulates a GC pause or a slow network) Broker2 > gets dropped out of ISR. Broker1 is still the leader. I can still write > data to the partition. > > 7. Shutdown Broker1. Hard or controlled, doesn't matter. > > 8. rm -rf the log directory of broker1. (This simulates a disk replacement > or full hardware replacement) > > 9. Resume broker2. It attempts to connect to broker1, but doesn't succeed > because broker1 is down. At this point, the partition is offline. Can't > write to it. > > 10. Resume broker1. Broker1 resumes leadership of the topic. Broker2 > attempts to join ISR, and immediately halts with the error message: > [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting > because log truncation is not allowed for topic test, Current leader 1's > latest offset 0 is less than replica 2's latest offset 151 > (kafka.server.ReplicaFetcherThread) > > I am able to recover by setting unclean.leader.election.enable=true on my > brokers. > > I'm trying to understand a couple things: > * Is my scenario a valid supported one, or is it along the lines of "don't > ever do that"? > * In step 10, why is broker1 allowed to resume leadership even though it > has no data? > * In step 10, why is it necessary to stop the entire broker due to one > partition that is in this state? Wouldn't it be possible for the broker to > continue to serve traffic for all the other topics, and just mark this one > as unavailable? > * Would it make sense to allow an operator to manually specify which > broker they want to become the new master? This would give me more control > over how much data loss I am willing to handle. In this case, I would want > broker2 to become the new master. Or, is that possible and I just don't > know how to do it? > * Would it be possible to make unclean.leader.election.enable to be a > per-topic configuration? This would let me control how much data loss I am > willing to handle. > > Btw, the comment in the source code for that error message indicates: > > https://github.com/apache/kafka/blob/01aeea7c7bca34f1edce40116b7721335938b13b/core/src/main/scala/kafka/server/ReplicaFetcherThread.scala#L164-L166 > > // Prior to truncating the follower's log, ensure that doing so is > not disallowed by the configuration for unclean leader election. > // This situation could only happen if the unclean election > configuration for a topic changes while a replica is down. Otherwise, > // we should never encounter this situation since a non-ISR leader > cannot be elected if disallowed by the broker configuration. > > But I don't believe that happened. I never changed the configuration. But > I did venture into "unclean leader election" territory, so I'm not sure if > the comment still applies. > > Thanks, > -James > > > > > > This email and any attachments may contain confidential and privileged > material for the sole use of the intended recipient. Any review, copying, > or distribution of this email (or any attachments) by others is prohibited. > If you are not the intended recipient, please contact the sender > immediately and permanently delete this email and any attachments. No > employee or agent of TiVo Inc. is authorized to conclude any binding > agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo > Inc. may only be made by a signed written agreement. >
Questions about unclean leader election and "Halting because log truncation is not allowed"
Hi, I ran into a scenario where one of my brokers would continually shutdown, with the error message: [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting because log truncation is not allowed for topic test, Current leader 1's latest offset 0 is less than replica 2's latest offset 151 (kafka.server.ReplicaFetcherThread) I managed to reproduce it with the following scenario: 1. Start broker1, with unclean.leader.election.enable=false 2. Start broker2, with unclean.leader.election.enable=false 3. Create topic, single partition, with replication-factor 2. 4. Write data to the topic. 5. At this point, both brokers are in the ISR. Broker1 is the partition leader. 6. Ctrl-Z on broker2. (Simulates a GC pause or a slow network) Broker2 gets dropped out of ISR. Broker1 is still the leader. I can still write data to the partition. 7. Shutdown Broker1. Hard or controlled, doesn't matter. 8. rm -rf the log directory of broker1. (This simulates a disk replacement or full hardware replacement) 9. Resume broker2. It attempts to connect to broker1, but doesn't succeed because broker1 is down. At this point, the partition is offline. Can't write to it. 10. Resume broker1. Broker1 resumes leadership of the topic. Broker2 attempts to join ISR, and immediately halts with the error message: [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting because log truncation is not allowed for topic test, Current leader 1's latest offset 0 is less than replica 2's latest offset 151 (kafka.server.ReplicaFetcherThread) I am able to recover by setting unclean.leader.election.enable=true on my brokers. I'm trying to understand a couple things: * Is my scenario a valid supported one, or is it along the lines of "don't ever do that"? * In step 10, why is broker1 allowed to resume leadership even though it has no data? * In step 10, why is it necessary to stop the entire broker due to one partition that is in this state? Wouldn't it be possible for the broker to continue to serve traffic for all the other topics, and just mark this one as unavailable? * Would it make sense to allow an operator to manually specify which broker they want to become the new master? This would give me more control over how much data loss I am willing to handle. In this case, I would want broker2 to become the new master. Or, is that possible and I just don't know how to do it? * Would it be possible to make unclean.leader.election.enable to be a per-topic configuration? This would let me control how much data loss I am willing to handle. Btw, the comment in the source code for that error message indicates: https://github.com/apache/kafka/blob/01aeea7c7bca34f1edce40116b7721335938b13b/core/src/main/scala/kafka/server/ReplicaFetcherThread.scala#L164-L166 // Prior to truncating the follower's log, ensure that doing so is not disallowed by the configuration for unclean leader election. // This situation could only happen if the unclean election configuration for a topic changes while a replica is down. Otherwise, // we should never encounter this situation since a non-ISR leader cannot be elected if disallowed by the broker configuration. But I don't believe that happened. I never changed the configuration. But I did venture into "unclean leader election" territory, so I'm not sure if the comment still applies. Thanks, -James This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.
min ISR of 1 for consumer offsets topic and disabling unclean leader election - problem or not?
Hello Apache Kafka community, Consumer offsets topic default configuration is somewhat different from all other topics for a good reason, e.g. replication factor is 3 compared to 1 for other topics, and number of partitions is 50 compared to 1 for other topic. Can somebody please explain why does consumer offsets topic default to min.insync.replicas of 1, or in other words would it make sense to set it to 2? I guess that min.insync.replicas of 1 is preferred for higher availability of consumer offsets topic partitions (whichever replica is alive allow it to become leader regardless of how far behind it was) compared to higher consistency across replicas achieved with min.insync.replicas of 2 or higher. Problem I see there, and please correct me if wrong, is that the unclean.leader.election.enable seems to be configurable only on cluster/broker level, one cannot override it on topic level (as documented http://kafka.apache.org/documentation.html#topic-config ). On the other hand one can override min.insync.replicas per topic and have it default to 1 on broker level. So, if one disables unclean leader election, and has min.insync.replicas set to 1 for consumer offsets topic and to 2 for some or all of the other topics, one would actually make consumer offsets topic partitions less available as loosing a lead would make partition unavailable until lead is recovered, since no other replica if it wasn't part of ISR set cannot take over, as by default with min.insync.replicas of 1 only a lead is guaranteed to be part of ISR set. Would it make sense to allow one to override unclean.leader.election.enable on topic level, introduce separate offsets.topic.unclean.leader.election.enable and have it default to true, and introduce offsets.topic.min.insync.replicas defaulting to 1, documenting rationale for all of these, suggesting one to increase offsets.topic.min.insync.replicas if offsets.topic.unclean.leader.election.enable is overridden to false? Kind regards, Stevo Slavic.
Leader Election
Hi Folks, I am trying to use the REST proxy, but I have some fundamental questions about how the leader election works. My understanding of how the way the leader elections work is that the proxy hides all of that complexity and processes my produce/consume request transparently. Is that the case or do I need to manage that myself? Thanks Heath Warning: This e-mail may contain information proprietary to AutoAnything Inc. and is intended only for the use of the intended recipient(s). If the reader of this message is not the intended recipient(s), you have received this message in error and any review, dissemination, distribution or copying of this message is strictly prohibited. If you have received this message in error, please notify the sender immediately and delete all copies.
Re: Leader Election
Hi Heath, I assume you're referring to the partition leader. This is not something you need to worry about when using the REST API. Kafka handles leader election, failure recovery, etc. for you behind the scenes. Do know that if a leader fails, you'll experience a small latency hit because a new leader needs to be elected for the partitions that were led by said failed leader. Alex On Wed, Jan 6, 2016 at 8:09 AM, Heath Ivie <hi...@autoanything.com> wrote: > Hi Folks, > > I am trying to use the REST proxy, but I have some fundamental questions > about how the leader election works. > > My understanding of how the way the leader elections work is that the > proxy hides all of that complexity and processes my produce/consume request > transparently. > > Is that the case or do I need to manage that myself? > > Thanks > Heath > > > > Warning: This e-mail may contain information proprietary to AutoAnything > Inc. and is intended only for the use of the intended recipient(s). If the > reader of this message is not the intended recipient(s), you have received > this message in error and any review, dissemination, distribution or > copying of this message is strictly prohibited. If you have received this > message in error, please notify the sender immediately and delete all > copies. > -- *Alex Loddengaard | **Solutions Architect | Confluent* *Download Apache Kafka and Confluent Platform: www.confluent.io/download <http://www.confluent.io/download>*
Kafka unclean leader election (0.8.2)
Howdy folks, If a host get into an unclean leader election, Kafka (via ZK) will assign a new leader to each partition/topic, however, is there a metric that shows how the replication is doing (aka what is going behind the scenes)? Thanks! -- Pablo
Re: Unclean leader election docs outdated
Created https://issues.apache.org/jira/browse/KAFKA-2551 On Mon, Sep 14, 2015 at 7:22 PM, Guozhang Wang <wangg...@gmail.com> wrote: > Yes you are right. Could you file a JIRA to edit the documents? > > Guozhang > > On Fri, Sep 11, 2015 at 4:41 PM, Stevo Slavić <ssla...@gmail.com> wrote: > > > That sentence is in both > > https://svn.apache.org/repos/asf/kafka/site/083/design.html and > > https://svn.apache.org/repos/asf/kafka/site/082/design.html near the end > > of > > "Unclean leader election: What if they all die?" section. Next one, > > "Availability and Durability Guarantees", mentions ability to disable > > unclean leader election, so likely just this one reference needs to be > > updated. > > > > On Sat, Sep 12, 2015 at 1:05 AM, Guozhang Wang <wangg...@gmail.com> > wrote: > > > > > Hi Stevo, > > > > > > Could you point me to the link of the docs? > > > > > > Guozhang > > > > > > On Fri, Sep 11, 2015 at 5:47 AM, Stevo Slavić <ssla...@gmail.com> > wrote: > > > > > > > Hello Apache Kafka community, > > > > > > > > Current unclean leader election docs state: > > > > "In the future, we would like to make this configurable to better > > support > > > > use cases where downtime is preferable to inconsistency. " > > > > > > > > If I'm not mistaken, since 0.8.2, unclean leader election strategy > > > (whether > > > > to allow it or not) is already configurable via > > > > unclean.leader.election.enable broker config property. > > > > > > > > Kind regards, > > > > Stevo Slavic. > > > > > > > > > > > > > > > > -- > > > -- Guozhang > > > > > > > > > -- > -- Guozhang >
Re: Unclean leader election docs outdated
Yes you are right. Could you file a JIRA to edit the documents? Guozhang On Fri, Sep 11, 2015 at 4:41 PM, Stevo Slavić <ssla...@gmail.com> wrote: > That sentence is in both > https://svn.apache.org/repos/asf/kafka/site/083/design.html and > https://svn.apache.org/repos/asf/kafka/site/082/design.html near the end > of > "Unclean leader election: What if they all die?" section. Next one, > "Availability and Durability Guarantees", mentions ability to disable > unclean leader election, so likely just this one reference needs to be > updated. > > On Sat, Sep 12, 2015 at 1:05 AM, Guozhang Wang <wangg...@gmail.com> wrote: > > > Hi Stevo, > > > > Could you point me to the link of the docs? > > > > Guozhang > > > > On Fri, Sep 11, 2015 at 5:47 AM, Stevo Slavić <ssla...@gmail.com> wrote: > > > > > Hello Apache Kafka community, > > > > > > Current unclean leader election docs state: > > > "In the future, we would like to make this configurable to better > support > > > use cases where downtime is preferable to inconsistency. " > > > > > > If I'm not mistaken, since 0.8.2, unclean leader election strategy > > (whether > > > to allow it or not) is already configurable via > > > unclean.leader.election.enable broker config property. > > > > > > Kind regards, > > > Stevo Slavic. > > > > > > > > > > > -- > > -- Guozhang > > > -- -- Guozhang
Re: Unclean leader election docs outdated
That sentence is in both https://svn.apache.org/repos/asf/kafka/site/083/design.html and https://svn.apache.org/repos/asf/kafka/site/082/design.html near the end of "Unclean leader election: What if they all die?" section. Next one, "Availability and Durability Guarantees", mentions ability to disable unclean leader election, so likely just this one reference needs to be updated. On Sat, Sep 12, 2015 at 1:05 AM, Guozhang Wang <wangg...@gmail.com> wrote: > Hi Stevo, > > Could you point me to the link of the docs? > > Guozhang > > On Fri, Sep 11, 2015 at 5:47 AM, Stevo Slavić <ssla...@gmail.com> wrote: > > > Hello Apache Kafka community, > > > > Current unclean leader election docs state: > > "In the future, we would like to make this configurable to better support > > use cases where downtime is preferable to inconsistency. " > > > > If I'm not mistaken, since 0.8.2, unclean leader election strategy > (whether > > to allow it or not) is already configurable via > > unclean.leader.election.enable broker config property. > > > > Kind regards, > > Stevo Slavic. > > > > > > -- > -- Guozhang >
Re: Unclean leader election docs outdated
Hi Stevo, Could you point me to the link of the docs? Guozhang On Fri, Sep 11, 2015 at 5:47 AM, Stevo Slavić <ssla...@gmail.com> wrote: > Hello Apache Kafka community, > > Current unclean leader election docs state: > "In the future, we would like to make this configurable to better support > use cases where downtime is preferable to inconsistency. " > > If I'm not mistaken, since 0.8.2, unclean leader election strategy (whether > to allow it or not) is already configurable via > unclean.leader.election.enable broker config property. > > Kind regards, > Stevo Slavic. > -- -- Guozhang
Unclean leader election docs outdated
Hello Apache Kafka community, Current unclean leader election docs state: "In the future, we would like to make this configurable to better support use cases where downtime is preferable to inconsistency. " If I'm not mistaken, since 0.8.2, unclean leader election strategy (whether to allow it or not) is already configurable via unclean.leader.election.enable broker config property. Kind regards, Stevo Slavic.
Re: unclean leader election enable default value bug?
ah, ok. --config should be given in front of all config parameters this worked: --config min.insync.replicas=2 --config unclean.leader.election.enable=false so it leaves only the default value issue. i.e. if it's not given the value from server.properties should be used. On Mon, Jun 22, 2015 at 1:54 PM, Zaiming Shi zmst...@gmail.com wrote: Hi Again! Unlike min.insync.replicas, unclean.leader.election.enable isn't set to false even if it's given 'false' in create topic command. here is the command used to create the topic: $./kafka-topics.sh --create --topic bbb --zookeeper localhost --replication-factor 3 --partitions 3 --config min.insync.replicas=2 unclean.leader.election.enable=false here is the log from server.log [2015-06-22 11:47:28,521] INFO Created log for partition [bbb,0] in /var/lib/kafka with properties {segment.index.bytes - 10485760, file.delete.delay.ms - 6, segment.bytes - 536870912, flush.ms - 1000, delete.retention.ms - 8640, index.interval.bytes - 4096, retention.bytes - -1, min.insync.replicas - 2, cleanup.policy - delete, unclean.leader.election.enable - true, segment.ms - 60480, max.message.bytes - 112, flush.messages - 1, min.cleanable.dirty.ratio - 0.5, retention.ms - 60480, segment.jitter.ms - 0}. (kafka.log.LogManager) Is this a known issue? is there a workaround ? or have I missed anything ? On Thu, Jun 18, 2015 at 12:51 PM, Zaiming Shi zmst...@gmail.com wrote: Kafka 0.8.2.1 I have `unclean.leader.election.enable=false` in server.properties I can see this log in server.log: [2015-06-18 09:57:18,961] INFO Property unclean.leader.election.enable is overridden to false (kafka.utils.VerifiableProperties) Yet the topic was created with `unclean.leader.election.enable - true` I see similarity in this issue: https://issues.apache.org/jira/browse/KAFKA-2114 Regards -Zaiming
Re: unclean leader election enable default value bug?
Hi Again! Unlike min.insync.replicas, unclean.leader.election.enable isn't set to false even if it's given 'false' in create topic command. here is the command used to create the topic: $./kafka-topics.sh --create --topic bbb --zookeeper localhost --replication-factor 3 --partitions 3 --config min.insync.replicas=2 unclean.leader.election.enable=false here is the log from server.log [2015-06-22 11:47:28,521] INFO Created log for partition [bbb,0] in /var/lib/kafka with properties {segment.index.bytes - 10485760, file.delete.delay.ms - 6, segment.bytes - 536870912, flush.ms - 1000, delete.retention.ms - 8640, index.interval.bytes - 4096, retention.bytes - -1, min.insync.replicas - 2, cleanup.policy - delete, unclean.leader.election.enable - true, segment.ms - 60480, max.message.bytes - 112, flush.messages - 1, min.cleanable.dirty.ratio - 0.5, retention.ms - 60480, segment.jitter.ms - 0}. (kafka.log.LogManager) Is this a known issue? is there a workaround ? or have I missed anything ? On Thu, Jun 18, 2015 at 12:51 PM, Zaiming Shi zmst...@gmail.com wrote: Kafka 0.8.2.1 I have `unclean.leader.election.enable=false` in server.properties I can see this log in server.log: [2015-06-18 09:57:18,961] INFO Property unclean.leader.election.enable is overridden to false (kafka.utils.VerifiableProperties) Yet the topic was created with `unclean.leader.election.enable - true` I see similarity in this issue: https://issues.apache.org/jira/browse/KAFKA-2114 Regards -Zaiming
Failure in Leader Election on broker shutdown
Hi, I have a kafka cluster of three nodes. I have constructed a topic with the following command: bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 3 --topic testv1p3 So the topic testv1p3 has 3 partitions and replication factor is 1. Here is the result of describe command: kafka_2.10-0.8.2.0]$ bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic testv1p3 Topic:testv1p3PartitionCount:3ReplicationFactor:1Configs: Topic: testv1p3Partition: 0Leader: 1Replicas: 1Isr: 1 Topic: testv1p3Partition: 1Leader: 2Replicas: 2Isr: 2 Topic: testv1p3Partition: 2Leader: 0Replicas: 0Isr: 0 So far things are good. Now I tried to kill a broker using bin/kafka-server-stop.sh The broker was stopped successfully. Now I wanted to ensure that there is a new leader for the partition which was hosted on the terminated broker. Here is the output of describe command post broker termination: Topic:testv1p3PartitionCount:3ReplicationFactor:1Configs: Topic: testv1p3Partition: 0Leader: 1Replicas: 1Isr: 1 Topic: testv1p3Partition: 1Leader: -1Replicas: 2Isr: Topic: testv1p3Partition: 2Leader: 0Replicas: 0Isr: 0 Leader for partition:1 is -1. Java API for kafka returns null for leader() in PartitionMetadata for partition 1. When I restarted the broker which was stopped earlier. Things go back to normal. 1) Does leader selection happen automatically ? 2) If yes, do I need any particular configuration in broker or topic config ? 3) If not, what is command to ensure that I have a leader for partition 1 in case its lead broker goes down. FYI I tried to run bin/bin/kafka-preferred-replica-election.sh --zookeeper localhost:2181 Post this script run, the topic description still remains same and no leader for partition 1. It will be great to get any help on this. Reference: Console log for (kafka-server-stop.sh): [2015-06-19 14:25:00,241] INFO [Kafka Server 2], shutting down (kafka.server.KafkaServer) [2015-06-19 14:25:00,243] INFO [Kafka Server 2], Starting controlled shutdown (kafka.server.KafkaServer) [2015-06-19 14:25:00,267] INFO [Kafka Server 2], Controlled shutdown succeeded (kafka.server.KafkaServer) [2015-06-19 14:25:00,273] INFO Deregistered broker 2 at path /brokers/ids/2. (kafka.utils.ZkUtils$) [2015-06-19 14:25:00,274] INFO [Socket Server on Broker 2], Shutting down (kafka.network.SocketServer) [2015-06-19 14:25:00,279] INFO [Socket Server on Broker 2], Shutdown completed (kafka.network.SocketServer) [2015-06-19 14:25:00,280] INFO [Kafka Request Handler on Broker 2], shutting down (kafka.server.KafkaRequestHandlerPool) [2015-06-19 14:25:00,282] INFO [Kafka Request Handler on Broker 2], shut down completely (kafka.server.KafkaRequestHandlerPool) [2015-06-19 14:25:00,600] INFO [Replica Manager on Broker 2]: Shut down (kafka.server.ReplicaManager) [2015-06-19 14:25:00,601] INFO [ReplicaFetcherManager on broker 2] shutting down (kafka.server.ReplicaFetcherManager) [2015-06-19 14:25:00,602] INFO [ReplicaFetcherManager on broker 2] shutdown completed (kafka.server.ReplicaFetcherManager) [2015-06-19 14:25:00,604] INFO [Replica Manager on Broker 2]: Shut down completely (kafka.server.ReplicaManager) [2015-06-19 14:25:00,605] INFO Shutting down. (kafka.log.LogManager) [2015-06-19 14:25:00,618] INFO Shutdown complete. (kafka.log.LogManager) [2015-06-19 14:25:00,620] WARN Kafka scheduler has not been started (kafka.utils.Utils$) java.lang.IllegalStateException: Kafka scheduler has not been started at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114) at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86) at kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350) at kafka.controller.KafkaController.shutdown(KafkaController.scala:664) at kafka.server.KafkaServer$$anonfun$shutdown$9.apply$mcV$sp(KafkaServer.scala:287) at kafka.utils.Utils$.swallow(Utils.scala:172) at kafka.utils.Logging$class.swallowWarn(Logging.scala:92) at kafka.utils.Utils$.swallowWarn(Utils.scala:45) at kafka.utils.Logging$class.swallow(Logging.scala:94) at kafka.utils.Utils$.swallow(Utils.scala:45) at kafka.server.KafkaServer.shutdown(KafkaServer.scala:287) at kafka.server.KafkaServerStartable.shutdown(KafkaServerStartable.scala:42) at kafka.Kafka$$anon$1.run(Kafka.scala:42) [2015-06-19 14:25:00,623] INFO Terminate ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread) [2015-06-19 14:25:00,625] INFO Session: 0x14de8e5f2b801f7 closed (org.apache.zookeeper.ZooKeeper) [2015-06-19 14:25:00,625] INFO EventThread shut down (org.apache.zookeeper.ClientCnxn) [2015-06-19 14:25:00,625] INFO [Kafka Server 2], shut down completed (kafka.server.KafkaServer) Regards, Sandeep
Re: Failure in Leader Election on broker shutdown
Sandeep, You need to have multiple replicas. Having single replica means you've one copy of the data and if that machine goes down there isn't another replica who can take over and be the leader for that partition.-Harsha _ From: Sandeep Bishnoi sandeepbishnoi.b...@gmail.com Sent: Friday, June 19, 2015 2:37 PM Subject: Failure in Leader Election on broker shutdown To: users@kafka.apache.org Hi, I have a kafka cluster of three nodes. I have constructed a topic with the following command: bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 3 --topic testv1p3 So the topic testv1p3 has 3 partitions and replication factor is 1. Here is the result of describe command: kafka_2.10-0.8.2.0]$ bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic testv1p3 Topic:testv1p3PartitionCount:3ReplicationFactor:1Configs: Topic: testv1p3Partition: 0Leader: 1Replicas: 1Isr: 1 Topic: testv1p3Partition: 1Leader: 2Replicas: 2Isr: 2 Topic: testv1p3Partition: 2Leader: 0Replicas: 0Isr: 0 So far things are good. Now I tried to kill a broker using bin/kafka-server-stop.sh The broker was stopped successfully. Now I wanted to ensure that there is a new leader for the partition which was hosted on the terminated broker. Here is the output of describe command post broker termination: Topic:testv1p3PartitionCount:3ReplicationFactor:1Configs: Topic: testv1p3Partition: 0Leader: 1Replicas: 1Isr: 1 Topic: testv1p3Partition: 1Leader: -1Replicas: 2Isr: Topic: testv1p3Partition: 2Leader: 0Replicas: 0Isr: 0 Leader for partition:1 is -1. Java API for kafka returns null for leader() in PartitionMetadata for partition 1. When I restarted the broker which was stopped earlier. Things go back to normal. 1) Does leader selection happen automatically ? 2) If yes, do I need any particular configuration in broker or topic config ? 3) If not, what is command to ensure that I have a leader for partition 1 in case its lead broker goes down. FYI I tried to run bin/bin/kafka-preferred-replica-election.sh --zookeeper localhost:2181 Post this script run, the topic description still remains same and no leader for partition 1. It will be great to get any help on this. Reference: Console log for (kafka-server-stop.sh): [2015-06-19 14:25:00,241] INFO [Kafka Server 2], shutting down (kafka.server.KafkaServer) [2015-06-19 14:25:00,243] INFO [Kafka Server 2], Starting controlled shutdown (kafka.server.KafkaServer) [2015-06-19 14:25:00,267] INFO [Kafka Server 2], Controlled shutdown succeeded (kafka.server.KafkaServer) [2015-06-19 14:25:00,273] INFO Deregistered broker 2 at path /brokers/ids/2. (kafka.utils.ZkUtils$) [2015-06-19 14:25:00,274] INFO [Socket Server on Broker 2], Shutting down (kafka.network.SocketServer) [2015-06-19 14:25:00,279] INFO [Socket Server on Broker 2], Shutdown completed (kafka.network.SocketServer) [2015-06-19 14:25:00,280] INFO [Kafka Request Handler on Broker 2], shutting down (kafka.server.KafkaRequestHandlerPool) [2015-06-19 14:25:00,282] INFO [Kafka Request Handler on Broker 2], shut down completely (kafka.server.KafkaRequestHandlerPool) [2015-06-19 14:25:00,600] INFO [Replica Manager on Broker 2]: Shut down (kafka.server.ReplicaManager) [2015-06-19 14:25:00,601] INFO [ReplicaFetcherManager on broker 2] shutting down (kafka.server.ReplicaFetcherManager) [2015-06-19 14:25:00,602] INFO [ReplicaFetcherManager on broker 2] shutdown completed (kafka.server.ReplicaFetcherManager) [2015-06-19 14:25:00,604] INFO [Replica Manager on Broker 2]: Shut down completely (kafka.server.ReplicaManager) [2015-06-19 14:25:00,605] INFO Shutting down. (kafka.log.LogManager) [2015-06-19 14:25:00,618] INFO Shutdown complete. (kafka.log.LogManager) [2015-06-19 14:25:00,620] WARN Kafka scheduler has not been started (kafka.utils.Utils$) java.lang.IllegalStateException: Kafka scheduler has not been started at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114) at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86) at kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350) at kafka.controller.KafkaController.shutdown(KafkaController.scala:664) at kafka.server.KafkaServer$$anonfun$shutdown$9.apply$mcV$sp(KafkaServer.scala:287) at kafka.utils.Utils$.swallow(Utils.scala:172) at kafka.utils.Logging$class.swallowWarn(Logging.scala:92) at kafka.utils.Utils$.swallowWarn(Utils.scala:45) at kafka.utils.Logging$class.swallow(Logging.scala:94) at kafka.utils.Utils$.swallow(Utils.scala:45) at kafka.server.KafkaServer.shutdown(KafkaServer.scala:287) at kafka.server.KafkaServerStartable.shutdown(KafkaServerStartable.scala:42) at kafka.Kafka$$anon$1.run(Kafka.scala:42) [2015-06-19 14:25:00,623] INFO Terminate
unclean leader election enable default value bug?
Kafka 0.8.2.1 I have `unclean.leader.election.enable=false` in server.properties I can see this log in server.log: [2015-06-18 09:57:18,961] INFO Property unclean.leader.election.enable is overridden to false (kafka.utils.VerifiableProperties) Yet the topic was created with `unclean.leader.election.enable - true` I see similarity in this issue: https://issues.apache.org/jira/browse/KAFKA-2114 Regards -Zaiming
leader election rate
Looking at the output from the jmx stats from our Kafka cluster, I see a more or less constant leader election rate of around 2.5 from our controller. Is this expected, or does this mean that leaders are shifting around constantly? If they are shifting, how should I go about debugging, and what triggers a leader election? Thanks, Wes
unclean leader election debugging
We're showing a constant level of unclean leader election errors. I'd like to investigate but I'm not quite sure how to approach it. Is there a doc somewhere that goes into some detail on what to look at? Thanks, Wes
Re: Question on ISR inclusion leader election for failed replica on catchup
When K1 crashes before K3 fully catches up, by default, Kafka allows K3 to become the new leader. In this case, data in batch 2 will be lost. Our default behavior favors availability over consistency. If you prefer consistency, you can set unclean.leader.election.enable to false on the broker. With this setting, Kafka will not select K3 as the new leader and instead, will wait for K1 to come back. Thanks, Jun On Fri, Feb 27, 2015 at 4:13 PM, Puneet Mehta mehta.p...@gmail.com wrote: Hi Gang, I am testing some of the durability guarantees given by Kafka 8.2.1 which involve min in-sync replicas and disabling unclean leader election. My question is: *When will the failed replica after successfully coming up will be included back in ISR? Is this governed by replica.lag.max.messages property or will it have to completely catch up with the leader to be back in ISR?* Alternately, In more detail, Will we loose a committed write in the following theoretical setup: - Single topic - 3 Kafka Brokers K1, K2, K3 - Replication : 3 - Minimum In-Sync Replica : 2 - Acks : -1 - Compression : Gzip - Producer type : Async - Batch size : 16000 - replica.lag.max.messages : 4000 There are 3 batches of data to be sent. Producer will retry if the batch of data fails on error callback. Batch 1 : Leader : K1 ; ISR : K1, K2, K3 Result: Data committed Batch 2 : Leader : K1 ; ISR : K1, K2 ( K3 crashed) Result: Data committed Batch 3 : Leader : K1 ; ISR : K1 (K2 crashed) Result: Data uncommitted due to min in-sync replica violation. K3 wakes up, Starts catching up with current leader. It doesn't have batch 2 data. At this point, broker K1 crashes and K3 has about 2K messages less than K1. Will K3 be elected the leader at this point as it's within 4K messages to be in ISR? If true, this probably will lead to committed data loss despite disabling the unclean leader election, if I am not wrong here? Thanks, Puneet Mehta
Re: Question on ISR inclusion leader election for failed replica on catchup
Hi Puneet, One of the conditions for K3 back to ISR is K3¹s log end offset to be higher than the K1(leaderReplica)¹s high watermark. If batch 2 is committed, then the leader high watermark will be above the offsets of messages in batch 2. In order to be added into ISR again, K3 has to at least have all the batch 2 in its log. So in this case, we will not lose committed messages. Jiangjie (Becket) Qin On 2/27/15, 4:13 PM, Puneet Mehta mehta.p...@gmail.com wrote: Hi Gang, I am testing some of the durability guarantees given by Kafka 8.2.1 which involve min in-sync replicas and disabling unclean leader election. My question is: *When will the failed replica after successfully coming up will be included back in ISR? Is this governed by replica.lag.max.messages property or will it have to completely catch up with the leader to be back in ISR?* Alternately, In more detail, Will we loose a committed write in the following theoretical setup: - Single topic - 3 Kafka Brokers K1, K2, K3 - Replication : 3 - Minimum In-Sync Replica : 2 - Acks : -1 - Compression : Gzip - Producer type : Async - Batch size : 16000 - replica.lag.max.messages : 4000 There are 3 batches of data to be sent. Producer will retry if the batch of data fails on error callback. Batch 1 : Leader : K1 ; ISR : K1, K2, K3 Result: Data committed Batch 2 : Leader : K1 ; ISR : K1, K2 ( K3 crashed) Result: Data committed Batch 3 : Leader : K1 ; ISR : K1 (K2 crashed) Result: Data uncommitted due to min in-sync replica violation. K3 wakes up, Starts catching up with current leader. It doesn't have batch 2 data. At this point, broker K1 crashes and K3 has about 2K messages less than K1. Will K3 be elected the leader at this point as it's within 4K messages to be in ISR? If true, this probably will lead to committed data loss despite disabling the unclean leader election, if I am not wrong here? Thanks, Puneet Mehta
Question on ISR inclusion leader election for failed replica on catchup
Hi Gang, I am testing some of the durability guarantees given by Kafka 8.2.1 which involve min in-sync replicas and disabling unclean leader election. My question is: *When will the failed replica after successfully coming up will be included back in ISR? Is this governed by replica.lag.max.messages property or will it have to completely catch up with the leader to be back in ISR?* Alternately, In more detail, Will we loose a committed write in the following theoretical setup: - Single topic - 3 Kafka Brokers K1, K2, K3 - Replication : 3 - Minimum In-Sync Replica : 2 - Acks : -1 - Compression : Gzip - Producer type : Async - Batch size : 16000 - replica.lag.max.messages : 4000 There are 3 batches of data to be sent. Producer will retry if the batch of data fails on error callback. Batch 1 : Leader : K1 ; ISR : K1, K2, K3 Result: Data committed Batch 2 : Leader : K1 ; ISR : K1, K2 ( K3 crashed) Result: Data committed Batch 3 : Leader : K1 ; ISR : K1 (K2 crashed) Result: Data uncommitted due to min in-sync replica violation. K3 wakes up, Starts catching up with current leader. It doesn't have batch 2 data. At this point, broker K1 crashes and K3 has about 2K messages less than K1. Will K3 be elected the leader at this point as it's within 4K messages to be in ISR? If true, this probably will lead to committed data loss despite disabling the unclean leader election, if I am not wrong here? Thanks, Puneet Mehta
Re: Strange behavior during un-clean leader election
Bryan, Did you take down some brokers in your cluster while hitting KAFKA-1028? If yes, you may be hitting KAFKA-1647 also. Guozhang On Mon, Oct 20, 2014 at 1:18 PM, Bryan Baugher bjb...@gmail.com wrote: Hi everyone, We run a 3 Kafka cluster using 0.8.1.1 with all topics having a replication factor of 3 meaning every broker has a replica of every partition. We recently ran into this issue ( https://issues.apache.org/jira/browse/KAFKA-1028) and saw data loss within Kafka. We understand why it happened and have plans to try to ensure it doesn't happen again. The strange part was that the broker that was chosen for the un-clean leader election seemed to drop all of its own data about the partition in the process as our monitoring shows the broker offset was reset to 0 for a number of partitions. Following the broker's server logs in chronological order for a particular partition that saw data loss I see this, 2014-10-16 10:18:11,104 INFO kafka.log.Log: Completed load of log TOPIC-6 with log end offset 528026 2014-10-16 10:20:18,144 WARN kafka.controller.OfflinePartitionLeaderSelector: [OfflinePartitionLeaderSelector]: No broker in ISR is alive for [TOPIC,6]. Elect leader 1 from live brokers 1,2. There's potential data loss. 2014-10-16 10:20:18,277 WARN kafka.cluster.Partition: Partition [TOPIC,6] on broker 1: No checkpointed highwatermark is found for partition [TOPIC,6] 2014-10-16 10:20:18,698 INFO kafka.log.Log: Truncating log TOPIC-6 to offset 0. 2014-10-16 10:21:18,788 INFO kafka.log.OffsetIndex: Deleting index /storage/kafka/00/kafka_data/TOPIC-6/00528024.index.deleted 2014-10-16 10:21:18,781 INFO kafka.log.Log: Deleting segment 528024 from log TOPIC-6. I'm not too worried about this since I'm hoping to move to Kafka 0.8.2 ASAP but I was curious if anyone could explain this behavior. -Bryan -- -- Guozhang
Re: Strange behavior during un-clean leader election
Yes the cluster was to a degree restarted in a rolling fashion but due to some other events causing the brokers to be rather confused the ISR for a number of partitions became empty and a new controller was elected. KAFKA-1647 sounds exactly like the problem I encountered. Thank you. On Tue, Oct 21, 2014 at 3:28 PM, Guozhang Wang wangg...@gmail.com wrote: Bryan, Did you take down some brokers in your cluster while hitting KAFKA-1028? If yes, you may be hitting KAFKA-1647 also. Guozhang On Mon, Oct 20, 2014 at 1:18 PM, Bryan Baugher bjb...@gmail.com wrote: Hi everyone, We run a 3 Kafka cluster using 0.8.1.1 with all topics having a replication factor of 3 meaning every broker has a replica of every partition. We recently ran into this issue ( https://issues.apache.org/jira/browse/KAFKA-1028) and saw data loss within Kafka. We understand why it happened and have plans to try to ensure it doesn't happen again. The strange part was that the broker that was chosen for the un-clean leader election seemed to drop all of its own data about the partition in the process as our monitoring shows the broker offset was reset to 0 for a number of partitions. Following the broker's server logs in chronological order for a particular partition that saw data loss I see this, 2014-10-16 10:18:11,104 INFO kafka.log.Log: Completed load of log TOPIC-6 with log end offset 528026 2014-10-16 10:20:18,144 WARN kafka.controller.OfflinePartitionLeaderSelector: [OfflinePartitionLeaderSelector]: No broker in ISR is alive for [TOPIC,6]. Elect leader 1 from live brokers 1,2. There's potential data loss. 2014-10-16 10:20:18,277 WARN kafka.cluster.Partition: Partition [TOPIC,6] on broker 1: No checkpointed highwatermark is found for partition [TOPIC,6] 2014-10-16 10:20:18,698 INFO kafka.log.Log: Truncating log TOPIC-6 to offset 0. 2014-10-16 10:21:18,788 INFO kafka.log.OffsetIndex: Deleting index /storage/kafka/00/kafka_data/TOPIC-6/00528024.index.deleted 2014-10-16 10:21:18,781 INFO kafka.log.Log: Deleting segment 528024 from log TOPIC-6. I'm not too worried about this since I'm hoping to move to Kafka 0.8.2 ASAP but I was curious if anyone could explain this behavior. -Bryan -- -- Guozhang -- Bryan
Strange behavior during un-clean leader election
Hi everyone, We run a 3 Kafka cluster using 0.8.1.1 with all topics having a replication factor of 3 meaning every broker has a replica of every partition. We recently ran into this issue ( https://issues.apache.org/jira/browse/KAFKA-1028) and saw data loss within Kafka. We understand why it happened and have plans to try to ensure it doesn't happen again. The strange part was that the broker that was chosen for the un-clean leader election seemed to drop all of its own data about the partition in the process as our monitoring shows the broker offset was reset to 0 for a number of partitions. Following the broker's server logs in chronological order for a particular partition that saw data loss I see this, 2014-10-16 10:18:11,104 INFO kafka.log.Log: Completed load of log TOPIC-6 with log end offset 528026 2014-10-16 10:20:18,144 WARN kafka.controller.OfflinePartitionLeaderSelector: [OfflinePartitionLeaderSelector]: No broker in ISR is alive for [TOPIC,6]. Elect leader 1 from live brokers 1,2. There's potential data loss. 2014-10-16 10:20:18,277 WARN kafka.cluster.Partition: Partition [TOPIC,6] on broker 1: No checkpointed highwatermark is found for partition [TOPIC,6] 2014-10-16 10:20:18,698 INFO kafka.log.Log: Truncating log TOPIC-6 to offset 0. 2014-10-16 10:21:18,788 INFO kafka.log.OffsetIndex: Deleting index /storage/kafka/00/kafka_data/TOPIC-6/00528024.index.deleted 2014-10-16 10:21:18,781 INFO kafka.log.Log: Deleting segment 528024 from log TOPIC-6. I'm not too worried about this since I'm hoping to move to Kafka 0.8.2 ASAP but I was curious if anyone could explain this behavior. -Bryan
Re: Lost messages during leader election
Jun, Jad, I think in this case data loss can still happen, since the replica factor was previously one, and in handling the produce requests, if the server decides that all the produced partitions have a replica factor of 1 it will also directly send back the response instead of putting the request into purgatory even if currently the number of replicas is 2 (for details look at ReplicaManager.getReplicationFactorForPartition and search of the usage of Partition.replicationFactor). I now agree that this is not related to KAFKA-1211 but a different small bug. We need to probably file another JIRA for this. But I think after this one is fixed (which should be much easier than KAFKA-1211), Jad's scenario should not cause data loss anymore. Guozhang On Sun, Jul 27, 2014 at 6:11 PM, Jad Naous jad.na...@appdynamics.com wrote: So in summary, is it true to say that currently triggering leader reelection is not a safe operation? I have been able to reproduce that message loss pretty reliably in tests. If that is the case, isn't that an important operation in a large cluster where nodes go up and down? On Jul 25, 2014 10:00 PM, Jun Rao jun...@gmail.com wrote: Actually, I don't think KAFKA-1211 will happen with just 2 replicas. When a replica becomes a leader, it never truncates its log. Only when a replica becomes follower, it truncates its log to HW. So in this particular case, the new leader will not truncate data to offset 8. Thanks, Jun On Fri, Jul 25, 2014 at 3:37 PM, Guozhang Wang wangg...@gmail.com wrote: Hi Jad, Yes. In this case I think you are actually hitting KAFKA-1211. The summary of the issue is that, it takes one more fetch request round trip for the follower replica to advance the HW after the leader has advanced HW. So for your case, the whole process is like this: 1. leader LEO at 10, follower LEO at 8. Both leader and follower knows the LEO is at 8. 2. Follower fetch data on Leader starting at 8, leader records its LEO as 8. 3. Follower gets 9 and 10 and append to its local log. 4. Follower fetch data on Leader starting at 10, leader records its LEO as 10; now leader knows follower has caught up, it advances its HW to 10 and adds the follower to ISR (but follower does not know that yet! It still think the HW is 8). 5. Leader's fetch response gets back to follower, and now the follower knows that HW has been updated to 10. And let's say there is a leader election between step 4) and 5), for your case it is due to preferred leader election, but it could also be that current leader fails, etc. Then on becoming the new leader the follower will truncate its data to 8, which is the HW it knows. Hence the data loss. The proposed solution in KAFKA-1211 will tackle this issue. Guozhang On Fri, Jul 25, 2014 at 2:48 PM, Jad Naous jad.na...@appdynamics.com wrote: Hi Guozhang, Yes, broker 1 is in the ISR (in fact, I wait for broker 1 to be in the ISR before triggering election). However, I think there is something still amiss. I still see data loss. Here are some relevant log lines from broker 0 (different test run). The log on broker 0 is getting truncated, losing some messages. [2014-07-25 10:40:02,134] [DEBUG] [kafka-request-handler-5] [kafka.cluster.Partition] Partition [EventServiceUpsertTopic,19] on broker 0: Old hw for partition [EventServiceUpsertTopic,19] is 8082. New hw is 8082. All leo's are 8111,8082 [2014-07-25 10:40:02,134] [DEBUG] [kafka-request-handler-5] [kafka.server.KafkaApis] [KafkaApi-0] Produce to local log in 1 ms [2014-07-25 10:40:02,134] [DEBUG] [main-SendThread(localhost:49893)] [org.apache.zookeeper.ClientCnxn] Reading reply sessionid:0x476e9a9e9a0001, packet:: clientPath:null serverPath:null finished:false header:: 729,5 replyHeader:: 729,4294968217,0 request:: '/brokers/topics/EventServiceUpsertTopic/partitions/5/state,#7b22636f6e74726f6c6c65725f65706f6368223a312c226c6561646572223a312c2276657273696f6e223a312c226c65616465725f65706f6368223a332c22697372223a5b302c315d7d,3 response:: s{4294967416,4294968217,1406309966419,1406310002132,4,0,0,0,74,0,4294967416} [2014-07-25 10:40:02,134] [DEBUG] [kafka-processor-49917-2] [kafka.request.logger] Completed request:Name: ProducerRequest; Version: 0; CorrelationId: 16248; ClientId: ; RequiredAcks: -1; AckTimeoutMs: 1 ms from client /127.0.0.1:50168 ;totalTime:1,requestQueueTime:0,localTime:1,remoteTime:0,responseQueueTime:0,sendTime:0 [2014-07-25 10:40:02,134] [DEBUG] [ZkClient-EventThread-13-localhost:49893,localhost:49896,localhost:49899] [kafka.utils.ZkUtils$] Conditional update of path /brokers/topics/EventServiceUpsertTopic/partitions/5/state with value {controller_epoch:1
Re: Lost messages during leader election
Hi Jad, Just to clarify, you also see data loss when you created the topic with replica factor 2, and two replicas running, and after an auto leader election triggered? If that is the case could you attach the logs of all involved brokers here? For your second question, KAFKA-1211 is designed to handle that case. Guozhang On Mon, Jul 28, 2014 at 10:18 AM, Jad Naous jad.na...@appdynamics.com wrote: Guozhang, I have actually also seen this happen when there are two replicas initially. So this problem is not limited to 1 replica. The issue is the truncation after leader election, which will also happen on the second replica. Coming back to your objections: First case: inconsistency between replicas 1) currently replica 1 is the current leader replica 1: m1 m2 m3 replica 2: m1 m2 2) replica 1 fails, and replica 2 becomes the new leader and accept messages 4 and 5: replica 1: m1 m2 m3 replica 2: m1 m2 m4 m5 3) replica 1 resumes, and does not truncate to HW, then it will still maintain m3, which is actually never committed. Say leader moves to replica 1 again, we can ended up with: replica 1: m1 m2 m3 m6 replica 2: m1 m2 m4 m5 I see. So when a replica resumes, it has to truncate to the last HW it saw before it died. Second case: inconsistency between server and clients: 1) producer send message m3 with ack=-1: replica 1: m1 m2 m3 replica 2: m1 m2 m3 replica 3: m1 m2 2) the response is held until all replicas also gets m3, say at this time current leader replica 1 fails and replica 3 re-elects. If replica 3 gets up to the largest LEO it will also get m3. replica 2: m1 m2 m3 replica 3: m1 m2 m3 3) But m3 is not actually committed by the time replica 1 fails; when producer gets the error at the time replica 1 fails, it will think that m3 was not successfully sent, so retry sending m3: replica 2: m1 m2 m3 m3 replica 3: m1 m2 m3 m3 So on failure, all nodes need to truncate to the HW. But if there's no failure, then truncating would lose data unnecessarily. Maybe those two scenarios need to be handled differently? Jad. On Mon, Jul 28, 2014 at 9:58 AM, Guozhang Wang wangg...@gmail.com wrote: Jun, Jad, I think in this case data loss can still happen, since the replica factor was previously one, and in handling the produce requests, if the server decides that all the produced partitions have a replica factor of 1 it will also directly send back the response instead of putting the request into purgatory even if currently the number of replicas is 2 (for details look at ReplicaManager.getReplicationFactorForPartition and search of the usage of Partition.replicationFactor). I now agree that this is not related to KAFKA-1211 but a different small bug. We need to probably file another JIRA for this. But I think after this one is fixed (which should be much easier than KAFKA-1211), Jad's scenario should not cause data loss anymore. Guozhang On Sun, Jul 27, 2014 at 6:11 PM, Jad Naous jad.na...@appdynamics.com wrote: So in summary, is it true to say that currently triggering leader reelection is not a safe operation? I have been able to reproduce that message loss pretty reliably in tests. If that is the case, isn't that an important operation in a large cluster where nodes go up and down? On Jul 25, 2014 10:00 PM, Jun Rao jun...@gmail.com wrote: Actually, I don't think KAFKA-1211 will happen with just 2 replicas. When a replica becomes a leader, it never truncates its log. Only when a replica becomes follower, it truncates its log to HW. So in this particular case, the new leader will not truncate data to offset 8. Thanks, Jun On Fri, Jul 25, 2014 at 3:37 PM, Guozhang Wang wangg...@gmail.com wrote: Hi Jad, Yes. In this case I think you are actually hitting KAFKA-1211. The summary of the issue is that, it takes one more fetch request round trip for the follower replica to advance the HW after the leader has advanced HW. So for your case, the whole process is like this: 1. leader LEO at 10, follower LEO at 8. Both leader and follower knows the LEO is at 8. 2. Follower fetch data on Leader starting at 8, leader records its LEO as 8. 3. Follower gets 9 and 10 and append to its local log. 4. Follower fetch data on Leader starting at 10, leader records its LEO as 10; now leader knows follower has caught up, it advances its HW to 10 and adds the follower to ISR (but follower does not know that yet! It still think the HW is 8). 5. Leader's fetch response gets back to follower, and now the follower knows that HW has been updated to 10. And let's say there is a leader election between step 4) and 5), for your case it is due to preferred leader election, but it could also
Re: Lost messages during leader election
Hi Jad, A follower replica can join ISR only when it has caught up to HW, which in this case would be the end of the leader replica. So in that scenario it should still be no data loss. On Thu, Jul 24, 2014 at 7:48 PM, Jad Naous jad.na...@appdynamics.com wrote: Actually, is the following scenario possible? - We start off with only 1 replica (the leader) - the producer continuously sends messages - a new replica (the preferred one) comes online - it becomes an ISR just after an ack is sent to the producer - the new replica gets elected as the new leader, but it's not fully caught up to the old leader and then we lose the last message... On Thu, Jul 24, 2014 at 6:29 PM, Jad Naous jad.na...@appdynamics.com wrote: Ah yes. OK, thanks! So it seems like we should only manually trigger re-election after seeing that all replicas are in the ISR. Is there a bug to follow this up? Thanks, Jad. On Thu, Jul 24, 2014 at 6:27 PM, Guozhang Wang wangg...@gmail.com wrote: With ack=-1 all messages produced to leader must have been acked by all replicas to respond. So that will not cause data loss. On Thu, Jul 24, 2014 at 6:07 PM, Jad Naous jad.na...@appdynamics.com wrote: Hi Guozhang, Isn't it also possible to lose messages even if the preferred leader is in the ISR, when the current leader is ahead by a few messages, but the preferred leader still has not caught up? Thanks, Jad. On Thu, Jul 24, 2014 at 4:59 PM, Guozhang Wang wangg...@gmail.com wrote: Hi Jad, Thanks for bring this up. It seems to be a valid issue: in the current auto leader rebalancer thread's logic, if the imbalance ratio threshold is violated, then it will trigger the preferred leader election whether or not the preferred leader is in ISR or not. Guozhang On Thu, Jul 24, 2014 at 4:21 PM, Jad Naous jad.na...@appdynamics.com wrote: Hi, I have a test that continuously sends messages to one broker, brings up another broker, and adds it as a replica for all partitions, with it being the preferred replica for some. I have auto.leader.rebalance.enable=true, so replica election gets triggered. Data is being pumped to the old broker all the while. It seems that some data gets lost while switching over to the new leader. Is this a bug, or do I have something misconfigured? I also have request.required.acks=-1 on the producer. Here's what I think is happening: 1. Producer writes message to broker 0, [EventServiceUpsertTopic,13], w/ broker 0 currently leader, with ISR=(0), so write returns successfully, even when acks = -1. Correlation id 35836 Producer log: [2014-07-24 14:44:26,991] [DEBUG] [dw-97 - PATCH /v1/events/type_for_test_bringupNewBroker_shouldRebalance_shouldNotLoseData/event?_idPath=idField_mergeFields=field1] [kafka.producer.BrokerPartitionInfo] Partition [EventServiceUpsertTopic,13] has leader 0 [2014-07-24 14:44:26,993] [DEBUG] [dw-97 - PATCH /v1/events/type_for_test_bringupNewBroker_shouldRebalance_shouldNotLoseData/event?_idPath=idField_mergeFields=field1] [k.producer.async.DefaultEventHandler] Producer sent messages with correlation id 35836 for topics [EventServiceUpsertTopic,13] to broker 0 on localhost:56821 2. Broker 1 is still catching up Broker 0 Log: [2014-07-24 14:44:26,992] [DEBUG] [kafka-request-handler-3] [kafka.cluster.Partition] Partition [EventServiceUpsertTopic,13] on broker 0: Old hw for partition [EventServiceUpsertTopic,13] is 971. New hw is 971. All leo's are 975,971 [2014-07-24 14:44:26,992] [DEBUG] [kafka-request-handler-3] [kafka.server.KafkaApis] [KafkaApi-0] Produce to local log in 0 ms [2014-07-24 14:44:26,992] [DEBUG] [kafka-processor-56821-0] [kafka.request.logger] Completed request:Name: ProducerRequest; Version: 0; CorrelationId: 35836; ClientId: ; RequiredAcks: -1; AckTimeoutMs: 1 ms from client /127.0.0.1:57086 ;totalTime:0,requestQueueTime:0,localTime:0,remoteTime:0,responseQueueTime:0,sendTime:0 3. Leader election is triggered by the scheduler: Broker 0 Log: [2014-07-24 14:44:26,991] [INFO ] [kafka-scheduler-0] [k.c.PreferredReplicaPartitionLeaderSelector] [PreferredReplicaPartitionLeaderSelector]: Current leader 0 for partition [ EventServiceUpsertTopic,13] is not the preferred replica. Trigerring preferred replica leader election [2014-07-24 14:44:26,993] [DEBUG] [kafka-scheduler-0] [kafka.utils.ZkUtils$] Conditional update of path /brokers/topics/EventServiceUpsertTopic/partitions/13/state with value {controller_epoch
Re: Lost messages during leader election
Thank you so much for your explanation and your patience! On Fri, Jul 25, 2014 at 10:08 AM, Guozhang Wang wangg...@gmail.com wrote: HW is updated as to the offset that the messages have been committed to all replicas. This is only updated by the leader, when it receives the fetch requests from other follower replicas, to the position of the minimum starting offsets of the fetch requests. For producer.ack=-1, the leader will only return the response once it knows the HW has been updated to be larger than the produce end offset. Guozhang On Fri, Jul 25, 2014 at 9:36 AM, Jad Naous jad.na...@appdynamics.com wrote: Hi Guozhang, I apologize for my misunderstanding, I would really like to understand this thoroughly. When/how is the HW set, and how does that interact with acks being sent to the producer? Is it that the hw sets the offset for messages for which acks have been sent, and so a replica only becomes in-sync if it has caught up with all the messages that have been acked? Thanks, Jad. On Fri, Jul 25, 2014 at 8:19 AM, Guozhang Wang wangg...@gmail.com wrote: Hi Jad, A follower replica can join ISR only when it has caught up to HW, which in this case would be the end of the leader replica. So in that scenario it should still be no data loss. On Thu, Jul 24, 2014 at 7:48 PM, Jad Naous jad.na...@appdynamics.com wrote: Actually, is the following scenario possible? - We start off with only 1 replica (the leader) - the producer continuously sends messages - a new replica (the preferred one) comes online - it becomes an ISR just after an ack is sent to the producer - the new replica gets elected as the new leader, but it's not fully caught up to the old leader and then we lose the last message... On Thu, Jul 24, 2014 at 6:29 PM, Jad Naous jad.na...@appdynamics.com wrote: Ah yes. OK, thanks! So it seems like we should only manually trigger re-election after seeing that all replicas are in the ISR. Is there a bug to follow this up? Thanks, Jad. On Thu, Jul 24, 2014 at 6:27 PM, Guozhang Wang wangg...@gmail.com wrote: With ack=-1 all messages produced to leader must have been acked by all replicas to respond. So that will not cause data loss. On Thu, Jul 24, 2014 at 6:07 PM, Jad Naous jad.na...@appdynamics.com wrote: Hi Guozhang, Isn't it also possible to lose messages even if the preferred leader is in the ISR, when the current leader is ahead by a few messages, but the preferred leader still has not caught up? Thanks, Jad. On Thu, Jul 24, 2014 at 4:59 PM, Guozhang Wang wangg...@gmail.com wrote: Hi Jad, Thanks for bring this up. It seems to be a valid issue: in the current auto leader rebalancer thread's logic, if the imbalance ratio threshold is violated, then it will trigger the preferred leader election whether or not the preferred leader is in ISR or not. Guozhang On Thu, Jul 24, 2014 at 4:21 PM, Jad Naous jad.na...@appdynamics.com wrote: Hi, I have a test that continuously sends messages to one broker, brings up another broker, and adds it as a replica for all partitions, with it being the preferred replica for some. I have auto.leader.rebalance.enable=true, so replica election gets triggered. Data is being pumped to the old broker all the while. It seems that some data gets lost while switching over to the new leader. Is this a bug, or do I have something misconfigured? I also have request.required.acks=-1 on the producer. Here's what I think is happening: 1. Producer writes message to broker 0, [EventServiceUpsertTopic,13], w/ broker 0 currently leader, with ISR=(0), so write returns successfully, even when acks = -1. Correlation id 35836 Producer log: [2014-07-24 14:44:26,991] [DEBUG] [dw-97 - PATCH /v1/events/type_for_test_bringupNewBroker_shouldRebalance_shouldNotLoseData/event?_idPath=idField_mergeFields=field1] [kafka.producer.BrokerPartitionInfo] Partition [EventServiceUpsertTopic,13] has leader 0 [2014-07-24 14:44:26,993] [DEBUG] [dw-97 - PATCH /v1/events/type_for_test_bringupNewBroker_shouldRebalance_shouldNotLoseData/event?_idPath=idField_mergeFields=field1] [k.producer.async.DefaultEventHandler] Producer sent messages with correlation id 35836 for topics
Re: Lost messages during leader election
Hi Guozhang, Yes, I think they are related. It seems odd to me that there should be any truncation at all since that is always an opportunity for data loss. It seems like we would want to avoid that at all costs, assuming we uphold the invariant that messages committed to an offset on any replica will always be identical. There seems to be two choices to make sure we don't lose data during leader election: 1- after leader election, requests going to the new leader should block until the leader is caught up to the highest LEO in the set of replicas (or until the HW reaches the highest LEO among replicas). I think this is less normal since it would mean that the new leader has to fetch messages from a non-leader. Maybe you can have the notion of a write-leader and a read-leader. After leader election, the write-leader becomes whatever the preferred leader today is, and the read-leader becomes the replica with the highest LEO. If the write-leader is not also the read-leader, then writes are blocked. All replicas and consumers fetch from the read-leader, and when the write-leader is caught up to the read-leader, the write-leader also becomes the read-leader and writes are unblocked. 2- After leader election, block actual leadership switching and block responding to new requests until the HW reaches the LEO on the old leader, then switch leadership, failing the blocked requests (or signaling a retry for a new leader), which should then retry the writes on the new leader, and things proceed as normal. Would either of these work? Thanks, Jad. On Fri, Jul 25, 2014 at 3:37 PM, Guozhang Wang wangg...@gmail.com wrote: Hi Jad, Yes. In this case I think you are actually hitting KAFKA-1211. The summary of the issue is that, it takes one more fetch request round trip for the follower replica to advance the HW after the leader has advanced HW. So for your case, the whole process is like this: 1. leader LEO at 10, follower LEO at 8. Both leader and follower knows the LEO is at 8. 2. Follower fetch data on Leader starting at 8, leader records its LEO as 8. 3. Follower gets 9 and 10 and append to its local log. 4. Follower fetch data on Leader starting at 10, leader records its LEO as 10; now leader knows follower has caught up, it advances its HW to 10 and adds the follower to ISR (but follower does not know that yet! It still think the HW is 8). 5. Leader's fetch response gets back to follower, and now the follower knows that HW has been updated to 10. And let's say there is a leader election between step 4) and 5), for your case it is due to preferred leader election, but it could also be that current leader fails, etc. Then on becoming the new leader the follower will truncate its data to 8, which is the HW it knows. Hence the data loss. The proposed solution in KAFKA-1211 will tackle this issue. Guozhang On Fri, Jul 25, 2014 at 2:48 PM, Jad Naous jad.na...@appdynamics.com wrote: Hi Guozhang, Yes, broker 1 is in the ISR (in fact, I wait for broker 1 to be in the ISR before triggering election). However, I think there is something still amiss. I still see data loss. Here are some relevant log lines from broker 0 (different test run). The log on broker 0 is getting truncated, losing some messages. [2014-07-25 10:40:02,134] [DEBUG] [kafka-request-handler-5] [kafka.cluster.Partition] Partition [EventServiceUpsertTopic,19] on broker 0: Old hw for partition [EventServiceUpsertTopic,19] is 8082. New hw is 8082. All leo's are 8111,8082 [2014-07-25 10:40:02,134] [DEBUG] [kafka-request-handler-5] [kafka.server.KafkaApis] [KafkaApi-0] Produce to local log in 1 ms [2014-07-25 10:40:02,134] [DEBUG] [main-SendThread(localhost:49893)] [org.apache.zookeeper.ClientCnxn] Reading reply sessionid:0x476e9a9e9a0001, packet:: clientPath:null serverPath:null finished:false header:: 729,5 replyHeader:: 729,4294968217,0 request:: '/brokers/topics/EventServiceUpsertTopic/partitions/5/state,#7b22636f6e74726f6c6c65725f65706f6368223a312c226c6561646572223a312c2276657273696f6e223a312c226c65616465725f65706f6368223a332c22697372223a5b302c315d7d,3 response:: s{4294967416,4294968217,1406309966419,1406310002132,4,0,0,0,74,0,4294967416} [2014-07-25 10:40:02,134] [DEBUG] [kafka-processor-49917-2] [kafka.request.logger] Completed request:Name: ProducerRequest; Version: 0; CorrelationId: 16248; ClientId: ; RequiredAcks: -1; AckTimeoutMs: 1 ms from client /127.0.0.1:50168 ;totalTime:1,requestQueueTime:0,localTime:1,remoteTime:0,responseQueueTime:0,sendTime:0 [2014-07-25 10:40:02,134] [DEBUG] [ZkClient-EventThread-13-localhost:49893,localhost:49896,localhost:49899] [kafka.utils.ZkUtils$] Conditional update of path /brokers/topics/EventServiceUpsertTopic/partitions/5/state with value {controller_epoch:1,leader:1,version:1,leader_epoch:3,isr:[0,1]} and expected version 3 succeeded, returning the new version: 4
Lost messages during leader election
Hi, I have a test that continuously sends messages to one broker, brings up another broker, and adds it as a replica for all partitions, with it being the preferred replica for some. I have auto.leader.rebalance.enable=true, so replica election gets triggered. Data is being pumped to the old broker all the while. It seems that some data gets lost while switching over to the new leader. Is this a bug, or do I have something misconfigured? I also have request.required.acks=-1 on the producer. Here's what I think is happening: 1. Producer writes message to broker 0, [EventServiceUpsertTopic,13], w/ broker 0 currently leader, with ISR=(0), so write returns successfully, even when acks = -1. Correlation id 35836 Producer log: [2014-07-24 14:44:26,991] [DEBUG] [dw-97 - PATCH /v1/events/type_for_test_bringupNewBroker_shouldRebalance_shouldNotLoseData/event?_idPath=idField_mergeFields=field1] [kafka.producer.BrokerPartitionInfo] Partition [EventServiceUpsertTopic,13] has leader 0 [2014-07-24 14:44:26,993] [DEBUG] [dw-97 - PATCH /v1/events/type_for_test_bringupNewBroker_shouldRebalance_shouldNotLoseData/event?_idPath=idField_mergeFields=field1] [k.producer.async.DefaultEventHandler] Producer sent messages with correlation id 35836 for topics [EventServiceUpsertTopic,13] to broker 0 on localhost:56821 2. Broker 1 is still catching up Broker 0 Log: [2014-07-24 14:44:26,992] [DEBUG] [kafka-request-handler-3] [kafka.cluster.Partition] Partition [EventServiceUpsertTopic,13] on broker 0: Old hw for partition [EventServiceUpsertTopic,13] is 971. New hw is 971. All leo's are 975,971 [2014-07-24 14:44:26,992] [DEBUG] [kafka-request-handler-3] [kafka.server.KafkaApis] [KafkaApi-0] Produce to local log in 0 ms [2014-07-24 14:44:26,992] [DEBUG] [kafka-processor-56821-0] [kafka.request.logger] Completed request:Name: ProducerRequest; Version: 0; CorrelationId: 35836; ClientId: ; RequiredAcks: -1; AckTimeoutMs: 1 ms from client /127.0.0.1:57086 ;totalTime:0,requestQueueTime:0,localTime:0,remoteTime:0,responseQueueTime:0,sendTime:0 3. Leader election is triggered by the scheduler: Broker 0 Log: [2014-07-24 14:44:26,991] [INFO ] [kafka-scheduler-0] [k.c.PreferredReplicaPartitionLeaderSelector] [PreferredReplicaPartitionLeaderSelector]: Current leader 0 for partition [ EventServiceUpsertTopic,13] is not the preferred replica. Trigerring preferred replica leader election [2014-07-24 14:44:26,993] [DEBUG] [kafka-scheduler-0] [kafka.utils.ZkUtils$] Conditional update of path /brokers/topics/EventServiceUpsertTopic/partitions/13/state with value {controller_epoch:1,leader:1,version:1,leader_epoch:3,isr:[0,1]} and expected version 3 succeeded, returning the new version: 4 [2014-07-24 14:44:26,994] [DEBUG] [kafka-scheduler-0] [k.controller.PartitionStateMachine] [Partition state machine on Controller 0]: After leader election, leader cache is updated to Map(Snipped(Leader:1,ISR:0,1,LeaderEpoch:3,ControllerEpoch:1),EndSnip) [2014-07-24 14:44:26,994] [INFO ] [kafka-scheduler-0] [kafka.controller.KafkaController] [Controller 0]: Partition [ EventServiceUpsertTopic,13] completed preferred replica leader election. New leader is 1 4. Broker 1 is still behind, but it sets the high water mark to 971!!! Broker 1 Log: [2014-07-24 14:44:26,999] [INFO ] [kafka-request-handler-6] [kafka.server.ReplicaFetcherManager] [ReplicaFetcherManager on broker 1] Removed fetcher for partitions [EventServiceUpsertTopic,13] [2014-07-24 14:44:27,000] [DEBUG] [kafka-request-handler-6] [kafka.cluster.Partition] Partition [EventServiceUpsertTopic,13] on broker 1: Old hw for partition [EventServiceUpsertTopic,13] is 970. New hw is -1. All leo's are -1,971 [2014-07-24 14:44:27,098] [DEBUG] [kafka-request-handler-3] [kafka.server.KafkaApis] [KafkaApi-1] Maybe update partition HW due to fetch request: Name: FetchRequest; Version: 0; CorrelationId: 1; ClientId: ReplicaFetcherThread-0-1; ReplicaId: 0; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: [EventServiceUpsertTopic,13] - PartitionFetchInfo(971,1048576), Snipped [2014-07-24 14:44:27,098] [DEBUG] [kafka-request-handler-3] [kafka.cluster.Partition] Partition [EventServiceUpsertTopic,13] on broker 1: Recording follower 0 position 971 for partition [ EventServiceUpsertTopic,13]. [2014-07-24 14:44:27,100] [DEBUG] [kafka-request-handler-3] [kafka.cluster.Partition] Partition [EventServiceUpsertTopic,13] on broker 1: Highwatermark for partition [EventServiceUpsertTopic,13] updated to 971 5. Consumer is none the wiser. All data that was in offsets 972-975 doesn't show up! I tried this with 2 initial replicas, and adding a 3rd which is supposed to be the leader for some new partitions, and this problem also happens there. The log on the old leader gets truncated to the offset on the new leader. What's the solution? Can I make a new broker leader for partitions that are currently active without losing data? Thanks, Jad. -- *Jad Naous* | Engineering
Re: Lost messages during leader election
Hi Jad, Thanks for bring this up. It seems to be a valid issue: in the current auto leader rebalancer thread's logic, if the imbalance ratio threshold is violated, then it will trigger the preferred leader election whether or not the preferred leader is in ISR or not. Guozhang On Thu, Jul 24, 2014 at 4:21 PM, Jad Naous jad.na...@appdynamics.com wrote: Hi, I have a test that continuously sends messages to one broker, brings up another broker, and adds it as a replica for all partitions, with it being the preferred replica for some. I have auto.leader.rebalance.enable=true, so replica election gets triggered. Data is being pumped to the old broker all the while. It seems that some data gets lost while switching over to the new leader. Is this a bug, or do I have something misconfigured? I also have request.required.acks=-1 on the producer. Here's what I think is happening: 1. Producer writes message to broker 0, [EventServiceUpsertTopic,13], w/ broker 0 currently leader, with ISR=(0), so write returns successfully, even when acks = -1. Correlation id 35836 Producer log: [2014-07-24 14:44:26,991] [DEBUG] [dw-97 - PATCH /v1/events/type_for_test_bringupNewBroker_shouldRebalance_shouldNotLoseData/event?_idPath=idField_mergeFields=field1] [kafka.producer.BrokerPartitionInfo] Partition [EventServiceUpsertTopic,13] has leader 0 [2014-07-24 14:44:26,993] [DEBUG] [dw-97 - PATCH /v1/events/type_for_test_bringupNewBroker_shouldRebalance_shouldNotLoseData/event?_idPath=idField_mergeFields=field1] [k.producer.async.DefaultEventHandler] Producer sent messages with correlation id 35836 for topics [EventServiceUpsertTopic,13] to broker 0 on localhost:56821 2. Broker 1 is still catching up Broker 0 Log: [2014-07-24 14:44:26,992] [DEBUG] [kafka-request-handler-3] [kafka.cluster.Partition] Partition [EventServiceUpsertTopic,13] on broker 0: Old hw for partition [EventServiceUpsertTopic,13] is 971. New hw is 971. All leo's are 975,971 [2014-07-24 14:44:26,992] [DEBUG] [kafka-request-handler-3] [kafka.server.KafkaApis] [KafkaApi-0] Produce to local log in 0 ms [2014-07-24 14:44:26,992] [DEBUG] [kafka-processor-56821-0] [kafka.request.logger] Completed request:Name: ProducerRequest; Version: 0; CorrelationId: 35836; ClientId: ; RequiredAcks: -1; AckTimeoutMs: 1 ms from client /127.0.0.1:57086 ;totalTime:0,requestQueueTime:0,localTime:0,remoteTime:0,responseQueueTime:0,sendTime:0 3. Leader election is triggered by the scheduler: Broker 0 Log: [2014-07-24 14:44:26,991] [INFO ] [kafka-scheduler-0] [k.c.PreferredReplicaPartitionLeaderSelector] [PreferredReplicaPartitionLeaderSelector]: Current leader 0 for partition [ EventServiceUpsertTopic,13] is not the preferred replica. Trigerring preferred replica leader election [2014-07-24 14:44:26,993] [DEBUG] [kafka-scheduler-0] [kafka.utils.ZkUtils$] Conditional update of path /brokers/topics/EventServiceUpsertTopic/partitions/13/state with value {controller_epoch:1,leader:1,version:1,leader_epoch:3,isr:[0,1]} and expected version 3 succeeded, returning the new version: 4 [2014-07-24 14:44:26,994] [DEBUG] [kafka-scheduler-0] [k.controller.PartitionStateMachine] [Partition state machine on Controller 0]: After leader election, leader cache is updated to Map(Snipped(Leader:1,ISR:0,1,LeaderEpoch:3,ControllerEpoch:1),EndSnip) [2014-07-24 14:44:26,994] [INFO ] [kafka-scheduler-0] [kafka.controller.KafkaController] [Controller 0]: Partition [ EventServiceUpsertTopic,13] completed preferred replica leader election. New leader is 1 4. Broker 1 is still behind, but it sets the high water mark to 971!!! Broker 1 Log: [2014-07-24 14:44:26,999] [INFO ] [kafka-request-handler-6] [kafka.server.ReplicaFetcherManager] [ReplicaFetcherManager on broker 1] Removed fetcher for partitions [EventServiceUpsertTopic,13] [2014-07-24 14:44:27,000] [DEBUG] [kafka-request-handler-6] [kafka.cluster.Partition] Partition [EventServiceUpsertTopic,13] on broker 1: Old hw for partition [EventServiceUpsertTopic,13] is 970. New hw is -1. All leo's are -1,971 [2014-07-24 14:44:27,098] [DEBUG] [kafka-request-handler-3] [kafka.server.KafkaApis] [KafkaApi-1] Maybe update partition HW due to fetch request: Name: FetchRequest; Version: 0; CorrelationId: 1; ClientId: ReplicaFetcherThread-0-1; ReplicaId: 0; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: [EventServiceUpsertTopic,13] - PartitionFetchInfo(971,1048576), Snipped [2014-07-24 14:44:27,098] [DEBUG] [kafka-request-handler-3] [kafka.cluster.Partition] Partition [EventServiceUpsertTopic,13] on broker 1: Recording follower 0 position 971 for partition [ EventServiceUpsertTopic,13]. [2014-07-24 14:44:27,100] [DEBUG] [kafka-request-handler-3] [kafka.cluster.Partition] Partition [EventServiceUpsertTopic,13] on broker 1: Highwatermark for partition [EventServiceUpsertTopic,13] updated to 971 5. Consumer is none the wiser. All
Re: Lost messages during leader election
Hi Guozhang, Isn't it also possible to lose messages even if the preferred leader is in the ISR, when the current leader is ahead by a few messages, but the preferred leader still has not caught up? Thanks, Jad. On Thu, Jul 24, 2014 at 4:59 PM, Guozhang Wang wangg...@gmail.com wrote: Hi Jad, Thanks for bring this up. It seems to be a valid issue: in the current auto leader rebalancer thread's logic, if the imbalance ratio threshold is violated, then it will trigger the preferred leader election whether or not the preferred leader is in ISR or not. Guozhang On Thu, Jul 24, 2014 at 4:21 PM, Jad Naous jad.na...@appdynamics.com wrote: Hi, I have a test that continuously sends messages to one broker, brings up another broker, and adds it as a replica for all partitions, with it being the preferred replica for some. I have auto.leader.rebalance.enable=true, so replica election gets triggered. Data is being pumped to the old broker all the while. It seems that some data gets lost while switching over to the new leader. Is this a bug, or do I have something misconfigured? I also have request.required.acks=-1 on the producer. Here's what I think is happening: 1. Producer writes message to broker 0, [EventServiceUpsertTopic,13], w/ broker 0 currently leader, with ISR=(0), so write returns successfully, even when acks = -1. Correlation id 35836 Producer log: [2014-07-24 14:44:26,991] [DEBUG] [dw-97 - PATCH /v1/events/type_for_test_bringupNewBroker_shouldRebalance_shouldNotLoseData/event?_idPath=idField_mergeFields=field1] [kafka.producer.BrokerPartitionInfo] Partition [EventServiceUpsertTopic,13] has leader 0 [2014-07-24 14:44:26,993] [DEBUG] [dw-97 - PATCH /v1/events/type_for_test_bringupNewBroker_shouldRebalance_shouldNotLoseData/event?_idPath=idField_mergeFields=field1] [k.producer.async.DefaultEventHandler] Producer sent messages with correlation id 35836 for topics [EventServiceUpsertTopic,13] to broker 0 on localhost:56821 2. Broker 1 is still catching up Broker 0 Log: [2014-07-24 14:44:26,992] [DEBUG] [kafka-request-handler-3] [kafka.cluster.Partition] Partition [EventServiceUpsertTopic,13] on broker 0: Old hw for partition [EventServiceUpsertTopic,13] is 971. New hw is 971. All leo's are 975,971 [2014-07-24 14:44:26,992] [DEBUG] [kafka-request-handler-3] [kafka.server.KafkaApis] [KafkaApi-0] Produce to local log in 0 ms [2014-07-24 14:44:26,992] [DEBUG] [kafka-processor-56821-0] [kafka.request.logger] Completed request:Name: ProducerRequest; Version: 0; CorrelationId: 35836; ClientId: ; RequiredAcks: -1; AckTimeoutMs: 1 ms from client /127.0.0.1:57086 ;totalTime:0,requestQueueTime:0,localTime:0,remoteTime:0,responseQueueTime:0,sendTime:0 3. Leader election is triggered by the scheduler: Broker 0 Log: [2014-07-24 14:44:26,991] [INFO ] [kafka-scheduler-0] [k.c.PreferredReplicaPartitionLeaderSelector] [PreferredReplicaPartitionLeaderSelector]: Current leader 0 for partition [ EventServiceUpsertTopic,13] is not the preferred replica. Trigerring preferred replica leader election [2014-07-24 14:44:26,993] [DEBUG] [kafka-scheduler-0] [kafka.utils.ZkUtils$] Conditional update of path /brokers/topics/EventServiceUpsertTopic/partitions/13/state with value {controller_epoch:1,leader:1,version:1,leader_epoch:3,isr:[0,1]} and expected version 3 succeeded, returning the new version: 4 [2014-07-24 14:44:26,994] [DEBUG] [kafka-scheduler-0] [k.controller.PartitionStateMachine] [Partition state machine on Controller 0]: After leader election, leader cache is updated to Map(Snipped(Leader:1,ISR:0,1,LeaderEpoch:3,ControllerEpoch:1),EndSnip) [2014-07-24 14:44:26,994] [INFO ] [kafka-scheduler-0] [kafka.controller.KafkaController] [Controller 0]: Partition [ EventServiceUpsertTopic,13] completed preferred replica leader election. New leader is 1 4. Broker 1 is still behind, but it sets the high water mark to 971!!! Broker 1 Log: [2014-07-24 14:44:26,999] [INFO ] [kafka-request-handler-6] [kafka.server.ReplicaFetcherManager] [ReplicaFetcherManager on broker 1] Removed fetcher for partitions [EventServiceUpsertTopic,13] [2014-07-24 14:44:27,000] [DEBUG] [kafka-request-handler-6] [kafka.cluster.Partition] Partition [EventServiceUpsertTopic,13] on broker 1: Old hw for partition [EventServiceUpsertTopic,13] is 970. New hw is -1. All leo's are -1,971 [2014-07-24 14:44:27,098] [DEBUG] [kafka-request-handler-3] [kafka.server.KafkaApis] [KafkaApi-1] Maybe update partition HW due to fetch request: Name: FetchRequest; Version: 0; CorrelationId: 1; ClientId: ReplicaFetcherThread-0-1; ReplicaId: 0; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: [EventServiceUpsertTopic,13] - PartitionFetchInfo(971,1048576), Snipped [2014-07-24 14:44:27,098] [DEBUG] [kafka-request-handler-3
Re: Lost messages during leader election
Ah yes. OK, thanks! So it seems like we should only manually trigger re-election after seeing that all replicas are in the ISR. Is there a bug to follow this up? Thanks, Jad. On Thu, Jul 24, 2014 at 6:27 PM, Guozhang Wang wangg...@gmail.com wrote: With ack=-1 all messages produced to leader must have been acked by all replicas to respond. So that will not cause data loss. On Thu, Jul 24, 2014 at 6:07 PM, Jad Naous jad.na...@appdynamics.com wrote: Hi Guozhang, Isn't it also possible to lose messages even if the preferred leader is in the ISR, when the current leader is ahead by a few messages, but the preferred leader still has not caught up? Thanks, Jad. On Thu, Jul 24, 2014 at 4:59 PM, Guozhang Wang wangg...@gmail.com wrote: Hi Jad, Thanks for bring this up. It seems to be a valid issue: in the current auto leader rebalancer thread's logic, if the imbalance ratio threshold is violated, then it will trigger the preferred leader election whether or not the preferred leader is in ISR or not. Guozhang On Thu, Jul 24, 2014 at 4:21 PM, Jad Naous jad.na...@appdynamics.com wrote: Hi, I have a test that continuously sends messages to one broker, brings up another broker, and adds it as a replica for all partitions, with it being the preferred replica for some. I have auto.leader.rebalance.enable=true, so replica election gets triggered. Data is being pumped to the old broker all the while. It seems that some data gets lost while switching over to the new leader. Is this a bug, or do I have something misconfigured? I also have request.required.acks=-1 on the producer. Here's what I think is happening: 1. Producer writes message to broker 0, [EventServiceUpsertTopic,13], w/ broker 0 currently leader, with ISR=(0), so write returns successfully, even when acks = -1. Correlation id 35836 Producer log: [2014-07-24 14:44:26,991] [DEBUG] [dw-97 - PATCH /v1/events/type_for_test_bringupNewBroker_shouldRebalance_shouldNotLoseData/event?_idPath=idField_mergeFields=field1] [kafka.producer.BrokerPartitionInfo] Partition [EventServiceUpsertTopic,13] has leader 0 [2014-07-24 14:44:26,993] [DEBUG] [dw-97 - PATCH /v1/events/type_for_test_bringupNewBroker_shouldRebalance_shouldNotLoseData/event?_idPath=idField_mergeFields=field1] [k.producer.async.DefaultEventHandler] Producer sent messages with correlation id 35836 for topics [EventServiceUpsertTopic,13] to broker 0 on localhost:56821 2. Broker 1 is still catching up Broker 0 Log: [2014-07-24 14:44:26,992] [DEBUG] [kafka-request-handler-3] [kafka.cluster.Partition] Partition [EventServiceUpsertTopic,13] on broker 0: Old hw for partition [EventServiceUpsertTopic,13] is 971. New hw is 971. All leo's are 975,971 [2014-07-24 14:44:26,992] [DEBUG] [kafka-request-handler-3] [kafka.server.KafkaApis] [KafkaApi-0] Produce to local log in 0 ms [2014-07-24 14:44:26,992] [DEBUG] [kafka-processor-56821-0] [kafka.request.logger] Completed request:Name: ProducerRequest; Version: 0; CorrelationId: 35836; ClientId: ; RequiredAcks: -1; AckTimeoutMs: 1 ms from client /127.0.0.1:57086 ;totalTime:0,requestQueueTime:0,localTime:0,remoteTime:0,responseQueueTime:0,sendTime:0 3. Leader election is triggered by the scheduler: Broker 0 Log: [2014-07-24 14:44:26,991] [INFO ] [kafka-scheduler-0] [k.c.PreferredReplicaPartitionLeaderSelector] [PreferredReplicaPartitionLeaderSelector]: Current leader 0 for partition [ EventServiceUpsertTopic,13] is not the preferred replica. Trigerring preferred replica leader election [2014-07-24 14:44:26,993] [DEBUG] [kafka-scheduler-0] [kafka.utils.ZkUtils$] Conditional update of path /brokers/topics/EventServiceUpsertTopic/partitions/13/state with value {controller_epoch:1,leader:1,version:1,leader_epoch:3,isr:[0,1]} and expected version 3 succeeded, returning the new version: 4 [2014-07-24 14:44:26,994] [DEBUG] [kafka-scheduler-0] [k.controller.PartitionStateMachine] [Partition state machine on Controller 0]: After leader election, leader cache is updated to Map(Snipped(Leader:1,ISR:0,1,LeaderEpoch:3,ControllerEpoch:1),EndSnip) [2014-07-24 14:44:26,994] [INFO ] [kafka-scheduler-0] [kafka.controller.KafkaController] [Controller 0]: Partition [ EventServiceUpsertTopic,13] completed preferred replica leader election. New leader is 1 4. Broker 1 is still behind, but it sets the high water mark to 971!!! Broker 1 Log: [2014-07-24 14:44:26,999] [INFO ] [kafka-request-handler-6] [kafka.server.ReplicaFetcherManager] [ReplicaFetcherManager on broker 1] Removed fetcher
Re: Lost messages during leader election
Actually, is the following scenario possible? - We start off with only 1 replica (the leader) - the producer continuously sends messages - a new replica (the preferred one) comes online - it becomes an ISR just after an ack is sent to the producer - the new replica gets elected as the new leader, but it's not fully caught up to the old leader and then we lose the last message... On Thu, Jul 24, 2014 at 6:29 PM, Jad Naous jad.na...@appdynamics.com wrote: Ah yes. OK, thanks! So it seems like we should only manually trigger re-election after seeing that all replicas are in the ISR. Is there a bug to follow this up? Thanks, Jad. On Thu, Jul 24, 2014 at 6:27 PM, Guozhang Wang wangg...@gmail.com wrote: With ack=-1 all messages produced to leader must have been acked by all replicas to respond. So that will not cause data loss. On Thu, Jul 24, 2014 at 6:07 PM, Jad Naous jad.na...@appdynamics.com wrote: Hi Guozhang, Isn't it also possible to lose messages even if the preferred leader is in the ISR, when the current leader is ahead by a few messages, but the preferred leader still has not caught up? Thanks, Jad. On Thu, Jul 24, 2014 at 4:59 PM, Guozhang Wang wangg...@gmail.com wrote: Hi Jad, Thanks for bring this up. It seems to be a valid issue: in the current auto leader rebalancer thread's logic, if the imbalance ratio threshold is violated, then it will trigger the preferred leader election whether or not the preferred leader is in ISR or not. Guozhang On Thu, Jul 24, 2014 at 4:21 PM, Jad Naous jad.na...@appdynamics.com wrote: Hi, I have a test that continuously sends messages to one broker, brings up another broker, and adds it as a replica for all partitions, with it being the preferred replica for some. I have auto.leader.rebalance.enable=true, so replica election gets triggered. Data is being pumped to the old broker all the while. It seems that some data gets lost while switching over to the new leader. Is this a bug, or do I have something misconfigured? I also have request.required.acks=-1 on the producer. Here's what I think is happening: 1. Producer writes message to broker 0, [EventServiceUpsertTopic,13], w/ broker 0 currently leader, with ISR=(0), so write returns successfully, even when acks = -1. Correlation id 35836 Producer log: [2014-07-24 14:44:26,991] [DEBUG] [dw-97 - PATCH /v1/events/type_for_test_bringupNewBroker_shouldRebalance_shouldNotLoseData/event?_idPath=idField_mergeFields=field1] [kafka.producer.BrokerPartitionInfo] Partition [EventServiceUpsertTopic,13] has leader 0 [2014-07-24 14:44:26,993] [DEBUG] [dw-97 - PATCH /v1/events/type_for_test_bringupNewBroker_shouldRebalance_shouldNotLoseData/event?_idPath=idField_mergeFields=field1] [k.producer.async.DefaultEventHandler] Producer sent messages with correlation id 35836 for topics [EventServiceUpsertTopic,13] to broker 0 on localhost:56821 2. Broker 1 is still catching up Broker 0 Log: [2014-07-24 14:44:26,992] [DEBUG] [kafka-request-handler-3] [kafka.cluster.Partition] Partition [EventServiceUpsertTopic,13] on broker 0: Old hw for partition [EventServiceUpsertTopic,13] is 971. New hw is 971. All leo's are 975,971 [2014-07-24 14:44:26,992] [DEBUG] [kafka-request-handler-3] [kafka.server.KafkaApis] [KafkaApi-0] Produce to local log in 0 ms [2014-07-24 14:44:26,992] [DEBUG] [kafka-processor-56821-0] [kafka.request.logger] Completed request:Name: ProducerRequest; Version: 0; CorrelationId: 35836; ClientId: ; RequiredAcks: -1; AckTimeoutMs: 1 ms from client /127.0.0.1:57086 ;totalTime:0,requestQueueTime:0,localTime:0,remoteTime:0,responseQueueTime:0,sendTime:0 3. Leader election is triggered by the scheduler: Broker 0 Log: [2014-07-24 14:44:26,991] [INFO ] [kafka-scheduler-0] [k.c.PreferredReplicaPartitionLeaderSelector] [PreferredReplicaPartitionLeaderSelector]: Current leader 0 for partition [ EventServiceUpsertTopic,13] is not the preferred replica. Trigerring preferred replica leader election [2014-07-24 14:44:26,993] [DEBUG] [kafka-scheduler-0] [kafka.utils.ZkUtils$] Conditional update of path /brokers/topics/EventServiceUpsertTopic/partitions/13/state with value {controller_epoch:1,leader:1,version:1,leader_epoch:3,isr:[0,1]} and expected version 3 succeeded, returning the new version: 4 [2014-07-24 14:44:26,994] [DEBUG] [kafka-scheduler-0] [k.controller.PartitionStateMachine] [Partition state machine on Controller 0]: After leader election, leader cache is updated to Map(Snipped(Leader:1,ISR:0,1,LeaderEpoch:3,ControllerEpoch:1),EndSnip) [2014-07-24 14:44:26,994
Re: Lost messages during leader election
I'm still not sure I understand after his reply - http://qnalist.com/questions/5034216/lost-messages-during-leader-election - I really need a tutorial on Kafka. I don't understand why they made it so complicated when Cassandra and Hbase are similar but simpler. * Ashwin Jayaprakash* | Engineering | AppDynamics http://appdynamics.com/ On Thu, Jul 24, 2014 at 4:21 PM, Jad Naous jad.na...@appdynamics.com wrote: Hi, I have a test that continuously sends messages to one broker, brings up another broker, and adds it as a replica for all partitions, with it being the preferred replica for some. I have auto.leader.rebalance.enable=true, so replica election gets triggered. Data is being pumped to the old broker all the while. It seems that some data gets lost while switching over to the new leader. Is this a bug, or do I have something misconfigured? I also have request.required.acks=-1 on the producer. Here's what I think is happening: 1. Producer writes message to broker 0, [EventServiceUpsertTopic,13], w/ broker 0 currently leader, with ISR=(0), so write returns successfully, even when acks = -1. Correlation id 35836 Producer log: [2014-07-24 14:44:26,991] [DEBUG] [dw-97 - PATCH /v1/events/type_for_test_bringupNewBroker_shouldRebalance_shouldNotLoseData/event?_idPath=idField_mergeFields=field1] [kafka.producer.BrokerPartitionInfo] Partition [EventServiceUpsertTopic,13] has leader 0 [2014-07-24 14:44:26,993] [DEBUG] [dw-97 - PATCH /v1/events/type_for_test_bringupNewBroker_shouldRebalance_shouldNotLoseData/event?_idPath=idField_mergeFields=field1] [k.producer.async.DefaultEventHandler] Producer sent messages with correlation id 35836 for topics [EventServiceUpsertTopic,13] to broker 0 on localhost:56821 2. Broker 1 is still catching up Broker 0 Log: [2014-07-24 14:44:26,992] [DEBUG] [kafka-request-handler-3] [kafka.cluster.Partition] Partition [EventServiceUpsertTopic,13] on broker 0: Old hw for partition [EventServiceUpsertTopic,13] is 971. New hw is 971. All leo's are 975,971 [2014-07-24 14:44:26,992] [DEBUG] [kafka-request-handler-3] [kafka.server.KafkaApis] [KafkaApi-0] Produce to local log in 0 ms [2014-07-24 14:44:26,992] [DEBUG] [kafka-processor-56821-0] [kafka.request.logger] Completed request:Name: ProducerRequest; Version: 0; CorrelationId: 35836; ClientId: ; RequiredAcks: -1; AckTimeoutMs: 1 ms from client /127.0.0.1:57086 ;totalTime:0,requestQueueTime:0,localTime:0,remoteTime:0,responseQueueTime:0,sendTime:0 3. Leader election is triggered by the scheduler: Broker 0 Log: [2014-07-24 14:44:26,991] [INFO ] [kafka-scheduler-0] [k.c.PreferredReplicaPartitionLeaderSelector] [PreferredReplicaPartitionLeaderSelector]: Current leader 0 for partition [ EventServiceUpsertTopic,13] is not the preferred replica. Trigerring preferred replica leader election [2014-07-24 14:44:26,993] [DEBUG] [kafka-scheduler-0] [kafka.utils.ZkUtils$] Conditional update of path /brokers/topics/EventServiceUpsertTopic/partitions/13/state with value {controller_epoch:1,leader:1,version:1,leader_epoch:3,isr:[0,1]} and expected version 3 succeeded, returning the new version: 4 [2014-07-24 14:44:26,994] [DEBUG] [kafka-scheduler-0] [k.controller.PartitionStateMachine] [Partition state machine on Controller 0]: After leader election, leader cache is updated to Map(Snipped(Leader:1,ISR:0,1,LeaderEpoch:3,ControllerEpoch:1),EndSnip) [2014-07-24 14:44:26,994] [INFO ] [kafka-scheduler-0] [kafka.controller.KafkaController] [Controller 0]: Partition [ EventServiceUpsertTopic,13] completed preferred replica leader election. New leader is 1 4. Broker 1 is still behind, but it sets the high water mark to 971!!! Broker 1 Log: [2014-07-24 14:44:26,999] [INFO ] [kafka-request-handler-6] [kafka.server.ReplicaFetcherManager] [ReplicaFetcherManager on broker 1] Removed fetcher for partitions [EventServiceUpsertTopic,13] [2014-07-24 14:44:27,000] [DEBUG] [kafka-request-handler-6] [kafka.cluster.Partition] Partition [EventServiceUpsertTopic,13] on broker 1: Old hw for partition [EventServiceUpsertTopic,13] is 970. New hw is -1. All leo's are -1,971 [2014-07-24 14:44:27,098] [DEBUG] [kafka-request-handler-3] [kafka.server.KafkaApis] [KafkaApi-1] Maybe update partition HW due to fetch request: Name: FetchRequest; Version: 0; CorrelationId: 1; ClientId: ReplicaFetcherThread-0-1; ReplicaId: 0; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: [EventServiceUpsertTopic,13] - PartitionFetchInfo(971,1048576), Snipped [2014-07-24 14:44:27,098] [DEBUG] [kafka-request-handler-3] [kafka.cluster.Partition] Partition [EventServiceUpsertTopic,13] on broker 1: Recording follower 0 position 971 for partition [ EventServiceUpsertTopic,13]. [2014-07-24 14:44:27,100] [DEBUG] [kafka-request-handler-3] [kafka.cluster.Partition] Partition [EventServiceUpsertTopic,13] on broker 1: Highwatermark for partition [EventServiceUpsertTopic
Re: Controlled shutdown and leader election issues
I think I've figured it out, and it still happens in the 0.8.1 branch. The code that is responsible for deleting the key from ZooKeeper is broken and will never be called when using the command line tool, so it will fail after the first use. I''ve created https://issues.apache.org/jira/browse/KAFKA-1365. On Fri, Apr 4, 2014 at 2:13 AM, Clark Breyman cl...@breyman.com wrote: Done. https://issues.apache.org/jira/browse/KAFKA-1360 On Thu, Apr 3, 2014 at 9:13 PM, Neha Narkhede neha.narkh...@gmail.com wrote: Is there a maven repo for pulling snapshot CI builds from? We still need to get the CI build setup going, could you please file a JIRA for this? Meanwhile, you will have to just build the code yourself for now, unfortunately. Thanks, Neha On Thu, Apr 3, 2014 at 12:01 PM, Clark Breyman cl...@breyman.com wrote: Thank Neha - Is there a maven repo for pulling snapshot CI builds from? Sorry if this is answered elsewhere. On Wed, Apr 2, 2014 at 7:16 PM, Neha Narkhede neha.narkh...@gmail.com wrote: I'm not so sure if I know the issue you are running into but we fixed a few bugs with similar symptoms and the fixes are on the 0.8.1 branch. It will be great if you give it a try to see if your issue is resolved. Thanks, Neha On Wed, Apr 2, 2014 at 12:59 PM, Clark Breyman cl...@breyman.com wrote: Was there an answer for 0.8.1 getting stuck in preferred leader election? I'm seeing this as well. Is there a JIRA ticket on this issue? On Fri, Mar 21, 2014 at 1:15 PM, Ryan Berdeen rberd...@hubspot.com wrote: So, for 0.8 without controlled.shutdown.enable, why would ShutdownBroker and restarting cause under-replication and producer exceptions? How can I upgrade gracefully? What's up with 0.8.1 getting stuck in preferred leader election? On Fri, Mar 21, 2014 at 12:18 AM, Neha Narkhede neha.narkh...@gmail.com wrote: Which brings up the question - Do we need ShutdownBroker anymore? It seems like the config should handle controlled shutdown correctly anyway. Thanks, Neha On Thu, Mar 20, 2014 at 9:16 PM, Jun Rao jun...@gmail.com wrote: We haven't been testing the ShutdownBroker command in 0.8.1 rigorously since in 0.8.1, one can do the controlled shutdown through the new config controlled.shutdown.enable. Instead of running the ShutdownBroker command during the upgrade, you can also wait until under replicated partition count drops to 0 after each restart before moving to the next one. Thanks, Jun On Thu, Mar 20, 2014 at 3:14 PM, Ryan Berdeen rberd...@hubspot.com wrote: While upgrading from 0.8.0 to 0.8.1 in place, I observed some surprising behavior using kafka.admin.ShutdownBroker. At the start, there were no underreplicated partitions. After running bin/kafka-run-class.sh kafka.admin.ShutdownBroker --broker 10 ... Partitions that had replicas on broker 10 were under-replicated: bin/kafka-topics.sh --describe --under-replicated-partitions ... Topic: analytics-activity Partition: 2 Leader: 12 Replicas: 12,10 Isr: 12 Topic: analytics-activity Partition: 6 Leader: 11 Replicas: 11,10 Isr: 11 Topic: analytics-activity Partition: 14 Leader: 14 Replicas: 14,10 Isr: 14 ... While restarting the broker process, many produce requests failed with kafka.common.UnknownTopicOrPartitionException. After each broker restart, I used the preferred leader election tool for all topics. Now, after finishing all of the broker restarts, the cluster seems to be stuck in leader election. Running the tool fails with kafka.admin.AdminOperationException: Preferred replica leader election currently in progress... Are any of these known issues? Is there a safer way to shutdown and restart brokers that does not cause producer failures and under-replicated partitions?
Re: Controlled shutdown and leader election issues
Done. https://issues.apache.org/jira/browse/KAFKA-1360 On Thu, Apr 3, 2014 at 9:13 PM, Neha Narkhede neha.narkh...@gmail.comwrote: Is there a maven repo for pulling snapshot CI builds from? We still need to get the CI build setup going, could you please file a JIRA for this? Meanwhile, you will have to just build the code yourself for now, unfortunately. Thanks, Neha On Thu, Apr 3, 2014 at 12:01 PM, Clark Breyman cl...@breyman.com wrote: Thank Neha - Is there a maven repo for pulling snapshot CI builds from? Sorry if this is answered elsewhere. On Wed, Apr 2, 2014 at 7:16 PM, Neha Narkhede neha.narkh...@gmail.com wrote: I'm not so sure if I know the issue you are running into but we fixed a few bugs with similar symptoms and the fixes are on the 0.8.1 branch. It will be great if you give it a try to see if your issue is resolved. Thanks, Neha On Wed, Apr 2, 2014 at 12:59 PM, Clark Breyman cl...@breyman.com wrote: Was there an answer for 0.8.1 getting stuck in preferred leader election? I'm seeing this as well. Is there a JIRA ticket on this issue? On Fri, Mar 21, 2014 at 1:15 PM, Ryan Berdeen rberd...@hubspot.com wrote: So, for 0.8 without controlled.shutdown.enable, why would ShutdownBroker and restarting cause under-replication and producer exceptions? How can I upgrade gracefully? What's up with 0.8.1 getting stuck in preferred leader election? On Fri, Mar 21, 2014 at 12:18 AM, Neha Narkhede neha.narkh...@gmail.com wrote: Which brings up the question - Do we need ShutdownBroker anymore? It seems like the config should handle controlled shutdown correctly anyway. Thanks, Neha On Thu, Mar 20, 2014 at 9:16 PM, Jun Rao jun...@gmail.com wrote: We haven't been testing the ShutdownBroker command in 0.8.1 rigorously since in 0.8.1, one can do the controlled shutdown through the new config controlled.shutdown.enable. Instead of running the ShutdownBroker command during the upgrade, you can also wait until under replicated partition count drops to 0 after each restart before moving to the next one. Thanks, Jun On Thu, Mar 20, 2014 at 3:14 PM, Ryan Berdeen rberd...@hubspot.com wrote: While upgrading from 0.8.0 to 0.8.1 in place, I observed some surprising behavior using kafka.admin.ShutdownBroker. At the start, there were no underreplicated partitions. After running bin/kafka-run-class.sh kafka.admin.ShutdownBroker --broker 10 ... Partitions that had replicas on broker 10 were under-replicated: bin/kafka-topics.sh --describe --under-replicated-partitions ... Topic: analytics-activity Partition: 2 Leader: 12 Replicas: 12,10 Isr: 12 Topic: analytics-activity Partition: 6 Leader: 11 Replicas: 11,10 Isr: 11 Topic: analytics-activity Partition: 14 Leader: 14 Replicas: 14,10 Isr: 14 ... While restarting the broker process, many produce requests failed with kafka.common.UnknownTopicOrPartitionException. After each broker restart, I used the preferred leader election tool for all topics. Now, after finishing all of the broker restarts, the cluster seems to be stuck in leader election. Running the tool fails with kafka.admin.AdminOperationException: Preferred replica leader election currently in progress... Are any of these known issues? Is there a safer way to shutdown and restart brokers that does not cause producer failures and under-replicated partitions?
Re: Controlled shutdown and leader election issues
Thank Neha - Is there a maven repo for pulling snapshot CI builds from? Sorry if this is answered elsewhere. On Wed, Apr 2, 2014 at 7:16 PM, Neha Narkhede neha.narkh...@gmail.comwrote: I'm not so sure if I know the issue you are running into but we fixed a few bugs with similar symptoms and the fixes are on the 0.8.1 branch. It will be great if you give it a try to see if your issue is resolved. Thanks, Neha On Wed, Apr 2, 2014 at 12:59 PM, Clark Breyman cl...@breyman.com wrote: Was there an answer for 0.8.1 getting stuck in preferred leader election? I'm seeing this as well. Is there a JIRA ticket on this issue? On Fri, Mar 21, 2014 at 1:15 PM, Ryan Berdeen rberd...@hubspot.com wrote: So, for 0.8 without controlled.shutdown.enable, why would ShutdownBroker and restarting cause under-replication and producer exceptions? How can I upgrade gracefully? What's up with 0.8.1 getting stuck in preferred leader election? On Fri, Mar 21, 2014 at 12:18 AM, Neha Narkhede neha.narkh...@gmail.com wrote: Which brings up the question - Do we need ShutdownBroker anymore? It seems like the config should handle controlled shutdown correctly anyway. Thanks, Neha On Thu, Mar 20, 2014 at 9:16 PM, Jun Rao jun...@gmail.com wrote: We haven't been testing the ShutdownBroker command in 0.8.1 rigorously since in 0.8.1, one can do the controlled shutdown through the new config controlled.shutdown.enable. Instead of running the ShutdownBroker command during the upgrade, you can also wait until under replicated partition count drops to 0 after each restart before moving to the next one. Thanks, Jun On Thu, Mar 20, 2014 at 3:14 PM, Ryan Berdeen rberd...@hubspot.com wrote: While upgrading from 0.8.0 to 0.8.1 in place, I observed some surprising behavior using kafka.admin.ShutdownBroker. At the start, there were no underreplicated partitions. After running bin/kafka-run-class.sh kafka.admin.ShutdownBroker --broker 10 ... Partitions that had replicas on broker 10 were under-replicated: bin/kafka-topics.sh --describe --under-replicated-partitions ... Topic: analytics-activity Partition: 2 Leader: 12 Replicas: 12,10 Isr: 12 Topic: analytics-activity Partition: 6 Leader: 11 Replicas: 11,10 Isr: 11 Topic: analytics-activity Partition: 14 Leader: 14 Replicas: 14,10 Isr: 14 ... While restarting the broker process, many produce requests failed with kafka.common.UnknownTopicOrPartitionException. After each broker restart, I used the preferred leader election tool for all topics. Now, after finishing all of the broker restarts, the cluster seems to be stuck in leader election. Running the tool fails with kafka.admin.AdminOperationException: Preferred replica leader election currently in progress... Are any of these known issues? Is there a safer way to shutdown and restart brokers that does not cause producer failures and under-replicated partitions?
Re: Controlled shutdown and leader election issues
Is there a maven repo for pulling snapshot CI builds from? We still need to get the CI build setup going, could you please file a JIRA for this? Meanwhile, you will have to just build the code yourself for now, unfortunately. Thanks, Neha On Thu, Apr 3, 2014 at 12:01 PM, Clark Breyman cl...@breyman.com wrote: Thank Neha - Is there a maven repo for pulling snapshot CI builds from? Sorry if this is answered elsewhere. On Wed, Apr 2, 2014 at 7:16 PM, Neha Narkhede neha.narkh...@gmail.com wrote: I'm not so sure if I know the issue you are running into but we fixed a few bugs with similar symptoms and the fixes are on the 0.8.1 branch. It will be great if you give it a try to see if your issue is resolved. Thanks, Neha On Wed, Apr 2, 2014 at 12:59 PM, Clark Breyman cl...@breyman.com wrote: Was there an answer for 0.8.1 getting stuck in preferred leader election? I'm seeing this as well. Is there a JIRA ticket on this issue? On Fri, Mar 21, 2014 at 1:15 PM, Ryan Berdeen rberd...@hubspot.com wrote: So, for 0.8 without controlled.shutdown.enable, why would ShutdownBroker and restarting cause under-replication and producer exceptions? How can I upgrade gracefully? What's up with 0.8.1 getting stuck in preferred leader election? On Fri, Mar 21, 2014 at 12:18 AM, Neha Narkhede neha.narkh...@gmail.com wrote: Which brings up the question - Do we need ShutdownBroker anymore? It seems like the config should handle controlled shutdown correctly anyway. Thanks, Neha On Thu, Mar 20, 2014 at 9:16 PM, Jun Rao jun...@gmail.com wrote: We haven't been testing the ShutdownBroker command in 0.8.1 rigorously since in 0.8.1, one can do the controlled shutdown through the new config controlled.shutdown.enable. Instead of running the ShutdownBroker command during the upgrade, you can also wait until under replicated partition count drops to 0 after each restart before moving to the next one. Thanks, Jun On Thu, Mar 20, 2014 at 3:14 PM, Ryan Berdeen rberd...@hubspot.com wrote: While upgrading from 0.8.0 to 0.8.1 in place, I observed some surprising behavior using kafka.admin.ShutdownBroker. At the start, there were no underreplicated partitions. After running bin/kafka-run-class.sh kafka.admin.ShutdownBroker --broker 10 ... Partitions that had replicas on broker 10 were under-replicated: bin/kafka-topics.sh --describe --under-replicated-partitions ... Topic: analytics-activity Partition: 2 Leader: 12 Replicas: 12,10 Isr: 12 Topic: analytics-activity Partition: 6 Leader: 11 Replicas: 11,10 Isr: 11 Topic: analytics-activity Partition: 14 Leader: 14 Replicas: 14,10 Isr: 14 ... While restarting the broker process, many produce requests failed with kafka.common.UnknownTopicOrPartitionException. After each broker restart, I used the preferred leader election tool for all topics. Now, after finishing all of the broker restarts, the cluster seems to be stuck in leader election. Running the tool fails with kafka.admin.AdminOperationException: Preferred replica leader election currently in progress... Are any of these known issues? Is there a safer way to shutdown and restart brokers that does not cause producer failures and under-replicated partitions?
Re: Controlled shutdown and leader election issues
Was there an answer for 0.8.1 getting stuck in preferred leader election? I'm seeing this as well. Is there a JIRA ticket on this issue? On Fri, Mar 21, 2014 at 1:15 PM, Ryan Berdeen rberd...@hubspot.com wrote: So, for 0.8 without controlled.shutdown.enable, why would ShutdownBroker and restarting cause under-replication and producer exceptions? How can I upgrade gracefully? What's up with 0.8.1 getting stuck in preferred leader election? On Fri, Mar 21, 2014 at 12:18 AM, Neha Narkhede neha.narkh...@gmail.com wrote: Which brings up the question - Do we need ShutdownBroker anymore? It seems like the config should handle controlled shutdown correctly anyway. Thanks, Neha On Thu, Mar 20, 2014 at 9:16 PM, Jun Rao jun...@gmail.com wrote: We haven't been testing the ShutdownBroker command in 0.8.1 rigorously since in 0.8.1, one can do the controlled shutdown through the new config controlled.shutdown.enable. Instead of running the ShutdownBroker command during the upgrade, you can also wait until under replicated partition count drops to 0 after each restart before moving to the next one. Thanks, Jun On Thu, Mar 20, 2014 at 3:14 PM, Ryan Berdeen rberd...@hubspot.com wrote: While upgrading from 0.8.0 to 0.8.1 in place, I observed some surprising behavior using kafka.admin.ShutdownBroker. At the start, there were no underreplicated partitions. After running bin/kafka-run-class.sh kafka.admin.ShutdownBroker --broker 10 ... Partitions that had replicas on broker 10 were under-replicated: bin/kafka-topics.sh --describe --under-replicated-partitions ... Topic: analytics-activity Partition: 2 Leader: 12 Replicas: 12,10 Isr: 12 Topic: analytics-activity Partition: 6 Leader: 11 Replicas: 11,10 Isr: 11 Topic: analytics-activity Partition: 14 Leader: 14 Replicas: 14,10 Isr: 14 ... While restarting the broker process, many produce requests failed with kafka.common.UnknownTopicOrPartitionException. After each broker restart, I used the preferred leader election tool for all topics. Now, after finishing all of the broker restarts, the cluster seems to be stuck in leader election. Running the tool fails with kafka.admin.AdminOperationException: Preferred replica leader election currently in progress... Are any of these known issues? Is there a safer way to shutdown and restart brokers that does not cause producer failures and under-replicated partitions?
Re: Controlled shutdown and leader election issues
I'm not so sure if I know the issue you are running into but we fixed a few bugs with similar symptoms and the fixes are on the 0.8.1 branch. It will be great if you give it a try to see if your issue is resolved. Thanks, Neha On Wed, Apr 2, 2014 at 12:59 PM, Clark Breyman cl...@breyman.com wrote: Was there an answer for 0.8.1 getting stuck in preferred leader election? I'm seeing this as well. Is there a JIRA ticket on this issue? On Fri, Mar 21, 2014 at 1:15 PM, Ryan Berdeen rberd...@hubspot.com wrote: So, for 0.8 without controlled.shutdown.enable, why would ShutdownBroker and restarting cause under-replication and producer exceptions? How can I upgrade gracefully? What's up with 0.8.1 getting stuck in preferred leader election? On Fri, Mar 21, 2014 at 12:18 AM, Neha Narkhede neha.narkh...@gmail.com wrote: Which brings up the question - Do we need ShutdownBroker anymore? It seems like the config should handle controlled shutdown correctly anyway. Thanks, Neha On Thu, Mar 20, 2014 at 9:16 PM, Jun Rao jun...@gmail.com wrote: We haven't been testing the ShutdownBroker command in 0.8.1 rigorously since in 0.8.1, one can do the controlled shutdown through the new config controlled.shutdown.enable. Instead of running the ShutdownBroker command during the upgrade, you can also wait until under replicated partition count drops to 0 after each restart before moving to the next one. Thanks, Jun On Thu, Mar 20, 2014 at 3:14 PM, Ryan Berdeen rberd...@hubspot.com wrote: While upgrading from 0.8.0 to 0.8.1 in place, I observed some surprising behavior using kafka.admin.ShutdownBroker. At the start, there were no underreplicated partitions. After running bin/kafka-run-class.sh kafka.admin.ShutdownBroker --broker 10 ... Partitions that had replicas on broker 10 were under-replicated: bin/kafka-topics.sh --describe --under-replicated-partitions ... Topic: analytics-activity Partition: 2 Leader: 12 Replicas: 12,10 Isr: 12 Topic: analytics-activity Partition: 6 Leader: 11 Replicas: 11,10 Isr: 11 Topic: analytics-activity Partition: 14 Leader: 14 Replicas: 14,10 Isr: 14 ... While restarting the broker process, many produce requests failed with kafka.common.UnknownTopicOrPartitionException. After each broker restart, I used the preferred leader election tool for all topics. Now, after finishing all of the broker restarts, the cluster seems to be stuck in leader election. Running the tool fails with kafka.admin.AdminOperationException: Preferred replica leader election currently in progress... Are any of these known issues? Is there a safer way to shutdown and restart brokers that does not cause producer failures and under-replicated partitions?
Controlled shutdown and leader election issues
While upgrading from 0.8.0 to 0.8.1 in place, I observed some surprising behavior using kafka.admin.ShutdownBroker. At the start, there were no underreplicated partitions. After running bin/kafka-run-class.sh kafka.admin.ShutdownBroker --broker 10 ... Partitions that had replicas on broker 10 were under-replicated: bin/kafka-topics.sh --describe --under-replicated-partitions ... Topic: analytics-activity Partition: 2 Leader: 12 Replicas: 12,10 Isr: 12 Topic: analytics-activity Partition: 6 Leader: 11 Replicas: 11,10 Isr: 11 Topic: analytics-activity Partition: 14 Leader: 14 Replicas: 14,10 Isr: 14 ... While restarting the broker process, many produce requests failed with kafka.common.UnknownTopicOrPartitionException. After each broker restart, I used the preferred leader election tool for all topics. Now, after finishing all of the broker restarts, the cluster seems to be stuck in leader election. Running the tool fails with kafka.admin.AdminOperationException: Preferred replica leader election currently in progress... Are any of these known issues? Is there a safer way to shutdown and restart brokers that does not cause producer failures and under-replicated partitions?
Re: Controlled shutdown and leader election issues
We haven't been testing the ShutdownBroker command in 0.8.1 rigorously since in 0.8.1, one can do the controlled shutdown through the new config controlled.shutdown.enable. Instead of running the ShutdownBroker command during the upgrade, you can also wait until under replicated partition count drops to 0 after each restart before moving to the next one. Thanks, Jun On Thu, Mar 20, 2014 at 3:14 PM, Ryan Berdeen rberd...@hubspot.com wrote: While upgrading from 0.8.0 to 0.8.1 in place, I observed some surprising behavior using kafka.admin.ShutdownBroker. At the start, there were no underreplicated partitions. After running bin/kafka-run-class.sh kafka.admin.ShutdownBroker --broker 10 ... Partitions that had replicas on broker 10 were under-replicated: bin/kafka-topics.sh --describe --under-replicated-partitions ... Topic: analytics-activity Partition: 2 Leader: 12 Replicas: 12,10 Isr: 12 Topic: analytics-activity Partition: 6 Leader: 11 Replicas: 11,10 Isr: 11 Topic: analytics-activity Partition: 14 Leader: 14 Replicas: 14,10 Isr: 14 ... While restarting the broker process, many produce requests failed with kafka.common.UnknownTopicOrPartitionException. After each broker restart, I used the preferred leader election tool for all topics. Now, after finishing all of the broker restarts, the cluster seems to be stuck in leader election. Running the tool fails with kafka.admin.AdminOperationException: Preferred replica leader election currently in progress... Are any of these known issues? Is there a safer way to shutdown and restart brokers that does not cause producer failures and under-replicated partitions?
Re: Controlled shutdown and leader election issues
Which brings up the question - Do we need ShutdownBroker anymore? It seems like the config should handle controlled shutdown correctly anyway. Thanks, Neha On Thu, Mar 20, 2014 at 9:16 PM, Jun Rao jun...@gmail.com wrote: We haven't been testing the ShutdownBroker command in 0.8.1 rigorously since in 0.8.1, one can do the controlled shutdown through the new config controlled.shutdown.enable. Instead of running the ShutdownBroker command during the upgrade, you can also wait until under replicated partition count drops to 0 after each restart before moving to the next one. Thanks, Jun On Thu, Mar 20, 2014 at 3:14 PM, Ryan Berdeen rberd...@hubspot.com wrote: While upgrading from 0.8.0 to 0.8.1 in place, I observed some surprising behavior using kafka.admin.ShutdownBroker. At the start, there were no underreplicated partitions. After running bin/kafka-run-class.sh kafka.admin.ShutdownBroker --broker 10 ... Partitions that had replicas on broker 10 were under-replicated: bin/kafka-topics.sh --describe --under-replicated-partitions ... Topic: analytics-activity Partition: 2 Leader: 12 Replicas: 12,10 Isr: 12 Topic: analytics-activity Partition: 6 Leader: 11 Replicas: 11,10 Isr: 11 Topic: analytics-activity Partition: 14 Leader: 14 Replicas: 14,10 Isr: 14 ... While restarting the broker process, many produce requests failed with kafka.common.UnknownTopicOrPartitionException. After each broker restart, I used the preferred leader election tool for all topics. Now, after finishing all of the broker restarts, the cluster seems to be stuck in leader election. Running the tool fails with kafka.admin.AdminOperationException: Preferred replica leader election currently in progress... Are any of these known issues? Is there a safer way to shutdown and restart brokers that does not cause producer failures and under-replicated partitions?
Regarding question Kafka metrics. Issue with unclean leader election rate
This is wrt question Kafka metrics. Issue with unclean leader election rate (http://www.marshut.com/inkitk/kafka-metrics-issue-with-unclean-leader-election-rate.html) We use Kafka 0.8.0
Kafka metrics. Issue with unclean leader election rate
Yes we use 0.8.0 release
Re: Kafka metrics. Issue with unclean leader election rate
What's the output of list topic on that topic? Thanks, Jun On Thu, Jan 23, 2014 at 10:27 AM, Arathi Maddula amadd...@boardreader.comwrote: Yes we use 0.8.0 release
Re: Kafka metrics. Issue with unclean leader election rate
Are you using the 0.8.0 release? Thanks, Jun On Wed, Jan 22, 2014 at 10:11 AM, Arathi Maddula amadd...@boardreader.comwrote: Hi, I have a 3 node Kafka cluster. Iam monitoring the JMX metrics of these 3 nodes. All topics and partitions are distributed across all 3 nodes. But node1 is idle while node2 and node3 are actively getting input data. As per http://kafka.apache.org/documentation.html#monitoring: Unclean leader election rate should be 0 and leader election rates should be 0 if everything is fine. But I have non-zero values for both these items in Node2 and Node3(PFA screenshot of the metrics). Could these values be the reason why node1 is idle? How can I find out what is wrong? Any pointers will be helpful. Thanks Arathi
Re: Cannot start Kafka 0.8, exception in leader election
Guozhang, In this case, it seems like the controller is trying to talk to itself as the controller establishes a channel with every broker in the cluster. Thanks, Neha On Mon, Oct 28, 2013 at 4:26 PM, Guozhang Wang wangg...@gmail.com wrote: Hello Nicholas, The log shows the controller cannot connect with one of the brokers due to ava.net.ConnectException: Connection refused. Are all your brokers in the same cluster? Guozhang On Mon, Oct 28, 2013 at 3:37 PM, Nicholas Tietz nti...@gmail.com wrote: Hi everyone, I'm having some problems getting Kafka 0.8 running. (I'm running this on Mac OS X 10.8 with the latest updates; Java version 1.6.0_51.) I followed the instructions from the quickstart for 0.8, both from-source and from-binary (cleaned up all related files each time). However, I am getting an exception when I try to start the Kafka server. Below is the relevant output from the console output. I cut the beginning and the end. The INFO 0 successfully elected as leader, ERROR while electing [...] and the exception recur until I terminate the process. Any help on this issue would be appreciated. Thanks, Nicholas Tietz Relevant console output: [...] [2013-10-28 18:35:03,055] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient) [2013-10-28 18:35:03,093] INFO Registered broker 0 at path /brokers/ids/0 with address coffeemachine:9092. (kafka.utils.ZkUtils$) [2013-10-28 18:35:03,094] INFO [Kafka Server 0], Connecting to ZK: localhost:2181 (kafka.server.KafkaServer) [2013-10-28 18:35:03,145] INFO Will not load MX4J, mx4j-tools.jar is not in the classpath (kafka.utils.Mx4jLoader$) [2013-10-28 18:35:03,151] INFO 0 successfully elected as leader (kafka.server.ZookeeperLeaderElector) [2013-10-28 18:35:03,263] ERROR Error while electing or becoming leader on broker 0 (kafka.server.ZookeeperLeaderElector) java.net.ConnectException: Connection refused at sun.nio.ch.Net.connect(Native Method) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:532) at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57) at kafka.controller.ControllerChannelManager.kafka$controller$ControllerChannelManager$$addNewBroker(ControllerChannelManager.scala:84) at kafka.controller.ControllerChannelManager$$anonfun$1.apply(ControllerChannelManager.scala:35) at kafka.controller.ControllerChannelManager$$anonfun$1.apply(ControllerChannelManager.scala:35) at scala.collection.immutable.Set$Set1.foreach(Set.scala:81) at kafka.controller.ControllerChannelManager.init(ControllerChannelManager.scala:35) at kafka.controller.KafkaController.startChannelManager(KafkaController.scala:503) at kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:467) at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:215) at kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:89) at kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:53) at kafka.server.ZookeeperLeaderElector.startup(ZookeeperLeaderElector.scala:43) at kafka.controller.KafkaController.startup(KafkaController.scala:396) at kafka.server.KafkaServer.startup(KafkaServer.scala:96) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34) at kafka.Kafka$.main(Kafka.scala:46) at kafka.Kafka.main(Kafka.scala) [2013-10-28 18:35:03,267] INFO New leader is 0 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener) [2013-10-28 18:35:03,273] INFO 0 successfully elected as leader (kafka.server.ZookeeperLeaderElector) [2013-10-28 18:35:03,276] INFO [Kafka Server 0], Started (kafka.server.KafkaServer) [2013-10-28 18:35:03,294] ERROR Error while electing or becoming leader on broker 0 (kafka.server.ZookeeperLeaderElector) java.net.ConnectException: Connection refused [...] -- -- Guozhang
Cannot start Kafka 0.8, exception in leader election
Hi everyone, I'm having some problems getting Kafka 0.8 running. (I'm running this on Mac OS X 10.8 with the latest updates; Java version 1.6.0_51.) I followed the instructions from the quickstart for 0.8, both from-source and from-binary (cleaned up all related files each time). However, I am getting an exception when I try to start the Kafka server. Below is the relevant output from the console output. I cut the beginning and the end. The INFO 0 successfully elected as leader, ERROR while electing [...] and the exception recur until I terminate the process. Any help on this issue would be appreciated. Thanks, Nicholas Tietz Relevant console output: [...] [2013-10-28 18:35:03,055] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient) [2013-10-28 18:35:03,093] INFO Registered broker 0 at path /brokers/ids/0 with address coffeemachine:9092. (kafka.utils.ZkUtils$) [2013-10-28 18:35:03,094] INFO [Kafka Server 0], Connecting to ZK: localhost:2181 (kafka.server.KafkaServer) [2013-10-28 18:35:03,145] INFO Will not load MX4J, mx4j-tools.jar is not in the classpath (kafka.utils.Mx4jLoader$) [2013-10-28 18:35:03,151] INFO 0 successfully elected as leader (kafka.server.ZookeeperLeaderElector) [2013-10-28 18:35:03,263] ERROR Error while electing or becoming leader on broker 0 (kafka.server.ZookeeperLeaderElector) java.net.ConnectException: Connection refused at sun.nio.ch.Net.connect(Native Method) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:532) at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57) at kafka.controller.ControllerChannelManager.kafka$controller$ControllerChannelManager$$addNewBroker(ControllerChannelManager.scala:84) at kafka.controller.ControllerChannelManager$$anonfun$1.apply(ControllerChannelManager.scala:35) at kafka.controller.ControllerChannelManager$$anonfun$1.apply(ControllerChannelManager.scala:35) at scala.collection.immutable.Set$Set1.foreach(Set.scala:81) at kafka.controller.ControllerChannelManager.init(ControllerChannelManager.scala:35) at kafka.controller.KafkaController.startChannelManager(KafkaController.scala:503) at kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:467) at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:215) at kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:89) at kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:53) at kafka.server.ZookeeperLeaderElector.startup(ZookeeperLeaderElector.scala:43) at kafka.controller.KafkaController.startup(KafkaController.scala:396) at kafka.server.KafkaServer.startup(KafkaServer.scala:96) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34) at kafka.Kafka$.main(Kafka.scala:46) at kafka.Kafka.main(Kafka.scala) [2013-10-28 18:35:03,267] INFO New leader is 0 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener) [2013-10-28 18:35:03,273] INFO 0 successfully elected as leader (kafka.server.ZookeeperLeaderElector) [2013-10-28 18:35:03,276] INFO [Kafka Server 0], Started (kafka.server.KafkaServer) [2013-10-28 18:35:03,294] ERROR Error while electing or becoming leader on broker 0 (kafka.server.ZookeeperLeaderElector) java.net.ConnectException: Connection refused [...]
Question about preferred replica leader election tool
Hi, We have three brokers in our kafka cluster. For all topics, the replica factor is two. Here is the distribution of leaders. After I ran the leader election tool, nothing happened. In this list, the first broker in ISR is the leader. I assume after running the tool, the first broker is replicas should be elected to the leader. Any idea why this does not work? Thanks. topic: example.topic.one partition: 0leader: 1 replicas: 2,1 isr: 1,2 topic: example.topic.one partition: 1leader: 2 replicas: 3,2 isr: 2,3 topic: example.topic.one partition: 2leader: 1 replicas: 1,3 isr: 1,3 topic: example.topic.twopartition: 0leader: 1 replicas: 3,1 isr: 1,3 topic: example.topic.twopartition: 1leader: 1 replicas: 1,2 isr: 1,2 topic: example.topic.twopartition: 2leader: 2 replicas: 2,3 isr: 2,3 Regards, Libo
Re: Question about preferred replica leader election tool
Your understanding is correct. Try running the tool again when both replicas are in ISR. If it still doesn't work, see if there is any error in the state-change log. Thanks, Jun On Tue, Aug 27, 2013 at 8:00 AM, Yu, Libo libo...@citi.com wrote: Hi, We have three brokers in our kafka cluster. For all topics, the replica factor is two. Here is the distribution of leaders. After I ran the leader election tool, nothing happened. In this list, the first broker in ISR is the leader. I assume after running the tool, the first broker is replicas should be elected to the leader. Any idea why this does not work? Thanks. topic: example.topic.one partition: 0leader: 1 replicas: 2,1 isr: 1,2 topic: example.topic.one partition: 1leader: 2 replicas: 3,2 isr: 2,3 topic: example.topic.one partition: 2leader: 1 replicas: 1,3 isr: 1,3 topic: example.topic.twopartition: 0leader: 1 replicas: 3,1 isr: 1,3 topic: example.topic.twopartition: 1leader: 1 replicas: 1,2 isr: 1,2 topic: example.topic.twopartition: 2leader: 2 replicas: 2,3 isr: 2,3 Regards, Libo
Re: Questions about the leader election
Thank you Neha, it's very helpful information! I also read this article http://engineering.linkedin.com/kafka/intra-cluster-replication-apache-kafka For the section Handling Failures, I am wondering some questions: 1. The leader and the ISR for each partition are also stored in Zookeeper and are used during the failover of the controller. What path does it be stored ? Is there any way to see the information on zookeeper ? 2. If the controller fail, how does the new controller be elected ? Does it be elected by Zookeeper ? How does Zookeeper decide which node should be the controller ? Many thanks! On Fri, Aug 23, 2013 at 11:59 AM, Neha Narkhede neha.narkh...@gmail.comwrote: The replication state machine and leader election mechanism is described here - http://kafka.apache.org/documentation.html#replication Let us know how the docs can be improved. Thanks, Neha On Thu, Aug 22, 2013 at 8:51 PM, James Wu jameswu...@gmail.com wrote: Hi, I am wondering what is the mechanism that Kafka elects the leader of partitions ? Does it handle by the controller process or ? If the leader crashed, who will decide the new leader ? and the process is running on Zookeeper or Kafka? Thanks. -- -- Friendly regards, *James Wu http://www.facebook.com/jameswu629* -- Friendly regards, *James Wu https://plus.google.com/u/0/100829801349304669533 *
Re: Questions about the leader election
ZK paths for 0.8 are documented in https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper If a controller fails, any live broker can become the leader. This is coordinated through an ephemeral path in ZK. Thanks, Jun On Fri, Aug 23, 2013 at 3:37 AM, James Wu jameswu...@gmail.com wrote: Thank you Neha, it's very helpful information! I also read this article http://engineering.linkedin.com/kafka/intra-cluster-replication-apache-kafka For the section Handling Failures, I am wondering some questions: 1. The leader and the ISR for each partition are also stored in Zookeeper and are used during the failover of the controller. What path does it be stored ? Is there any way to see the information on zookeeper ? 2. If the controller fail, how does the new controller be elected ? Does it be elected by Zookeeper ? How does Zookeeper decide which node should be the controller ? Many thanks! On Fri, Aug 23, 2013 at 11:59 AM, Neha Narkhede neha.narkh...@gmail.com wrote: The replication state machine and leader election mechanism is described here - http://kafka.apache.org/documentation.html#replication Let us know how the docs can be improved. Thanks, Neha On Thu, Aug 22, 2013 at 8:51 PM, James Wu jameswu...@gmail.com wrote: Hi, I am wondering what is the mechanism that Kafka elects the leader of partitions ? Does it handle by the controller process or ? If the leader crashed, who will decide the new leader ? and the process is running on Zookeeper or Kafka? Thanks. -- -- Friendly regards, *James Wu http://www.facebook.com/jameswu629* -- Friendly regards, *James Wu https://plus.google.com/u/0/100829801349304669533 *