>> Roman, Without replication, Kafka can lose messages permanently if the underlying storage system is damaged. Setting that aside, there are 2 ways that you can achieve HA now. In either case, you need to set up a Kafka cluster with at least 2 brokers.
Thanks for the clarification Jun. But even then, with replication, you could still lose messages, right? >> [...] Unconsumed messages on that broker will not be available for consumption until the broker comes up again. How does a Consumer fetch those "old" messages, given that it did already fetch "new" messages at a higher offset? What am I missing? >> The second approach is to use the built-in ZK-based software load balancer in Kafka (by setting zk.connect in the producer config). In this case, we rely on ZK to detect broker failures. This is the approach I've tried. I did use zj.connect. I started all locally: - 2 Kafka brokers (broker id=0 & 1, single partition) - 3 zookeeper nodes (all of these on a single box) with different election ports and different fs paths/ids. - 5 producer threads sending <1k msgs Then I killed one of the Kafka brokers, and all my producer threads died. What I'm I doing wrong? Thanks! Roman -----Original Message----- From: Jun Rao [mailto:[email protected]] Sent: Tuesday, August 30, 2011 11:44 AM To: [email protected] Subject: Re: HA / failover Roman, Without replication, Kafka can lose messages permanently if the underlying storage system is damaged. Setting that aside, there are 2 ways that you can achieve HA now. In either case, you need to set up a Kafka cluster with at least 2 brokers. The first approach is to put the hosts of all Kafka brokers in a VIP and rely on a hardware load balancer to do health check and routing. In the case, all producers send data through the VIP. If one of the brokers is down temporarily, the load balancer will direct the produce requests to the rest of the brokers. Unconsumed messages on that broker will not be available for consumption until the broker comes up again. The second approach is to use the built-in ZK-based software load balancer in Kafka (by setting zk.connect in the producer config). In this case, we rely on ZK to detect broker failures. Thanks, Jun On Tue, Aug 30, 2011 at 7:18 AM, Roman Garcia <[email protected]> wrote: > Hi, I'm trying to figure out how my prod environment should look like, > and still I don't seem to understand how to achieve HA / FO conditions. > > I realize this is going to be fully supported once there is > replication, right? > > But what about right now? How do you guys achieve this? > > I understand at least LinkedIn has a Kafka cluster deployed. > > - How do you guys ensure no messages get lost before flush to disk happens? > > - How did you manage to always have a broker available and redirect > producers to those during failure? > I've tried using Producer class with "sync" type and zookeeper, and > killing one of two brokers, but I've got an exception. Should I handle > and retry then? > > So, to sum up, any pointer on how should I setup a prod env will be > appreciated! Any doc I might have missed or a simple short example > would help. > Thanks! > Roman >
