RE: HA / failover

Roman Garcia Tue, 30 Aug 2011 08:37:37 -0700

>> Roman,
Without replication, Kafka can lose messages permanently if the
underlying storage system is damaged. Setting that aside, there are 2
ways that you can achieve HA now. In either case, you need to set up a
Kafka cluster with at least 2 brokers.

Thanks for the clarification Jun. But even then, with replication, you
could still lose messages, right?

>> [...] Unconsumed messages on that broker will not be available for
consumption until the broker comes up again.

How does a Consumer fetch those "old" messages, given that it did
already fetch "new" messages at a higher offset? What am I missing?

>> The second approach is to use the built-in ZK-based software load
balancer in Kafka (by setting zk.connect in the producer config). In
this case, we rely on ZK to detect broker failures.

This is the approach I've tried. I did use zj.connect.
I started all locally:
- 2 Kafka brokers (broker id=0 & 1, single partition)
- 3 zookeeper nodes (all of these on a single box) with different
election ports and different fs paths/ids.
- 5 producer threads sending <1k msgs

Then I killed one of the Kafka brokers, and all my producer threads
died.

What I'm I doing wrong?

Thanks!
Roman

-----Original Message-----
From: Jun Rao [mailto:[email protected]] 
Sent: Tuesday, August 30, 2011 11:44 AM
To: [email protected]
Subject: Re: HA / failover

Roman,

Without replication, Kafka can lose messages permanently if the
underlying storage system is damaged. Setting that aside, there are 2
ways that you can achieve HA now. In either case, you need to set up a
Kafka cluster with at least 2 brokers.

The first approach is to put the hosts of all Kafka brokers in a VIP and
rely on a hardware load balancer to do health check and routing. In the
case, all producers send data through the VIP. If one of the brokers is
down temporarily, the load balancer will direct the produce requests to
the rest of the brokers. Unconsumed messages on that broker will not be
available for consumption until the broker comes up again.

 The second approach is to use the built-in ZK-based software load
balancer in Kafka (by setting zk.connect in the producer config). In
this case, we rely on ZK to detect broker failures.

Thanks,

Jun

On Tue, Aug 30, 2011 at 7:18 AM, Roman Garcia <[email protected]>
wrote:

> Hi, I'm trying to figure out how my prod environment should look like,

> and still I don't seem to understand how to achieve HA / FO
conditions.
>
> I realize this is going to be fully supported once there is 
> replication, right?
>
> But what about right now? How do you guys achieve this?
>
> I understand at least LinkedIn has a Kafka cluster deployed.
>
> - How do you guys ensure no messages get lost before flush to disk
happens?
>
> - How did you manage to always have a broker available and redirect 
> producers to those during failure?
> I've tried using Producer class with "sync" type and zookeeper, and 
> killing one of two brokers, but I've got an exception. Should I handle

> and retry then?
>
> So, to sum up, any pointer on how should I setup a prod env will be 
> appreciated! Any doc I might have missed or a simple short example 
> would help.
> Thanks!
> Roman
>

RE: HA / failover

Reply via email to