At Lithium, we have multiple datacenters and we distcp our data across our
Hadoop clusters. We have 2 DCs in NA and 1 in EU. We have a non-redundant
direct connect from our EU cluster to one of our NA DCs. If and when this
fails, we have automatic failover to a VPN that goes over the internet. The
amount of data thats moving across the clusters is not much, so we can get
away with this. We don't have Kafka replication setup yet, but we will be
setting it up using Mirror Maker and the same constraints apply.

Of course opening up your Kafka cluster to be reachable by the internet
would work too, but IMHO a VPN is more secure and reduces the surface area
of your infrastructure that could come under attack. It sucks that you
can't get your executives on board for a p2p direct connect as that is the
best solution.

On Tue, Oct 6, 2015 at 5:48 PM, Gwen Shapira <g...@confluent.io> wrote:

> You can configure "advertised.host.name" for each broker, which is the
> name
> external consumers and producers will use to refer to the brokers.
>
> On Tue, Oct 6, 2015 at 3:31 PM, Tom Brown <tombrow...@gmail.com> wrote:
>
> > Hello,
> >
> > How do you consume a kafka topic from a remote location without a
> dedicated
> > connection? How do you protect the server?
> >
> > The setup: data streams into our datacenter. We process it, and publish
> it
> > to a kafka cluster. The consumer is located in a different datacenter
> with
> > no direct connection. The most efficient scenario would be to setup a
> > point-to-point link but that idea has no traction with our executives. We
> > can setup a VPN; While functional, our IT department assures us that it
> > won't be able to scale.
> >
> > What we're currently planning is to expose the kafka cluster IP addresses
> > to the internet, and only allow access via firewall. Each message will be
> > encrypted with a shared private key, so we're not worried about messages
> > being intercepted. What we are worried about is this: how brokers refer
> to
> > each other-- when a broker directs the consumer to the server that is in
> > charge of a particular region, does it use the host name (that could be
> > externally mapped to the public IP) or does it use the detected/private
> IP
> > address.
> >
> > What solution would you use to consume a remote cluster?
> >
> > --Tom
> >
>

Reply via email to