At Lithium, we have multiple datacenters and we distcp our data across our Hadoop clusters. We have 2 DCs in NA and 1 in EU. We have a non-redundant direct connect from our EU cluster to one of our NA DCs. If and when this fails, we have automatic failover to a VPN that goes over the internet. The amount of data thats moving across the clusters is not much, so we can get away with this. We don't have Kafka replication setup yet, but we will be setting it up using Mirror Maker and the same constraints apply.
Of course opening up your Kafka cluster to be reachable by the internet would work too, but IMHO a VPN is more secure and reduces the surface area of your infrastructure that could come under attack. It sucks that you can't get your executives on board for a p2p direct connect as that is the best solution. On Tue, Oct 6, 2015 at 5:48 PM, Gwen Shapira <g...@confluent.io> wrote: > You can configure "advertised.host.name" for each broker, which is the > name > external consumers and producers will use to refer to the brokers. > > On Tue, Oct 6, 2015 at 3:31 PM, Tom Brown <tombrow...@gmail.com> wrote: > > > Hello, > > > > How do you consume a kafka topic from a remote location without a > dedicated > > connection? How do you protect the server? > > > > The setup: data streams into our datacenter. We process it, and publish > it > > to a kafka cluster. The consumer is located in a different datacenter > with > > no direct connection. The most efficient scenario would be to setup a > > point-to-point link but that idea has no traction with our executives. We > > can setup a VPN; While functional, our IT department assures us that it > > won't be able to scale. > > > > What we're currently planning is to expose the kafka cluster IP addresses > > to the internet, and only allow access via firewall. Each message will be > > encrypted with a shared private key, so we're not worried about messages > > being intercepted. What we are worried about is this: how brokers refer > to > > each other-- when a broker directs the consumer to the server that is in > > charge of a particular region, does it use the host name (that could be > > externally mapped to the public IP) or does it use the detected/private > IP > > address. > > > > What solution would you use to consume a remote cluster? > > > > --Tom > > >