Re: Proxying the Kafka protocol

2019-03-26 Thread Martin Gainty
mg>that depends on the underlying protocol you are attempting to proxy (see 
below)


From: James Grant 
Sent: Monday, March 25, 2019 1:21 PM
To: users@kafka.apache.org
Subject: Re: Proxying the Kafka protocol

Thank you all.

We have in the past exposed message streams backed by Kafka via a HTTP/POST
and Websocket service which worked very well. We were able to filter
messages based on schema compliance and it was very simple for the teams
that generate the data to use. It also had no trouble scaling to the 100K
messages / sec levels.

However not exposing the Kafka protocol has it's drawbacks when you try to
bring in other tools and teams who are already familiar with Kafka.

So we looked for something that would provide:
* Native Kafka protocol support

MG>is the protocol you are trying to proxy is tcp/ip..then try juniper tcp/ip 
proxy:
MG>https://www.juniper.net/documentation/en_US/junos/topics/concept/denial-of-service-network-tcp-proxy-understanding.html
Understanding TCP Proxy - Technical Documentation - Support - Juniper Networks 
- Juniper Networks - Network Security & 
Performance.<https://www.juniper.net/documentation/en_US/junos/topics/concept/denial-of-service-network-tcp-proxy-understanding.html>
Understanding TCP Proxy. A TCP proxy is a server that acts as an intermediary 
between a client and the destination server. Clients establish connections to 
the TCP proxy server, which then establishes a connection to the destination 
server.
www.juniper.net


MG>if the protocol you are trying to proxy is http OR https then try 
implementing squid
http://www.squid-cache.org/
[http://upload.wikimedia.org/wikipedia/en/thumb/b/b7/Squid-cache_logo.jpg/200px-Squid-cache_logo.jpg1zA]<http://www.squid-cache.org/>

squid : Optimising Web Delivery<http://www.squid-cache.org/>
Squid: Optimising Web Delivery. Squid is a caching proxy for the Web supporting 
HTTP, HTTPS, FTP, and more. It reduces bandwidth and improves response times by 
caching and reusing frequently-requested web pages.
www.squid-cache.org

MG>implementing either of the above proxies would whitelist/blacklist and 
implement NAT configurations for *all applications*
MG>if on the other hand all you need to do is rewrite metadata then stick with 
your "kafka proxy"

* Single endpoint access to make access between networks easier
* Schema (and possibly other business logic) enforcement.

I took a couple of weeks to create a PoC that works, at least, with the
producer and consumer command line tools. I have this working now and can
insert a predicate into the PRODUCE message handler that can reject
messages.

We plan to develop this further and take it beyond a PoC. I’d be keen to
understand if you think this kind of component could be a good addition to
the Kafka ecosystem? Are there any other capabilities that might be a good
fit with this proxy layer? And most importantly, does anybody foresee any
fundamental issues with this approach?

James Grant

Developer - Expedia Group


On Tue, 19 Mar 2019 at 16:13, Hans Jespersen  wrote:

>
>
> You might want to take a look at kafka-proxy ( see
> https://github.com/grepplabs/kafka-proxy <
> https://github.com/grepplabs/kafka-proxy>).
> It’s a true kafka protocol proxy and modified the metadata like advertized
> listeners so it works when there is no ip routing between the client and
> the brokers.
>
> -hans
>
>
>
>
>
> > On Mar 19, 2019, at 8:19 AM, James Grant  wrote:
> >
> > Hello,
> >
> > We would like to expose a Kafka cluster running on one network to clients
> > that are running on other networks without having to have full routing
> > between the two networks. In this case these networks are in different
> AWS
> > accounts but the concept applies more widely. We would like to access
> Kafka
> > over a single (or very few) host names.
> >
> > In addition we would like to filter incoming messages to enforce some
> level
> > of data quality and also impose some access control.
> >
> > A solution we are looking into is to provide a Kafka protocol level proxy
> > that presents to clients as a single node Kafka cluster holding all the
> > topics and partitions of the cluster behind it. This proxy would be able
> to
> > operate in a load balanced cluster behind a single DNS entry and would
> also
> > be able to intercept and filter/alter messages as they passed through.
> >
> > The advantages we see in this approach over the HTTP proxy is that it
> > presents the Kafka protocol whilst also meaning that we can use a typical
> > TCP level load balancer that it is easy to route connections to. This
> means
> > that we continue to use native Kafka clients.
> >
> > Does anything like this already exist? Does anybody think it would
> useful?
> > Does anybody know of any reason it would be impossible (or a bad idea) to
> > do?
> >
> > James Grant
> >
> > Developer - Expedia Group
>
>


Re: Proxying the Kafka protocol

2019-03-25 Thread James Grant
Thank you all.

We have in the past exposed message streams backed by Kafka via a HTTP/POST
and Websocket service which worked very well. We were able to filter
messages based on schema compliance and it was very simple for the teams
that generate the data to use. It also had no trouble scaling to the 100K
messages / sec levels.

However not exposing the Kafka protocol has it's drawbacks when you try to
bring in other tools and teams who are already familiar with Kafka.

So we looked for something that would provide:
* Native Kafka protocol support
* Single endpoint access to make access between networks easier
* Schema (and possibly other business logic) enforcement.

I took a couple of weeks to create a PoC that works, at least, with the
producer and consumer command line tools. I have this working now and can
insert a predicate into the PRODUCE message handler that can reject
messages.

We plan to develop this further and take it beyond a PoC. I’d be keen to
understand if you think this kind of component could be a good addition to
the Kafka ecosystem? Are there any other capabilities that might be a good
fit with this proxy layer? And most importantly, does anybody foresee any
fundamental issues with this approach?

James Grant

Developer - Expedia Group


On Tue, 19 Mar 2019 at 16:13, Hans Jespersen  wrote:

>
>
> You might want to take a look at kafka-proxy ( see
> https://github.com/grepplabs/kafka-proxy <
> https://github.com/grepplabs/kafka-proxy>).
> It’s a true kafka protocol proxy and modified the metadata like advertized
> listeners so it works when there is no ip routing between the client and
> the brokers.
>
> -hans
>
>
>
>
>
> > On Mar 19, 2019, at 8:19 AM, James Grant  wrote:
> >
> > Hello,
> >
> > We would like to expose a Kafka cluster running on one network to clients
> > that are running on other networks without having to have full routing
> > between the two networks. In this case these networks are in different
> AWS
> > accounts but the concept applies more widely. We would like to access
> Kafka
> > over a single (or very few) host names.
> >
> > In addition we would like to filter incoming messages to enforce some
> level
> > of data quality and also impose some access control.
> >
> > A solution we are looking into is to provide a Kafka protocol level proxy
> > that presents to clients as a single node Kafka cluster holding all the
> > topics and partitions of the cluster behind it. This proxy would be able
> to
> > operate in a load balanced cluster behind a single DNS entry and would
> also
> > be able to intercept and filter/alter messages as they passed through.
> >
> > The advantages we see in this approach over the HTTP proxy is that it
> > presents the Kafka protocol whilst also meaning that we can use a typical
> > TCP level load balancer that it is easy to route connections to. This
> means
> > that we continue to use native Kafka clients.
> >
> > Does anything like this already exist? Does anybody think it would
> useful?
> > Does anybody know of any reason it would be impossible (or a bad idea) to
> > do?
> >
> > James Grant
> >
> > Developer - Expedia Group
>
>


Re: Proxying the Kafka protocol

2019-03-19 Thread Hans Jespersen


You might want to take a look at kafka-proxy ( see 
https://github.com/grepplabs/kafka-proxy 
).
It’s a true kafka protocol proxy and modified the metadata like advertized 
listeners so it works when there is no ip routing between the client and the 
brokers.

-hans





> On Mar 19, 2019, at 8:19 AM, James Grant  wrote:
> 
> Hello,
> 
> We would like to expose a Kafka cluster running on one network to clients
> that are running on other networks without having to have full routing
> between the two networks. In this case these networks are in different AWS
> accounts but the concept applies more widely. We would like to access Kafka
> over a single (or very few) host names.
> 
> In addition we would like to filter incoming messages to enforce some level
> of data quality and also impose some access control.
> 
> A solution we are looking into is to provide a Kafka protocol level proxy
> that presents to clients as a single node Kafka cluster holding all the
> topics and partitions of the cluster behind it. This proxy would be able to
> operate in a load balanced cluster behind a single DNS entry and would also
> be able to intercept and filter/alter messages as they passed through.
> 
> The advantages we see in this approach over the HTTP proxy is that it
> presents the Kafka protocol whilst also meaning that we can use a typical
> TCP level load balancer that it is easy to route connections to. This means
> that we continue to use native Kafka clients.
> 
> Does anything like this already exist? Does anybody think it would useful?
> Does anybody know of any reason it would be impossible (or a bad idea) to
> do?
> 
> James Grant
> 
> Developer - Expedia Group



Re: Proxying the Kafka protocol

2019-03-19 Thread Matt Veitas
You might follow along with the Envoy proxy team and the work they are
doing to support the Kafka binary protocol:
https://github.com/envoyproxy/envoy/issues/2852

On Tue, Mar 19, 2019 at 11:46 AM Peter Bukowinski  wrote:

> https://docs.confluent.io/3.0.0/kafka-rest/docs/intro.html
>
> The Kafka REST proxy may be what you need. You can put multiple instances
> behind a load balancer to scale to your needs.
>
>
> -- Peter (from phone)
>
> > On Mar 19, 2019, at 8:30 AM, Ryanne Dolan  wrote:
> >
> > Hello James, I'm not aware of anything like that for Kafka, but you can
> use
> > MirrorMaker for network segmentation. With this approach you have one
> Kafka
> > cluster in each segment and a MM cluster in the more privileged segment.
> > You don't need to expose the privileged segment at all -- you just need
> to
> > let MM reach the external segment(s).
> >
> > Ryanne
> >
> >> On Tue, Mar 19, 2019, 10:20 AM James Grant  wrote:
> >>
> >> Hello,
> >>
> >> We would like to expose a Kafka cluster running on one network to
> clients
> >> that are running on other networks without having to have full routing
> >> between the two networks. In this case these networks are in different
> AWS
> >> accounts but the concept applies more widely. We would like to access
> Kafka
> >> over a single (or very few) host names.
> >>
> >> In addition we would like to filter incoming messages to enforce some
> level
> >> of data quality and also impose some access control.
> >>
> >> A solution we are looking into is to provide a Kafka protocol level
> proxy
> >> that presents to clients as a single node Kafka cluster holding all the
> >> topics and partitions of the cluster behind it. This proxy would be
> able to
> >> operate in a load balanced cluster behind a single DNS entry and would
> also
> >> be able to intercept and filter/alter messages as they passed through.
> >>
> >> The advantages we see in this approach over the HTTP proxy is that it
> >> presents the Kafka protocol whilst also meaning that we can use a
> typical
> >> TCP level load balancer that it is easy to route connections to. This
> means
> >> that we continue to use native Kafka clients.
> >>
> >> Does anything like this already exist? Does anybody think it would
> useful?
> >> Does anybody know of any reason it would be impossible (or a bad idea)
> to
> >> do?
> >>
> >> James Grant
> >>
> >> Developer - Expedia Group
> >>
>


Re: Proxying the Kafka protocol

2019-03-19 Thread Peter Bukowinski
https://docs.confluent.io/3.0.0/kafka-rest/docs/intro.html

The Kafka REST proxy may be what you need. You can put multiple instances 
behind a load balancer to scale to your needs.


-- Peter (from phone)

> On Mar 19, 2019, at 8:30 AM, Ryanne Dolan  wrote:
> 
> Hello James, I'm not aware of anything like that for Kafka, but you can use
> MirrorMaker for network segmentation. With this approach you have one Kafka
> cluster in each segment and a MM cluster in the more privileged segment.
> You don't need to expose the privileged segment at all -- you just need to
> let MM reach the external segment(s).
> 
> Ryanne
> 
>> On Tue, Mar 19, 2019, 10:20 AM James Grant  wrote:
>> 
>> Hello,
>> 
>> We would like to expose a Kafka cluster running on one network to clients
>> that are running on other networks without having to have full routing
>> between the two networks. In this case these networks are in different AWS
>> accounts but the concept applies more widely. We would like to access Kafka
>> over a single (or very few) host names.
>> 
>> In addition we would like to filter incoming messages to enforce some level
>> of data quality and also impose some access control.
>> 
>> A solution we are looking into is to provide a Kafka protocol level proxy
>> that presents to clients as a single node Kafka cluster holding all the
>> topics and partitions of the cluster behind it. This proxy would be able to
>> operate in a load balanced cluster behind a single DNS entry and would also
>> be able to intercept and filter/alter messages as they passed through.
>> 
>> The advantages we see in this approach over the HTTP proxy is that it
>> presents the Kafka protocol whilst also meaning that we can use a typical
>> TCP level load balancer that it is easy to route connections to. This means
>> that we continue to use native Kafka clients.
>> 
>> Does anything like this already exist? Does anybody think it would useful?
>> Does anybody know of any reason it would be impossible (or a bad idea) to
>> do?
>> 
>> James Grant
>> 
>> Developer - Expedia Group
>> 


Re: Proxying the Kafka protocol

2019-03-19 Thread Ryanne Dolan
Hello James, I'm not aware of anything like that for Kafka, but you can use
MirrorMaker for network segmentation. With this approach you have one Kafka
cluster in each segment and a MM cluster in the more privileged segment.
You don't need to expose the privileged segment at all -- you just need to
let MM reach the external segment(s).

Ryanne

On Tue, Mar 19, 2019, 10:20 AM James Grant  wrote:

> Hello,
>
> We would like to expose a Kafka cluster running on one network to clients
> that are running on other networks without having to have full routing
> between the two networks. In this case these networks are in different AWS
> accounts but the concept applies more widely. We would like to access Kafka
> over a single (or very few) host names.
>
> In addition we would like to filter incoming messages to enforce some level
> of data quality and also impose some access control.
>
> A solution we are looking into is to provide a Kafka protocol level proxy
> that presents to clients as a single node Kafka cluster holding all the
> topics and partitions of the cluster behind it. This proxy would be able to
> operate in a load balanced cluster behind a single DNS entry and would also
> be able to intercept and filter/alter messages as they passed through.
>
> The advantages we see in this approach over the HTTP proxy is that it
> presents the Kafka protocol whilst also meaning that we can use a typical
> TCP level load balancer that it is easy to route connections to. This means
> that we continue to use native Kafka clients.
>
> Does anything like this already exist? Does anybody think it would useful?
> Does anybody know of any reason it would be impossible (or a bad idea) to
> do?
>
> James Grant
>
> Developer - Expedia Group
>


Proxying the Kafka protocol

2019-03-19 Thread James Grant
Hello,

We would like to expose a Kafka cluster running on one network to clients
that are running on other networks without having to have full routing
between the two networks. In this case these networks are in different AWS
accounts but the concept applies more widely. We would like to access Kafka
over a single (or very few) host names.

In addition we would like to filter incoming messages to enforce some level
of data quality and also impose some access control.

A solution we are looking into is to provide a Kafka protocol level proxy
that presents to clients as a single node Kafka cluster holding all the
topics and partitions of the cluster behind it. This proxy would be able to
operate in a load balanced cluster behind a single DNS entry and would also
be able to intercept and filter/alter messages as they passed through.

The advantages we see in this approach over the HTTP proxy is that it
presents the Kafka protocol whilst also meaning that we can use a typical
TCP level load balancer that it is easy to route connections to. This means
that we continue to use native Kafka clients.

Does anything like this already exist? Does anybody think it would useful?
Does anybody know of any reason it would be impossible (or a bad idea) to
do?

James Grant

Developer - Expedia Group