Re: Re: [DISCUSS] KIP-905: Broker interceptors

2023-10-23 Thread Andrew Otto
FWIW, this would be very useful for the Wikimedia Foundation's Event
Platform.  We have some requirements

for our event stream producers, and not having to re-implement this logic
in multiple programming languages and frameworks would be really nice.

I had doubts about making brokers more complex as well, but
> One benefit of pluggable interceptors is that they don't affect users who
don't need and don't use them, so the Kafka robustness remains at the
baseline

were my thoughts too.  This is an opt-in feature.

It would be nice if there was configuration to includelist or
excludelist certain topics from passing through the interceptor logic.  I
suppose the custom interceptor implementation could just pass if the topic
shouldn't be intercepted.  But I think I'd prefer if custom code execution
could be avoided for certain topics, just in case there is a bug deployed
in the custom interceptor.





On Fri, Oct 20, 2023 at 11:19 AM Ivan Yurchenko  wrote:

> Hi David and Ahmed,
>
> First, thank you David for the KIP. It would be very valuable for multiple
> use cases. Products like Conduktor Gateway [1] validate the demand and
> offer many potential use cases [2].
>
> Now, I understand Ahmed's concerns about possible in-band interruptions,
> the are valid. However, certain use cases cannot be handled without
> intercepting the request flow to Kafka brokers (for example, the
> broker-side schema validation.) A number of open source and proprietary
> proxy solutions exist and they have their user base, for which the benefits
> outweigh the risks. In the current state, the broker itself already has
> injection points for custom code executed in the hot path of message
> handling, namely the Authorizer.
>
> One benefit of pluggable interceptors is that they don't affect users who
> don't need and don't use them, so the Kafka robustness remains at the
> baseline. Those who need this functionality, can make their conscious
> decision. So to me it seems this will be positive to Kafka community and
> ecosystem.
>
> Best regards,
> Ivan
>
> [1] https://docs.conduktor.io/gateway/
> [2] https://marketplace.conduktor.io/
>
> On 2023/02/10 16:41:01 David Mariassy wrote:
> > Hi Ahmed,
> >
> > Thanks for taking a look at the KIP, and for your insightful feedback!
> >
> > I don't disagree with the sentiment that in-band interceptors could be a
> > potential source of bugs in a cluster.
> >
> > Having said that, I don't necessarily think that an in-band interceptor
> is
> > significantly riskier than an out-of-band pre-processor. Let's take the
> > example of platform-wide privacy scrubbing. In my opinion it doesn't
> really
> > matter if this feature is deployed as an out-of-band stream processor app
> > that consumes from all topics OR if the logic is implemented as an in-ban
> > interceptor. Either way, a faulty release of the scrubber will result in
> > the platform-wide disruption of data flows. Thus, I'd argue that from the
> > perspective of the platform's overall health, the level of risk is very
> > comparable in both cases. However in-band interceptors have a couple of
> > advantages in my opinion:
> > 1. They are significantly cheaper (don't require duplicating data between
> > raw and sanitized topics. There are also a lot of potential savings in
> > network costs)
> > 2. They are easier to maintain (no need to set up additional
> infrastructure
> > for out-of-band processing)
> > 3. They can provide accurate produce responses to clients (since there is
> > no downstream processing that could render a client's messages invalid
> > async)
> >
> > Also, in-band interceptors could be as safe or risky as their authors
> > design them to be. There's nothing stopping someone from catching all
> > exceptions in a `processRecord` method, and letting all unprocessed
> > messages go through or sending them to a DLQ. Once the interceptor is
> > fixed, those unprocessed messages could get re-ingested into Kafka to
> > re-attempt pre-processing.
> >
> > Thanks and happy Friday,
> > David
> >
> >
> >
> >
> >
> > On Fri, Feb 10, 2023 at 8:23 AM Ahmed Abdalla 
> > wrote:
> >
> > > Hi David,
> > >
> > > That's a very interesting KIP and I wanted to share my two cents. I
> believe
> > > there's a lot of value and use cases for the ability to intercept,
> mutate
> > > and filter Kafka's messages, however I'm not sure if trying to achieve
> that
> > > via in-band interceptors is the best approach for this.
> > >
> > >- My mental model around one of Kafka's core values is the brokers'
> > >focus on a single functionality (more or less): highly available and
> > > fault
> > >tolerant commit log. I see this in many design decisions such as
> > >off-loading responsibilities to the clients (partitioner, assignor,
> > >consumer groups coordination etc).
> > >- And the impact of this KIP on the Kafka server would be adding
> another
> > >

RE: Re: [DISCUSS] KIP-905: Broker interceptors

2023-10-20 Thread Ivan Yurchenko
Hi David and Ahmed,

First, thank you David for the KIP. It would be very valuable for multiple use 
cases. Products like Conduktor Gateway [1] validate the demand and offer many 
potential use cases [2].

Now, I understand Ahmed's concerns about possible in-band interruptions, the 
are valid. However, certain use cases cannot be handled without intercepting 
the request flow to Kafka brokers (for example, the broker-side schema 
validation.) A number of open source and proprietary proxy solutions exist and 
they have their user base, for which the benefits outweigh the risks. In the 
current state, the broker itself already has injection points for custom code 
executed in the hot path of message handling, namely the Authorizer.

One benefit of pluggable interceptors is that they don't affect users who don't 
need and don't use them, so the Kafka robustness remains at the baseline. Those 
who need this functionality, can make their conscious decision. So to me it 
seems this will be positive to Kafka community and ecosystem.

Best regards,
Ivan

[1] https://docs.conduktor.io/gateway/
[2] https://marketplace.conduktor.io/

On 2023/02/10 16:41:01 David Mariassy wrote:
> Hi Ahmed,
> 
> Thanks for taking a look at the KIP, and for your insightful feedback!
> 
> I don't disagree with the sentiment that in-band interceptors could be a
> potential source of bugs in a cluster.
> 
> Having said that, I don't necessarily think that an in-band interceptor is
> significantly riskier than an out-of-band pre-processor. Let's take the
> example of platform-wide privacy scrubbing. In my opinion it doesn't really
> matter if this feature is deployed as an out-of-band stream processor app
> that consumes from all topics OR if the logic is implemented as an in-ban
> interceptor. Either way, a faulty release of the scrubber will result in
> the platform-wide disruption of data flows. Thus, I'd argue that from the
> perspective of the platform's overall health, the level of risk is very
> comparable in both cases. However in-band interceptors have a couple of
> advantages in my opinion:
> 1. They are significantly cheaper (don't require duplicating data between
> raw and sanitized topics. There are also a lot of potential savings in
> network costs)
> 2. They are easier to maintain (no need to set up additional infrastructure
> for out-of-band processing)
> 3. They can provide accurate produce responses to clients (since there is
> no downstream processing that could render a client's messages invalid
> async)
> 
> Also, in-band interceptors could be as safe or risky as their authors
> design them to be. There's nothing stopping someone from catching all
> exceptions in a `processRecord` method, and letting all unprocessed
> messages go through or sending them to a DLQ. Once the interceptor is
> fixed, those unprocessed messages could get re-ingested into Kafka to
> re-attempt pre-processing.
> 
> Thanks and happy Friday,
> David
> 
> 
> 
> 
> 
> On Fri, Feb 10, 2023 at 8:23 AM Ahmed Abdalla 
> wrote:
> 
> > Hi David,
> >
> > That's a very interesting KIP and I wanted to share my two cents. I believe
> > there's a lot of value and use cases for the ability to intercept, mutate
> > and filter Kafka's messages, however I'm not sure if trying to achieve that
> > via in-band interceptors is the best approach for this.
> >
> >- My mental model around one of Kafka's core values is the brokers'
> >focus on a single functionality (more or less): highly available and
> > fault
> >tolerant commit log. I see this in many design decisions such as
> >off-loading responsibilities to the clients (partitioner, assignor,
> >consumer groups coordination etc).
> >- And the impact of this KIP on the Kafka server would be adding another
> >moving part to their "state of the world" that they try to maintain.
> > What
> >if an interceptor goes bad? What if there're version-mismatch? etc, a
> > lot
> >of responsibilities that can be managed very efficiently out-of-band
> > IMHO.
> >- The comparison to NginX and Kubernetes is IMHO comparing apples to
> >oranges
> >   - NginX
> >  - Doesn't maintain persisted data.
> >  - It's designed as a middleware, it's an interceptor by nature.
> >   - Kubernetes
> >  - CRDs extend the API surface, they don't "augment" existing APIs.
> >  I think admission webhooks
> >  <
> > https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/
> > >
> > is
> >  Kubernetes' solution for providing interceptors.
> >  - The admission webhooks are out-of-band, and in fact they're a
> >  great example of "opening up your cluster for extensibility"
> > going wrong.
> >  Installing a misbehaving webhook can brick the whole cluster.
> >
> > As I mentioned, I see a value for users being able to intercept and
> > transform Kafka's messages. But I'm worried that having this as a