@Nacho

> > - Brokers can't see the headers (part of the "V" black box)>
>


> (Also, it would be nice if we had a way to access the headers from the
> > brokers, something that is not trivial at this time with the current
> broker
> > architecture).
>
>

I think this can be addressed with broker interceptors which we touched on
in KIP-42
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-42%3A+Add+Producer+and+Consumer+Interceptors>
.

@Gwen

You are right that the wrapper thingy “works”, but there are some drawbacks
that Nacho and Radai have covered in detail that I can add a few more
comments to.

At LinkedIn, we *get by* without the proposed Kafka record headers by
dumping such metadata in one or two places:

   - Most of our applications use Avro, so for the most part we can use an
   explicit header field in the Avro schema. Topic owners are supposed to
   include this header in their schemas.
   - A prefix to the payload that primarily contains the schema’s ID so we
   can deserialize the Avro. (We could use this for other use-cases as well -
   i.e., move some of the above into this prefix blob.)

Dumping headers in the Avro schema pollutes the application’s data model
with data/service-infra-related fields that are unrelated to the underlying
topic; and forces the application to deserialize the entire blob whether or
not the headers are actually used. Conversely from an infrastructure
perspective, we would really like to not touch any application data. Our
infiltration of the application’s schema is a major reason why many at
LinkedIn sometimes assume that we (Kafka folks) are the shepherds for all
things Avro :)

Another drawback is that all this only works if everyone in the
organization is a good citizen and includes the header; and uses our
wrapper libraries - which is a good practice IMO - but may not always be
easy for open source projects that wish to directly use the Apache
producer/client. If instead we allow these headers to be inserted via
suitable interceptors outside the application payloads it would remove such
issues of separation in the data model and choice of clients.

Radai has enumerated a number of use-cases
<https://cwiki.apache.org/confluence/display/KAFKA/A+Case+for+Kafka+Headers>
and
I’m sure the broader community will have a lot more to add. The feature as
such would enable an ecosystem of plugins from different vendors that users
can mix and match in their data pipelines without requiring any specific
payload formats or client libraries.

Thanks,

Joel



> >
> >
> > On Wed, Oct 5, 2016 at 2:20 PM, Gwen Shapira <g...@confluent.io> wrote:
> >
> > > Since LinkedIn has some kind of wrapper thingy that adds the headers,
> > > where they could have added them to Apache Kafka - I'm very curious to
> > > hear what drove that decision and the pros/cons of managing the
> > > headers outside Kafka itself.
> > >
>

Reply via email to