Re: [DISCUSS] KIP-82 - Add Record Headers

Michael Pearce Mon, 07 Nov 2016 23:10:11 -0800

For me 5c and 5a are almost identical.

The idea in the kip(5a) is that the core message just has a header length and 
then the header bytes, which are then in a pre agreed sub wire protocol as 
described.


5c instead of having a pre agreed wire format allows custom serialisation of a 
map of <int, byte[]>

The advantage of 5a is that it doesn't close the door ever, of the broker one 
day if needed understanding the headers.

Also 5a would allow for the future possibility of broker to handle any changes 
in the wireformat with upgrade and downgrading dependent on client version as 
we do the message.

The obvious disadvantage is no custom serialisation, but here we are just 
talking about how to serialise a vector of ints and byte[] is there that much 
benefit of having that custom?
________________________________________
From: Roger Hoover <roger.hoo...@gmail.com>
Sent: Tuesday, November 8, 2016 12:01:38 AM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-82 - Add Record Headers

Nacho,

Thanks for the summary.  #5 is not a binary decision, right?

5a) headers could be "fully" native as proposed - meaning both clients and
brokers would be able to list all keys.
5b) headers could be inside the existing value field.  in this case, only
clients would understand the container format and brokers would remain
unchanged.
5c) headers could be inside a new "metadata" field which would be opaque
bytes as far as the core broker protocol and on-disk format (not part of
the existing value field) but understood by clients.

I guess I'm asking what the reasons are to favor 5a over 5c.  For the case
of broker plugins, those plugins could also understand the common header
format.

Cheers,

Roger


On Mon, Nov 7, 2016 at 3:25 PM, Nacho Solis <nso...@linkedin.com.invalid>
wrote:

> Hey Roger.
>
> The original design involved:
> 1- a header set per message (an array of key+values)
> 2- a message level API to set/get headers.
> 3- byte[] header-values
> 4- int header-keys
> 5- headers encoded at the protocol/core level
>
>
> 1- I think most (not all) people would agree that having metadata per
> message is a good thing. Headers is one way to provide this.
>
> 2- There are many use cases for the headers. Quite number of them are at
> the message level. Given this we expect the best way to do this is by
> giving an API at the message level.  Agreement is not at 100% here on
> providing an API to get/set headers available to all.  Some believe this
> should be done purely by interceptors instead of direct API calls.  How
> this "map" is presented to the user via the API can still being fine tuned.
>
> 3- byte[] header values allow the encoding of anything.  This is a black
> box that does not need to be understood by anybody other than the
> plugin/code that wrote the header to start with.  A plugin, if it so
> wishes, could have a custom serializer.  So in here, if somebody wanted to
> use protobuf or avro or what have you you could do that.
>
> 4- int header keys are in the proposal. This offers a very compact
> representation with an easy ability to segment the space. Coordination is
> needed in one way or another, whether ints are used or strings are used.
> In our testing ints are faster than strings... is this performance boost
> worth it?  We have differing opinions.  A lot of people would argue that
> the flexibility of strings plus their ability to have long lengths make
> coordination easier, and that compression will take care of the overhead.
> I will make a quick note that HTTP2, which in theory uses strings as
> headers uses static header compression, effectively using ints for the core
> headers and a precomputed Huffman table for other strings. (
> https://tools.ietf.org/html/rfc7541).
>
> 5- This is the big sticking point.  Should headers be done at the protocol
> level (native) or as a container/wrapper inside the V part of the message.
>
> Benefits of doing container:
> - no modification to the broker
> - no modification to the open source client.
>
> Benefits of doing native:
> - core can use headers (compaction, exactly-once, etc)
> - broker can have plugins
> - open source client can have plugins
> - no need to worry about aliasing (interoperability between headers and no
> header supporting clients)
>
>
> There are a few other benefits that seem to come bundled into the native
> implementation but could be made available in the container format.
>
> For example, we could develop a shared open source client that offers a
> container format. This would allow us to:
> - have other open source projects depend on headers
> - create a community to share plugins
>
> This container format client could be completely separate from Apache Kafka
> or it could be part of Apache Kafka. The people that would like to use
> headers can use that client, and the people that think it's an overhead can
> use the one without.
>
>
> Nacho
>
>
> On Mon, Nov 7, 2016 at 2:54 PM, Roger Hoover <roger.hoo...@gmail.com>
> wrote:
>
> > Radai,
> >
> > If the broker must parse headers, then I agree that the serialization
> > probably should not be configurable.  However, the if the broker sees
> > metadata only as bytes and clients are the only components that serialize
> > and deserialize the headers, then pluggability seems reasonable.
> >
> > Cheers,
> >
> > Roger
> >
> > On Sun, Nov 6, 2016 at 9:25 AM, radai <radai.rosenbl...@gmail.com>
> wrote:
> >
> > > making header _key_ serialization configurable potentially undermines
> the
> > > board usefulness of the feature (any point along the path must be able
> to
> > > read the header keys. the values may be whatever and require more
> > intimate
> > > knowledge of the code that produced specific headers, but keys should
> be
> > > universally readable).
> > >
> > > it would also make it hard to write really portable plugins - say i
> > wrote a
> > > large message splitter/combiner - if i rely on key "largeMessage" and
> > > values of the form "1/20" someone who uses (contrived example)
> > Map<Byte[],
> > > Double> wouldnt be able to re-use my code.
> > >
> > > not the end of a the world within an organization, but problematic if
> you
> > > want to enable an ecosystem
> > >
> > > On Thu, Nov 3, 2016 at 2:04 PM, Roger Hoover <roger.hoo...@gmail.com>
> > > wrote:
> > >
> > > >  As others have laid out, I see strong reasons for a common message
> > > > metadata structure for the Kafka ecosystem.  In particular, I've seen
> > > that
> > > > even within a single organization, infrastructure teams often own the
> > > > message metadata while application teams own the application-level
> data
> > > > format.  Allowing metadata and content to have different structure
> and
> > > > evolve separately is very helpful for this.  Also, I think there's a
> > lot
> > > of
> > > > value to having a common metadata structure shared across the Kafka
> > > > ecosystem so that tools which leverage metadata can more easily be
> > shared
> > > > across organizations and integrated together.
> > > >
> > > > The question is, where does the metadata structure belong?  Here's my
> > > take:
> > > >
> > > > We change the Kafka wire and on-disk format to from a (key, value)
> > model
> > > to
> > > > a (key, metadata, value) model where all three are byte arrays from
> the
> > > > brokers point of view.  The primary reason for this is that it
> > provides a
> > > > backward compatible migration path forward.  Producers can start
> > > populating
> > > > metadata fields before all consumers understand the metadata
> structure.
> > > > For people who already have custom envelope structures, they can
> > populate
> > > > their existing structure and the new structure for a while as they
> make
> > > the
> > > > transition.
> > > >
> > > > We could stop there and let the clients plug in a KeySerializer,
> > > > MetadataSerializer, and ValueSerializer but I think it is also be
> > useful
> > > to
> > > > have a default MetadataSerializer that implements a key-value model
> > > similar
> > > > to AMQP or HTTP headers.  Or we could go even further and prescribe a
> > > > Map<String, byte[]> or Map<String, String> data model for headers in
> > the
> > > > clients (while still allowing custom serialization of the header data
> > > > model).
> > > >
> > > > I think this would address Radai's concerns:
> > > > 1. All client code would not need to be updated to know about the
> > > > container.
> > > > 2. Middleware friendly clients would have a standard header data
> model
> > to
> > > > work with.
> > > > 3. KIP is required both b/c of broker changes and because of client
> API
> > > > changes.
> > > >
> > > > Cheers,
> > > >
> > > > Roger
> > > >
> > > >
> > > > On Wed, Nov 2, 2016 at 4:38 PM, radai <radai.rosenbl...@gmail.com>
> > > wrote:
> > > >
> > > > > my biggest issues with a "standard" wrapper format:
> > > > >
> > > > > 1. _ALL_ client _CODE_ (as opposed to kafka lib version) must be
> > > updated
> > > > to
> > > > > know about the container, because any old naive code trying to
> > directly
> > > > > deserialize its own payload would keel over and die (it needs to
> know
> > > to
> > > > > deserialize a container, and then dig in there for its payload).
> > > > > 2. in order to write middleware-friendly clients that utilize such
> a
> > > > > container one would basically have to write their own
> > producer/consumer
> > > > API
> > > > > on top of the open source kafka one.
> > > > > 3. if you were going to go with a wrapper format you really dont
> need
> > > to
> > > > > bother with a kip (just open source your own client stack from #2
> > above
> > > > so
> > > > > others could stop re-inventing it)
> > > > >
> > > > > On Wed, Nov 2, 2016 at 4:25 PM, James Cheng <wushuja...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > How exactly would this work? Or maybe that's out of scope for
> this
> > > > email.
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Nacho (Ignacio) Solis
> Kafka
> nso...@linkedin.com
>
The information contained in this email is strictly confidential and for the 
use of the addressee only, unless otherwise indicated. If you are not the 
intended recipient, please do not read, copy, use or disclose to others this 
message or any attachment. Please also notify the sender by replying to this 
email or by telephone (+44(020 7896 0011) and then delete the email and any 
copies of it. Opinions, conclusion (etc) that do not relate to the official 
business of this company shall be understood as neither given nor endorsed by 
it. IG is a trading name of IG Markets Limited (a company registered in England 
and Wales, company number 04008957) and IG Index Limited (a company registered 
in England and Wales, company number 01190902). Registered address at Cannon 
Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited 
(register number 195355) and IG Index Limited (register number 114059) are 
authorised and regulated by the Financial Conduct Authority.

Re: [DISCUSS] KIP-82 - Add Record Headers

Reply via email to