Re: [DISCUSS] KIP-82 - Add Record Headers

Michael Pearce Fri, 18 Nov 2016 09:32:18 -0800

#jay #jun any concerns on 1 and 2 still?

@all
To get this moving along a bit more I'd also like to ask to get clarity on the 
below last points:


3) I believe we're all roughly happy with the header value being a byte[]?

4) I believe consensus has been for an namespace based int approach {int,int} 
for the key. Any objections if this is what we go with?

5) as we have if assumption in (4)  is correct, {int,int} keys.
Should both int's be int16 or int32?
I'm for them being int16(2 bytes) as combined is space of 4bytes as per 
original and gives plenty of combinations for the foreseeable, and keeps the 
overhead small.

Do we see any benefit in another kip call to discuss these at all?

Cheers
Mike
________________________________________
From: K Burstev <[email protected]>
Sent: Friday, November 18, 2016 7:07:07 AM
To: [email protected]
Subject: Re: [DISCUSS] KIP-82 - Add Record Headers

For what it is worth also i agree. As a user:

 1) Yes - Headers are worthwhile
 2) Yes - Headers should be a top level option

14.11.2016, 21:15, "Ignacio Solis" <[email protected]>:
> 1) Yes - Headers are worthwhile
> 2) Yes - Headers should be a top level option
>
> On Mon, Nov 14, 2016 at 9:16 AM, Michael Pearce <[email protected]>
> wrote:
>
>>  Hi Roger,
>>
>>  The kip details/examples the original proposal for key spacing , not the
>>  new mentioned as per discussion namespace idea.
>>
>>  We will need to update the kip, when we get agreement this is a better
>>  approach (which seems to be the case if I have understood the general
>>  feeling in the conversation)
>>
>>  Re the variable ints, at very early stage we did think about this. I think
>>  the added complexity for the saving isn't worth it. I'd rather go with, if
>>  we want to reduce overheads and size int16 (2bytes) keys as it keeps it
>>  simple.
>>
>>  On the note of no headers, there is as per the kip as we use an attribute
>>  bit to denote if headers are present or not as such provides a zero
>>  overhead currently if headers are not used.
>>
>>  I think as radai mentions would be good first if we can get clarity if do
>>  we now have general consensus that (1) headers are worthwhile and useful,
>>  and (2) we want it as a top level entity.
>>
>>  Just to state the obvious i believe (1) headers are worthwhile and (2)
>>  agree as a top level entity.
>>
>>  Cheers
>>  Mike
>>  ________________________________________
>>  From: Roger Hoover <[email protected]>
>>  Sent: Wednesday, November 9, 2016 9:10:47 PM
>>  To: [email protected]
>>  Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>>
>>  Sorry for going a little in the weeds but thanks for the replies regarding
>>  varint.
>>
>>  Agreed that a prefix and {int, int} can be the same. It doesn't look like
>>  that's what the KIP is saying the "Open" section. The example shows
>>  2100001
>>  for New Relic and 210002 for App Dynamics implying that the New Relic
>>  organization will have only a single header id to work with. Or is 2100001
>>  a prefix? The main point of a namespace or prefix is to reduce the
>>  overhead of config mapping or registration depending on how
>>  namespaces/prefixes are managed.
>>
>>  Would love to hear more feedback on the higher-level questions though...
>>
>>  Cheers,
>>
>>  Roger
>>
>>  On Wed, Nov 9, 2016 at 11:38 AM, radai <[email protected]> wrote:
>>
>>  > I think this discussion is getting a bit into the weeds on technical
>>  > implementation details.
>>  > I'd liek to step back a minute and try and establish where we are in the
>>  > larger picture:
>>  >
>>  > (re-wording nacho's last paragraph)
>>  > 1. are we all in agreement that headers are a worthwhile and useful
>>  > addition to have? this was contested early on
>>  > 2. are we all in agreement on headers as top level entity vs headers
>>  > squirreled-away in V?
>>  >
>>  > if there are still concerns around these #2 points (#jay? #jun?)?
>>  >
>>  > (and now back to our normal programming ...)
>>  >
>>  > varints are nice. having said that, its adding complexity (see
>>  > https://github.com/addthis/stream-lib/blob/master/src/
>>  > main/java/com/clearspring/analytics/util/Varint.java
>>  > as 1st google result) and would require anyone writing other clients (C?
>>  > Python? Go? Bash? ;-) ) to get/implement the same, and for relatively
>>  > little gain (int vs string is order of magnitude, this isnt).
>>  >
>>  > int namespacing vs {int, int} namespacing are basically the same thing -
>>  > youre just namespacing an int64 and giving people while 2^32 ranges at a
>>  > time. the part i like about this is letting people have a large swath of
>>  > numbers with one registration so they dont have to come back for every
>>  > single plugin/header they want to "reserve".
>>  >
>>  >
>>  > On Wed, Nov 9, 2016 at 11:01 AM, Roger Hoover <[email protected]>
>>  > wrote:
>>  >
>>  > > Since some of the debate has been about overhead + performance, I'm
>>  > > wondering if we have considered a varint encoding (
>>  > > https://developers.google.com/protocol-buffers/docs/encoding#varints)
>>  > for
>>  > > the header length field (int32 in the proposal) and for header ids? If
>>  > you
>>  > > don't use headers, the overhead would be a single byte and for each
>>  > header
>>  > > id < 128 would also need only a single byte?
>>  > >
>>  > >
>>  > >
>>  > > On Wed, Nov 9, 2016 at 6:43 AM, radai <[email protected]>
>>  > wrote:
>>  > >
>>  > > > @magnus - and very dangerous (youre essentially downloading and
>>  > executing
>>  > > > arbitrary code off the internet on your servers ... bad idea without
>>  a
>>  > > > sandbox, even with)
>>  > > >
>>  > > > as for it being a purely administrative task - i disagree.
>>  > > >
>>  > > > i wish it would, really, because then my earlier point on the
>>  > complexity
>>  > > of
>>  > > > the remapping process would be invalid, but at linkedin, for example,
>>  > we
>>  > > > (the team im in) run kafka as a service. we dont really know what our
>>  > > users
>>  > > > (developing applications that use kafka) are up to at any given
>>  moment.
>>  > > it
>>  > > > is very possible (given the existance of headers and a corresponding
>>  > > plugin
>>  > > > ecosystem) for some application to "equip" their producers and
>>  > consumers
>>  > > > with the required plugin without us knowing. i dont mean to imply
>>  thats
>>  > > > bad, i just want to make the point that its not as simple keeping it
>>  in
>>  > > > sync across a large-enough organization.
>>  > > >
>>  > > >
>>  > > > On Wed, Nov 9, 2016 at 6:17 AM, Magnus Edenhill <[email protected]>
>>  > > > wrote:
>>  > > >
>>  > > > > I think there is a piece missing in the Strings discussion, where
>>  > > > > pro-Stringers
>>  > > > > reason that by providing unique string identifiers for each header
>>  > > > > everything will just
>>  > > > > magically work for all parts of the stream pipeline.
>>  > > > >
>>  > > > > But the strings dont mean anything by themselves, and while we
>>  could
>>  > > > > probably envision
>>  > > > > some auto plugin loader that downloads, compiles, links and runs
>>  > > plugins
>>  > > > > on-demand
>>  > > > > as soon as they're seen by a consumer, I dont really see a use-case
>>  > for
>>  > > > > something
>>  > > > > so dynamic (and fragile) in practice.
>>  > > > >
>>  > > > > In the real world an application will be configured with a set of
>>  > > plugins
>>  > > > > to either add (producer)
>>  > > > > or read (consumer) headers.
>>  > > > > This is an administrative task based on what features a client
>>  > > > > needs/provides and results in
>>  > > > > some sort of configuration to enable and configure the desired
>>  > plugins.
>>  > > > >
>>  > > > > Since this needs to be kept somewhat in sync across an organisation
>>  > > > (there
>>  > > > > is no point in having producers
>>  > > > > add headers no consumers will read, and vice versa), the added
>>  > > complexity
>>  > > > > of assigning an id namespace
>>  > > > > for each plugin as it is being configured should be tolerable.
>>  > > > >
>>  > > > >
>>  > > > > /Magnus
>>  > > > >
>>  > > > > 2016-11-09 13:06 GMT+01:00 Michael Pearce <[email protected]>:
>>  > > > >
>>  > > > > > Just following/catching up on what seems to be an active night :)
>>  > > > > >
>>  > > > > > @Radai sorry if it may seem obvious but what does MD stand for?
>>  > > > > >
>>  > > > > > My take on String vs Int:
>>  > > > > >
>>  > > > > > I will state first I am pro Int (16 or 32).
>>  > > > > >
>>  > > > > > I do though playing devils advocate see a big plus with the
>>  > argument
>>  > > of
>>  > > > > > String keys, this is around integrating into an existing
>>  > eco-system.
>>  > > > > >
>>  > > > > > As many other systems use String based headers (Flume, JMS) it
>>  > makes
>>  > > > it
>>  > > > > > much easier for these to be incorporated/integrated into.
>>  > > > > >
>>  > > > > > How with Int based headers could we provide a way/guidence to
>>  make
>>  > > this
>>  > > > > > integration simple / easy with transition flows over to kafka?
>>  > > > > >
>>  > > > > > * tough luck buddy you're on your own
>>  > > > > > * simply hash the string into int code and hope for no collisions
>>  > > (how
>>  > > > to
>>  > > > > > convert back though?)
>>  > > > > > * http2 style as mentioned by nacho.
>>  > > > > >
>>  > > > > > cheers,
>>  > > > > > Mike
>>  > > > > >
>>  > > > > >
>>  > > > > > ________________________________________
>>  > > > > > From: radai <[email protected]>
>>  > > > > > Sent: Wednesday, November 9, 2016 8:12 AM
>>  > > > > > To: [email protected]
>>  > > > > > Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>>  > > > > >
>>  > > > > > thinking about it some more, the best way to transmit the header
>>  > > > > remapping
>>  > > > > > data to consumers would be to put it in the MD response payload,
>>  so
>>  > > > maybe
>>  > > > > > it should be discussed now.
>>  > > > > >
>>  > > > > >
>>  > > > > > On Wed, Nov 9, 2016 at 12:09 AM, radai <
>>  [email protected]
>>  > >
>>  > > > > wrote:
>>  > > > > >
>>  > > > > > > im not opposed to the idea of namespace mapping. all im saying
>>  is
>>  > > > that
>>  > > > > > its
>>  > > > > > > not part of the "mvp" and, since it requires no wire format
>>  > change,
>>  > > > can
>>  > > > > > > always be added later.
>>  > > > > > > also, its not as simple as just configuring MM to do the
>>  > transform:
>>  > > > > lets
>>  > > > > > > say i've implemented large message support as {666,1} and on
>>  some
>>  > > > > mirror
>>  > > > > > > target cluster its been remapped to {999,1}. the consumer
>>  plugin
>>  > > code
>>  > > > > > would
>>  > > > > > > also need to be told to look for the large message "part X of
>>  Y"
>>  > > > header
>>  > > > > > > under {999,1}. doable, but tricky.
>>  > > > > > >
>>  > > > > > > On Tue, Nov 8, 2016 at 10:29 PM, Gwen Shapira <
>>  [email protected]
>>  > >
>>  > > > > wrote:
>>  > > > > > >
>>  > > > > > >> While you can do whatever you want with a namespace and your
>>  > code,
>>  > > > > > >> what I'd expect is for each app to namespaces configurable...
>>  > > > > > >>
>>  > > > > > >> So if I accidentally used 666 for my HR department, and still
>>  > want
>>  > > > to
>>  > > > > > >> run RadaiApp, I can config "namespace=42" for RadaiApp and
>>  > > > everything
>>  > > > > > >> will look normal.
>>  > > > > > >>
>>  > > > > > >> This means you only need to sync usage inside your own
>>  > > organization.
>>  > > > > > >> Still hard, but somewhat easier than syncing with the entire
>>  > > world.
>>  > > > > > >>
>>  > > > > > >> On Tue, Nov 8, 2016 at 10:07 PM, radai <
>>  > > [email protected]>
>>  > > > > > >> wrote:
>>  > > > > > >> > and we can start with {namespace, id} and no re-mapping
>>  > support
>>  > > > and
>>  > > > > > >> always
>>  > > > > > >> > add it later on if/when collisions actually happen (i dont
>>  > think
>>  > > > > > they'd
>>  > > > > > >> be
>>  > > > > > >> > a problem).
>>  > > > > > >> >
>>  > > > > > >> > every interested party (so orgs or individuals) could then
>>  > > > register
>>  > > > > a
>>  > > > > > >> > prefix (0 = reserved, 1 = confluent ... 666 = me :-) ) and
>>  do
>>  > > > > whatever
>>  > > > > > >> with
>>  > > > > > >> > the 2nd ID - so once linkedin registers, say 3, then
>>  linkedin
>>  > > devs
>>  > > > > are
>>  > > > > > >> free
>>  > > > > > >> > to use {3, *} with a reasonable expectation to to collide
>>  with
>>  > > > > > anything
>>  > > > > > >> > else. further partitioning of that * becomes linkedin's
>>  > problem,
>>  > > > but
>>  > > > > > the
>>  > > > > > >> > "upstream registration" of a namespace only has to happen
>>  > once.
>>  > > > > > >> >
>>  > > > > > >> > On Tue, Nov 8, 2016 at 9:03 PM, James Cheng <
>>  > > [email protected]
>>  > > > >
>>  > > > > > >> wrote:
>>  > > > > > >> >
>>  > > > > > >> >>
>>  > > > > > >> >>
>>  > > > > > >> >>
>>  > > > > > >> >> > On Nov 8, 2016, at 5:54 PM, Gwen Shapira <
>>  > [email protected]>
>>  > > > > > wrote:
>>  > > > > > >> >> >
>>  > > > > > >> >> > Thank you so much for this clear and fair summary of the
>>  > > > > arguments.
>>  > > > > > >> >> >
>>  > > > > > >> >> > I'm in favor of ints. Not a deal-breaker, but in favor.
>>  > > > > > >> >> >
>>  > > > > > >> >> > Even more in favor of Magnus's decentralized suggestion
>>  > with
>>  > > > > > Roger's
>>  > > > > > >> >> > tweak: add a namespace for headers. This will allow each
>>  > app
>>  > > to
>>  > > > > > just
>>  > > > > > >> >> > use whatever IDs it wants internally, and then let the
>>  > admin
>>  > > > > > >> deploying
>>  > > > > > >> >> > the app figure out an available namespace ID for the app
>>  to
>>  > > > live
>>  > > > > > in.
>>  > > > > > >> >> > So io.confluent.schema-registry can be namespace 0x01 on
>>  my
>>  > > > > > >> deployment
>>  > > > > > >> >> > and 0x57 on yours, and the poor guys developing the app
>>  > don't
>>  > > > > need
>>  > > > > > to
>>  > > > > > >> >> > worry about that.
>>  > > > > > >> >> >
>>  > > > > > >> >>
>>  > > > > > >> >> Gwen, if I understand your example right, an application
>>  > > deployer
>>  > > > > > might
>>  > > > > > >> >> decide to use 0x01 in one deployment, and that means that
>>  > once
>>  > > > the
>>  > > > > > >> message
>>  > > > > > >> >> is written into the broker, it will be saved on the broker
>>  > with
>>  > > > > that
>>  > > > > > >> >> specific namespace (0x01).
>>  > > > > > >> >>
>>  > > > > > >> >> If you were to mirror that message into another cluster,
>>  the
>>  > > 0x01
>>  > > > > > would
>>  > > > > > >> >> accompany the message, right? What if the deployers of the
>>  > same
>>  > > > app
>>  > > > > > in
>>  > > > > > >> the
>>  > > > > > >> >> other cluster uses 0x57? They won't understand each other?
>>  > > > > > >> >>
>>  > > > > > >> >> I'm not sure that's an avoidable problem. I think it simply
>>  > > means
>>  > > > > > that
>>  > > > > > >> in
>>  > > > > > >> >> order to share data, you have to also have a shared (agreed
>>  > > upon)
>>  > > > > > >> >> understanding of what the namespaces mean. Which I think
>>  > makes
>>  > > > > sense,
>>  > > > > > >> >> because the alternate (sharing *nothing* at all) would mean
>>  > > that
>>  > > > > > there
>>  > > > > > >> >> would be no way to understand each other.
>>  > > > > > >> >>
>>  > > > > > >> >> -James
>>  > > > > > >> >>
>>  > > > > > >> >> > Gwen
>>  > > > > > >> >> >
>>  > > > > > >> >> > On Tue, Nov 8, 2016 at 4:23 PM, radai <
>>  > > > > [email protected]>
>>  > > > > > >> >> wrote:
>>  > > > > > >> >> >> +1 for sean's document. it covers pretty much all the
>>  > > > trade-offs
>>  > > > > > and
>>  > > > > > >> >> >> provides concrete figures to argue about :-)
>>  > > > > > >> >> >> (nit-picking - used the same xkcd twice, also trove has
>>  > been
>>  > > > > > >> superceded
>>  > > > > > >> >> for
>>  > > > > > >> >> >> purposes of high performance collections: look at
>>  > > > > > >> >> >> https://github.com/leventov/Koloboke)
>>  > > > > > >> >> >>
>>  > > > > > >> >> >> so to sum up the string vs int debate:
>>  > > > > > >> >> >>
>>  > > > > > >> >> >> performance - you can do 140k ops/sec _per thread_ with
>>  > > string
>>  > > > > > >> headers.
>>  > > > > > >> >> you
>>  > > > > > >> >> >> could do x2-3 better with ints. there's no arguing the
>>  > > > relative
>>  > > > > > diff
>>  > > > > > >> >> >> between the two, there's only the question of whether or
>>  > not
>>  > > > > _the
>>  > > > > > >> rest
>>  > > > > > >> >> of
>>  > > > > > >> >> >> kafka_ operates fast enough to care. if we want to make
>>  > > > choices
>>  > > > > > >> solely
>>  > > > > > >> >> >> based on performance we need ints. if we are willing to
>>  > > > > > >> >> settle/compromise
>>  > > > > > >> >> >> for a nicer (to some) API than strings are good enough
>>  for
>>  > > the
>>  > > > > > >> current
>>  > > > > > >> >> >> state of affairs.
>>  > > > > > >> >> >>
>>  > > > > > >> >> >> message size - with batching and compression it comes
>>  down
>>  > > to
>>  > > > a
>>  > > > > > ~5%
>>  > > > > > >> >> >> difference (internal testing, not in the doc. maybe
>>  would
>>  > > help
>>  > > > > > >> adding if
>>  > > > > > >> >> >> this becomes a point of contention?). this means it wont
>>  > > > really
>>  > > > > > >> affect
>>  > > > > > >> >> >> kafka in "throughput mode" (large, compressed batches).
>>  in
>>  > > > "low
>>  > > > > > >> latency"
>>  > > > > > >> >> >> mode (meaning less/no batching and compression) the
>>  > > difference
>>  > > > > can
>>  > > > > > >> be
>>  > > > > > >> >> >> extreme (it'll easily be an order of magnitude with
>>  small
>>  > > > > payloads
>>  > > > > > >> like
>>  > > > > > >> >> >> stock ticks and header keys of the form
>>  > > > > > >> >> >> "com.acme.infraTeam.kafka.hiMom.auditPlugin"). we have
>>  a
>>  > > few
>>  > > > > such
>>  > > > > > >> >> topics at
>>  > > > > > >> >> >> linkedin where actual payloads are ~2 ints and are
>>  > eclipsed
>>  > > by
>>  > > > > our
>>  > > > > > >> >> in-house
>>  > > > > > >> >> >> audit "header" which is why we liked ints to begin with.
>>  > > > > > >> >> >>
>>  > > > > > >> >> >> "ease of use" - strings would probably still require
>>  > _some_
>>  > > > > degree
>>  > > > > > >> of
>>  > > > > > >> >> >> partitioning by convention (imagine if everyone used the
>>  > key
>>  > > > > > >> "infra"...)
>>  > > > > > >> >> >> but its very intuitive for java devs to do anyway
>>  > > > > (reverse-domain
>>  > > > > > is
>>  > > > > > >> >> >> ingrained into java developers at a young age :-) ).
>>  also
>>  > > most
>>  > > > > > java
>>  > > > > > >> devs
>>  > > > > > >> >> >> find Map<String, whatever> more intuitive than
>>  > Map<Integer,
>>  > > > > > >> whatever> -
>>  > > > > > >> >> >> probably because of other text-based protocols like
>>  http.
>>  > > ints
>>  > > > > > would
>>  > > > > > >> >> >> require a number registry. if you think number
>>  registries
>>  > > are
>>  > > > > hard
>>  > > > > > >> just
>>  > > > > > >> >> >> look at the wiki page for KIPs (specifically the number
>>  > for
>>  > > > next
>>  > > > > > >> >> available
>>  > > > > > >> >> >> KIP) and think again - we are probably talking about the
>>  > > same
>>  > > > > > >> volume of
>>  > > > > > >> >> >> requests. also this would only be "required" (good
>>  > > > citizenship,
>>  > > > > > more
>>  > > > > > >> >> like)
>>  > > > > > >> >> >> if you want to publish your plugin for others to use.
>>  > within
>>  > > > > your
>>  > > > > > >> org do
>>  > > > > > >> >> >> whatever you want - just know that if you use [some
>>  > > "reserved"
>>  > > > > > >> range]
>>  > > > > > >> >> and a
>>  > > > > > >> >> >> future kafka update breaks it its your problem. RTFM.
>>  > > > > > >> >> >>
>>  > > > > > >> >> >> personally im in favor of ints.
>>  > > > > > >> >> >>
>>  > > > > > >> >> >> having said that (and like nacho) I will settle if int
>>  vs
>>  > > > string
>>  > > > > > >> remains
>>  > > > > > >> >> >> the only obstacle to this.
>>  > > > > > >> >> >>
>>  > > > > > >> >> >> On Tue, Nov 8, 2016 at 3:53 PM, Nacho Solis
>>  > > > > > >> <[email protected]
>>  > > > > > >> >> >
>>  > > > > > >> >> >> wrote:
>>  > > > > > >> >> >>
>>  > > > > > >> >> >>> I think it's well known I've been pushing for ints
>>  (and I
>>  > > > could
>>  > > > > > >> switch
>>  > > > > > >> >> to
>>  > > > > > >> >> >>> 16 bit shorts if pressed).
>>  > > > > > >> >> >>>
>>  > > > > > >> >> >>> - efficient (space)
>>  > > > > > >> >> >>> - efficient (processing)
>>  > > > > > >> >> >>> - easily partitionable
>>  > > > > > >> >> >>>
>>  > > > > > >> >> >>>
>>  > > > > > >> >> >>> However, if the only thing that is keeping us from
>>  > adopting
>>  > > > > > >> headers is
>>  > > > > > >> >> the
>>  > > > > > >> >> >>> use of strings vs ints as keys, then I would cave in
>>  and
>>  > > > accept
>>  > > > > > >> >> strings. If
>>  > > > > > >> >> >>> we do so, I would like to limit string keys to 128
>>  bytes
>>  > in
>>  > > > > > length.
>>  > > > > > >> >> This
>>  > > > > > >> >> >>> way 1) I could use a 3 letter string if I wanted
>>  > > (effectively
>>  > > > > > >> using 4
>>  > > > > > >> >> total
>>  > > > > > >> >> >>> bytes), 2) limit overall impact of possible keys (don't
>>  > > > really
>>  > > > > > want
>>  > > > > > >> >> people
>>  > > > > > >> >> >>> to send a 16K header string key).
>>  > > > > > >> >> >>>
>>  > > > > > >> >> >>> Nacho
>>  > > > > > >> >> >>>
>>  > > > > > >> >> >>>
>>  > > > > > >> >> >>> On Tue, Nov 8, 2016 at 3:35 PM, Gwen Shapira <
>>  > > > > [email protected]>
>>  > > > > > >> >> wrote:
>>  > > > > > >> >> >>>
>>  > > > > > >> >> >>>> Forgot to mention: Thank you for quantifying the
>>  > > trade-off -
>>  > > > > it
>>  > > > > > is
>>  > > > > > >> >> >>>> helpful and important regardless of what we end up
>>  > > deciding.
>>  > > > > > >> >> >>>>
>>  > > > > > >> >> >>>> On Tue, Nov 8, 2016 at 3:12 PM, Sean McCauliff
>>  > > > > > >> >> >>>> <[email protected]> wrote:
>>  > > > > > >> >> >>>>> On Tue, Nov 8, 2016 at 2:15 PM, Gwen Shapira <
>>  > > > > > [email protected]>
>>  > > > > > >> >> >>> wrote:
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>>> Since Kafka specifically targets high-throughput,
>>  > > > > low-latency
>>  > > > > > >> >> >>>>>> use-cases, I don't think we should trade them off
>>  that
>>  > > > > easily.
>>  > > > > > >> >> >>>>>>
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>> I find these kind of design goals not to be really
>>  > > helpful
>>  > > > > > unless
>>  > > > > > >> >> it's
>>  > > > > > >> >> >>>>> quantified in someway. Because it's always possible
>>  to
>>  > > > argue
>>  > > > > > >> against
>>  > > > > > >> >> >>>>> something as either being not performant or just an
>>  > > > > > >> implementation
>>  > > > > > >> >> >>>> detail.
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>> This is a single threaded benchmarks so all the
>>  > > > measurements
>>  > > > > > are
>>  > > > > > >> per
>>  > > > > > >> >> >>>>> thread.
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>> For 1M messages/s/thread if header keys are int and
>>  > you
>>  > > > had
>>  > > > > > >> even a
>>  > > > > > >> >> >>>> single
>>  > > > > > >> >> >>>>> header key, value pair then it's still about 2^-2
>>  > > > > microseconds
>>  > > > > > >> which
>>  > > > > > >> >> >>>> means
>>  > > > > > >> >> >>>>> you only have another 0.75 microseconds to do
>>  > everything
>>  > > > else
>>  > > > > > you
>>  > > > > > >> >> want
>>  > > > > > >> >> >>> to
>>  > > > > > >> >> >>>>> do with a message (1M messages/s means 1 micro second
>>  > per
>>  > > > > > >> message).
>>  > > > > > >> >> >>> With
>>  > > > > > >> >> >>>>> string header keys there is still 0.5 micro seconds
>>  to
>>  > > > > process
>>  > > > > > a
>>  > > > > > >> >> >>> message.
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>> I love strings as much as the next guy (we had them
>>  in
>>  > > > > Flume),
>>  > > > > > >> but I
>>  > > > > > >> >> >>>>>> was convinced by Magnus/Michael/Radai that strings
>>  > don't
>>  > > > > > >> actually
>>  > > > > > >> >> have
>>  > > > > > >> >> >>>>>> strong benefits as opposed to ints (you'll need a
>>  > string
>>  > > > > > >> registry
>>  > > > > > >> >> >>>>>> anyway - otherwise, how will you know what does the
>>  > > > > > "profile_id"
>>  > > > > > >> >> >>>>>> header refers to?) and I want to keep closer to our
>>  > > > original
>>  > > > > > >> design
>>  > > > > > >> >> >>>>>> goals for Kafka.
>>  > > > > > >> >> >>>>>>
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>> "confluent.profile_id"
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>>>
>>  > > > > > >> >> >>>>>> If someone likes strings in the headers and doesn't
>>  do
>>  > > > > > millions
>>  > > > > > >> of
>>  > > > > > >> >> >>>>>> messages a sec, they probably have lots of other
>>  > systems
>>  > > > > they
>>  > > > > > >> can
>>  > > > > > >> >> use
>>  > > > > > >> >> >>>>>> instead.
>>  > > > > > >> >> >>>>>>
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>> None of them will scale like Kafka. Horizontal
>>  scaling
>>  > > is
>>  > > > > > still
>>  > > > > > >> >> good.
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>>
>>  > > > > > >> >> >>>>>>
>>  > > > > > >> >> >>>>>>
>>  > > > > > >> >> >>>>>> On Tue, Nov 8, 2016 at 1:22 PM, Sean McCauliff
>>  > > > > > >> >> >>>>>> <[email protected]> wrote:
>>  > > > > > >> >> >>>>>>> +1 for String keys.
>>  > > > > > >> >> >>>>>>>
>>  > > > > > >> >> >>>>>>> I've been doing some bechmarking and it seems like
>>  > the
>>  > > > > > speedup
>>  > > > > > >> for
>>  > > > > > >> >> >>>> using
>>  > > > > > >> >> >>>>>>> integer keys is about 2-5 depending on the length
>>  of
>>  > > the
>>  > > > > > >> strings
>>  > > > > > >> >> and
>>  > > > > > >> >> >>>> what
>>  > > > > > >> >> >>>>>>> collections are being used. The overall amount of
>>  > time
>>  > > > > spent
>>  > > > > > >> >> >>> parsing
>>  > > > > > >> >> >>>> a
>>  > > > > > >> >> >>>>>> set
>>  > > > > > >> >> >>>>>>> of header key, value pairs probably does not matter
>>  > > > unless
>>  > > > > > you
>>  > > > > > >> are
>>  > > > > > >> >> >>>>>> getting
>>  > > > > > >> >> >>>>>>> close to 1M messages per consumer. In which case
>>  > > > probably
>>  > > > > > >> don't
>>  > > > > > >> >> use
>>  > > > > > >> >> >>>>>>> headers. There is also the option to use very
>>  short
>>  > > > > strings;
>>  > > > > > >> some
>>  > > > > > >> >> >>>> that
>>  > > > > > >> >> >>>>>> are
>>  > > > > > >> >> >>>>>>> even shorter than integers.
>>  > > > > > >> >> >>>>>>>
>>  > > > > > >> >> >>>>>>> Partitioning the string key space will be easier
>>  than
>>  > > > > > >> partitioning
>>  > > > > > >> >> >>> an
>>  > > > > > >> >> >>>>>>> integer key space. We won't need a global registry.
>>  > > > Kafka
>>  > > > > > >> >> >>> internally
>>  > > > > > >> >> >>>> can
>>  > > > > > >> >> >>>>>>> reserve some prefix like "_" as its namespace.
>>  > > Everyone
>>  > > > > else
>>  > > > > > >> can
>>  > > > > > >> >> >>> use
>>  > > > > > >> >> >>>>>> their
>>  > > > > > >> >> >>>>>>> company or project name as namespace prefix and
>>  life
>>  > > > should
>>  > > > > > be
>>  > > > > > >> >> good.
>>  > > > > > >> >> >>>>>>>
>>  > > > > > >> >> >>>>>>> Here's the link to some of the benchmarking info:
>>  > > > > > >> >> >>>>>>> https://docs.google.com/document/d/1tfT-
>>  > > > > > >> >> >>>> 6SZdnKOLyWGDH82kS30PnUkmgb7nPL
>>  > > > > > >> >> >>>>>> dw6p65pAI/edit?usp=sharing
>>  > > > > > >> >> >>>>>>>
>>  > > > > > >> >> >>>>>>>
>>  > > > > > >> >> >>>>>>>
>>  > > > > > >> >> >>>>>>> --
>>  > > > > > >> >> >>>>>>> Sean McCauliff
>>  > > > > > >> >> >>>>>>> Staff Software Engineer
>>  > > > > > >> >> >>>>>>> Kafka
>>  > > > > > >> >> >>>>>>>
>>  > > > > > >> >> >>>>>>> [email protected]
>>  > > > > > >> >> >>>>>>> linkedin.com/in/sean-mccauliff-b563192
>>  > > > > > >> >> >>>>>>>
>>  > > > > > >> >> >>>>>>> On Mon, Nov 7, 2016 at 11:51 PM, Michael Pearce <
>>  > > > > > >> >> >>>> [email protected]>
>>  > > > > > >> >> >>>>>>> wrote:
>>  > > > > > >> >> >>>>>>>
>>  > > > > > >> >> >>>>>>>> +1 on this slimmer version of our proposal
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> I def think the Id space we can reduce from the
>>  > > proposed
>>  > > > > > >> >> >>>> int32(4bytes)
>>  > > > > > >> >> >>>>>>>> down to int16(2bytes) it saves on space and as
>>  > headers
>>  > > > we
>>  > > > > > >> wouldn't
>>  > > > > > >> >> >>>>>> expect
>>  > > > > > >> >> >>>>>>>> the number of headers being used concurrently
>>  being
>>  > > that
>>  > > > > > high.
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> I would wonder if we should make the value byte
>>  > array
>>  > > > > length
>>  > > > > > >> still
>>  > > > > > >> >> >>>> int32
>>  > > > > > >> >> >>>>>>>> though as This is the standard Max array length in
>>  > > Java
>>  > > > > > saying
>>  > > > > > >> >> that
>>  > > > > > >> >> >>>> it
>>  > > > > > >> >> >>>>>> is a
>>  > > > > > >> >> >>>>>>>> header and I guess limiting the size is sensible
>>  and
>>  > > > would
>>  > > > > > >> work
>>  > > > > > >> >> for
>>  > > > > > >> >> >>>> all
>>  > > > > > >> >> >>>>>> the
>>  > > > > > >> >> >>>>>>>> use cases we have in mind so happy with limiting
>>  > this.
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Do people generally concur on Magnus's slimmer
>>  > > version?
>>  > > > > > >> Anyone see
>>  > > > > > >> >> >>>> any
>>  > > > > > >> >> >>>>>>>> issues if we moved from int32 to int16?
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Re configurable ids per plugin over a global
>>  > registry
>>  > > > also
>>  > > > > > >> would
>>  > > > > > >> >> >>> work
>>  > > > > > >> >> >>>>>> for
>>  > > > > > >> >> >>>>>>>> us. As such if this has better concensus over the
>>  > > > > proposed
>>  > > > > > >> global
>>  > > > > > >> >> >>>>>> registry
>>  > > > > > >> >> >>>>>>>> I'd be happy to change that.
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> I was already sold on ints over strings for keys
>>  ;)
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Cheers
>>  > > > > > >> >> >>>>>>>> Mike
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> ________________________________________
>>  > > > > > >> >> >>>>>>>> From: Magnus Edenhill <[email protected]>
>>  > > > > > >> >> >>>>>>>> Sent: Monday, November 7, 2016 10:10:21 PM
>>  > > > > > >> >> >>>>>>>> To: [email protected]
>>  > > > > > >> >> >>>>>>>> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Hi,
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> I'm +1 for adding generic message headers, but I
>>  do
>>  > > > share
>>  > > > > > the
>>  > > > > > >> >> >>>> concerns
>>  > > > > > >> >> >>>>>>>> previously aired on this thread and during the KIP
>>  > > > > meeting.
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> So let me propose a slimmer alternative that does
>>  > not
>>  > > > > > require
>>  > > > > > >> any
>>  > > > > > >> >> >>>> sort
>>  > > > > > >> >> >>>>>> of
>>  > > > > > >> >> >>>>>>>> global header registry, does not affect broker
>>  > > > performance
>>  > > > > > or
>>  > > > > > >> >> >>>>>> operations,
>>  > > > > > >> >> >>>>>>>> and adds as little overhead as possible.
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Message
>>  > > > > > >> >> >>>>>>>> ------------
>>  > > > > > >> >> >>>>>>>> The protocol Message type is extended with a
>>  Headers
>>  > > > array
>>  > > > > > >> >> consting
>>  > > > > > >> >> >>>> of
>>  > > > > > >> >> >>>>>>>> Tags, where a Tag is defined as:
>>  > > > > > >> >> >>>>>>>> int16 Id
>>  > > > > > >> >> >>>>>>>> int16 Len // binary_data length
>>  > > > > > >> >> >>>>>>>> binary_data[Len] // opaque binary data
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Ids
>>  > > > > > >> >> >>>>>>>> ---
>>  > > > > > >> >> >>>>>>>> The Id space is not centrally managed, so whenever
>>  > an
>>  > > > > > >> application
>>  > > > > > >> >> >>>> needs
>>  > > > > > >> >> >>>>>> to
>>  > > > > > >> >> >>>>>>>> add headers, or use an eco-system plugin that
>>  does,
>>  > > its
>>  > > > Id
>>  > > > > > >> >> >>> allocation
>>  > > > > > >> >> >>>>>> will
>>  > > > > > >> >> >>>>>>>> need to be manually configured.
>>  > > > > > >> >> >>>>>>>> This moves the allocation concern from the global
>>  > > space
>>  > > > > down
>>  > > > > > >> to
>>  > > > > > >> >> >>>>>>>> organization level and avoids the risk for id
>>  > > conflicts.
>>  > > > > > >> >> >>>>>>>> Example pseudo-config for some app:
>>  > > > > > >> >> >>>>>>>> sometrackerplugin.tag.sourcev3.id=1000
>>  > > > > > >> >> >>>>>>>> dbthing.tag.tablename.id=1001
>>  > > > > > >> >> >>>>>>>> myschemareg.tag.schemaname.id=1002
>>  > > > > > >> >> >>>>>>>> myschemareg.tag.schemaversion.id=1003
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Each header-writing or header-reading plugin must
>>  > > > provide
>>  > > > > > >> means
>>  > > > > > >> >> >>>>>> (typically
>>  > > > > > >> >> >>>>>>>> through configuration) to specify the tag for each
>>  > > > header
>>  > > > > it
>>  > > > > > >> uses.
>>  > > > > > >> >> >>>>>> Defaults
>>  > > > > > >> >> >>>>>>>> should be avoided.
>>  > > > > > >> >> >>>>>>>> A consumer silently ignores tags it does not have
>>  a
>>  > > > > mapping
>>  > > > > > >> for
>>  > > > > > >> >> >>>> (since
>>  > > > > > >> >> >>>>>> the
>>  > > > > > >> >> >>>>>>>> binary_data can't be parsed without knowing what
>>  it
>>  > > is).
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Id range 0..999 is reserved for future use by the
>>  > > broker
>>  > > > > and
>>  > > > > > >> must
>>  > > > > > >> >> >>>> not be
>>  > > > > > >> >> >>>>>>>> used by plugins.
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Broker
>>  > > > > > >> >> >>>>>>>> ---------
>>  > > > > > >> >> >>>>>>>> The broker does not process the tags (other than
>>  the
>>  > > > > > standard
>>  > > > > > >> >> >>>> protocol
>>  > > > > > >> >> >>>>>>>> syntax verification), it simply stores and
>>  forwards
>>  > > them
>>  > > > > as
>>  > > > > > >> opaque
>>  > > > > > >> >> >>>> data.
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Standard message translation (removal of Headers)
>>  > > kicks
>>  > > > in
>>  > > > > > for
>>  > > > > > >> >> >>> older
>>  > > > > > >> >> >>>>>>>> clients.
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Why not string ids?
>>  > > > > > >> >> >>>>>>>> -------------------------
>>  > > > > > >> >> >>>>>>>> String ids might seem like a good idea, but:
>>  > > > > > >> >> >>>>>>>> * does not really solve uniqueness
>>  > > > > > >> >> >>>>>>>> * consumes a lot of space (2 byte string length +
>>  > > > string,
>>  > > > > > per
>>  > > > > > >> >> >>>> header)
>>  > > > > > >> >> >>>>>> to
>>  > > > > > >> >> >>>>>>>> be meaningful
>>  > > > > > >> >> >>>>>>>> * doesn't really say anything how to parse the
>>  tag's
>>  > > > data,
>>  > > > > > so
>>  > > > > > >> it
>>  > > > > > >> >> >>> is
>>  > > > > > >> >> >>>> in
>>  > > > > > >> >> >>>>>>>> effect useless on its own.
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> Regards,
>>  > > > > > >> >> >>>>>>>> Magnus
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>> 2016-11-07 18:32 GMT+01:00 Michael Pearce <
>>  > > > > > >> [email protected]
>>  > > > > > >> >> >:
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>>>> Hi Roger,
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> Thanks for the support.
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> I think the key thing is to have a common key
>>  space
>>  > > to
>>  > > > > make
>>  > > > > > >> an
>>  > > > > > >> >> >>>>>> ecosystem,
>>  > > > > > >> >> >>>>>>>>> there does have to be some level of contract for
>>  > > people
>>  > > > > to
>>  > > > > > >> play
>>  > > > > > >> >> >>>>>> nicely.
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> Having map<String, byte[]> or as per current
>>  > proposed
>>  > > > in
>>  > > > > > kip
>>  > > > > > >> of
>>  > > > > > >> >> >>>>>> having a
>>  > > > > > >> >> >>>>>>>>> numerical key space of map<int, byte[]> is a
>>  level
>>  > > of
>>  > > > > the
>>  > > > > > >> >> >>> contract
>>  > > > > > >> >> >>>>>> that
>>  > > > > > >> >> >>>>>>>>> most people would expect.
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> I think the example in a previous comment someone
>>  > > else
>>  > > > > made
>>  > > > > > >> >> >>>> linking to
>>  > > > > > >> >> >>>>>>>> AWS
>>  > > > > > >> >> >>>>>>>>> blog and also implemented api where originally
>>  they
>>  > > > > didn't
>>  > > > > > >> have a
>>  > > > > > >> >> >>>>>> header
>>  > > > > > >> >> >>>>>>>>> space but not they do, where keys are uniform but
>>  > the
>>  > > > > value
>>  > > > > > >> can
>>  > > > > > >> >> >>> be
>>  > > > > > >> >> >>>>>>>> string,
>>  > > > > > >> >> >>>>>>>>> int, anything is a good example.
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> Having a custom MetadataSerializer is something
>>  we
>>  > > had
>>  > > > > > played
>>  > > > > > >> >> >>> with,
>>  > > > > > >> >> >>>>>> but
>>  > > > > > >> >> >>>>>>>>> discounted the idea, as if you wanted everyone to
>>  > > work
>>  > > > > the
>>  > > > > > >> same
>>  > > > > > >> >> >>>> way in
>>  > > > > > >> >> >>>>>>>> the
>>  > > > > > >> >> >>>>>>>>> ecosystem, having to have this also customizable
>>  > > makes
>>  > > > > it a
>>  > > > > > >> bit
>>  > > > > > >> >> >>>>>> harder.
>>  > > > > > >> >> >>>>>>>>> Think about making the whole message record
>>  custom
>>  > > > > > >> serializable,
>>  > > > > > >> >> >>>> this
>>  > > > > > >> >> >>>>>>>> would
>>  > > > > > >> >> >>>>>>>>> make it fairly tricky (though it would not be
>>  > > > impossible)
>>  > > > > > to
>>  > > > > > >> have
>>  > > > > > >> >> >>>> made
>>  > > > > > >> >> >>>>>>>> work
>>  > > > > > >> >> >>>>>>>>> nicely. Having the value customizable we thought
>>  > is a
>>  > > > > > >> reasonable
>>  > > > > > >> >> >>>>>> tradeoff
>>  > > > > > >> >> >>>>>>>>> here of flexibility over contract of interaction
>>  > > > between
>>  > > > > > >> >> >>> different
>>  > > > > > >> >> >>>>>>>> parties.
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> Is there a particular case or benefit of having
>>  > > > > > serialization
>>  > > > > > >> >> >>>>>>>> customizable
>>  > > > > > >> >> >>>>>>>>> that you have in mind?
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> Saying this it is obviously something that could
>>  be
>>  > > > > > >> implemented,
>>  > > > > > >> >> >>> if
>>  > > > > > >> >> >>>>>> there
>>  > > > > > >> >> >>>>>>>>> is a need. If we did go this avenue I think a
>>  > > defaulted
>>  > > > > > >> >> >>> serializer
>>  > > > > > >> >> >>>>>>>>> implementation should exist so for the 80:20
>>  rule,
>>  > > > people
>>  > > > > > can
>>  > > > > > >> >> >>> just
>>  > > > > > >> >> >>>>>> have
>>  > > > > > >> >> >>>>>>>> the
>>  > > > > > >> >> >>>>>>>>> broker and clients get default behavior.
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> Cheers
>>  > > > > > >> >> >>>>>>>>> Mike
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> On 11/6/16, 5:25 PM, "radai" <
>>  > > > [email protected]
>>  > > > > >
>>  > > > > > >> wrote:
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> making header _key_ serialization configurable
>>  > > > > > potentially
>>  > > > > > >> >> >>>>>> undermines
>>  > > > > > >> >> >>>>>>>>> the
>>  > > > > > >> >> >>>>>>>>> board usefulness of the feature (any point
>>  along
>>  > > the
>>  > > > > > path
>>  > > > > > >> >> >>> must
>>  > > > > > >> >> >>>> be
>>  > > > > > >> >> >>>>>>>> able
>>  > > > > > >> >> >>>>>>>>> to
>>  > > > > > >> >> >>>>>>>>> read the header keys. the values may be
>>  whatever
>>  > > and
>>  > > > > > >> require
>>  > > > > > >> >> >>>> more
>>  > > > > > >> >> >>>>>>>>> intimate
>>  > > > > > >> >> >>>>>>>>> knowledge of the code that produced specific
>>  > > > headers,
>>  > > > > > but
>>  > > > > > >> >> >>> keys
>>  > > > > > >> >> >>>>>> should
>>  > > > > > >> >> >>>>>>>>> be
>>  > > > > > >> >> >>>>>>>>> universally readable).
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> it would also make it hard to write really
>>  > > portable
>>  > > > > > >> plugins -
>>  > > > > > >> >> >>>> say
>>  > > > > > >> >> >>>>>> i
>>  > > > > > >> >> >>>>>>>>> wrote a
>>  > > > > > >> >> >>>>>>>>> large message splitter/combiner - if i rely on
>>  > key
>>  > > > > > >> >> >>>> "largeMessage"
>>  > > > > > >> >> >>>>>> and
>>  > > > > > >> >> >>>>>>>>> values of the form "1/20" someone who uses
>>  > > > (contrived
>>  > > > > > >> >> >>> example)
>>  > > > > > >> >> >>>>>>>>> Map<Byte[],
>>  > > > > > >> >> >>>>>>>>> Double> wouldnt be able to re-use my code.
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> not the end of a the world within an
>>  > organization,
>>  > > > but
>>  > > > > > >> >> >>>>>> problematic if
>>  > > > > > >> >> >>>>>>>>> you
>>  > > > > > >> >> >>>>>>>>> want to enable an ecosystem
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> On Thu, Nov 3, 2016 at 2:04 PM, Roger Hoover <
>>  > > > > > >> >> >>>>>> [email protected]
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> wrote:
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>> As others have laid out, I see strong reasons
>>  for
>>  > a
>>  > > > > common
>>  > > > > > >> >> >>>>>> message
>>  > > > > > >> >> >>>>>>>>>> metadata structure for the Kafka ecosystem. In
>>  > > > > > particular,
>>  > > > > > >> >> >>>> I've
>>  > > > > > >> >> >>>>>>>>> seen that
>>  > > > > > >> >> >>>>>>>>>> even within a single organization,
>>  infrastructure
>>  > > > teams
>>  > > > > > >> >> >>> often
>>  > > > > > >> >> >>>>>> own
>>  > > > > > >> >> >>>>>>>> the
>>  > > > > > >> >> >>>>>>>>>> message metadata while application teams own the
>>  > > > > > >> >> >>>>>> application-level
>>  > > > > > >> >> >>>>>>>>> data
>>  > > > > > >> >> >>>>>>>>>> format. Allowing metadata and content to have
>>  > > > different
>>  > > > > > >> >> >>>>>> structure
>>  > > > > > >> >> >>>>>>>>> and
>>  > > > > > >> >> >>>>>>>>>> evolve separately is very helpful for this.
>>  > Also, I
>>  > > > > think
>>  > > > > > >> >> >>>>>> there's
>>  > > > > > >> >> >>>>>>>> a
>>  > > > > > >> >> >>>>>>>>> lot of
>>  > > > > > >> >> >>>>>>>>>> value to having a common metadata structure
>>  shared
>>  > > > > across
>>  > > > > > >> >> >>> the
>>  > > > > > >> >> >>>>>> Kafka
>>  > > > > > >> >> >>>>>>>>>> ecosystem so that tools which leverage metadata
>>  > can
>>  > > > more
>>  > > > > > >> >> >>>> easily
>>  > > > > > >> >> >>>>>> be
>>  > > > > > >> >> >>>>>>>>> shared
>>  > > > > > >> >> >>>>>>>>>> across organizations and integrated together.
>>  > > > > > >> >> >>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>> The question is, where does the metadata
>>  structure
>>  > > > > belong?
>>  > > > > > >> >> >>>>>> Here's
>>  > > > > > >> >> >>>>>>>>> my take:
>>  > > > > > >> >> >>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>> We change the Kafka wire and on-disk format to
>>  > from
>>  > > a
>>  > > > > > (key,
>>  > > > > > >> >> >>>>>> value)
>>  > > > > > >> >> >>>>>>>>> model to
>>  > > > > > >> >> >>>>>>>>>> a (key, metadata, value) model where all three
>>  are
>>  > > > byte
>>  > > > > > >> >> >>>> arrays
>>  > > > > > >> >> >>>>>> from
>>  > > > > > >> >> >>>>>>>>> the
>>  > > > > > >> >> >>>>>>>>>> brokers point of view. The primary reason for
>>  > this
>>  > > is
>>  > > > > > that
>>  > > > > > >> >> >>>> it
>>  > > > > > >> >> >>>>>>>>> provides a
>>  > > > > > >> >> >>>>>>>>>> backward compatible migration path forward.
>>  > > Producers
>>  > > > > can
>>  > > > > > >> >> >>>> start
>>  > > > > > >> >> >>>>>>>>> populating
>>  > > > > > >> >> >>>>>>>>>> metadata fields before all consumers understand
>>  > the
>>  > > > > > >> >> >>> metadata
>>  > > > > > >> >> >>>>>>>>> structure.
>>  > > > > > >> >> >>>>>>>>>> For people who already have custom envelope
>>  > > > structures,
>>  > > > > > >> >> >>> they
>>  > > > > > >> >> >>>> can
>>  > > > > > >> >> >>>>>>>>> populate
>>  > > > > > >> >> >>>>>>>>>> their existing structure and the new structure
>>  > for a
>>  > > > > while
>>  > > > > > >> >> >>> as
>>  > > > > > >> >> >>>>>> they
>>  > > > > > >> >> >>>>>>>>> make the
>>  > > > > > >> >> >>>>>>>>>> transition.
>>  > > > > > >> >> >>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>> We could stop there and let the clients plug in
>>  a
>>  > > > > > >> >> >>>> KeySerializer,
>>  > > > > > >> >> >>>>>>>>>> MetadataSerializer, and ValueSerializer but I
>>  > think
>>  > > it
>>  > > > > is
>>  > > > > > >> >> >>>> also
>>  > > > > > >> >> >>>>>> be
>>  > > > > > >> >> >>>>>>>>> useful to
>>  > > > > > >> >> >>>>>>>>>> have a default MetadataSerializer that
>>  implements
>>  > a
>>  > > > > > >> >> >>> key-value
>>  > > > > > >> >> >>>>>> model
>>  > > > > > >> >> >>>>>>>>> similar
>>  > > > > > >> >> >>>>>>>>>> to AMQP or HTTP headers. Or we could go even
>>  > > further
>>  > > > > and
>>  > > > > > >> >> >>>>>>>> prescribe a
>>  > > > > > >> >> >>>>>>>>>> Map<String, byte[]> or Map<String, String> data
>>  > > model
>>  > > > > for
>>  > > > > > >> >> >>>>>> headers
>>  > > > > > >> >> >>>>>>>> in
>>  > > > > > >> >> >>>>>>>>> the
>>  > > > > > >> >> >>>>>>>>>> clients (while still allowing custom
>>  serialization
>>  > > of
>>  > > > > the
>>  > > > > > >> >> >>>> header
>>  > > > > > >> >> >>>>>>>> data
>>  > > > > > >> >> >>>>>>>>>> model).
>>  > > > > > >> >> >>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>> I think this would address Radai's concerns:
>>  > > > > > >> >> >>>>>>>>>> 1. All client code would not need to be updated
>>  to
>>  > > > know
>>  > > > > > >> >> >>> about
>>  > > > > > >> >> >>>>>> the
>>  > > > > > >> >> >>>>>>>>>> container.
>>  > > > > > >> >> >>>>>>>>>> 2. Middleware friendly clients would have a
>>  > standard
>>  > > > > > header
>>  > > > > > >> >> >>>> data
>>  > > > > > >> >> >>>>>>>>> model to
>>  > > > > > >> >> >>>>>>>>>> work with.
>>  > > > > > >> >> >>>>>>>>>> 3. KIP is required both b/c of broker changes
>>  and
>>  > > > > because
>>  > > > > > >> >> >>> of
>>  > > > > > >> >> >>>>>> client
>>  > > > > > >> >> >>>>>>>>> API
>>  > > > > > >> >> >>>>>>>>>> changes.
>>  > > > > > >> >> >>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>> Cheers,
>>  > > > > > >> >> >>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>> Roger
>>  > > > > > >> >> >>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>> On Wed, Nov 2, 2016 at 4:38 PM, radai <
>>  > > > > > >> >> >>>>>> [email protected]>
>>  > > > > > >> >> >>>>>>>>> wrote:
>>  > > > > > >> >> >>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>>> my biggest issues with a "standard" wrapper
>>  > format:
>>  > > > > > >> >> >>>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>>> 1. _ALL_ client _CODE_ (as opposed to kafka lib
>>  > > > > version)
>>  > > > > > >> >> >>>> must
>>  > > > > > >> >> >>>>>> be
>>  > > > > > >> >> >>>>>>>>> updated
>>  > > > > > >> >> >>>>>>>>>> to
>>  > > > > > >> >> >>>>>>>>>>> know about the container, because any old naive
>>  > > code
>>  > > > > > >> >> >>>> trying to
>>  > > > > > >> >> >>>>>>>>> directly
>>  > > > > > >> >> >>>>>>>>>>> deserialize its own payload would keel over and
>>  > die
>>  > > > (it
>>  > > > > > >> >> >>>> needs
>>  > > > > > >> >> >>>>>> to
>>  > > > > > >> >> >>>>>>>>> know to
>>  > > > > > >> >> >>>>>>>>>>> deserialize a container, and then dig in there
>>  > for
>>  > > > its
>>  > > > > > >> >> >>>>>> payload).
>>  > > > > > >> >> >>>>>>>>>>> 2. in order to write middleware-friendly
>>  clients
>>  > > that
>>  > > > > > >> >> >>>> utilize
>>  > > > > > >> >> >>>>>>>> such
>>  > > > > > >> >> >>>>>>>>> a
>>  > > > > > >> >> >>>>>>>>>>> container one would basically have to write
>>  their
>>  > > own
>>  > > > > > >> >> >>>>>>>>> producer/consumer
>>  > > > > > >> >> >>>>>>>>>> API
>>  > > > > > >> >> >>>>>>>>>>> on top of the open source kafka one.
>>  > > > > > >> >> >>>>>>>>>>> 3. if you were going to go with a wrapper
>>  format
>>  > > you
>>  > > > > > >> >> >>> really
>>  > > > > > >> >> >>>>>> dont
>>  > > > > > >> >> >>>>>>>>> need to
>>  > > > > > >> >> >>>>>>>>>>> bother with a kip (just open source your own
>>  > client
>>  > > > > stack
>>  > > > > > >> >> >>>>>> from #2
>>  > > > > > >> >> >>>>>>>>> above
>>  > > > > > >> >> >>>>>>>>>> so
>>  > > > > > >> >> >>>>>>>>>>> others could stop re-inventing it)
>>  > > > > > >> >> >>>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>>> On Wed, Nov 2, 2016 at 4:25 PM, James Cheng <
>>  > > > > > >> >> >>>>>>>> [email protected]>
>>  > > > > > >> >> >>>>>>>>>> wrote:
>>  > > > > > >> >> >>>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>>>> How exactly would this work? Or maybe that's
>>  out
>>  > > of
>>  > > > > > >> >> >>> scope
>>  > > > > > >> >> >>>>>> for
>>  > > > > > >> >> >>>>>>>>> this
>>  > > > > > >> >> >>>>>>>>>> email.
>>  > > > > > >> >> >>>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>>> The information contained in this email is
>>  strictly
>>  > > > > > >> confidential
>>  > > > > > >> >> >>>> and
>>  > > > > > >> >> >>>>>> for
>>  > > > > > >> >> >>>>>>>>> the use of the addressee only, unless otherwise
>>  > > > > indicated.
>>  > > > > > >> If you
>>  > > > > > >> >> >>>> are
>>  > > > > > >> >> >>>>>> not
>>  > > > > > >> >> >>>>>>>>> the intended recipient, please do not read, copy,
>>  > use
>>  > > > or
>>  > > > > > >> disclose
>>  > > > > > >> >> >>>> to
>>  > > > > > >> >> >>>>>>>> others
>>  > > > > > >> >> >>>>>>>>> this message or any attachment. Please also
>>  notify
>>  > > the
>>  > > > > > >> sender by
>>  > > > > > >> >> >>>>>> replying
>>  > > > > > >> >> >>>>>>>>> to this email or by telephone (+44(020 7896 0011)
>>  > and
>>  > > > > then
>>  > > > > > >> delete
>>  > > > > > >> >> >>>> the
>>  > > > > > >> >> >>>>>>>> email
>>  > > > > > >> >> >>>>>>>>> and any copies of it. Opinions, conclusion (etc)
>>  > that
>>  > > > do
>>  > > > > > not
>>  > > > > > >> >> >>>> relate to
>>  > > > > > >> >> >>>>>>>> the
>>  > > > > > >> >> >>>>>>>>> official business of this company shall be
>>  > understood
>>  > > > as
>>  > > > > > >> neither
>>  > > > > > >> >> >>>> given
>>  > > > > > >> >> >>>>>>>> nor
>>  > > > > > >> >> >>>>>>>>> endorsed by it. IG is a trading name of IG
>>  Markets
>>  > > > > Limited
>>  > > > > > (a
>>  > > > > > >> >> >>>> company
>>  > > > > > >> >> >>>>>>>>> registered in England and Wales, company number
>>  > > > 04008957)
>>  > > > > > >> and IG
>>  > > > > > >> >> >>>> Index
>>  > > > > > >> >> >>>>>>>>> Limited (a company registered in England and
>>  Wales,
>>  > > > > company
>>  > > > > > >> >> >>> number
>>  > > > > > >> >> >>>>>>>>> 01190902). Registered address at Cannon Bridge
>>  > House,
>>  > > > 25
>>  > > > > > >> Dowgate
>>  > > > > > >> >> >>>> Hill,
>>  > > > > > >> >> >>>>>>>>> London EC4R 2YA. Both IG Markets Limited
>>  (register
>>  > > > number
>>  > > > > > >> 195355)
>>  > > > > > >> >> >>>> and
>>  > > > > > >> >> >>>>>> IG
>>  > > > > > >> >> >>>>>>>>> Index Limited (register number 114059) are
>>  > authorised
>>  > > > and
>>  > > > > > >> >> >>>> regulated by
>>  > > > > > >> >> >>>>>>>> the
>>  > > > > > >> >> >>>>>>>>> Financial Conduct Authority.
>>  > > > > > >> >> >>>>>>>>>
>>  > > > > > >> >> >>>>>>>> The information contained in this email is
>>  strictly
>>  > > > > > >> confidential
>>  > > > > > >> >> >>> and
>>  > > > > > >> >> >>>> for
>>  > > > > > >> >> >>>>>>>> the use of the addressee only, unless otherwise
>>  > > > indicated.
>>  > > > > > If
>>  > > > > > >> you
>>  > > > > > >> >> >>> are
>>  > > > > > >> >> >>>>>> not
>>  > > > > > >> >> >>>>>>>> the intended recipient, please do not read, copy,
>>  > use
>>  > > or
>>  > > > > > >> disclose
>>  > > > > > >> >> >>> to
>>  > > > > > >> >> >>>>>> others
>>  > > > > > >> >> >>>>>>>> this message or any attachment. Please also notify
>>  > the
>>  > > > > > sender
>>  > > > > > >> by
>>  > > > > > >> >> >>>>>> replying
>>  > > > > > >> >> >>>>>>>> to this email or by telephone (+44(020 7896 0011)
>>  > and
>>  > > > then
>>  > > > > > >> delete
>>  > > > > > >> >> >>> the
>>  > > > > > >> >> >>>>>> email
>>  > > > > > >> >> >>>>>>>> and any copies of it. Opinions, conclusion (etc)
>>  > that
>>  > > do
>>  > > > > not
>>  > > > > > >> >> relate
>>  > > > > > >> >> >>>> to
>>  > > > > > >> >> >>>>>> the
>>  > > > > > >> >> >>>>>>>> official business of this company shall be
>>  > understood
>>  > > as
>>  > > > > > >> neither
>>  > > > > > >> >> >>>> given
>>  > > > > > >> >> >>>>>> nor
>>  > > > > > >> >> >>>>>>>> endorsed by it. IG is a trading name of IG Markets
>>  > > > Limited
>>  > > > > > (a
>>  > > > > > >> >> >>> company
>>  > > > > > >> >> >>>>>>>> registered in England and Wales, company number
>>  > > > 04008957)
>>  > > > > > and
>>  > > > > > >> IG
>>  > > > > > >> >> >>>> Index
>>  > > > > > >> >> >>>>>>>> Limited (a company registered in England and
>>  Wales,
>>  > > > > company
>>  > > > > > >> number
>>  > > > > > >> >> >>>>>>>> 01190902). Registered address at Cannon Bridge
>>  > House,
>>  > > 25
>>  > > > > > >> Dowgate
>>  > > > > > >> >> >>>> Hill,
>>  > > > > > >> >> >>>>>>>> London EC4R 2YA. Both IG Markets Limited (register
>>  > > > number
>>  > > > > > >> 195355)
>>  > > > > > >> >> >>>> and IG
>>  > > > > > >> >> >>>>>>>> Index Limited (register number 114059) are
>>  > authorised
>>  > > > and
>>  > > > > > >> >> regulated
>>  > > > > > >> >> >>>> by
>>  > > > > > >> >> >>>>>> the
>>  > > > > > >> >> >>>>>>>> Financial Conduct Authority.
>>  > > > > > >> >> >>>>>>>>
>>  > > > > > >> >> >>>>>>
>>  > > > > > >> >> >>>>>>
>>  > > > > > >> >> >>>>>>
>>  > > > > > >> >> >>>>>> --
>>  > > > > > >> >> >>>>>> Gwen Shapira
>>  > > > > > >> >> >>>>>> Product Manager | Confluent
>>  > > > > > >> >> >>>>>> 650.450.2760 | @gwenshap
>>  > > > > > >> >> >>>>>> Follow us: Twitter | blog
>>  > > > > > >> >> >>>>>>
>>  > > > > > >> >> >>>>
>>  > > > > > >> >> >>>>
>>  > > > > > >> >> >>>>
>>  > > > > > >> >> >>>> --
>>  > > > > > >> >> >>>> Gwen Shapira
>>  > > > > > >> >> >>>> Product Manager | Confluent
>>  > > > > > >> >> >>>> 650.450.2760 | @gwenshap
>>  > > > > > >> >> >>>> Follow us: Twitter | blog
>>  > > > > > >> >> >>>>
>>  > > > > > >> >> >>>
>>  > > > > > >> >> >>>
>>  > > > > > >> >> >>>
>>  > > > > > >> >> >>> --
>>  > > > > > >> >> >>> Nacho (Ignacio) Solis
>>  > > > > > >> >> >>> Kafka
>>  > > > > > >> >> >>> [email protected]
>>  > > > > > >> >> >>>
>>  > > > > > >> >> >
>>  > > > > > >> >> >
>>  > > > > > >> >> >
>>  > > > > > >> >> > --
>>  > > > > > >> >> > Gwen Shapira
>>  > > > > > >> >> > Product Manager | Confluent
>>  > > > > > >> >> > 650.450.2760 | @gwenshap
>>  > > > > > >> >> > Follow us: Twitter | blog
>>  > > > > > >> >>
>>  > > > > > >> >>
>>  > > > > > >>
>>  > > > > > >>
>>  > > > > > >>
>>  > > > > > >> --
>>  > > > > > >> Gwen Shapira
>>  > > > > > >> Product Manager | Confluent
>>  > > > > > >> 650.450.2760 | @gwenshap
>>  > > > > > >> Follow us: Twitter | blog
>>  > > > > > >>
>>  > > > > > >
>>  > > > > > >
>>  > > > > > The information contained in this email is strictly confidential
>>  > and
>>  > > > for
>>  > > > > > the use of the addressee only, unless otherwise indicated. If you
>>  > are
>>  > > > not
>>  > > > > > the intended recipient, please do not read, copy, use or disclose
>>  > to
>>  > > > > others
>>  > > > > > this message or any attachment. Please also notify the sender by
>>  > > > replying
>>  > > > > > to this email or by telephone (+44(020 7896 0011) and then delete
>>  > the
>>  > > > > email
>>  > > > > > and any copies of it. Opinions, conclusion (etc) that do not
>>  relate
>>  > > to
>>  > > > > the
>>  > > > > > official business of this company shall be understood as neither
>>  > > given
>>  > > > > nor
>>  > > > > > endorsed by it. IG is a trading name of IG Markets Limited (a
>>  > company
>>  > > > > > registered in England and Wales, company number 04008957) and IG
>>  > > Index
>>  > > > > > Limited (a company registered in England and Wales, company
>>  number
>>  > > > > > 01190902). Registered address at Cannon Bridge House, 25 Dowgate
>>  > > Hill,
>>  > > > > > London EC4R 2YA. Both IG Markets Limited (register number 195355)
>>  > and
>>  > > > IG
>>  > > > > > Index Limited (register number 114059) are authorised and
>>  regulated
>>  > > by
>>  > > > > the
>>  > > > > > Financial Conduct Authority.
>>  > > > > >
>>  > > > >
>>  > > >
>>  > >
>>  >
>>  The information contained in this email is strictly confidential and for
>>  the use of the addressee only, unless otherwise indicated. If you are not
>>  the intended recipient, please do not read, copy, use or disclose to others
>>  this message or any attachment. Please also notify the sender by replying
>>  to this email or by telephone (+44(020 7896 0011) and then delete the email
>>  and any copies of it. Opinions, conclusion (etc) that do not relate to the
>>  official business of this company shall be understood as neither given nor
>>  endorsed by it. IG is a trading name of IG Markets Limited (a company
>>  registered in England and Wales, company number 04008957) and IG Index
>>  Limited (a company registered in England and Wales, company number
>>  01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
>>  London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
>>  Index Limited (register number 114059) are authorised and regulated by the
>>  Financial Conduct Authority.
>
> --
> Nacho - Ignacio Solis - [email protected]
The information contained in this email is strictly confidential and for the 
use of the addressee only, unless otherwise indicated. If you are not the 
intended recipient, please do not read, copy, use or disclose to others this 
message or any attachment. Please also notify the sender by replying to this 
email or by telephone (+44(020 7896 0011) and then delete the email and any 
copies of it. Opinions, conclusion (etc) that do not relate to the official 
business of this company shall be understood as neither given nor endorsed by 
it. IG is a trading name of IG Markets Limited (a company registered in England 
and Wales, company number 04008957) and IG Index Limited (a company registered 
in England and Wales, company number 01190902). Registered address at Cannon 
Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited 
(register number 195355) and IG Index Limited (register number 114059) are 
authorised and regulated by the Financial Conduct Authority.

Re: [DISCUSS] KIP-82 - Add Record Headers

Reply via email to