Re: [DISCUSS] KIP-82 - Add Record Headers

Ignacio Solis Mon, 14 Nov 2016 13:15:54 -0800

1) Yes - Headers are worthwhile
2) Yes - Headers should be a top level option


On Mon, Nov 14, 2016 at 9:16 AM, Michael Pearce <[email protected]>
wrote:

> Hi Roger,
>
> The kip details/examples the original proposal for key spacing , not the
> new mentioned as per discussion namespace idea.
>
> We will need to update the kip, when we get agreement this is a better
> approach (which seems to be the case if I have understood the general
> feeling in the conversation)
>
> Re the variable ints, at very early stage we did think about this. I think
> the added complexity for the saving isn't worth it. I'd rather go with, if
> we want to reduce overheads and size int16 (2bytes) keys as it keeps it
> simple.
>
> On the note of no headers, there is as per the kip as we use an attribute
> bit to denote if headers are present or not as such provides a zero
> overhead currently if headers are not used.
>
> I think as radai mentions would be good first if we can get clarity if do
> we now have general consensus that (1) headers are worthwhile and useful,
> and (2) we want it as a top level entity.
>
>
> Just to state the obvious i believe (1) headers are worthwhile and (2)
> agree as a top level entity.
>
> Cheers
> Mike
> ________________________________________
> From: Roger Hoover <[email protected]>
> Sent: Wednesday, November 9, 2016 9:10:47 PM
> To: [email protected]
> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>
> Sorry for going a little in the weeds but thanks for the replies regarding
> varint.
>
> Agreed that a prefix and {int, int} can be the same.  It doesn't look like
> that's what the KIP is saying the "Open" section.   The example shows
> 2100001
> for New Relic and 210002 for App Dynamics implying that the New Relic
> organization will have only a single header id to work with.  Or is 2100001
> a prefix?  The main point of a namespace or prefix is to reduce the
> overhead of config mapping or registration depending on how
> namespaces/prefixes are managed.
>
> Would love to hear more feedback on the higher-level questions though...
>
> Cheers,
>
> Roger
>
>
> On Wed, Nov 9, 2016 at 11:38 AM, radai <[email protected]> wrote:
>
> > I think this discussion is getting a bit into the weeds on technical
> > implementation details.
> > I'd liek to step back a minute and try and establish where we are in the
> > larger picture:
> >
> > (re-wording nacho's last paragraph)
> > 1. are we all in agreement that headers are a worthwhile and useful
> > addition to have? this was contested early on
> > 2. are we all in agreement on headers as top level entity vs headers
> > squirreled-away in V?
> >
> > if there are still concerns around these #2 points (#jay? #jun?)?
> >
> > (and now back to our normal programming ...)
> >
> > varints are nice. having said that, its adding complexity (see
> > https://github.com/addthis/stream-lib/blob/master/src/
> > main/java/com/clearspring/analytics/util/Varint.java
> > as 1st google result) and would require anyone writing other clients (C?
> > Python? Go? Bash? ;-) ) to get/implement the same, and for relatively
> > little gain (int vs string is order of magnitude, this isnt).
> >
> > int namespacing vs {int, int} namespacing are basically the same thing -
> > youre just namespacing an int64 and giving people while 2^32 ranges at a
> > time. the part i like about this is letting people have a large swath of
> > numbers with one registration so they dont have to come back for every
> > single plugin/header they want to "reserve".
> >
> >
> > On Wed, Nov 9, 2016 at 11:01 AM, Roger Hoover <[email protected]>
> > wrote:
> >
> > > Since some of the debate has been about overhead + performance, I'm
> > > wondering if we have considered a varint encoding (
> > > https://developers.google.com/protocol-buffers/docs/encoding#varints)
> > for
> > > the header length field (int32 in the proposal) and for header ids?  If
> > you
> > > don't use headers, the overhead would be a single byte and for each
> > header
> > > id < 128 would also need only a single byte?
> > >
> > >
> > >
> > > On Wed, Nov 9, 2016 at 6:43 AM, radai <[email protected]>
> > wrote:
> > >
> > > > @magnus - and very dangerous (youre essentially downloading and
> > executing
> > > > arbitrary code off the internet on your servers ... bad idea without
> a
> > > > sandbox, even with)
> > > >
> > > > as for it being a purely administrative task - i disagree.
> > > >
> > > > i wish it would, really, because then my earlier point on the
> > complexity
> > > of
> > > > the remapping process would be invalid, but at linkedin, for example,
> > we
> > > > (the team im in) run kafka as a service. we dont really know what our
> > > users
> > > > (developing applications that use kafka) are up to at any given
> moment.
> > > it
> > > > is very possible (given the existance of headers and a corresponding
> > > plugin
> > > > ecosystem) for some application to "equip" their producers and
> > consumers
> > > > with the required plugin without us knowing. i dont mean to imply
> thats
> > > > bad, i just want to make the point that its not as simple keeping it
> in
> > > > sync across a large-enough organization.
> > > >
> > > >
> > > > On Wed, Nov 9, 2016 at 6:17 AM, Magnus Edenhill <[email protected]>
> > > > wrote:
> > > >
> > > > > I think there is a piece missing in the Strings discussion, where
> > > > > pro-Stringers
> > > > > reason that by providing unique string identifiers for each header
> > > > > everything will just
> > > > > magically work for all parts of the stream pipeline.
> > > > >
> > > > > But the strings dont mean anything by themselves, and while we
> could
> > > > > probably envision
> > > > > some auto plugin loader that downloads, compiles, links and runs
> > > plugins
> > > > > on-demand
> > > > > as soon as they're seen by a consumer, I dont really see a use-case
> > for
> > > > > something
> > > > > so dynamic (and fragile) in practice.
> > > > >
> > > > > In the real world an application will be configured with a set of
> > > plugins
> > > > > to either add (producer)
> > > > > or read (consumer) headers.
> > > > > This is an administrative task based on what features a client
> > > > > needs/provides and results in
> > > > > some sort of configuration to enable and configure the desired
> > plugins.
> > > > >
> > > > > Since this needs to be kept somewhat in sync across an organisation
> > > > (there
> > > > > is no point in having producers
> > > > > add headers no consumers will read, and vice versa), the added
> > > complexity
> > > > > of assigning an id namespace
> > > > > for each plugin as it is being configured should be tolerable.
> > > > >
> > > > >
> > > > > /Magnus
> > > > >
> > > > > 2016-11-09 13:06 GMT+01:00 Michael Pearce <[email protected]>:
> > > > >
> > > > > > Just following/catching up on what seems to be an active night :)
> > > > > >
> > > > > > @Radai sorry if it may seem obvious but what does MD stand for?
> > > > > >
> > > > > > My take on String vs Int:
> > > > > >
> > > > > > I will state first I am pro Int (16 or 32).
> > > > > >
> > > > > > I do though playing devils advocate see a big plus with the
> > argument
> > > of
> > > > > > String keys, this is around integrating into an existing
> > eco-system.
> > > > > >
> > > > > > As many other systems use String based headers (Flume, JMS)  it
> > makes
> > > > it
> > > > > > much easier for these to be incorporated/integrated into.
> > > > > >
> > > > > > How with Int based headers could we provide a way/guidence to
> make
> > > this
> > > > > > integration simple / easy with transition flows over to kafka?
> > > > > >
> > > > > > * tough luck buddy you're on your own
> > > > > > * simply hash the string into int code and hope for no collisions
> > > (how
> > > > to
> > > > > > convert back though?)
> > > > > > * http2 style as mentioned by nacho.
> > > > > >
> > > > > > cheers,
> > > > > > Mike
> > > > > >
> > > > > >
> > > > > > ________________________________________
> > > > > > From: radai <[email protected]>
> > > > > > Sent: Wednesday, November 9, 2016 8:12 AM
> > > > > > To: [email protected]
> > > > > > Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
> > > > > >
> > > > > > thinking about it some more, the best way to transmit the header
> > > > > remapping
> > > > > > data to consumers would be to put it in the MD response payload,
> so
> > > > maybe
> > > > > > it should be discussed now.
> > > > > >
> > > > > >
> > > > > > On Wed, Nov 9, 2016 at 12:09 AM, radai <
> [email protected]
> > >
> > > > > wrote:
> > > > > >
> > > > > > > im not opposed to the idea of namespace mapping. all im saying
> is
> > > > that
> > > > > > its
> > > > > > > not part of the "mvp" and, since it requires no wire format
> > change,
> > > > can
> > > > > > > always be added later.
> > > > > > > also, its not as simple as just configuring MM to do the
> > transform:
> > > > > lets
> > > > > > > say i've implemented large message support as {666,1} and on
> some
> > > > > mirror
> > > > > > > target cluster its been remapped to {999,1}. the consumer
> plugin
> > > code
> > > > > > would
> > > > > > > also need to be told to look for the large message "part X of
> Y"
> > > > header
> > > > > > > under {999,1}. doable, but tricky.
> > > > > > >
> > > > > > > On Tue, Nov 8, 2016 at 10:29 PM, Gwen Shapira <
> [email protected]
> > >
> > > > > wrote:
> > > > > > >
> > > > > > >> While you can do whatever you want with a namespace and your
> > code,
> > > > > > >> what I'd expect is for each app to namespaces configurable...
> > > > > > >>
> > > > > > >> So if I accidentally used 666 for my HR department, and still
> > want
> > > > to
> > > > > > >> run RadaiApp, I can config "namespace=42" for RadaiApp and
> > > > everything
> > > > > > >> will look normal.
> > > > > > >>
> > > > > > >> This means you only need to sync usage inside your own
> > > organization.
> > > > > > >> Still hard, but somewhat easier than syncing with the entire
> > > world.
> > > > > > >>
> > > > > > >> On Tue, Nov 8, 2016 at 10:07 PM, radai <
> > > [email protected]>
> > > > > > >> wrote:
> > > > > > >> > and we can start with {namespace, id} and no re-mapping
> > support
> > > > and
> > > > > > >> always
> > > > > > >> > add it later on if/when collisions actually happen (i dont
> > think
> > > > > > they'd
> > > > > > >> be
> > > > > > >> > a problem).
> > > > > > >> >
> > > > > > >> > every interested party (so orgs or individuals) could then
> > > > register
> > > > > a
> > > > > > >> > prefix (0 = reserved, 1 = confluent ... 666 = me :-) ) and
> do
> > > > > whatever
> > > > > > >> with
> > > > > > >> > the 2nd ID - so once linkedin registers, say 3, then
> linkedin
> > > devs
> > > > > are
> > > > > > >> free
> > > > > > >> > to use {3, *} with a reasonable expectation to to collide
> with
> > > > > > anything
> > > > > > >> > else. further partitioning of that * becomes linkedin's
> > problem,
> > > > but
> > > > > > the
> > > > > > >> > "upstream registration" of a namespace only has to happen
> > once.
> > > > > > >> >
> > > > > > >> > On Tue, Nov 8, 2016 at 9:03 PM, James Cheng <
> > > [email protected]
> > > > >
> > > > > > >> wrote:
> > > > > > >> >
> > > > > > >> >>
> > > > > > >> >>
> > > > > > >> >>
> > > > > > >> >> > On Nov 8, 2016, at 5:54 PM, Gwen Shapira <
> > [email protected]>
> > > > > > wrote:
> > > > > > >> >> >
> > > > > > >> >> > Thank you so much for this clear and fair summary of the
> > > > > arguments.
> > > > > > >> >> >
> > > > > > >> >> > I'm in favor of ints. Not a deal-breaker, but in favor.
> > > > > > >> >> >
> > > > > > >> >> > Even more in favor of Magnus's decentralized suggestion
> > with
> > > > > > Roger's
> > > > > > >> >> > tweak: add a namespace for headers. This will allow each
> > app
> > > to
> > > > > > just
> > > > > > >> >> > use whatever IDs it wants internally, and then let the
> > admin
> > > > > > >> deploying
> > > > > > >> >> > the app figure out an available namespace ID for the app
> to
> > > > live
> > > > > > in.
> > > > > > >> >> > So io.confluent.schema-registry can be namespace 0x01 on
> my
> > > > > > >> deployment
> > > > > > >> >> > and 0x57 on yours, and the poor guys developing the app
> > don't
> > > > > need
> > > > > > to
> > > > > > >> >> > worry about that.
> > > > > > >> >> >
> > > > > > >> >>
> > > > > > >> >> Gwen, if I understand your example right, an application
> > > deployer
> > > > > > might
> > > > > > >> >> decide to use 0x01 in one deployment, and that means that
> > once
> > > > the
> > > > > > >> message
> > > > > > >> >> is written into the broker, it will be saved on the broker
> > with
> > > > > that
> > > > > > >> >> specific namespace (0x01).
> > > > > > >> >>
> > > > > > >> >> If you were to mirror that message into another cluster,
> the
> > > 0x01
> > > > > > would
> > > > > > >> >> accompany the message, right? What if the deployers of the
> > same
> > > > app
> > > > > > in
> > > > > > >> the
> > > > > > >> >> other cluster uses 0x57? They won't understand each other?
> > > > > > >> >>
> > > > > > >> >> I'm not sure that's an avoidable problem. I think it simply
> > > means
> > > > > > that
> > > > > > >> in
> > > > > > >> >> order to share data, you have to also have a shared (agreed
> > > upon)
> > > > > > >> >> understanding of what the namespaces mean. Which I think
> > makes
> > > > > sense,
> > > > > > >> >> because the alternate (sharing *nothing* at all) would mean
> > > that
> > > > > > there
> > > > > > >> >> would be no way to understand each other.
> > > > > > >> >>
> > > > > > >> >> -James
> > > > > > >> >>
> > > > > > >> >> > Gwen
> > > > > > >> >> >
> > > > > > >> >> > On Tue, Nov 8, 2016 at 4:23 PM, radai <
> > > > > [email protected]>
> > > > > > >> >> wrote:
> > > > > > >> >> >> +1 for sean's document. it covers pretty much all the
> > > > trade-offs
> > > > > > and
> > > > > > >> >> >> provides concrete figures to argue about :-)
> > > > > > >> >> >> (nit-picking - used the same xkcd twice, also trove has
> > been
> > > > > > >> superceded
> > > > > > >> >> for
> > > > > > >> >> >> purposes of high performance collections: look at
> > > > > > >> >> >> https://github.com/leventov/Koloboke)
> > > > > > >> >> >>
> > > > > > >> >> >> so to sum up the string vs int debate:
> > > > > > >> >> >>
> > > > > > >> >> >> performance - you can do 140k ops/sec _per thread_ with
> > > string
> > > > > > >> headers.
> > > > > > >> >> you
> > > > > > >> >> >> could do x2-3 better with ints. there's no arguing the
> > > > relative
> > > > > > diff
> > > > > > >> >> >> between the two, there's only the question of whether or
> > not
> > > > > _the
> > > > > > >> rest
> > > > > > >> >> of
> > > > > > >> >> >> kafka_ operates fast enough to care. if we want to make
> > > > choices
> > > > > > >> solely
> > > > > > >> >> >> based on performance we need ints. if we are willing to
> > > > > > >> >> settle/compromise
> > > > > > >> >> >> for a nicer (to some) API than strings are good enough
> for
> > > the
> > > > > > >> current
> > > > > > >> >> >> state of affairs.
> > > > > > >> >> >>
> > > > > > >> >> >> message size - with batching and compression it comes
> down
> > > to
> > > > a
> > > > > > ~5%
> > > > > > >> >> >> difference (internal testing, not in the doc. maybe
> would
> > > help
> > > > > > >> adding if
> > > > > > >> >> >> this becomes a point of contention?). this means it wont
> > > > really
> > > > > > >> affect
> > > > > > >> >> >> kafka in "throughput mode" (large, compressed batches).
> in
> > > > "low
> > > > > > >> latency"
> > > > > > >> >> >> mode (meaning less/no batching and compression) the
> > > difference
> > > > > can
> > > > > > >> be
> > > > > > >> >> >> extreme (it'll easily be an order of magnitude with
> small
> > > > > payloads
> > > > > > >> like
> > > > > > >> >> >> stock ticks and header keys of the form
> > > > > > >> >> >> "com.acme.infraTeam.kafka.hiMom.auditPlugin"). we have
> a
> > > few
> > > > > such
> > > > > > >> >> topics at
> > > > > > >> >> >> linkedin where actual payloads are ~2 ints and are
> > eclipsed
> > > by
> > > > > our
> > > > > > >> >> in-house
> > > > > > >> >> >> audit "header" which is why we liked ints to begin with.
> > > > > > >> >> >>
> > > > > > >> >> >> "ease of use" - strings would probably still require
> > _some_
> > > > > degree
> > > > > > >> of
> > > > > > >> >> >> partitioning by convention (imagine if everyone used the
> > key
> > > > > > >> "infra"...)
> > > > > > >> >> >> but its very intuitive for java devs to do anyway
> > > > > (reverse-domain
> > > > > > is
> > > > > > >> >> >> ingrained into java developers at a young age :-) ).
> also
> > > most
> > > > > > java
> > > > > > >> devs
> > > > > > >> >> >> find Map<String, whatever> more intuitive than
> > Map<Integer,
> > > > > > >> whatever> -
> > > > > > >> >> >> probably because of other text-based protocols like
> http.
> > > ints
> > > > > > would
> > > > > > >> >> >> require a number registry. if you think number
> registries
> > > are
> > > > > hard
> > > > > > >> just
> > > > > > >> >> >> look at the wiki page for KIPs (specifically the number
> > for
> > > > next
> > > > > > >> >> available
> > > > > > >> >> >> KIP) and think again - we are probably talking about the
> > > same
> > > > > > >> volume of
> > > > > > >> >> >> requests. also this would only be "required" (good
> > > > citizenship,
> > > > > > more
> > > > > > >> >> like)
> > > > > > >> >> >> if you want to publish your plugin for others to use.
> > within
> > > > > your
> > > > > > >> org do
> > > > > > >> >> >> whatever you want - just know that if you use [some
> > > "reserved"
> > > > > > >> range]
> > > > > > >> >> and a
> > > > > > >> >> >> future kafka update breaks it its your problem. RTFM.
> > > > > > >> >> >>
> > > > > > >> >> >> personally im in favor of ints.
> > > > > > >> >> >>
> > > > > > >> >> >> having said that (and like nacho) I will settle if int
> vs
> > > > string
> > > > > > >> remains
> > > > > > >> >> >> the only obstacle to this.
> > > > > > >> >> >>
> > > > > > >> >> >> On Tue, Nov 8, 2016 at 3:53 PM, Nacho Solis
> > > > > > >> <[email protected]
> > > > > > >> >> >
> > > > > > >> >> >> wrote:
> > > > > > >> >> >>
> > > > > > >> >> >>> I think it's well known I've been pushing for ints
> (and I
> > > > could
> > > > > > >> switch
> > > > > > >> >> to
> > > > > > >> >> >>> 16 bit shorts if pressed).
> > > > > > >> >> >>>
> > > > > > >> >> >>> - efficient (space)
> > > > > > >> >> >>> - efficient (processing)
> > > > > > >> >> >>> - easily partitionable
> > > > > > >> >> >>>
> > > > > > >> >> >>>
> > > > > > >> >> >>> However, if the only thing that is keeping us from
> > adopting
> > > > > > >> headers is
> > > > > > >> >> the
> > > > > > >> >> >>> use of strings vs ints as keys, then I would cave in
> and
> > > > accept
> > > > > > >> >> strings. If
> > > > > > >> >> >>> we do so, I would like to limit string keys to 128
> bytes
> > in
> > > > > > length.
> > > > > > >> >> This
> > > > > > >> >> >>> way 1) I could use a 3 letter string if I wanted
> > > (effectively
> > > > > > >> using 4
> > > > > > >> >> total
> > > > > > >> >> >>> bytes), 2) limit overall impact of possible keys (don't
> > > > really
> > > > > > want
> > > > > > >> >> people
> > > > > > >> >> >>> to send a 16K header string key).
> > > > > > >> >> >>>
> > > > > > >> >> >>> Nacho
> > > > > > >> >> >>>
> > > > > > >> >> >>>
> > > > > > >> >> >>> On Tue, Nov 8, 2016 at 3:35 PM, Gwen Shapira <
> > > > > [email protected]>
> > > > > > >> >> wrote:
> > > > > > >> >> >>>
> > > > > > >> >> >>>> Forgot to mention: Thank you for quantifying the
> > > trade-off -
> > > > > it
> > > > > > is
> > > > > > >> >> >>>> helpful and important regardless of what we end up
> > > deciding.
> > > > > > >> >> >>>>
> > > > > > >> >> >>>> On Tue, Nov 8, 2016 at 3:12 PM, Sean McCauliff
> > > > > > >> >> >>>> <[email protected]> wrote:
> > > > > > >> >> >>>>> On Tue, Nov 8, 2016 at 2:15 PM, Gwen Shapira <
> > > > > > [email protected]>
> > > > > > >> >> >>> wrote:
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>>> Since Kafka specifically targets high-throughput,
> > > > > low-latency
> > > > > > >> >> >>>>>> use-cases, I don't think we should trade them off
> that
> > > > > easily.
> > > > > > >> >> >>>>>>
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>> I find these kind of design goals not to be really
> > > helpful
> > > > > > unless
> > > > > > >> >> it's
> > > > > > >> >> >>>>> quantified in someway.  Because it's always possible
> to
> > > > argue
> > > > > > >> against
> > > > > > >> >> >>>>> something as either being not performant or just an
> > > > > > >> implementation
> > > > > > >> >> >>>> detail.
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>> This is a single threaded benchmarks so all the
> > > > measurements
> > > > > > are
> > > > > > >> per
> > > > > > >> >> >>>>> thread.
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>> For 1M messages/s/thread  if header keys are int and
> > you
> > > > had
> > > > > > >> even a
> > > > > > >> >> >>>> single
> > > > > > >> >> >>>>> header key, value pair then it's still about 2^-2
> > > > > microseconds
> > > > > > >> which
> > > > > > >> >> >>>> means
> > > > > > >> >> >>>>> you only have another 0.75 microseconds to do
> > everything
> > > > else
> > > > > > you
> > > > > > >> >> want
> > > > > > >> >> >>> to
> > > > > > >> >> >>>>> do with a message (1M messages/s means 1 micro second
> > per
> > > > > > >> message).
> > > > > > >> >> >>> With
> > > > > > >> >> >>>>> string header keys there is still 0.5 micro seconds
> to
> > > > > process
> > > > > > a
> > > > > > >> >> >>> message.
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>> I love strings as much as the next guy (we had them
> in
> > > > > Flume),
> > > > > > >> but I
> > > > > > >> >> >>>>>> was convinced by Magnus/Michael/Radai that strings
> > don't
> > > > > > >> actually
> > > > > > >> >> have
> > > > > > >> >> >>>>>> strong benefits as opposed to ints (you'll need a
> > string
> > > > > > >> registry
> > > > > > >> >> >>>>>> anyway - otherwise, how will you know what does the
> > > > > > "profile_id"
> > > > > > >> >> >>>>>> header refers to?) and I want to keep closer to our
> > > > original
> > > > > > >> design
> > > > > > >> >> >>>>>> goals for Kafka.
> > > > > > >> >> >>>>>>
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>> "confluent.profile_id"
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>>>
> > > > > > >> >> >>>>>> If someone likes strings in the headers and doesn't
> do
> > > > > > millions
> > > > > > >> of
> > > > > > >> >> >>>>>> messages a sec, they probably have lots of other
> > systems
> > > > > they
> > > > > > >> can
> > > > > > >> >> use
> > > > > > >> >> >>>>>> instead.
> > > > > > >> >> >>>>>>
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>> None of them will scale like Kafka.  Horizontal
> scaling
> > > is
> > > > > > still
> > > > > > >> >> good.
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>>>
> > > > > > >> >> >>>>>>
> > > > > > >> >> >>>>>> On Tue, Nov 8, 2016 at 1:22 PM, Sean McCauliff
> > > > > > >> >> >>>>>> <[email protected]> wrote:
> > > > > > >> >> >>>>>>> +1 for String keys.
> > > > > > >> >> >>>>>>>
> > > > > > >> >> >>>>>>> I've been doing some bechmarking and it seems like
> > the
> > > > > > speedup
> > > > > > >> for
> > > > > > >> >> >>>> using
> > > > > > >> >> >>>>>>> integer keys is about 2-5 depending on the length
> of
> > > the
> > > > > > >> strings
> > > > > > >> >> and
> > > > > > >> >> >>>> what
> > > > > > >> >> >>>>>>> collections are being used.  The overall amount of
> > time
> > > > > spent
> > > > > > >> >> >>> parsing
> > > > > > >> >> >>>> a
> > > > > > >> >> >>>>>> set
> > > > > > >> >> >>>>>>> of header key, value pairs probably does not matter
> > > > unless
> > > > > > you
> > > > > > >> are
> > > > > > >> >> >>>>>> getting
> > > > > > >> >> >>>>>>> close to 1M messages per consumer.  In which case
> > > > probably
> > > > > > >> don't
> > > > > > >> >> use
> > > > > > >> >> >>>>>>> headers.  There is also the option to use very
> short
> > > > > strings;
> > > > > > >> some
> > > > > > >> >> >>>> that
> > > > > > >> >> >>>>>> are
> > > > > > >> >> >>>>>>> even shorter than integers.
> > > > > > >> >> >>>>>>>
> > > > > > >> >> >>>>>>> Partitioning the string key space will be easier
> than
> > > > > > >> partitioning
> > > > > > >> >> >>> an
> > > > > > >> >> >>>>>>> integer key space. We won't need a global registry.
> > > > Kafka
> > > > > > >> >> >>> internally
> > > > > > >> >> >>>> can
> > > > > > >> >> >>>>>>> reserve some prefix like "_" as its namespace.
> > > Everyone
> > > > > else
> > > > > > >> can
> > > > > > >> >> >>> use
> > > > > > >> >> >>>>>> their
> > > > > > >> >> >>>>>>> company or project name as namespace prefix and
> life
> > > > should
> > > > > > be
> > > > > > >> >> good.
> > > > > > >> >> >>>>>>>
> > > > > > >> >> >>>>>>> Here's the link to some of the benchmarking info:
> > > > > > >> >> >>>>>>> https://docs.google.com/document/d/1tfT-
> > > > > > >> >> >>>> 6SZdnKOLyWGDH82kS30PnUkmgb7nPL
> > > > > > >> >> >>>>>> dw6p65pAI/edit?usp=sharing
> > > > > > >> >> >>>>>>>
> > > > > > >> >> >>>>>>>
> > > > > > >> >> >>>>>>>
> > > > > > >> >> >>>>>>> --
> > > > > > >> >> >>>>>>> Sean McCauliff
> > > > > > >> >> >>>>>>> Staff Software Engineer
> > > > > > >> >> >>>>>>> Kafka
> > > > > > >> >> >>>>>>>
> > > > > > >> >> >>>>>>> [email protected]
> > > > > > >> >> >>>>>>> linkedin.com/in/sean-mccauliff-b563192
> > > > > > >> >> >>>>>>>
> > > > > > >> >> >>>>>>> On Mon, Nov 7, 2016 at 11:51 PM, Michael Pearce <
> > > > > > >> >> >>>> [email protected]>
> > > > > > >> >> >>>>>>> wrote:
> > > > > > >> >> >>>>>>>
> > > > > > >> >> >>>>>>>> +1 on this slimmer version of our proposal
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> I def think the Id space we can reduce from the
> > > proposed
> > > > > > >> >> >>>> int32(4bytes)
> > > > > > >> >> >>>>>>>> down to int16(2bytes) it saves on space and as
> > headers
> > > > we
> > > > > > >> wouldn't
> > > > > > >> >> >>>>>> expect
> > > > > > >> >> >>>>>>>> the number of headers being used concurrently
> being
> > > that
> > > > > > high.
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> I would wonder if we should make the value byte
> > array
> > > > > length
> > > > > > >> still
> > > > > > >> >> >>>> int32
> > > > > > >> >> >>>>>>>> though as This is the standard Max array length in
> > > Java
> > > > > > saying
> > > > > > >> >> that
> > > > > > >> >> >>>> it
> > > > > > >> >> >>>>>> is a
> > > > > > >> >> >>>>>>>> header and I guess limiting the size is sensible
> and
> > > > would
> > > > > > >> work
> > > > > > >> >> for
> > > > > > >> >> >>>> all
> > > > > > >> >> >>>>>> the
> > > > > > >> >> >>>>>>>> use cases we have in mind so happy with limiting
> > this.
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> Do people generally concur on Magnus's slimmer
> > > version?
> > > > > > >> Anyone see
> > > > > > >> >> >>>> any
> > > > > > >> >> >>>>>>>> issues if we moved from int32 to int16?
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> Re configurable ids per plugin over a global
> > registry
> > > > also
> > > > > > >> would
> > > > > > >> >> >>> work
> > > > > > >> >> >>>>>> for
> > > > > > >> >> >>>>>>>> us.  As such if this has better concensus over the
> > > > > proposed
> > > > > > >> global
> > > > > > >> >> >>>>>> registry
> > > > > > >> >> >>>>>>>> I'd be happy to change that.
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> I was already sold on ints over strings for keys
> ;)
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> Cheers
> > > > > > >> >> >>>>>>>> Mike
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> ________________________________________
> > > > > > >> >> >>>>>>>> From: Magnus Edenhill <[email protected]>
> > > > > > >> >> >>>>>>>> Sent: Monday, November 7, 2016 10:10:21 PM
> > > > > > >> >> >>>>>>>> To: [email protected]
> > > > > > >> >> >>>>>>>> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> Hi,
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> I'm +1 for adding generic message headers, but I
> do
> > > > share
> > > > > > the
> > > > > > >> >> >>>> concerns
> > > > > > >> >> >>>>>>>> previously aired on this thread and during the KIP
> > > > > meeting.
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> So let me propose a slimmer alternative that does
> > not
> > > > > > require
> > > > > > >> any
> > > > > > >> >> >>>> sort
> > > > > > >> >> >>>>>> of
> > > > > > >> >> >>>>>>>> global header registry, does not affect broker
> > > > performance
> > > > > > or
> > > > > > >> >> >>>>>> operations,
> > > > > > >> >> >>>>>>>> and adds as little overhead as possible.
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> Message
> > > > > > >> >> >>>>>>>> ------------
> > > > > > >> >> >>>>>>>> The protocol Message type is extended with a
> Headers
> > > > array
> > > > > > >> >> consting
> > > > > > >> >> >>>> of
> > > > > > >> >> >>>>>>>> Tags, where a Tag is defined as:
> > > > > > >> >> >>>>>>>>   int16 Id
> > > > > > >> >> >>>>>>>>   int16 Len              // binary_data length
> > > > > > >> >> >>>>>>>>   binary_data[Len]  // opaque binary data
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> Ids
> > > > > > >> >> >>>>>>>> ---
> > > > > > >> >> >>>>>>>> The Id space is not centrally managed, so whenever
> > an
> > > > > > >> application
> > > > > > >> >> >>>> needs
> > > > > > >> >> >>>>>> to
> > > > > > >> >> >>>>>>>> add headers, or use an eco-system plugin that
> does,
> > > its
> > > > Id
> > > > > > >> >> >>> allocation
> > > > > > >> >> >>>>>> will
> > > > > > >> >> >>>>>>>> need to be manually configured.
> > > > > > >> >> >>>>>>>> This moves the allocation concern from the global
> > > space
> > > > > down
> > > > > > >> to
> > > > > > >> >> >>>>>>>> organization level and avoids the risk for id
> > > conflicts.
> > > > > > >> >> >>>>>>>> Example pseudo-config for some app:
> > > > > > >> >> >>>>>>>>    sometrackerplugin.tag.sourcev3.id=1000
> > > > > > >> >> >>>>>>>>    dbthing.tag.tablename.id=1001
> > > > > > >> >> >>>>>>>>    myschemareg.tag.schemaname.id=1002
> > > > > > >> >> >>>>>>>>    myschemareg.tag.schemaversion.id=1003
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> Each header-writing or header-reading plugin must
> > > > provide
> > > > > > >> means
> > > > > > >> >> >>>>>> (typically
> > > > > > >> >> >>>>>>>> through configuration) to specify the tag for each
> > > > header
> > > > > it
> > > > > > >> uses.
> > > > > > >> >> >>>>>> Defaults
> > > > > > >> >> >>>>>>>> should be avoided.
> > > > > > >> >> >>>>>>>> A consumer silently ignores tags it does not have
> a
> > > > > mapping
> > > > > > >> for
> > > > > > >> >> >>>> (since
> > > > > > >> >> >>>>>> the
> > > > > > >> >> >>>>>>>> binary_data can't be parsed without knowing what
> it
> > > is).
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> Id range 0..999 is reserved for future use by the
> > > broker
> > > > > and
> > > > > > >> must
> > > > > > >> >> >>>> not be
> > > > > > >> >> >>>>>>>> used by plugins.
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> Broker
> > > > > > >> >> >>>>>>>> ---------
> > > > > > >> >> >>>>>>>> The broker does not process the tags (other than
> the
> > > > > > standard
> > > > > > >> >> >>>> protocol
> > > > > > >> >> >>>>>>>> syntax verification), it simply stores and
> forwards
> > > them
> > > > > as
> > > > > > >> opaque
> > > > > > >> >> >>>> data.
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> Standard message translation (removal of Headers)
> > > kicks
> > > > in
> > > > > > for
> > > > > > >> >> >>> older
> > > > > > >> >> >>>>>>>> clients.
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> Why not string ids?
> > > > > > >> >> >>>>>>>> -------------------------
> > > > > > >> >> >>>>>>>> String ids might seem like a good idea, but:
> > > > > > >> >> >>>>>>>> * does not really solve uniqueness
> > > > > > >> >> >>>>>>>> * consumes a lot of space (2 byte string length +
> > > > string,
> > > > > > per
> > > > > > >> >> >>>> header)
> > > > > > >> >> >>>>>> to
> > > > > > >> >> >>>>>>>> be meaningful
> > > > > > >> >> >>>>>>>> * doesn't really say anything how to parse the
> tag's
> > > > data,
> > > > > > so
> > > > > > >> it
> > > > > > >> >> >>> is
> > > > > > >> >> >>>> in
> > > > > > >> >> >>>>>>>> effect useless on its own.
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> Regards,
> > > > > > >> >> >>>>>>>> Magnus
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> 2016-11-07 18:32 GMT+01:00 Michael Pearce <
> > > > > > >> [email protected]
> > > > > > >> >> >:
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>>> Hi Roger,
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> Thanks for the support.
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> I think the key thing is to have a common key
> space
> > > to
> > > > > make
> > > > > > >> an
> > > > > > >> >> >>>>>> ecosystem,
> > > > > > >> >> >>>>>>>>> there does have to be some level of contract for
> > > people
> > > > > to
> > > > > > >> play
> > > > > > >> >> >>>>>> nicely.
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> Having map<String, byte[]> or as per current
> > proposed
> > > > in
> > > > > > kip
> > > > > > >> of
> > > > > > >> >> >>>>>> having a
> > > > > > >> >> >>>>>>>>> numerical key space of  map<int, byte[]> is a
> level
> > > of
> > > > > the
> > > > > > >> >> >>> contract
> > > > > > >> >> >>>>>> that
> > > > > > >> >> >>>>>>>>> most people would expect.
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> I think the example in a previous comment someone
> > > else
> > > > > made
> > > > > > >> >> >>>> linking to
> > > > > > >> >> >>>>>>>> AWS
> > > > > > >> >> >>>>>>>>> blog and also implemented api where originally
> they
> > > > > didn't
> > > > > > >> have a
> > > > > > >> >> >>>>>> header
> > > > > > >> >> >>>>>>>>> space but not they do, where keys are uniform but
> > the
> > > > > value
> > > > > > >> can
> > > > > > >> >> >>> be
> > > > > > >> >> >>>>>>>> string,
> > > > > > >> >> >>>>>>>>> int, anything is a good example.
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> Having a custom MetadataSerializer is something
> we
> > > had
> > > > > > played
> > > > > > >> >> >>> with,
> > > > > > >> >> >>>>>> but
> > > > > > >> >> >>>>>>>>> discounted the idea, as if you wanted everyone to
> > > work
> > > > > the
> > > > > > >> same
> > > > > > >> >> >>>> way in
> > > > > > >> >> >>>>>>>> the
> > > > > > >> >> >>>>>>>>> ecosystem, having to have this also customizable
> > > makes
> > > > > it a
> > > > > > >> bit
> > > > > > >> >> >>>>>> harder.
> > > > > > >> >> >>>>>>>>> Think about making the whole message record
> custom
> > > > > > >> serializable,
> > > > > > >> >> >>>> this
> > > > > > >> >> >>>>>>>> would
> > > > > > >> >> >>>>>>>>> make it fairly tricky (though it would not be
> > > > impossible)
> > > > > > to
> > > > > > >> have
> > > > > > >> >> >>>> made
> > > > > > >> >> >>>>>>>> work
> > > > > > >> >> >>>>>>>>> nicely. Having the value customizable we thought
> > is a
> > > > > > >> reasonable
> > > > > > >> >> >>>>>> tradeoff
> > > > > > >> >> >>>>>>>>> here of flexibility over contract of interaction
> > > > between
> > > > > > >> >> >>> different
> > > > > > >> >> >>>>>>>> parties.
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> Is there a particular case or benefit of having
> > > > > > serialization
> > > > > > >> >> >>>>>>>> customizable
> > > > > > >> >> >>>>>>>>> that you have in mind?
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> Saying this it is obviously something that could
> be
> > > > > > >> implemented,
> > > > > > >> >> >>> if
> > > > > > >> >> >>>>>> there
> > > > > > >> >> >>>>>>>>> is a need. If we did go this avenue I think a
> > > defaulted
> > > > > > >> >> >>> serializer
> > > > > > >> >> >>>>>>>>> implementation should exist so for the 80:20
> rule,
> > > > people
> > > > > > can
> > > > > > >> >> >>> just
> > > > > > >> >> >>>>>> have
> > > > > > >> >> >>>>>>>> the
> > > > > > >> >> >>>>>>>>> broker and clients get default behavior.
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> Cheers
> > > > > > >> >> >>>>>>>>> Mike
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> On 11/6/16, 5:25 PM, "radai" <
> > > > [email protected]
> > > > > >
> > > > > > >> wrote:
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>>    making header _key_ serialization configurable
> > > > > > potentially
> > > > > > >> >> >>>>>> undermines
> > > > > > >> >> >>>>>>>>> the
> > > > > > >> >> >>>>>>>>>    board usefulness of the feature (any point
> along
> > > the
> > > > > > path
> > > > > > >> >> >>> must
> > > > > > >> >> >>>> be
> > > > > > >> >> >>>>>>>> able
> > > > > > >> >> >>>>>>>>> to
> > > > > > >> >> >>>>>>>>>    read the header keys. the values may be
> whatever
> > > and
> > > > > > >> require
> > > > > > >> >> >>>> more
> > > > > > >> >> >>>>>>>>> intimate
> > > > > > >> >> >>>>>>>>>    knowledge of the code that produced specific
> > > > headers,
> > > > > > but
> > > > > > >> >> >>> keys
> > > > > > >> >> >>>>>> should
> > > > > > >> >> >>>>>>>>> be
> > > > > > >> >> >>>>>>>>>    universally readable).
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>>    it would also make it hard to write really
> > > portable
> > > > > > >> plugins -
> > > > > > >> >> >>>> say
> > > > > > >> >> >>>>>> i
> > > > > > >> >> >>>>>>>>> wrote a
> > > > > > >> >> >>>>>>>>>    large message splitter/combiner - if i rely on
> > key
> > > > > > >> >> >>>> "largeMessage"
> > > > > > >> >> >>>>>> and
> > > > > > >> >> >>>>>>>>>    values of the form "1/20" someone who uses
> > > > (contrived
> > > > > > >> >> >>> example)
> > > > > > >> >> >>>>>>>>> Map<Byte[],
> > > > > > >> >> >>>>>>>>>    Double> wouldnt be able to re-use my code.
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>>    not the end of a the world within an
> > organization,
> > > > but
> > > > > > >> >> >>>>>> problematic if
> > > > > > >> >> >>>>>>>>> you
> > > > > > >> >> >>>>>>>>>    want to enable an ecosystem
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>>    On Thu, Nov 3, 2016 at 2:04 PM, Roger Hoover <
> > > > > > >> >> >>>>>> [email protected]
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> wrote:
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>>> As others have laid out, I see strong reasons
> for
> > a
> > > > > common
> > > > > > >> >> >>>>>> message
> > > > > > >> >> >>>>>>>>>> metadata structure for the Kafka ecosystem.  In
> > > > > > particular,
> > > > > > >> >> >>>> I've
> > > > > > >> >> >>>>>>>>> seen that
> > > > > > >> >> >>>>>>>>>> even within a single organization,
> infrastructure
> > > > teams
> > > > > > >> >> >>> often
> > > > > > >> >> >>>>>> own
> > > > > > >> >> >>>>>>>> the
> > > > > > >> >> >>>>>>>>>> message metadata while application teams own the
> > > > > > >> >> >>>>>> application-level
> > > > > > >> >> >>>>>>>>> data
> > > > > > >> >> >>>>>>>>>> format.  Allowing metadata and content to have
> > > > different
> > > > > > >> >> >>>>>> structure
> > > > > > >> >> >>>>>>>>> and
> > > > > > >> >> >>>>>>>>>> evolve separately is very helpful for this.
> > Also, I
> > > > > think
> > > > > > >> >> >>>>>> there's
> > > > > > >> >> >>>>>>>> a
> > > > > > >> >> >>>>>>>>> lot of
> > > > > > >> >> >>>>>>>>>> value to having a common metadata structure
> shared
> > > > > across
> > > > > > >> >> >>> the
> > > > > > >> >> >>>>>> Kafka
> > > > > > >> >> >>>>>>>>>> ecosystem so that tools which leverage metadata
> > can
> > > > more
> > > > > > >> >> >>>> easily
> > > > > > >> >> >>>>>> be
> > > > > > >> >> >>>>>>>>> shared
> > > > > > >> >> >>>>>>>>>> across organizations and integrated together.
> > > > > > >> >> >>>>>>>>>>
> > > > > > >> >> >>>>>>>>>> The question is, where does the metadata
> structure
> > > > > belong?
> > > > > > >> >> >>>>>> Here's
> > > > > > >> >> >>>>>>>>> my take:
> > > > > > >> >> >>>>>>>>>>
> > > > > > >> >> >>>>>>>>>> We change the Kafka wire and on-disk format to
> > from
> > > a
> > > > > > (key,
> > > > > > >> >> >>>>>> value)
> > > > > > >> >> >>>>>>>>> model to
> > > > > > >> >> >>>>>>>>>> a (key, metadata, value) model where all three
> are
> > > > byte
> > > > > > >> >> >>>> arrays
> > > > > > >> >> >>>>>> from
> > > > > > >> >> >>>>>>>>> the
> > > > > > >> >> >>>>>>>>>> brokers point of view.  The primary reason for
> > this
> > > is
> > > > > > that
> > > > > > >> >> >>>> it
> > > > > > >> >> >>>>>>>>> provides a
> > > > > > >> >> >>>>>>>>>> backward compatible migration path forward.
> > > Producers
> > > > > can
> > > > > > >> >> >>>> start
> > > > > > >> >> >>>>>>>>> populating
> > > > > > >> >> >>>>>>>>>> metadata fields before all consumers understand
> > the
> > > > > > >> >> >>> metadata
> > > > > > >> >> >>>>>>>>> structure.
> > > > > > >> >> >>>>>>>>>> For people who already have custom envelope
> > > > structures,
> > > > > > >> >> >>> they
> > > > > > >> >> >>>> can
> > > > > > >> >> >>>>>>>>> populate
> > > > > > >> >> >>>>>>>>>> their existing structure and the new structure
> > for a
> > > > > while
> > > > > > >> >> >>> as
> > > > > > >> >> >>>>>> they
> > > > > > >> >> >>>>>>>>> make the
> > > > > > >> >> >>>>>>>>>> transition.
> > > > > > >> >> >>>>>>>>>>
> > > > > > >> >> >>>>>>>>>> We could stop there and let the clients plug in
> a
> > > > > > >> >> >>>> KeySerializer,
> > > > > > >> >> >>>>>>>>>> MetadataSerializer, and ValueSerializer but I
> > think
> > > it
> > > > > is
> > > > > > >> >> >>>> also
> > > > > > >> >> >>>>>> be
> > > > > > >> >> >>>>>>>>> useful to
> > > > > > >> >> >>>>>>>>>> have a default MetadataSerializer that
> implements
> > a
> > > > > > >> >> >>> key-value
> > > > > > >> >> >>>>>> model
> > > > > > >> >> >>>>>>>>> similar
> > > > > > >> >> >>>>>>>>>> to AMQP or HTTP headers.  Or we could go even
> > > further
> > > > > and
> > > > > > >> >> >>>>>>>> prescribe a
> > > > > > >> >> >>>>>>>>>> Map<String, byte[]> or Map<String, String> data
> > > model
> > > > > for
> > > > > > >> >> >>>>>> headers
> > > > > > >> >> >>>>>>>> in
> > > > > > >> >> >>>>>>>>> the
> > > > > > >> >> >>>>>>>>>> clients (while still allowing custom
> serialization
> > > of
> > > > > the
> > > > > > >> >> >>>> header
> > > > > > >> >> >>>>>>>> data
> > > > > > >> >> >>>>>>>>>> model).
> > > > > > >> >> >>>>>>>>>>
> > > > > > >> >> >>>>>>>>>> I think this would address Radai's concerns:
> > > > > > >> >> >>>>>>>>>> 1. All client code would not need to be updated
> to
> > > > know
> > > > > > >> >> >>> about
> > > > > > >> >> >>>>>> the
> > > > > > >> >> >>>>>>>>>> container.
> > > > > > >> >> >>>>>>>>>> 2. Middleware friendly clients would have a
> > standard
> > > > > > header
> > > > > > >> >> >>>> data
> > > > > > >> >> >>>>>>>>> model to
> > > > > > >> >> >>>>>>>>>> work with.
> > > > > > >> >> >>>>>>>>>> 3. KIP is required both b/c of broker changes
> and
> > > > > because
> > > > > > >> >> >>> of
> > > > > > >> >> >>>>>> client
> > > > > > >> >> >>>>>>>>> API
> > > > > > >> >> >>>>>>>>>> changes.
> > > > > > >> >> >>>>>>>>>>
> > > > > > >> >> >>>>>>>>>> Cheers,
> > > > > > >> >> >>>>>>>>>>
> > > > > > >> >> >>>>>>>>>> Roger
> > > > > > >> >> >>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>
> > > > > > >> >> >>>>>>>>>> On Wed, Nov 2, 2016 at 4:38 PM, radai <
> > > > > > >> >> >>>>>> [email protected]>
> > > > > > >> >> >>>>>>>>> wrote:
> > > > > > >> >> >>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>> my biggest issues with a "standard" wrapper
> > format:
> > > > > > >> >> >>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>> 1. _ALL_ client _CODE_ (as opposed to kafka lib
> > > > > version)
> > > > > > >> >> >>>> must
> > > > > > >> >> >>>>>> be
> > > > > > >> >> >>>>>>>>> updated
> > > > > > >> >> >>>>>>>>>> to
> > > > > > >> >> >>>>>>>>>>> know about the container, because any old naive
> > > code
> > > > > > >> >> >>>> trying to
> > > > > > >> >> >>>>>>>>> directly
> > > > > > >> >> >>>>>>>>>>> deserialize its own payload would keel over and
> > die
> > > > (it
> > > > > > >> >> >>>> needs
> > > > > > >> >> >>>>>> to
> > > > > > >> >> >>>>>>>>> know to
> > > > > > >> >> >>>>>>>>>>> deserialize a container, and then dig in there
> > for
> > > > its
> > > > > > >> >> >>>>>> payload).
> > > > > > >> >> >>>>>>>>>>> 2. in order to write middleware-friendly
> clients
> > > that
> > > > > > >> >> >>>> utilize
> > > > > > >> >> >>>>>>>> such
> > > > > > >> >> >>>>>>>>> a
> > > > > > >> >> >>>>>>>>>>> container one would basically have to write
> their
> > > own
> > > > > > >> >> >>>>>>>>> producer/consumer
> > > > > > >> >> >>>>>>>>>> API
> > > > > > >> >> >>>>>>>>>>> on top of the open source kafka one.
> > > > > > >> >> >>>>>>>>>>> 3. if you were going to go with a wrapper
> format
> > > you
> > > > > > >> >> >>> really
> > > > > > >> >> >>>>>> dont
> > > > > > >> >> >>>>>>>>> need to
> > > > > > >> >> >>>>>>>>>>> bother with a kip (just open source your own
> > client
> > > > > stack
> > > > > > >> >> >>>>>> from #2
> > > > > > >> >> >>>>>>>>> above
> > > > > > >> >> >>>>>>>>>> so
> > > > > > >> >> >>>>>>>>>>> others could stop re-inventing it)
> > > > > > >> >> >>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>> On Wed, Nov 2, 2016 at 4:25 PM, James Cheng <
> > > > > > >> >> >>>>>>>> [email protected]>
> > > > > > >> >> >>>>>>>>>> wrote:
> > > > > > >> >> >>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>> How exactly would this work? Or maybe that's
> out
> > > of
> > > > > > >> >> >>> scope
> > > > > > >> >> >>>>>> for
> > > > > > >> >> >>>>>>>>> this
> > > > > > >> >> >>>>>>>>>> email.
> > > > > > >> >> >>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> The information contained in this email is
> strictly
> > > > > > >> confidential
> > > > > > >> >> >>>> and
> > > > > > >> >> >>>>>> for
> > > > > > >> >> >>>>>>>>> the use of the addressee only, unless otherwise
> > > > > indicated.
> > > > > > >> If you
> > > > > > >> >> >>>> are
> > > > > > >> >> >>>>>> not
> > > > > > >> >> >>>>>>>>> the intended recipient, please do not read, copy,
> > use
> > > > or
> > > > > > >> disclose
> > > > > > >> >> >>>> to
> > > > > > >> >> >>>>>>>> others
> > > > > > >> >> >>>>>>>>> this message or any attachment. Please also
> notify
> > > the
> > > > > > >> sender by
> > > > > > >> >> >>>>>> replying
> > > > > > >> >> >>>>>>>>> to this email or by telephone (+44(020 7896 0011)
> > and
> > > > > then
> > > > > > >> delete
> > > > > > >> >> >>>> the
> > > > > > >> >> >>>>>>>> email
> > > > > > >> >> >>>>>>>>> and any copies of it. Opinions, conclusion (etc)
> > that
> > > > do
> > > > > > not
> > > > > > >> >> >>>> relate to
> > > > > > >> >> >>>>>>>> the
> > > > > > >> >> >>>>>>>>> official business of this company shall be
> > understood
> > > > as
> > > > > > >> neither
> > > > > > >> >> >>>> given
> > > > > > >> >> >>>>>>>> nor
> > > > > > >> >> >>>>>>>>> endorsed by it. IG is a trading name of IG
> Markets
> > > > > Limited
> > > > > > (a
> > > > > > >> >> >>>> company
> > > > > > >> >> >>>>>>>>> registered in England and Wales, company number
> > > > 04008957)
> > > > > > >> and IG
> > > > > > >> >> >>>> Index
> > > > > > >> >> >>>>>>>>> Limited (a company registered in England and
> Wales,
> > > > > company
> > > > > > >> >> >>> number
> > > > > > >> >> >>>>>>>>> 01190902). Registered address at Cannon Bridge
> > House,
> > > > 25
> > > > > > >> Dowgate
> > > > > > >> >> >>>> Hill,
> > > > > > >> >> >>>>>>>>> London EC4R 2YA. Both IG Markets Limited
> (register
> > > > number
> > > > > > >> 195355)
> > > > > > >> >> >>>> and
> > > > > > >> >> >>>>>> IG
> > > > > > >> >> >>>>>>>>> Index Limited (register number 114059) are
> > authorised
> > > > and
> > > > > > >> >> >>>> regulated by
> > > > > > >> >> >>>>>>>> the
> > > > > > >> >> >>>>>>>>> Financial Conduct Authority.
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>> The information contained in this email is
> strictly
> > > > > > >> confidential
> > > > > > >> >> >>> and
> > > > > > >> >> >>>> for
> > > > > > >> >> >>>>>>>> the use of the addressee only, unless otherwise
> > > > indicated.
> > > > > > If
> > > > > > >> you
> > > > > > >> >> >>> are
> > > > > > >> >> >>>>>> not
> > > > > > >> >> >>>>>>>> the intended recipient, please do not read, copy,
> > use
> > > or
> > > > > > >> disclose
> > > > > > >> >> >>> to
> > > > > > >> >> >>>>>> others
> > > > > > >> >> >>>>>>>> this message or any attachment. Please also notify
> > the
> > > > > > sender
> > > > > > >> by
> > > > > > >> >> >>>>>> replying
> > > > > > >> >> >>>>>>>> to this email or by telephone (+44(020 7896 0011)
> > and
> > > > then
> > > > > > >> delete
> > > > > > >> >> >>> the
> > > > > > >> >> >>>>>> email
> > > > > > >> >> >>>>>>>> and any copies of it. Opinions, conclusion (etc)
> > that
> > > do
> > > > > not
> > > > > > >> >> relate
> > > > > > >> >> >>>> to
> > > > > > >> >> >>>>>> the
> > > > > > >> >> >>>>>>>> official business of this company shall be
> > understood
> > > as
> > > > > > >> neither
> > > > > > >> >> >>>> given
> > > > > > >> >> >>>>>> nor
> > > > > > >> >> >>>>>>>> endorsed by it. IG is a trading name of IG Markets
> > > > Limited
> > > > > > (a
> > > > > > >> >> >>> company
> > > > > > >> >> >>>>>>>> registered in England and Wales, company number
> > > > 04008957)
> > > > > > and
> > > > > > >> IG
> > > > > > >> >> >>>> Index
> > > > > > >> >> >>>>>>>> Limited (a company registered in England and
> Wales,
> > > > > company
> > > > > > >> number
> > > > > > >> >> >>>>>>>> 01190902). Registered address at Cannon Bridge
> > House,
> > > 25
> > > > > > >> Dowgate
> > > > > > >> >> >>>> Hill,
> > > > > > >> >> >>>>>>>> London EC4R 2YA. Both IG Markets Limited (register
> > > > number
> > > > > > >> 195355)
> > > > > > >> >> >>>> and IG
> > > > > > >> >> >>>>>>>> Index Limited (register number 114059) are
> > authorised
> > > > and
> > > > > > >> >> regulated
> > > > > > >> >> >>>> by
> > > > > > >> >> >>>>>> the
> > > > > > >> >> >>>>>>>> Financial Conduct Authority.
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>
> > > > > > >> >> >>>>>>
> > > > > > >> >> >>>>>>
> > > > > > >> >> >>>>>> --
> > > > > > >> >> >>>>>> Gwen Shapira
> > > > > > >> >> >>>>>> Product Manager | Confluent
> > > > > > >> >> >>>>>> 650.450.2760 | @gwenshap
> > > > > > >> >> >>>>>> Follow us: Twitter | blog
> > > > > > >> >> >>>>>>
> > > > > > >> >> >>>>
> > > > > > >> >> >>>>
> > > > > > >> >> >>>>
> > > > > > >> >> >>>> --
> > > > > > >> >> >>>> Gwen Shapira
> > > > > > >> >> >>>> Product Manager | Confluent
> > > > > > >> >> >>>> 650.450.2760 | @gwenshap
> > > > > > >> >> >>>> Follow us: Twitter | blog
> > > > > > >> >> >>>>
> > > > > > >> >> >>>
> > > > > > >> >> >>>
> > > > > > >> >> >>>
> > > > > > >> >> >>> --
> > > > > > >> >> >>> Nacho (Ignacio) Solis
> > > > > > >> >> >>> Kafka
> > > > > > >> >> >>> [email protected]
> > > > > > >> >> >>>
> > > > > > >> >> >
> > > > > > >> >> >
> > > > > > >> >> >
> > > > > > >> >> > --
> > > > > > >> >> > Gwen Shapira
> > > > > > >> >> > Product Manager | Confluent
> > > > > > >> >> > 650.450.2760 | @gwenshap
> > > > > > >> >> > Follow us: Twitter | blog
> > > > > > >> >>
> > > > > > >> >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> Gwen Shapira
> > > > > > >> Product Manager | Confluent
> > > > > > >> 650.450.2760 | @gwenshap
> > > > > > >> Follow us: Twitter | blog
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > The information contained in this email is strictly confidential
> > and
> > > > for
> > > > > > the use of the addressee only, unless otherwise indicated. If you
> > are
> > > > not
> > > > > > the intended recipient, please do not read, copy, use or disclose
> > to
> > > > > others
> > > > > > this message or any attachment. Please also notify the sender by
> > > > replying
> > > > > > to this email or by telephone (+44(020 7896 0011) and then delete
> > the
> > > > > email
> > > > > > and any copies of it. Opinions, conclusion (etc) that do not
> relate
> > > to
> > > > > the
> > > > > > official business of this company shall be understood as neither
> > > given
> > > > > nor
> > > > > > endorsed by it. IG is a trading name of IG Markets Limited (a
> > company
> > > > > > registered in England and Wales, company number 04008957) and IG
> > > Index
> > > > > > Limited (a company registered in England and Wales, company
> number
> > > > > > 01190902). Registered address at Cannon Bridge House, 25 Dowgate
> > > Hill,
> > > > > > London EC4R 2YA. Both IG Markets Limited (register number 195355)
> > and
> > > > IG
> > > > > > Index Limited (register number 114059) are authorised and
> regulated
> > > by
> > > > > the
> > > > > > Financial Conduct Authority.
> > > > > >
> > > > >
> > > >
> > >
> >
> The information contained in this email is strictly confidential and for
> the use of the addressee only, unless otherwise indicated. If you are not
> the intended recipient, please do not read, copy, use or disclose to others
> this message or any attachment. Please also notify the sender by replying
> to this email or by telephone (+44(020 7896 0011) and then delete the email
> and any copies of it. Opinions, conclusion (etc) that do not relate to the
> official business of this company shall be understood as neither given nor
> endorsed by it. IG is a trading name of IG Markets Limited (a company
> registered in England and Wales, company number 04008957) and IG Index
> Limited (a company registered in England and Wales, company number
> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
> Index Limited (register number 114059) are authorised and regulated by the
> Financial Conduct Authority.
>



-- 
Nacho - Ignacio Solis - [email protected]

Re: [DISCUSS] KIP-82 - Add Record Headers

Reply via email to