Re:
“ The point about creation of maps seems orthogonal. We can still represent
the headers as a slice of bytes until the time it is accessed.”
That’s exactly what we’re doing the headers are a slice of bytes, which then
gets parsed later if needed, or can be parsed right away, the headers is part
of the protocol, so can still be validated if wanted.
If you had a header count then you would have to go through each header key and
value length value to work out how much to skip to get to say the value or any
future component in the message after the headers. Having it as a byte[] with
length value makes this a lot easier to skip.
On 17/02/2017, 20:37, "Michael Pearce" <[email protected]> wrote:
What’s the issue with exposing a method getHeaders on the producer/consumer
record? It doesn’t break anything. We don’t need any special version.
Current batch consumer model and consumer interceptors don’t work where
headers need to be acted on at per message level at time of processing, very
case is APM (the core one), where the header value is used to continue tracing.
JMS/HTTP etc all expose these, without issues. I would NOT want to lock this
down to only be usable accessible via interceptors, as we’d fail on one of the
main goals.
Regards
Mike
On 17/02/2017, 20:21, "Jason Gustafson" <[email protected]> wrote:
The point about creation of maps seems orthogonal. We can still
represent
the headers as a slice of bytes until the time it is accessed.
> Yes exactly we have access to the records thus why the header should
be
> accessible via it and not hidden for only interceptors to access.
As explained above, the point is to make the intended usage clear.
Applications should continue to rely on the key/value fields to
serialize
their own headers, and it would be more ideal if we can avoid leaking
third-party headers into applications. This is difficult to do with the
current interceptors because they share the record objects with the
common
API. What I had in mind is something like an extension of the current
interceptors which exposed a different object (e.g. `RecordAndHeaders`).
The challenge is for MM-like use cases. Let me see if I can come up
with a
concrete proposal for that problem.
-Jason
On Fri, Feb 17, 2017 at 11:55 AM, Michael Pearce <[email protected]>
wrote:
> I am happy to move the definition of the header into the message
body, but
> would cause us not to lazy initialise/parse the headers, as
obviously, we
> would have to traverse these reading the message.
>
> This was actually one of Jay’s requests:
>
> “ 2. I think we should think about creating the Map lazily to avoid
> parsing out all the headers into little objects. HashMaps
themselves
> are
> kind of expensive and the consumer is very perf sensitive so and
making
> gazillions of hashmaps that may or may not get used is probably a
bad
> idea.”
>
>
>
>
>
> On 17/02/2017, 19:44, "Michael Pearce" <[email protected]> wrote:
>
> Yes exactly we have access to the records thus why the header
should
> be accessible via it and not hidden for only interceptors to access.
>
> Sent using OWA for iPhone
> ________________________________________
> From: Magnus Edenhill <[email protected]>
> Sent: Friday, February 17, 2017 7:34:49 PM
> To: [email protected]
> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>
> Big +1 on VarInts.
> CPUs are fast, memory is slow.
>
> I agree with Jason that we'll want to continue verifying messages,
> including their headers, so while I appreciate the idea of the
opaque
> header blob it won't be useful in practice.
>
> /Magnus
>
> 2017-02-17 10:41 GMT-08:00 Jason Gustafson <[email protected]>:
>
> > Sorry, my mistake. The consumer interceptor is per batch,
though I'm
> not
> > sure that's an actual limitation since you still have access to
the
> > individual records.
> >
> > -Jason
> >
> > On Fri, Feb 17, 2017 at 10:39 AM, Jason Gustafson <
> [email protected]>
> > wrote:
> >
> > > Re headers as byte array and future use by broker. This
doesn't
> take away
> > >> from that at all. Nor makes it difficult at all in my
opinion.
> > >
> > >
> > > Yeah, I didn't say it was difficult, only awkward. You
wouldn't
> write the
> > > schema that way if you were planning to use it on the brokers
from
> the
> > > beginning. Note also that one of the benefits of letting the
broker
> > > understand headers is that it can validate that they are
properly
> > > formatted. If cost is the only concern, we should confirm its
> impact
> > > through performance testing.
> > >
> > > One of the key use cases requires access on consume at per
> event/message
> > >> level at the point that message is being processed, as such
the
> batch
> > >> interceptors and batch consume api isn't suitable. It needs
to be
> at the
> > >> record level.
> > >
> > >
> > > I'm not sure I understand the point about batching.
Interceptors
> are
> > > applied per-message, right?
> > >
> > > My intent on interceptors is to keep the usage of headers
> well-defined so
> > > that they don't start leaking unnecessarily into
applications. My
> guess
> > is
> > > that it's probably inevitable, but isolating it in the
> interceptors would
> > > at least give people a second thought before deciding to use
it.
> The main
> > > challenge in my mind is figuring out how an MM use case would
> work. It
> > > would be more cumbersome to replicate headers through an
> interceptor,
> > > though arguably MM should be working at a lower level anyway.
> > >
> > > -Jason
> > >
> > > On Fri, Feb 17, 2017 at 10:16 AM, Michael Pearce <
> [email protected]>
> > > wrote:
> > >
> > >> Re headers available on the record va interceptors only
> > >>
> > >> One of the key use cases requires access on consume at per
> event/message
> > >> level at the point that message is being processed, as such
the
> batch
> > >> interceptors and batch consume api isn't suitable. It needs
to be
> at the
> > >> record level.
> > >>
> > >> This anyhow is similar to jms/http/amqp where headers are
> available to
> > >> consuming applications.
> > >>
> > >> Re headers as byte array and future use by broker. This
doesn't
> take
> > away
> > >> from that at all. Nor makes it difficult at all in my
opinion.
> > >>
> > >>
> > >>
> > >> Sent using OWA for iPhone
> > >> ________________________________________
> > >> From: Jason Gustafson <[email protected]>
> > >> Sent: Friday, February 17, 2017 5:55:42 PM
> > >> To: [email protected]
> > >> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
> > >>
> > >> >
> > >> > Would you be proposing in KIP-98 to convert the other
message
> int’s
> > (key
> > >> > length, value length) also to varint to keep it uniform.
> > >> > Also I assume there will be a static or helper method made
to
> > write/read
> > >> > these in the client and server.
> > >>
> > >>
> > >> Yes, that is what we are proposing, so using varints for
headers
> would
> > be
> > >> consistent with the rest of the message. We have used static
> helper
> > >> methods
> > >> in our prototype implementation.
> > >>
> > >> The cost of parsing, we want to parse/interpret the headers
> lazily (this
> > >> is
> > >> > a key point brought up earlier in discussions)
> > >>
> > >>
> > >> I'm a bit skeptical of this. Has anyone done the performance
> testing? I
> > >> can
> > >> probably implement it and test it if no one else has. I was
also
> under
> > the
> > >> impression that there may be use cases down the road where
the
> broker
> > >> would
> > >> need to interpret headers. That wouldn't be off the table in
the
> future
> > if
> > >> it's represented as bytes, but it would be quite a bit more
> awkward,
> > >> right?
> > >>
> > >> By the way, one question I have been wondering about. My
> understanding
> > is
> > >> that headers are primarily for use cases where a third-party
> components
> > >> wants to enrich messages without needing to understand or
modify
> the
> > >> schema
> > >> of the message key and value. For the applications which
directly
> > produce
> > >> and consume the messages and control the key/value schema
> directly, it
> > >> seems we would rather have them implement headers directly in
> their own
> > >> schema. Supposing for the sake of argument that it was
possible,
> my
> > >> question is whether it be sufficient to expose the headers
in the
> > >> interceptor API and not in the common API?
> > >>
> > >> -Jason
> > >>
> > >> On Fri, Feb 17, 2017 at 3:26 AM, Michael Pearce <
> [email protected]>
> > >> wrote:
> > >>
> > >> > On the point of varInts
> > >> >
> > >> > Would you be proposing in KIP-98 to convert the other
message
> int’s
> > (key
> > >> > length, value length) also to varint to keep it uniform.
> > >> > Also I assume there will be a static or helper method made
to
> > write/read
> > >> > these in the client and server.
> > >> >
> > >> > Cheers
> > >> > Mike
> > >> >
> > >> >
> > >> >
> > >> > On 17/02/2017, 11:22, "Michael Pearce"
<[email protected]>
> wrote:
> > >> >
> > >> > On the point re: headers in the message protocol being
a
> byte
> > array
> > >> > and not a count of elements followed by the elements. Again
> this was
> > >> > discussed/argued previously.
> > >> >
> > >> > It was agreed on for a few reasons some of which you
have
> > obviously
> > >> > picked up on:
> > >> >
> > >> > Broker is able to pass it through opaquely
> > >> > The cost of parsing, we want to parse/interpret the
headers
> lazily
> > >> > (this is a key point brought up earlier in discussions)
> > >> > Headers can be copied from consumer record to producer
> record (aka
> > >> > mirror makers etc) without parsing if no changes are being
made
> or
> > being
> > >> > looked at.
> > >> > Keeps the broker agnostic to the format
> > >> > You need an int32 either for the byte size of the
headers,
> or for
> > >> the
> > >> > count of elements, so overheads are the same, but with
going
> with an
> > >> opaque
> > >> > byte array has the above advantages.
> > >> >
> > >> > Cheers
> > >> > Mike
> > >> >
> > >> >
> > >> > On 17/02/2017, 02:50, "Jason Gustafson"
<[email protected]
> >
> > wrote:
> > >> >
> > >> > Sorry, should have noted that the performance
testing
> was done
> > >> > using the
> > >> > producer performance tool shipped with Kafka.
> > >> >
> > >> > -Jason
> > >> >
> > >> > On Thu, Feb 16, 2017 at 6:44 PM, Jason Gustafson <
> > >> > [email protected]> wrote:
> > >> >
> > >> > > Hey Nacho,
> > >> > >
> > >> > > I've compared performance of our KIP-98
> implementation with
> > >> and
> > >> > without
> > >> > > varints. For messages around 128 bytes, we see an
> increase
> > in
> > >> > throughput of
> > >> > > about 30% using the default configuration
settings.
> At 256
> > >> > bytes, the
> > >> > > increase is around 16%. Obviously the performance
> converge
> > as
> > >> > messages get
> > >> > > larger, but it seems well worth the cost. Note
that
> we are
> > >> also
> > >> > seeing a
> > >> > > substantial performance increase against trunk
> primarily
> > >> because
> > >> > of the
> > >> > > much more efficient packing that varints provide
us.
> > Anything
> > >> > adding to
> > >> > > message overhead, such as record headers, would
only
> > increase
> > >> > the relative
> > >> > > difference. (Of course take these numbers with a
> grain of
> > salt
> > >> > since I have
> > >> > > only used the default settings with both the
producer
> and
> > >> broker
> > >> > on my
> > >> > > local machine. We intend to provide more
extensive
> > performance
> > >> > details as
> > >> > > part of the work for KIP-98.)
> > >> > >
> > >> > > The implementation we are using is from protobuf
(
> > >> > > https://developers.google.com/
> > protocol-buffers/docs/encoding
> > >> ),
> > >> > which is
> > >> > > also used in HBase. It is trivial to implement
and as
> far
> > as I
> > >> > know doesn't
> > >> > > suffer from the aliasing problem you are
describing. I
> > checked
> > >> > with Magnus
> > >> > > (the author of librdkafka) and he agreed that the
> savings
> > >> seemed
> > >> > worth the
> > >> > > cost of implementation.
> > >> > >
> > >> > > -Jason
> > >> > >
> > >> > > On Thu, Feb 16, 2017 at 4:32 PM, Ignacio Solis <
> > >> [email protected]>
> > >> > wrote:
> > >> > >
> > >> > >> -VarInts
> > >> > >>
> > >> > >> I'm one of the people (if not the most) opposed
to
> VarInts.
> > >> > VarInts
> > >> > >> have a place, but this is not it. (We had a
large
> > >> discussion
> > >> > about
> > >> > >> them at the beginning of KIP-82 time)
> > >> > >>
> > >> > >> If anybody has real life performance numbers of
> VarInts
> > >> > improving
> > >> > >> things or significantly reducing resources I
would
> like to
> > >> know
> > >> > what
> > >> > >> that case may be. Yes, you can save some bytes
here
> and
> > >> there,
> > >> > but
> > >> > >> this is probably insignificant to the overall
system
> > behavior
> > >> > and
> > >> > >> storage requirements. -- I say this with
respect to
> using
> > >> > VarInts in
> > >> > >> the protocol itself, not as part of the data.
> > >> > >>
> > >> > >> VarInts require you to parse the Int before
using it
> and
> > >> > depending on
> > >> > >> the encoding they can suffer from aliasing
(multiple
> > >> > representations
> > >> > >> for the same value).
> > >> > >>
> > >> > >> Why add complexity?
> > >> > >>
> > >> > >> Nacho
> > >> > >>
> > >> > >>
> > >> > >> On Thu, Feb 16, 2017 at 10:29 AM, Colin McCabe <
> > >> > [email protected]>
> > >> > >> wrote:
> > >> > >> > +1 for varints here-- it would save quite a
bit of
> space.
> > >> > They are
> > >> > >> > pretty quick to implement as well.
> > >> > >> >
> > >> > >> > I think it makes sense for values to be byte
> arrays.
> > Users
> > >> > might want
> > >> > >> > to attach arbitrary payloads; they shouldn't
be
> forced to
> > >> > serialize
> > >> > >> > everything to Java strings.
> > >> > >> >
> > >> > >> > best,
> > >> > >> > Colin
> > >> > >> >
> > >> > >> >
> > >> > >> > On Thu, Feb 16, 2017, at 09:52, Jason
Gustafson
> wrote:
> > >> > >> >> Hey Michael,
> > >> > >> >>
> > >> > >> >> Hmm, I guess the point of representing it as
> bytes is to
> > >> > allow the
> > >> > >> broker
> > >> > >> >> to pass it through opaquely? Is the cost of
> parsing
> > them a
> > >> > concern, or
> > >> > >> >> are
> > >> > >> >> we simply trying to ensure that the broker
stays
> > agnostic
> > >> to
> > >> > the
> > >> > >> format?
> > >> > >> >>
> > >> > >> >> On varints, I think adding support for them
makes
> less
> > >> sense
> > >> > for an
> > >> > >> >> isolated use case, but as part of a more
holistic
> change
> > >> > (such as what
> > >> > >> we
> > >> > >> >> have proposed in KIP-98), I think they are
> justifiable.
> > If
> > >> > we add them,
> > >> > >> >> then the need to use attributes becomes
quite a
> bit
> > >> weaker,
> > >> > right? The
> > >> > >> >> other thing I find slightly odd is the fact
that
> null
> > >> > headers has no
> > >> > >> >> actual
> > >> > >> >> semantic meaning for the message (unlike null
> keys and
> > >> > values). It is
> > >> > >> >> just
> > >> > >> >> a space optimization. It seems a bit better
to
> always
> > use
> > >> > size 0 to
> > >> > >> >> indicate having no headers.
> > >> > >> >>
> > >> > >> >> Overall, the main point is ensuring that the
> message
> > >> schema
> > >> > remains
> > >> > >> >> consistent, either within the larger
protocol, or
> at a
> > >> > minimum within
> > >> > >> the
> > >> > >> >> message itself.
> > >> > >> >>
> > >> > >> >> -Jason
> > >> > >> >>
> > >> > >> >> On Thu, Feb 16, 2017 at 6:39 AM, Michael
Pearce <
> > >> > [email protected]
> > >> > >> >
> > >> > >> >> wrote:
> > >> > >> >>
> > >> > >> >> > Hi Jason,
> > >> > >> >> >
> > >> > >> >> > On point 1) in the message protocol the
headers
> are
> > >> simply
> > >> > a byte
> > >> > >> array,
> > >> > >> >> > as like the key or value, this is to
clearly
> demarcate
> > >> the
> > >> > header in
> > >> > >> the
> > >> > >> >> > core message. Then the header byte array
in the
> core
> > >> > message is an
> > >> > >> array of
> > >> > >> >> > key, value pairs. This is what it is
denoting.
> > >> > >> >> >
> > >> > >> >> > Then this would be I guess in the given
> notation:
> > >> > >> >> >
> > >> > >> >> > Headers => [KeyLength, Key, ValueLength,
Value]
> > >> > >> >> > KeyLength => int32
<-----------------NEW
> size of
> > the
> > >> > byte[] of
> > >> > >> the
> > >> > >> >> > serialised key value
> > >> > >> >> > Key => bytes <----------------------
NEW
> > serialised
> > >> > string (UTF8)
> > >> > >> >> > bytes of the header key
> > >> > >> >> > ValueLength => int32 <--------------
NEW
> size of
> > the
> > >> > byte[] of
> > >> > >> the
> > >> > >> >> > serialised header value
> > >> > >> >> > Value => bytes <--------------------
NEW
> > serialised
> > >> > form of the
> > >> > >> header
> > >> > >> >> > value
> > >> > >> >> >
> > >> > >> >> > The key length and value length is
matching the
> way
> > the
> > >> > protocol is
> > >> > >> >> > defined in the core message currently.
> > >> > >> >> >
> > >> > >> >> >
> > >> > >> >> >
> > >> > >> >> >
> > >> > >> >> > On point 2)
> > >> > >> >> > Var sized ints, this was discussed much
earlier
> on, in
> > >> > fact I had
> > >> > >> >> > suggested it myself (with Hadoop
references),
> the
> > >> > complexity of this
> > >> > >> >> > compared to having a simpler protocol was
> argued and
> > >> > agreed it
> > >> > >> wasn’t worth
> > >> > >> >> > the complexity as all other clients in
other
> languages
> > >> > would need to
> > >> > >> ensure
> > >> > >> >> > theyre using the right var size algorithm,
as
> there
> > is a
> > >> > few.
> > >> > >> >> >
> > >> > >> >> > On point 3)
> > >> > >> >> > We did the attributes, optional approach as
> originally
> > >> > there was
> > >> > >> marked
> > >> > >> >> > concern that headers would cause a message
size
> > overhead
> > >> > for others,
> > >> > >> who
> > >> > >> >> > don’t want them. As such this is the clean
> solution to
> > >> > achieve that.
> > >> > >> If
> > >> > >> >> > that no longer holds, and we don’t care
that we
> add
> > >> 4bytes
> > >> > overhead,
> > >> > >> then
> > >> > >> >> > im happy to remove.
> > >> > >> >> >
> > >> > >> >> > I’m personally in favour of keeping the
message
> as
> > small
> > >> > as possible
> > >> > >> so
> > >> > >> >> > people don’t get shocks in perf and
throughputs
> dues
> > to
> > >> > message size,
> > >> > >> >> > unless they actively use the feature, as
such I
> do
> > >> prefer
> > >> > the
> > >> > >> attribute bit
> > >> > >> >> > wise feature flag approach myself.
> > >> > >> >> >
> > >> > >> >> >
> > >> > >> >> >
> > >> > >> >> >
> > >> > >> >> > On 16/02/2017, 05:40, "Jason Gustafson" <
> > >> > [email protected]> wrote:
> > >> > >> >> >
> > >> > >> >> > We have proposed a few significant
changes
> to the
> > >> > message format
> > >> > >> in
> > >> > >> >> > KIP-98
> > >> > >> >> > which now seems likely to pass (perhaps
> with some
> > >> > iterations on
> > >> > >> >> > implementation details). It would be
good
> to try
> > and
> > >> > coordinate
> > >> > >> the
> > >> > >> >> > changes
> > >> > >> >> > in both of the proposals to make sure
they
> are
> > >> > consistent and
> > >> > >> >> > compatible.
> > >> > >> >> >
> > >> > >> >> > I think using the attributes to
indicate
> null
> > >> headers
> > >> > is a
> > >> > >> reasonable
> > >> > >> >> > approach. We have proposed to do the
same
> thing
> > for
> > >> > the message
> > >> > >> key and
> > >> > >> >> > value. That said, I sympathize with
Jay's
> > argument.
> > >> > Having
> > >> > >> multiple
> > >> > >> >> > ways to
> > >> > >> >> > specify a null value increases the
overall
> > >> complexity
> > >> > of the
> > >> > >> protocol.
> > >> > >> >> > You
> > >> > >> >> > can see this just from the fact that
you
> need the
> > >> > extra verbiage
> > >> > >> in the
> > >> > >> >> > protocol specification in this KIP and
in
> KIP-98
> > to
> > >> > describe the
> > >> > >> >> > dependence
> > >> > >> >> > between the fields and the attributes.
It
> seems
> > >> like a
> > >> > slippery
> > >> > >> slope
> > >> > >> >> > if
> > >> > >> >> > you start allowing different request
types
> to
> > >> > implement the
> > >> > >> protocol
> > >> > >> >> > specification differently.
> > >> > >> >> >
> > >> > >> >> > You can also argue that the messages
> already are
> > and
> > >> > are likely
> > >> > >> to
> > >> > >> >> > remain a
> > >> > >> >> > special case. For example, there is
> currently no
> > >> > generality in
> > >> > >> how
> > >> > >> >> > compressed message sets are represented
> that would
> > >> be
> > >> > applicable
> > >> > >> for
> > >> > >> >> > other
> > >> > >> >> > request types. Some might see this
> divergence as
> > an
> > >> > unfortunate
> > >> > >> >> > protocol
> > >> > >> >> > deficiency which should be fixed;
others
> might see
> > >> it
> > >> > as sort of
> > >> > >> the
> > >> > >> >> > inevitability of needing to optimize
where
> it
> > counts
> > >> > most. I'm
> > >> > >> probably
> > >> > >> >> > somewhere in between, but I think we
> probably all
> > >> > share the
> > >> > >> intuition
> > >> > >> >> > that
> > >> > >> >> > the protocol should be kept as
consistent as
> > >> possible.
> > >> > With that
> > >> > >> in
> > >> > >> >> > mind,
> > >> > >> >> > here are a few comments:
> > >> > >> >> >
> > >> > >> >> > 1. One thing I found a little odd when
> reading the
> > >> > current
> > >> > >> proposal is
> > >> > >> >> > that
> > >> > >> >> > the headers are both represented as an
> array of
> > >> bytes
> > >> > and as an
> > >> > >> array
> > >> > >> >> > of
> > >> > >> >> > key/value pairs. I'd probably suggest
> something
> > like
> > >> > this:
> > >> > >> >> >
> > >> > >> >> > Headers => [HeaderKey HeaderValue]
> > >> > >> >> > HeaderKey => String
> > >> > >> >> > HeaderValue => Bytes
> > >> > >> >> >
> > >> > >> >> > An array in the Kafka protocol is
> represented as a
> > >> > 4-byte integer
> > >> > >> >> > indicating the number of elements in
the
> array
> > >> > followed by the
> > >> > >> >> > serialization of the elements. Unless
I'm
> > >> > misunderstanding, what
> > >> > >> you
> > >> > >> >> > have
> > >> > >> >> > instead is the total size of the
headers in
> bytes
> > >> > followed by the
> > >> > >> >> > elements.
> > >> > >> >> > I'm not sure I see any reason for this
> > >> inconsistency.
> > >> > >> >> >
> > >> > >> >> > 2. In KIP-98, we've introduced
> variable-length
> > >> integer
> > >> > fields.
> > >> > >> >> > Effectively,
> > >> > >> >> > we've enriched (or "complicated" as Jay
> might say
> > ;)
> > >> > the protocol
> > >> > >> >> > specification to include the following
> types:
> > >> VarInt,
> > >> > VarLong,
> > >> > >> >> > UnsignedVarInt and UnsignedVarLong.
> > >> > >> >> >
> > >> > >> >> > Along with these primitives, we could
> introduce
> > the
> > >> > following
> > >> > >> types:
> > >> > >> >> >
> > >> > >> >> > VarSizeArray => NumberOfItems Item1
Item2
> .. ItemN
> > >> > >> >> > NumberOfItems => UnsignedVarInt
> > >> > >> >> >
> > >> > >> >> > VarSizeNullableArray =>
NumberOfItemsOrNull
> Item1
> > >> > Item2 .. ItemN
> > >> > >> >> > NumberOfItemsOrNull => VarInt (-1
means
> null)
> > >> > >> >> >
> > >> > >> >> > And similarly for the `String` and
`Bytes`
> types.
> > >> > These types
> > >> > >> can save
> > >> > >> >> > a
> > >> > >> >> > considerable amount of space in this
> proposal
> > >> because
> > >> > they can
> > >> > >> be used
> > >> > >> >> > for
> > >> > >> >> > both the number of headers included in
the
> message
> > >> and
> > >> > the
> > >> > >> lengths of
> > >> > >> >> > the
> > >> > >> >> > header keys and values. We could do
this
> instead:
> > >> > >> >> >
> > >> > >> >> > Headers => VarSizeArray[HeaderKey
> HeaderValue]
> > >> > >> >> > HeaderKey => VarSizeString
> > >> > >> >> > HeaderValue => VarSizeBytes
> > >> > >> >> >
> > >> > >> >> > Combining the savings from the use of
> variable
> > >> length
> > >> > fields, the
> > >> > >> >> > benefit
> > >> > >> >> > of using the attributes to represent
null
> seems
> > >> pretty
> > >> > small.
> > >> > >> >> >
> > >> > >> >> > 3. Whichever way we go (whether we use
the
> > >> attributes
> > >> > or not), we
> > >> > >> >> > should at
> > >> > >> >> > least be consistent between this KIP
and
> KIP-98.
> > It
> > >> > would be very
> > >> > >> >> > strange
> > >> > >> >> > to have two ways to represent null
values
> in the
> > >> same
> > >> > schema.
> > >> > >> Either
> > >> > >> >> > way is
> > >> > >> >> > OK with me. I think some message-level
> > optimizations
> > >> > are
> > >> > >> justifiable,
> > >> > >> >> > but
> > >> > >> >> > the savings here seem minimal (a few
bytes
> per
> > >> > message), so
> > >> > >> maybe it's
> > >> > >> >> > not
> > >> > >> >> > worth the cost of letting the message
> diverge even
> > >> > further from
> > >> > >> the
> > >> > >> >> > rest of
> > >> > >> >> > the protocol.
> > >> > >> >> >
> > >> > >> >> > -Jason
> > >> > >> >> >
> > >> > >> >> >
> > >> > >> >> > On Wed, Feb 15, 2017 at 8:52 AM, radai
<
> > >> > >> [email protected]>
> > >> > >> >> > wrote:
> > >> > >> >> >
> > >> > >> >> > > I've trimmed the inline contents as
this
> mail is
> > >> > getting too
> > >> > >> big for
> > >> > >> >> > the
> > >> > >> >> > > apache mailing list software to
deliver
> :-(
> > >> > >> >> > >
> > >> > >> >> > > 1. the important thing for
> interoperability is
> > for
> > >> > different
> > >> > >> >> > "interested
> > >> > >> >> > > parties" (plugins, infra
layers/wrappers,
> > >> user-code)
> > >> > to be
> > >> > >> able to
> > >> > >> >> > stick
> > >> > >> >> > > pieces of metadata onto msgs without
> getting in
> > >> each
> > >> > other's
> > >> > >> way. a
> > >> > >> >> > common
> > >> > >> >> > > key scheme (Strings, as of the time
of
> this
> > >> > writing?) is all
> > >> > >> thats
> > >> > >> >> > required
> > >> > >> >> > > for that. it is assumed that the
other end
> > >> > interested in any
> > >> > >> such
> > >> > >> >> > piece of
> > >> > >> >> > > metadata knows the encoding, and
byte[]
> provides
> > >> for
> > >> > the most
> > >> > >> >> > flexibility.
> > >> > >> >> > > i believe this is the same logic
behind
> core
> > kafka
> > >> > being
> > >> > >> >> > byte[]/byte[] -
> > >> > >> >> > > Strings are more "usable" but bytes
are
> flexible
> > >> and
> > >> > so were
> > >> > >> chosen.
> > >> > >> >> > > Also - core kafka doesnt even do that
> good of a
> > >> job
> > >> > on
> > >> > >> usability of
> > >> > >> >> > the
> > >> > >> >> > > payload (example - i have to specify
the
> nop
> > >> byte[]
> > >> > "decoders"
> > >> > >> >> > explicitly
> > >> > >> >> > > in conf), and again sacrificies
usability
> for
> > the
> > >> > sake of
> > >> > >> >> > performance (no
> > >> > >> >> > > convenient single-record processing
as
> poll is a
> > >> > batch, lots of
> > >> > >> >> > obscure
> > >> > >> >> > > little config details exposing
internals
> of the
> > >> > batching
> > >> > >> mechanism,
> > >> > >> >> > etc)
> > >> > >> >> > >
> > >> > >> >> > > this is also why i really dislike the
> idea of a
> > >> > "type system"
> > >> > >> for
> > >> > >> >> > header
> > >> > >> >> > > values, it further degrades the
> usability, adds
> > >> > complexity and
> > >> > >> will
> > >> > >> >> > > eventually get in people's way,
also, it
> would
> > be
> > >> > the 2nd/3rd
> > >> > >> >> > home-group
> > >> > >> >> > > serialization mechanism in core kafka
> (counting
> > 2
> > >> > iterations
> > >> > >> of the
> > >> > >> >> > "type
> > >> > >> >> > > definition DSL")
> > >> > >> >> > >
> > >> > >> >> > > 2. this is an implementation detail,
and
> not
> > even
> > >> a
> > >> > very "user
> > >> > >> >> > facing" one?
> > >> > >> >> > > to the best of my understanding the
vote
> process
> > >> is
> > >> > on proposed
> > >> > >> >> > > API/behaviour. also - since we're
willing
> to go
> > >> with
> > >> > strings
> > >> > >> just
> > >> > >> >> > serialize
> > >> > >> >> > > a 0-sized header blob and IIUC you
dont
> need any
> > >> > optionals
> > >> > >> anymore.
> > >> > >> >> > >
> > >> > >> >> > > 3. yes, we can :-)
> > >> > >> >> > >
> > >> > >> >> > > On Tue, Feb 14, 2017 at 11:56 PM,
Michael
> > Pearce <
> > >> > >> >> > [email protected]>
> > >> > >> >> > > wrote:
> > >> > >> >> > >
> > >> > >> >> > > > Hi Jay,
> > >> > >> >> > > >
> > >> > >> >> > > > 1) There was some initial debate
on the
> value
> > >> > part, as youll
> > >> > >> note
> > >> > >> >> > String,
> > >> > >> >> > > > String headers were discounted
early
> on. The
> > >> > reason for this
> > >> > >> is
> > >> > >> >> > > flexibility
> > >> > >> >> > > > and keeping in line with the
> flexibility of
> > key,
> > >> > value of the
> > >> > >> >> > message
> > >> > >> >> > > > object itself. I don’t think it
takes
> away
> > from
> > >> an
> > >> > ecosystem
> > >> > >> as
> > >> > >> >> > each
> > >> > >> >> > > plugin
> > >> > >> >> > > > will care for their own key, this
way
> ints,
> > >> > booleans , exotic
> > >> > >> >> > custom
> > >> > >> >> > > binary
> > >> > >> >> > > > can all be catered for=.
> > >> > >> >> > > > a. If you really wanted to push
for a
> typed
> > >> value
> > >> > interface,
> > >> > >> I
> > >> > >> >> > wouldn’t
> > >> > >> >> > > > want just String values supported,
but
> the the
> > >> > primatives
> > >> > >> plus
> > >> > >> >> > string and
> > >> > >> >> > > > also still keeping the ability to
have a
> > binary
> > >> > for custom
> > >> > >> >> > binaries that
> > >> > >> >> > > > some organisations may have.
> > >> > >> >> > > > i. I have written this slight
> alternative
> > here,
> > >> > >> >> > > https://cwiki.apache.org/
> > >> > >> >> > > > confluence/display/KAFKA/KIP-
> > >> > 82+-+Add+Record+Headers+-+Typed
> > >> > >> >> > > > ii. Essentially the value bytes,
has a
> leading
> > >> > byte overhead.
> > >> > >> >> > > > 1. This tells you what type the
value
> is,
> > >> before
> > >> > reading
> > >> > >> the rest
> > >> > >> >> > of the
> > >> > >> >> > > > bytes, allowing
> serialisation/deserialization
> > to
> > >> > and from the
> > >> > >> >> > primitives,
> > >> > >> >> > > > string and byte[]. This is akin to
some
> other
> > >> > messaging
> > >> > >> systems.
> > >> > >> >> > > > 2) We are making it optional, so
that
> for
> > those
> > >> > not wanting
> > >> > >> >> > headers have
> > >> > >> >> > > 0
> > >> > >> >> > > > bytes overhead (think of it as a
feature
> > flag),
> > >> I
> > >> > don’t
> > >> > >> think this
> > >> > >> >> > is
> > >> > >> >> > > > complex, especially if comparing to
> changes
> > >> > proposed in
> > >> > >> other kips
> > >> > >> >> > like
> > >> > >> >> > > > kip-98.
> > >> > >> >> > > > a. If you really really don’t like
> this, we
> > can
> > >> > drop it, but
> > >> > >> it
> > >> > >> >> > would
> > >> > >> >> > > mean
> > >> > >> >> > > > buying into 4 bytes extra overhead
for
> users
> > who
> > >> > do not want
> > >> > >> to use
> > >> > >> >> > > headers.
> > >> > >> >> > > > 3) In the summary yes, it is at a
higher
> > level,
> > >> > but I think
> > >> > >> this
> > >> > >> >> > is well
> > >> > >> >> > > > documented in the proposed changes
> section.
> > >> > >> >> > > > a. Added getHeaders method to
> > Producer/Consumer
> > >> > record (that
> > >> > >> is it)
> > >> > >> >> > > > b. We’ve also detailed the new
Headers
> class
> > >> that
> > >> > this method
> > >> > >> >> > returns
> > >> > >> >> > > that
> > >> > >> >> > > > encapsulates the headers protocol
and
> logic.
> > >> > >> >> > > >
> > >> > >> >> > > > Best,
> > >> > >> >> > > > Mike
> > >> > >> >> > > >
> > >> > >> >> > > > ==Original questions from the vote
> thread from
> > >> > Jay.==
> > >> > >> >> > > >
> > >> > >> >> > > > Couple of things I think we still
need
> to work
> > >> out:
> > >> > >> >> > > >
> > >> > >> >> > > > 1. I think we agree about the
key,
> but I
> > >> think
> > >> > we haven't
> > >> > >> >> > talked about
> > >> > >> >> > > > the value yet. I think if our
goal
> is an
> > open
> > >> > ecosystem
> > >> > >> of these
> > >> > >> >> > > header
> > >> > >> >> > > > spread across many plugins from
many
> > systems
> > >> we
> > >> > should
> > >> > >> consider
> > >> > >> >> > making
> > >> > >> >> > > > this
> > >> > >> >> > > > a string as well so it can be
> printed, set
> > >> via
> > >> > a UI, set
> > >> > >> in
> > >> > >> >> > config,
> > >> > >> >> > > etc.
> > >> > >> >> > > > Basically encouraging pluggable
> > serialization
> > >> > formats
> > >> > >> here will
> > >> > >> >> > lead
> > >> > >> >> > > to
> > >> > >> >> > > > a
> > >> > >> >> > > > bit of a tower of babel.
> > >> > >> >> > > > 2. This proposal still includes
a
> pretty
> > big
> > >> > change to our
> > >> > >> >> > > serialization
> > >> > >> >> > > > and protocol definition layer.
> Essentially
> > >> it is
> > >> > >> introducing an
> > >> > >> >> > > optional
> > >> > >> >> > > > type, where the format is data
> dependent. I
> > >> > think this is
> > >> > >> >> > actually a
> > >> > >> >> > > big
> > >> > >> >> > > > change though it doesn't seem
like
> it. It
> > >> means
> > >> > you can no
> > >> > >> >> > longer
> > >> > >> >> > > > specify
> > >> > >> >> > > > this type with our type
definition
> DSL, and
> > >> > likewise it
> > >> > >> requires
> > >> > >> >> > > custom
> > >> > >> >> > > > handling in client libs. This
isn't
> a huge
> > >> > thing, since
> > >> > >> the
> > >> > >> >> > Record
> > >> > >> >> > > > definition is custom anyway,
but I
> think
> > this
> > >> > kind of
> > >> > >> protocol
> > >> > >> >> > > > inconsistency is very
non-desirable
> and
> > ties
> > >> > you to
> > >> > >> hand-coding
> > >> > >> >> > > things.
> > >> > >> >> > > > I
> > >> > >> >> > > > think the type should instead
by [Key
> > Value]
> > >> in
> > >> > our BNF,
> > >> > >> where
> > >> > >> >> > key and
> > >> > >> >> > > > value are both short strings as
used
> > >> elsewhere.
> > >> > This
> > >> > >> brings it
> > >> > >> >> > in line
> > >> > >> >> > > > with
> > >> > >> >> > > > the rest of the protocol.
> > >> > >> >> > > > 3. Could we get more specific
about
> the
> > exact
> > >> > Java API
> > >> > >> change to
> > >> > >> >> > > > ProducerRecord, ConsumerRecord,
> Record,
> > etc?
> > >> > >> >> > > >
> > >> > >> >> > > > -Jay
> > >> > >> >> > > >
> > >> > >> >> > >
> > >> > >> >> >
> > >> > >> >> >
> > >> > >> >> > The information contained in this email is
> strictly
> > >> > confidential and
> > >> > >> for
> > >> > >> >> > the use of the addressee only, unless
otherwise
> > >> indicated.
> > >> > If you
> > >> > >> are not
> > >> > >> >> > the intended recipient, please do not read,
> copy, use
> > or
> > >> > disclose to
> > >> > >> others
> > >> > >> >> > this message or any attachment. Please also
> notify the
> > >> > sender by
> > >> > >> replying
> > >> > >> >> > to this email or by telephone (+44(020 7896
> 0011) and
> > >> then
> > >> > delete
> > >> > >> the email
> > >> > >> >> > and any copies of it. Opinions, conclusion
> (etc) that
> > do
> > >> > not relate
> > >> > >> to the
> > >> > >> >> > official business of this company shall be
> understood
> > as
> > >> > neither
> > >> > >> given nor
> > >> > >> >> > endorsed by it. IG is a trading name of IG
> Markets
> > >> Limited
> > >> > (a company
> > >> > >> >> > registered in England and Wales, company
number
> > >> 04008957)
> > >> > and IG
> > >> > >> Index
> > >> > >> >> > Limited (a company registered in England
and
> Wales,
> > >> > company number
> > >> > >> >> > 01190902). Registered address at Cannon
Bridge
> House,
> > 25
> > >> > Dowgate
> > >> > >> Hill,
> > >> > >> >> > London EC4R 2YA. Both IG Markets Limited
> (register
> > >> number
> > >> > 195355)
> > >> > >> and IG
> > >> > >> >> > Index Limited (register number 114059) are
> authorised
> > >> and
> > >> > regulated
> > >> > >> by the
> > >> > >> >> > Financial Conduct Authority.
> > >> > >> >> >
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> --
> > >> > >> Nacho - Ignacio Solis - [email protected]
> > >> > >>
> > >> > >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > The information contained in this email is strictly
> confidential and
> > for
> > >> > the use of the addressee only, unless otherwise indicated.
If
> you are
> > >> not
> > >> > the intended recipient, please do not read, copy, use or
> disclose to
> > >> others
> > >> > this message or any attachment. Please also notify the
sender by
> > >> replying
> > >> > to this email or by telephone (+44(020 7896 0011) and then
> delete the
> > >> email
> > >> > and any copies of it. Opinions, conclusion (etc) that do
not
> relate to
> > >> the
> > >> > official business of this company shall be understood as
> neither given
> > >> nor
> > >> > endorsed by it. IG is a trading name of IG Markets Limited
(a
> company
> > >> > registered in England and Wales, company number 04008957)
and
> IG Index
> > >> > Limited (a company registered in England and Wales, company
> number
> > >> > 01190902). Registered address at Cannon Bridge House, 25
> Dowgate Hill,
> > >> > London EC4R 2YA. Both IG Markets Limited (register number
> 195355) and
> > IG
> > >> > Index Limited (register number 114059) are authorised and
> regulated by
> > >> the
> > >> > Financial Conduct Authority.
> > >> >
> > >> The information contained in this email is strictly
confidential
> and for
> > >> the use of the addressee only, unless otherwise indicated.
If you
> are
> > not
> > >> the intended recipient, please do not read, copy, use or
disclose
> to
> > others
> > >> this message or any attachment. Please also notify the
sender by
> > replying
> > >> to this email or by telephone (+44(020 7896 0011) and then
delete
> the
> > email
> > >> and any copies of it. Opinions, conclusion (etc) that do not
> relate to
> > the
> > >> official business of this company shall be understood as
neither
> given
> > nor
> > >> endorsed by it. IG is a trading name of IG Markets Limited (a
> company
> > >> registered in England and Wales, company number 04008957)
and IG
> Index
> > >> Limited (a company registered in England and Wales, company
number
> > >> 01190902). Registered address at Cannon Bridge House, 25
Dowgate
> Hill,
> > >> London EC4R 2YA. Both IG Markets Limited (register number
195355)
> and IG
> > >> Index Limited (register number 114059) are authorised and
> regulated by
> > the
> > >> Financial Conduct Authority.
> > >>
> > >
> > >
> >
> The information contained in this email is strictly confidential
and
> for the use of the addressee only, unless otherwise indicated. If you
are
> not the intended recipient, please do not read, copy, use or disclose
to
> others this message or any attachment. Please also notify the sender
by
> replying to this email or by telephone (+44(020 7896 0011) and then
delete
> the email and any copies of it. Opinions, conclusion (etc) that do not
> relate to the official business of this company shall be understood as
> neither given nor endorsed by it. IG is a trading name of IG Markets
> Limited (a company registered in England and Wales, company number
> 04008957) and IG Index Limited (a company registered in England and
Wales,
> company number 01190902). Registered address at Cannon Bridge House,
25
> Dowgate Hill, London EC4R 2YA. Both IG Markets Limited (register
number
> 195355) and IG Index Limited (register number 114059) are authorised
and
> regulated by the Financial Conduct Authority.
>
>
>
>
The information contained in this email is strictly confidential and for
the use of the addressee only, unless otherwise indicated. If you are not the
intended recipient, please do not read, copy, use or disclose to others this
message or any attachment. Please also notify the sender by replying to this
email or by telephone (+44(020 7896 0011) and then delete the email and any
copies of it. Opinions, conclusion (etc) that do not relate to the official
business of this company shall be understood as neither given nor endorsed by
it. IG is a trading name of IG Markets Limited (a company registered in England
and Wales, company number 04008957) and IG Index Limited (a company registered
in England and Wales, company number 01190902). Registered address at Cannon
Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited
(register number 195355) and IG Index Limited (register number 114059) are
authorised and regulated by the Financial Conduct Authority.