Re: [DISCUSS] KIP-82 - Add Record Headers

Gwen Shapira Fri, 02 Dec 2016 12:55:40 -0800

Daniel,

Can you share specifics of how you'd use headers (the same way Todd
did)? I think it may help the discussion.


Thanks!

On Fri, Dec 2, 2016 at 12:30 AM, Daniel Schierbeck
<[email protected]> wrote:
> I don't have a lot of feedback on this, but at Zendesk we could definitely
> use a standardized header system. Using ints as keys sounds tedious, but if
> that's a necessary tradeoff I'd be okay with it.
>
> On Fri, Dec 2, 2016 at 5:44 AM Todd Palino <[email protected]> wrote:
>
>> Come on, I’ve done at least 2 talks on this one :)
>>
>> Producing counts to a topic is part of it, but that’s only part. So you
>> count you have 100 messages in topic A. When you mirror topic A to another
>> cluster, you have 99 messages. Where was your problem? Or worse, you have
>> 100 messages, but one producer duplicated messages and another one lost
>> messages. You need details about where the message came from in order to
>> pinpoint problems when they happen. Source producer info, where it was
>> produced into your infrastructure, and when it was produced. This requires
>> you to add the information to the message.
>>
>> And yes, you still need to maintain your clients. So maybe my original
>> example was not the best. My thoughts on not wanting to be responsible for
>> message formats stands, because that’s very much separate from the client.
>> As you know, we have our own internal client library that can insert the
>> right headers, and right now inserts the right audit information into the
>> message fields. If they exist, and assuming the message is Avro encoded.
>> What if someone wants to use JSON instead for a good reason? What if user X
>> wants to encrypt messages, but user Y does not? Maintaining the client
>> library is still much easier than maintaining the message formats.
>>
>>
>> -Todd
>>
>>
>>
>> On Thu, Dec 1, 2016 at 6:21 PM, Gwen Shapira <[email protected]> wrote:
>>
>> > Based on your last sentence, consider me convinced :)
>> >
>> > I get why headers are critical for Mirroring (you need tags to prevent
>> > loops and sometimes to route messages to the correct destination).
>> > But why do you need headers to audit? We are auditing by producing
>> > counts to a side topic (and I was under the impression you do the
>> > same), so we never need to modify the message.
>> >
>> > Another thing - after we added headers, wouldn't you be in the
>> > business of making sure everyone uses them properly? Making sure
>> > everyone includes the right headers you need, not using the header
>> > names you intend to use, etc. I don't think the "policing" business
>> > will ever go away.
>> >
>> > On Thu, Dec 1, 2016 at 5:25 PM, Todd Palino <[email protected]> wrote:
>> > > Got it. As an ops guy, I'm not very happy with the workaround. Avro
>> means
>> > > that I have to be concerned with the format of the messages in order to
>> > run
>> > > the infrastructure (audit, mirroring, etc.). That means that I have to
>> > > handle the schemas, and I have to enforce rules about good formats.
>> This
>> > is
>> > > not something I want to be in the business of, because I should be able
>> > to
>> > > run a service infrastructure without needing to be in the weeds of
>> > dealing
>> > > with customer data formats.
>> > >
>> > > Trust me, a sizable portion of my support time is spent dealing with
>> > schema
>> > > issues. I really would like to get away from that. Maybe I'd have more
>> > time
>> > > for other hobbies. Like writing. ;)
>> > >
>> > > -Todd
>> > >
>> > > On Thu, Dec 1, 2016 at 4:04 PM Gwen Shapira <[email protected]> wrote:
>> > >
>> > >> I'm pretty satisfied with the current workarounds (Avro container
>> > >> format), so I'm not too excited about the extra work required to do
>> > >> headers in Kafka. I absolutely don't mind it if you do it...
>> > >> I think the Apache convention for "good idea, but not willing to put
>> > >> any work toward it" is +0.5? anyway, that's what I was trying to
>> > >> convey :)
>> > >>
>> > >> On Thu, Dec 1, 2016 at 3:05 PM, Todd Palino <[email protected]>
>> wrote:
>> > >> > Well I guess my question for you, then, is what is holding you back
>> > from
>> > >> > full support for headers? What’s the bit that you’re missing that
>> has
>> > you
>> > >> > under a full +1?
>> > >> >
>> > >> > -Todd
>> > >> >
>> > >> >
>> > >> > On Thu, Dec 1, 2016 at 1:59 PM, Gwen Shapira <[email protected]>
>> > wrote:
>> > >> >
>> > >> >> I know why people who support headers support them, and I've seen
>> > what
>> > >> >> the discussion is like.
>> > >> >>
>> > >> >> This is why I'm asking people who are against headers (especially
>> > >> >> committers) what will make them change their mind - so we can get
>> > this
>> > >> >> part over one way or another.
>> > >> >>
>> > >> >> If I sound frustrated it is not at Radai, Jun or you (Todd)... I am
>> > >> >> just looking for something concrete we can do to move the
>> discussion
>> > >> >> along to the yummy design details (which is the argument I really
>> am
>> > >> >> looking forward to).
>> > >> >>
>> > >> >> On Thu, Dec 1, 2016 at 1:53 PM, Todd Palino <[email protected]>
>> > wrote:
>> > >> >> > So, Gwen, to your question (even though I’m not a committer)...
>> > >> >> >
>> > >> >> > I have always been a strong supporter of introducing the concept
>> > of an
>> > >> >> > envelope to messages, which headers accomplishes. The message key
>> > is
>> > >> >> > already an example of a piece of envelope information. By
>> > providing a
>> > >> >> means
>> > >> >> > to do this within Kafka itself, and not relying on use-case
>> > specific
>> > >> >> > implementations, you make it much easier for components to
>> > >> interoperate.
>> > >> >> It
>> > >> >> > simplifies development of all these things (message routing,
>> > auditing,
>> > >> >> > encryption, etc.) because each one does not have to reinvent the
>> > >> wheel.
>> > >> >> >
>> > >> >> > It also makes it much easier from a client point of view if the
>> > >> headers
>> > >> >> are
>> > >> >> > defined as part of the protocol and/or message format in general
>> > >> because
>> > >> >> > you can easily produce and consume messages without having to
>> take
>> > >> into
>> > >> >> > account specific cases. For example, I want to route messages,
>> but
>> > >> >> client A
>> > >> >> > doesn’t support the way audit implemented headers, and client B
>> > >> doesn’t
>> > >> >> > support the way encryption or routing implemented headers, so now
>> > my
>> > >> >> > application has to create some really fragile (my autocorrect
>> just
>> > >> tried
>> > >> >> to
>> > >> >> > make that “tragic”, which is probably appropriate too) code to
>> > strip
>> > >> >> > everything off, rather than just consuming the messages, picking
>> > out
>> > >> the
>> > >> >> 1
>> > >> >> > or 2 headers it’s interested in, and performing its function.
>> > >> >> >
>> > >> >> > Honestly, this discussion has been going on for a long time, and
>> > it’s
>> > >> >> > always “Oh, you came up with 2 use cases, and yeah, those use
>> cases
>> > >> are
>> > >> >> > real things that someone would want to do. Here’s an alternate
>> way
>> > to
>> > >> >> > implement them so let’s not do headers.” If we have a few use
>> cases
>> > >> that
>> > >> >> we
>> > >> >> > actually came up with, you can be sure that over the next year
>> > >> there’s a
>> > >> >> > dozen others that we didn’t think of that someone would like to
>> > do. I
>> > >> >> > really think it’s time to stop rehashing this discussion and
>> > instead
>> > >> >> focus
>> > >> >> > on a workable standard that we can adopt.
>> > >> >> >
>> > >> >> > -Todd
>> > >> >> >
>> > >> >> >
>> > >> >> > On Thu, Dec 1, 2016 at 1:39 PM, Todd Palino <[email protected]>
>> > >> wrote:
>> > >> >> >
>> > >> >> >> C. per message encryption
>> > >> >> >>> One drawback of this approach is that this significantly reduce
>> > the
>> > >> >> >>> effectiveness of compression, which happens on a set of
>> > serialized
>> > >> >> >>> messages. An alternative is to enable SSL for wire encryption
>> and
>> > >> rely
>> > >> >> on
>> > >> >> >>> the storage system (e.g. LUKS) for at rest encryption.
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> Jun, this is not sufficient. While this does cover the case of
>> > >> removing
>> > >> >> a
>> > >> >> >> drive from the system, it will not satisfy most compliance
>> > >> requirements
>> > >> >> for
>> > >> >> >> encryption of data as whoever has access to the broker itself
>> > still
>> > >> has
>> > >> >> >> access to the unencrypted data. For end-to-end encryption you
>> > need to
>> > >> >> >> encrypt at the producer, before it enters the system, and
>> decrypt
>> > at
>> > >> the
>> > >> >> >> consumer, after it exits the system.
>> > >> >> >>
>> > >> >> >> -Todd
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> On Thu, Dec 1, 2016 at 1:03 PM, radai <
>> [email protected]
>> > >
>> > >> >> wrote:
>> > >> >> >>
>> > >> >> >>> another big plus of headers in the protocol is that it would
>> > enable
>> > >> >> rapid
>> > >> >> >>> iteration on ideas outside of core kafka and would reduce the
>> > >> number of
>> > >> >> >>> future wire format changes required.
>> > >> >> >>>
>> > >> >> >>> a lot of what is currently a KIP represents use cases that are
>> > not
>> > >> 100%
>> > >> >> >>> relevant to all users, and some of them require rather invasive
>> > wire
>> > >> >> >>> protocol changes. a thing a good recent example of this is
>> > kip-98.
>> > >> >> >>> tx-utilizing traffic is expected to be a very small fraction of
>> > >> total
>> > >> >> >>> traffic and yet the changes are invasive.
>> > >> >> >>>
>> > >> >> >>> every such wire format change translates into painful and slow
>> > >> >> adoption of
>> > >> >> >>> new versions.
>> > >> >> >>>
>> > >> >> >>> i think a lot of functionality currently in KIPs could be "spun
>> > out"
>> > >> >> and
>> > >> >> >>> implemented as opt-in plugins transmitting data over headers.
>> > this
>> > >> >> would
>> > >> >> >>> keep the core wire format stable(r), core codebase smaller, and
>> > >> avoid
>> > >> >> the
>> > >> >> >>> "burden of proof" thats sometimes required to prove a certain
>> > >> feature
>> > >> >> is
>> > >> >> >>> useful enough for a wide-enough audience to warrant a wire
>> format
>> > >> >> change
>> > >> >> >>> and code complexity additions.
>> > >> >> >>>
>> > >> >> >>> (to be clear - kip-98 goes beyond "mere" wire format changes
>> and
>> > im
>> > >> not
>> > >> >> >>> saying it could have been completely done with headers, but
>> > >> >> exactly-once
>> > >> >> >>> delivery certainly could)
>> > >> >> >>>
>> > >> >> >>> On Thu, Dec 1, 2016 at 11:20 AM, Gwen Shapira <
>> [email protected]
>> > >
>> > >> >> wrote:
>> > >> >> >>>
>> > >> >> >>> > On Thu, Dec 1, 2016 at 10:24 AM, radai <
>> > >> [email protected]>
>> > >> >> >>> wrote:
>> > >> >> >>> > > "For use cases within an organization, one could always use
>> > >> other
>> > >> >> >>> > > approaches such as company-wise containers"
>> > >> >> >>> > > this is what linkedin has traditionally done but there are
>> > now
>> > >> >> cases
>> > >> >> >>> > (read
>> > >> >> >>> > > - topics) where this is not acceptable. this makes headers
>> > >> useful
>> > >> >> even
>> > >> >> >>> > > within single orgs for cases where one-container-fits-all
>> > cannot
>> > >> >> >>> apply.
>> > >> >> >>> > >
>> > >> >> >>> > > as for the particular use cases listed, i dont want this to
>> > >> devolve
>> > >> >> >>> to a
>> > >> >> >>> > > discussion of particular use cases - i think its enough
>> that
>> > >> some
>> > >> >> of
>> > >> >> >>> them
>> > >> >> >>> >
>> > >> >> >>> > I think a main point of contention is that: We identified few
>> > >> >> >>> > use-cases where headers are useful, do we want Kafka to be a
>> > >> system
>> > >> >> >>> > that supports those use-cases?
>> > >> >> >>> >
>> > >> >> >>> > For example, Jun said:
>> > >> >> >>> > "Not sure how widely useful record-level lineage is though
>> > since
>> > >> the
>> > >> >> >>> > overhead could
>> > >> >> >>> > be significant."
>> > >> >> >>> >
>> > >> >> >>> > We know NiFi supports record level lineage. I don't think it
>> > was
>> > >> >> >>> > developed for lols, I think it is safe to assume that the NSA
>> > >> needed
>> > >> >> >>> > that functionality. We also know that certain financial
>> > institutes
>> > >> >> >>> > need to track tampering with records at a record level and
>> > there
>> > >> are
>> > >> >> >>> > federal regulations that absolutely require this. They also
>> > need
>> > >> to
>> > >> >> >>> > prove that routing apps that "touches" the messages and
>> either
>> > >> reads
>> > >> >> >>> > or updates headers couldn't have possibly modified the
>> payload
>> > >> >> itself.
>> > >> >> >>> > They use record level encryption to do that - apps can read
>> and
>> > >> >> >>> > (sometimes) modify headers but can't touch the payload.
>> > >> >> >>> >
>> > >> >> >>> > We can totally say "those are corner cases and not worth
>> adding
>> > >> >> >>> > headers to Kafka for", they should use a different pubsub
>> > message
>> > >> for
>> > >> >> >>> > that (Nifi or one of the other 1000 that cater specifically
>> to
>> > the
>> > >> >> >>> > financial industry).
>> > >> >> >>> >
>> > >> >> >>> > But this gets us into a catch 22:
>> > >> >> >>> > If we discuss a specific use-case, someone can always say it
>> > isn't
>> > >> >> >>> > interesting enough for Kafka. If we discuss more general
>> > trends,
>> > >> >> >>> > others can say "well, we are not sure any of them really
>> needs
>> > >> >> headers
>> > >> >> >>> > specifically. This is just hand waving and not interesting.".
>> > >> >> >>> >
>> > >> >> >>> > I think discussing use-cases in specifics is super important
>> to
>> > >> >> decide
>> > >> >> >>> > implementation details for headers (my use-cases lean toward
>> > >> >> numerical
>> > >> >> >>> > keys with namespaces and object values, others differ), but I
>> > >> think
>> > >> >> we
>> > >> >> >>> > need to answer the general "Are we going to have headers"
>> > question
>> > >> >> >>> > first.
>> > >> >> >>> >
>> > >> >> >>> > I'd love to hear from the other committers in the discussion:
>> > >> >> >>> > What would it take to convince you that headers in Kafka are
>> a
>> > >> good
>> > >> >> >>> > idea in general, so we can move ahead and try to agree on the
>> > >> >> details?
>> > >> >> >>> >
>> > >> >> >>> > I feel like we keep moving the goal posts and this is truly
>> > >> >> exhausting.
>> > >> >> >>> >
>> > >> >> >>> > For the record, I mildly support adding headers to Kafka
>> > (+0.5?).
>> > >> >> >>> > The community can continue to find workarounds to the issue
>> and
>> > >> there
>> > >> >> >>> > are some benefits to keeping the message format and clients
>> > >> simpler.
>> > >> >> >>> > But I see the usefulness of headers to many use-cases and if
>> we
>> > >> can
>> > >> >> >>> > find a good and generally useful way to add it to Kafka, it
>> > will
>> > >> make
>> > >> >> >>> > Kafka easier to use for many - worthy goal in my eyes.
>> > >> >> >>> >
>> > >> >> >>> > > are interesting/feasible, but:
>> > >> >> >>> > > A+B. i think there are use cases for polyglot topics.
>> > >> especially if
>> > >> >> >>> kafka
>> > >> >> >>> > > is being used to "trunk" something else.
>> > >> >> >>> > > D. multiple topics would make it harder to write portable
>> > >> consumer
>> > >> >> >>> code.
>> > >> >> >>> > > partition remapping would mess with locality of consumption
>> > >> >> >>> guarantees.
>> > >> >> >>> > > E+F. a use case I see for lineage/metadata is
>> > >> billing/chargeback.
>> > >> >> for
>> > >> >> >>> > that
>> > >> >> >>> > > use case it is not enough to simply record the point of
>> > origin,
>> > >> but
>> > >> >> >>> every
>> > >> >> >>> > > replication stop (think mirror maker) must also add a
>> record
>> > to
>> > >> >> form a
>> > >> >> >>> > > "transit log".
>> > >> >> >>> > >
>> > >> >> >>> > > as for stream processing on top of kafka - i know samza
>> has a
>> > >> >> metadata
>> > >> >> >>> > map
>> > >> >> >>> > > which they carry around in addition to user values. headers
>> > are
>> > >> the
>> > >> >> >>> > perfect
>> > >> >> >>> > > fit for these things.
>> > >> >> >>> > >
>> > >> >> >>> > >
>> > >> >> >>> > >
>> > >> >> >>> > > On Wed, Nov 30, 2016 at 6:50 PM, Jun Rao <[email protected]
>> >
>> > >> wrote:
>> > >> >> >>> > >
>> > >> >> >>> > >> Hi, Michael,
>> > >> >> >>> > >>
>> > >> >> >>> > >> In order to answer the first two questions, it would be
>> > helpful
>> > >> >> if we
>> > >> >> >>> > could
>> > >> >> >>> > >> identify 1 or 2 strong use cases for headers in the space
>> > for
>> > >> >> >>> > third-party
>> > >> >> >>> > >> vendors. For use cases within an organization, one could
>> > always
>> > >> >> use
>> > >> >> >>> > other
>> > >> >> >>> > >> approaches such as company-wise containers to get around
>> w/o
>> > >> >> >>> headers. I
>> > >> >> >>> > >> went through the use cases in the KIP and in Radai's wiki
>> (
>> > >> >> >>> > >> https://cwiki.apache.org/confluence/display/KAFKA/A+
>> <https://cwiki.apache.org/confluence/display/KAFKA/A+>
>> > >> >> >>> > Case+for+Kafka+Headers
>> > >> >> >>> > >> ).
>> > >> >> >>> > >> The following are the ones that that I understand and
>> could
>> > be
>> > >> in
>> > >> >> the
>> > >> >> >>> > >> third-party use case category.
>> > >> >> >>> > >>
>> > >> >> >>> > >> A. content-type
>> > >> >> >>> > >> It seems that in general, content-type should be set at
>> the
>> > >> topic
>> > >> >> >>> level.
>> > >> >> >>> > >> Not sure if mixing messages with different content types
>> > >> should be
>> > >> >> >>> > >> encouraged.
>> > >> >> >>> > >>
>> > >> >> >>> > >> B. schema id
>> > >> >> >>> > >> Since the value is mostly useless without schema id, it
>> > seems
>> > >> that
>> > >> >> >>> > storing
>> > >> >> >>> > >> the schema id together with serialized bytes in the value
>> is
>> > >> >> better?
>> > >> >> >>> > >>
>> > >> >> >>> > >> C. per message encryption
>> > >> >> >>> > >> One drawback of this approach is that this significantly
>> > reduce
>> > >> >> the
>> > >> >> >>> > >> effectiveness of compression, which happens on a set of
>> > >> serialized
>> > >> >> >>> > >> messages. An alternative is to enable SSL for wire
>> > encryption
>> > >> and
>> > >> >> >>> rely
>> > >> >> >>> > on
>> > >> >> >>> > >> the storage system (e.g. LUKS) for at rest encryption.
>> > >> >> >>> > >>
>> > >> >> >>> > >> D. cluster ID for mirroring across Kafka clusters
>> > >> >> >>> > >> This is actually interesting. Today, to avoid introducing
>> > >> cycles
>> > >> >> when
>> > >> >> >>> > doing
>> > >> >> >>> > >> mirroring across data centers, one would either have to
>> set
>> > up
>> > >> two
>> > >> >> >>> Kafka
>> > >> >> >>> > >> clusters (a local and an aggregate) per data center or
>> > rename
>> > >> >> topics.
>> > >> >> >>> > >> Neither is ideal. With headers, the producer could tag
>> each
>> > >> >> message
>> > >> >> >>> with
>> > >> >> >>> > >> the producing cluster ID in the header. MirrorMaker could
>> > then
>> > >> >> avoid
>> > >> >> >>> > >> mirroring messages to a cluster if they are tagged with
>> the
>> > >> same
>> > >> >> >>> cluster
>> > >> >> >>> > >> id.
>> > >> >> >>> > >>
>> > >> >> >>> > >> However, an alternative approach is to introduce sth like
>> > >> >> >>> hierarchical
>> > >> >> >>> > >> topic and store messages from different clusters in
>> > different
>> > >> >> >>> partitions
>> > >> >> >>> > >> under the same topic. This approach avoids filtering out
>> > >> unneeded
>> > >> >> >>> data
>> > >> >> >>> > and
>> > >> >> >>> > >> makes offset preserving easier to support. It may make
>> > >> compaction
>> > >> >> >>> > trickier
>> > >> >> >>> > >> though since the same key may show up in different
>> > partitions.
>> > >> >> >>> > >>
>> > >> >> >>> > >> E. record-level lineage
>> > >> >> >>> > >> For example, a source connector could store in the message
>> > the
>> > >> >> >>> metadata
>> > >> >> >>> > >> (e.g. UUID) of the source record. Similarly, if a stream
>> job
>> > >> >> >>> transforms
>> > >> >> >>> > >> messages from topic A to topic B, the library could
>> include
>> > the
>> > >> >> >>> source
>> > >> >> >>> > >> message offset in each of the transformed message in the
>> > >> header.
>> > >> >> Not
>> > >> >> >>> > sure
>> > >> >> >>> > >> how widely useful record-level lineage is though since the
>> > >> >> overhead
>> > >> >> >>> > could
>> > >> >> >>> > >> be significant.
>> > >> >> >>> > >>
>> > >> >> >>> > >> F. auditing metadata
>> > >> >> >>> > >> We could put things like clientId/host/user in the header
>> in
>> > >> each
>> > >> >> >>> > message
>> > >> >> >>> > >> for auditing. These metadata are really at the producer
>> > level
>> > >> >> though.
>> > >> >> >>> > So, a
>> > >> >> >>> > >> more efficient way is to only include a "producerId" per
>> > >> message
>> > >> >> and
>> > >> >> >>> > send
>> > >> >> >>> > >> the producerId -> metadata mapping independently. KIP-98
>> is
>> > >> >> actually
>> > >> >> >>> > >> proposing including such a producerId natively in the
>> > message.
>> > >> >> >>> > >>
>> > >> >> >>> > >> So, overall, I not sure that I am fully convinced of the
>> > strong
>> > >> >> >>> > third-party
>> > >> >> >>> > >> use cases of headers yet. Perhaps we could discuss a bit
>> > more
>> > >> to
>> > >> >> make
>> > >> >> >>> > one
>> > >> >> >>> > >> or two really convincing use cases.
>> > >> >> >>> > >>
>> > >> >> >>> > >> Another orthogonal question is whether header should be
>> > >> exposed
>> > >> >> in
>> > >> >> >>> > stream
>> > >> >> >>> > >> processing systems such Kafka stream, Samza, and Spark
>> > >> streaming.
>> > >> >> >>> > >> Currently, those systems just deal with key/value pairs.
>> > >> Should we
>> > >> >> >>> > expose a
>> > >> >> >>> > >> third thing header there too or somehow map header to key
>> or
>> > >> >> value?
>> > >> >> >>> > >>
>> > >> >> >>> > >> Thanks,
>> > >> >> >>> > >>
>> > >> >> >>> > >> Jun
>> > >> >> >>> > >>
>> > >> >> >>> > >>
>> > >> >> >>> > >> On Tue, Nov 29, 2016 at 3:35 AM, Michael Pearce <
>> > >> >> >>> [email protected]>
>> > >> >> >>> > >> wrote:
>> > >> >> >>> > >>
>> > >> >> >>> > >> > I assume, that after a period of a week, that there is
>> no
>> > >> >> concerns
>> > >> >> >>> now
>> > >> >> >>> > >> > with points 1, and 2 and now we have agreement that
>> > headers
>> > >> are
>> > >> >> >>> useful
>> > >> >> >>> > >> and
>> > >> >> >>> > >> > needed in Kafka. As such if put to a KIP vote, this
>> > wouldn’t
>> > >> be
>> > >> >> a
>> > >> >> >>> > reason
>> > >> >> >>> > >> to
>> > >> >> >>> > >> > reject.
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > @
>> > >> >> >>> > >> > Ignacio on point 4).
>> > >> >> >>> > >> > I think for purpose of getting this KIP moving past
>> this,
>> > we
>> > >> can
>> > >> >> >>> state
>> > >> >> >>> > >> the
>> > >> >> >>> > >> > key will be a 4 bytes space that can will be naturally
>> > >> >> interpreted
>> > >> >> >>> as
>> > >> >> >>> > an
>> > >> >> >>> > >> > Int32 (if namespacing is later wanted you can easily
>> split
>> > >> this
>> > >> >> >>> into
>> > >> >> >>> > two
>> > >> >> >>> > >> > int16 spaces), from the wire protocol implementation
>> this
>> > >> makes
>> > >> >> no
>> > >> >> >>> > >> > difference I don’t believe. Is this reasonable to all?
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > On 5) as per point 4 therefor happy we keep with 32
>> bits.
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > On 18/11/2016, 20:34, "[email protected] on
>> behalf
>> > of
>> > >> >> >>> Ignacio
>> > >> >> >>> > >> > Solis" <[email protected] on behalf of
>> > [email protected]
>> > >> >
>> > >> >> >>> wrote:
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > Summary:
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > 3) Yes - Header value as byte[]
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > 4a) Int,Int - No
>> > >> >> >>> > >> > 4b) Int - Yes
>> > >> >> >>> > >> > 4c) String - Reluctant maybe
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > 5) I believe the header system should take a single
>> > >> int. I
>> > >> >> >>> think
>> > >> >> >>> > >> > 32bits is
>> > >> >> >>> > >> > a good size, if you want to interpret this as to 16bit
>> > >> >> numbers
>> > >> >> >>> in
>> > >> >> >>> > the
>> > >> >> >>> > >> > layer
>> > >> >> >>> > >> > above go right ahead. If somebody wants to argue for
>> > 16
>> > >> >> bits
>> > >> >> >>> or
>> > >> >> >>> > 64
>> > >> >> >>> > >> > bits of
>> > >> >> >>> > >> > header key space I would listen.
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > Discussion:
>> > >> >> >>> > >> > Dividing the key space into sub_key_1 and sub_key_2
>> > >> makes no
>> > >> >> >>> > sense to
>> > >> >> >>> > >> > me at
>> > >> >> >>> > >> > this layer. Are we going to start providing APIs to
>> > get
>> > >> all
>> > >> >> >>> the
>> > >> >> >>> > >> > sub_key_1s? or all the sub_key_2s? If there is no
>> > >> >> >>> distinguishing
>> > >> >> >>> > >> > functions
>> > >> >> >>> > >> > that are applied to each one then they should be a
>> > single
>> > >> >> >>> value.
>> > >> >> >>> > At
>> > >> >> >>> > >> > this
>> > >> >> >>> > >> > layer all we're doing is equality.
>> > >> >> >>> > >> > If the above layer wants to interpret this as 2, 3 or
>> > >> more
>> > >> >> >>> values
>> > >> >> >>> > >> > that's a
>> > >> >> >>> > >> > different question. I personally think it's all one
>> > >> >> keyspace
>> > >> >> >>> > that is
>> > >> >> >>> > >> > getting assigned using some structure, but if you
>> > want to
>> > >> >> >>> > sub-assign
>> > >> >> >>> > >> > parts
>> > >> >> >>> > >> > of it then that's fine.
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > The same discussion applies to strings. If somebody
>> > >> argued
>> > >> >> for
>> > >> >> >>> > >> > strings,
>> > >> >> >>> > >> > would we be arguing to divide the strings with dots
>> > ('.')
>> > >> >> as a
>> > >> >> >>> > >> > requirement?
>> > >> >> >>> > >> > Would we want them to give us the different name
>> > segments
>> > >> >> >>> > separately?
>> > >> >> >>> > >> > Would we be performing any actions on this key other
>> > than
>> > >> >> >>> > matching?
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > Nacho
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > On Fri, Nov 18, 2016 at 9:30 AM, Michael Pearce <
>> > >> >> >>> > >> [email protected]
>> > >> >> >>> > >> > >
>> > >> >> >>> > >> > wrote:
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > > #jay #jun any concerns on 1 and 2 still?
>> > >> >> >>> > >> > >
>> > >> >> >>> > >> > > @all
>> > >> >> >>> > >> > > To get this moving along a bit more I'd also like to
>> > >> ask
>> > >> >> to
>> > >> >> >>> get
>> > >> >> >>> > >> > clarity on
>> > >> >> >>> > >> > > the below last points:
>> > >> >> >>> > >> > >
>> > >> >> >>> > >> > > 3) I believe we're all roughly happy with the header
>> > >> value
>> > >> >> >>> > being a
>> > >> >> >>> > >> > byte[]?
>> > >> >> >>> > >> > >
>> > >> >> >>> > >> > > 4) I believe consensus has been for an namespace
>> > based
>> > >> int
>> > >> >> >>> > approach
>> > >> >> >>> > >> > > {int,int} for the key. Any objections if this is
>> > what
>> > >> we
>> > >> >> go
>> > >> >> >>> > with?
>> > >> >> >>> > >> > >
>> > >> >> >>> > >> > > 5) as we have if assumption in (4) is correct,
>> > >> {int,int}
>> > >> >> >>> keys.
>> > >> >> >>> > >> > > Should both int's be int16 or int32?
>> > >> >> >>> > >> > > I'm for them being int16(2 bytes) as combined is
>> > space
>> > >> of
>> > >> >> >>> > 4bytes as
>> > >> >> >>> > >> > per
>> > >> >> >>> > >> > > original and gives plenty of combinations for the
>> > >> >> >>> foreseeable,
>> > >> >> >>> > and
>> > >> >> >>> > >> > keeps
>> > >> >> >>> > >> > > the overhead small.
>> > >> >> >>> > >> > >
>> > >> >> >>> > >> > > Do we see any benefit in another kip call to discuss
>> > >> >> these at
>> > >> >> >>> > all?
>> > >> >> >>> > >> > >
>> > >> >> >>> > >> > > Cheers
>> > >> >> >>> > >> > > Mike
>> > >> >> >>> > >> > > ________________________________________
>> > >> >> >>> > >> > > From: K Burstev <[email protected]>
>> > >> >> >>> > >> > > Sent: Friday, November 18, 2016 7:07:07 AM
>> > >> >> >>> > >> > > To: [email protected]
>> > >> >> >>> > >> > > Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>> > >> >> >>> > >> > >
>> > >> >> >>> > >> > > For what it is worth also i agree. As a user:
>> > >> >> >>> > >> > >
>> > >> >> >>> > >> > > 1) Yes - Headers are worthwhile
>> > >> >> >>> > >> > > 2) Yes - Headers should be a top level option
>> > >> >> >>> > >> > >
>> > >> >> >>> > >> > > 14.11.2016, 21:15, "Ignacio Solis" <[email protected]
>> > >:
>> > >> >> >>> > >> > > > 1) Yes - Headers are worthwhile
>> > >> >> >>> > >> > > > 2) Yes - Headers should be a top level option
>> > >> >> >>> > >> > > >
>> > >> >> >>> > >> > > > On Mon, Nov 14, 2016 at 9:16 AM, Michael Pearce <
>> > >> >> >>> > >> > [email protected]>
>> > >> >> >>> > >> > > > wrote:
>> > >> >> >>> > >> > > >
>> > >> >> >>> > >> > > >> Hi Roger,
>> > >> >> >>> > >> > > >>
>> > >> >> >>> > >> > > >> The kip details/examples the original proposal
>> > for
>> > >> key
>> > >> >> >>> > spacing
>> > >> >> >>> > >> ,
>> > >> >> >>> > >> > not
>> > >> >> >>> > >> > > the
>> > >> >> >>> > >> > > >> new mentioned as per discussion namespace idea.
>> > >> >> >>> > >> > > >>
>> > >> >> >>> > >> > > >> We will need to update the kip, when we get
>> > >> agreement
>> > >> >> >>> this
>> > >> >> >>> > is a
>> > >> >> >>> > >> > better
>> > >> >> >>> > >> > > >> approach (which seems to be the case if I have
>> > >> >> understood
>> > >> >> >>> > the
>> > >> >> >>> > >> > general
>> > >> >> >>> > >> > > >> feeling in the conversation)
>> > >> >> >>> > >> > > >>
>> > >> >> >>> > >> > > >> Re the variable ints, at very early stage we did
>> > >> think
>> > >> >> >>> about
>> > >> >> >>> > >> > this. I
>> > >> >> >>> > >> > > think
>> > >> >> >>> > >> > > >> the added complexity for the saving isn't worth
>> > it.
>> > >> >> I'd
>> > >> >> >>> > rather
>> > >> >> >>> > >> go
>> > >> >> >>> > >> > > with, if
>> > >> >> >>> > >> > > >> we want to reduce overheads and size int16
>> > (2bytes)
>> > >> >> keys
>> > >> >> >>> as
>> > >> >> >>> > it
>> > >> >> >>> > >> > keeps it
>> > >> >> >>> > >> > > >> simple.
>> > >> >> >>> > >> > > >>
>> > >> >> >>> > >> > > >> On the note of no headers, there is as per the
>> > kip
>> > >> as
>> > >> >> we
>> > >> >> >>> > use an
>> > >> >> >>> > >> > > attribute
>> > >> >> >>> > >> > > >> bit to denote if headers are present or not as
>> > such
>> > >> >> >>> > provides a
>> > >> >> >>> > >> > zero
>> > >> >> >>> > >> > > >> overhead currently if headers are not used.
>> > >> >> >>> > >> > > >>
>> > >> >> >>> > >> > > >> I think as radai mentions would be good first
>> > if we
>> > >> >> can
>> > >> >> >>> get
>> > >> >> >>> > >> > clarity if
>> > >> >> >>> > >> > > do
>> > >> >> >>> > >> > > >> we now have general consensus that (1) headers
>> > are
>> > >> >> >>> > worthwhile
>> > >> >> >>> > >> and
>> > >> >> >>> > >> > > useful,
>> > >> >> >>> > >> > > >> and (2) we want it as a top level entity.
>> > >> >> >>> > >> > > >>
>> > >> >> >>> > >> > > >> Just to state the obvious i believe (1) headers
>> > are
>> > >> >> >>> > worthwhile
>> > >> >> >>> > >> > and (2)
>> > >> >> >>> > >> > > >> agree as a top level entity.
>> > >> >> >>> > >> > > >>
>> > >> >> >>> > >> > > >> Cheers
>> > >> >> >>> > >> > > >> Mike
>> > >> >> >>> > >> > > >> ________________________________________
>> > >> >> >>> > >> > > >> From: Roger Hoover <[email protected]>
>> > >> >> >>> > >> > > >> Sent: Wednesday, November 9, 2016 9:10:47 PM
>> > >> >> >>> > >> > > >> To: [email protected]
>> > >> >> >>> > >> > > >> Subject: Re: [DISCUSS] KIP-82 - Add Record
>> > Headers
>> > >> >> >>> > >> > > >>
>> > >> >> >>> > >> > > >> Sorry for going a little in the weeds but thanks
>> > >> for
>> > >> >> the
>> > >> >> >>> > >> replies
>> > >> >> >>> > >> > > regarding
>> > >> >> >>> > >> > > >> varint.
>> > >> >> >>> > >> > > >>
>> > >> >> >>> > >> > > >> Agreed that a prefix and {int, int} can be the
>> > >> same.
>> > >> >> It
>> > >> >> >>> > doesn't
>> > >> >> >>> > >> > look
>> > >> >> >>> > >> > > like
>> > >> >> >>> > >> > > >> that's what the KIP is saying the "Open"
>> > section.
>> > >> The
>> > >> >> >>> > example
>> > >> >> >>> > >> > shows
>> > >> >> >>> > >> > > >> 2100001
>> > >> >> >>> > >> > > >> for New Relic and 210002 for App Dynamics
>> > implying
>> > >> >> that
>> > >> >> >>> the
>> > >> >> >>> > New
>> > >> >> >>> > >> > Relic
>> > >> >> >>> > >> > > >> organization will have only a single header id
>> > to
>> > >> work
>> > >> >> >>> > with. Or
>> > >> >> >>> > >> > is
>> > >> >> >>> > >> > > 2100001
>> > >> >> >>> > >> > > >> a prefix? The main point of a namespace or
>> > prefix
>> > >> is
>> > >> >> to
>> > >> >> >>> > reduce
>> > >> >> >>> > >> > the
>> > >> >> >>> > >> > > >> overhead of config mapping or registration
>> > >> depending
>> > >> >> on
>> > >> >> >>> how
>> > >> >> >>> > >> > > >> namespaces/prefixes are managed.
>> > >> >> >>> > >> > > >>
>> > >> >> >>> > >> > > >> Would love to hear more feedback on the
>> > >> higher-level
>> > >> >> >>> > questions
>> > >> >> >>> > >> > > though...
>> > >> >> >>> > >> > > >>
>> > >> >> >>> > >> > > >> Cheers,
>> > >> >> >>> > >> > > >>
>> > >> >> >>> > >> > > >> Roger
>> > >> >> >>> > >> > > >>
>> > >> >> >>> > >> > > >> On Wed, Nov 9, 2016 at 11:38 AM, radai <
>> > >> >> >>> > >> > [email protected]>
>> > >> >> >>> > >> > > wrote:
>> > >> >> >>> > >> > > >>
>> > >> >> >>> > >> > > >> > I think this discussion is getting a bit into
>> > the
>> > >> >> >>> weeds on
>> > >> >> >>> > >> > technical
>> > >> >> >>> > >> > > >> > implementation details.
>> > >> >> >>> > >> > > >> > I'd liek to step back a minute and try and
>> > >> establish
>> > >> >> >>> > where we
>> > >> >> >>> > >> > are in
>> > >> >> >>> > >> > > the
>> > >> >> >>> > >> > > >> > larger picture:
>> > >> >> >>> > >> > > >> >
>> > >> >> >>> > >> > > >> > (re-wording nacho's last paragraph)
>> > >> >> >>> > >> > > >> > 1. are we all in agreement that headers are a
>> > >> >> >>> worthwhile
>> > >> >> >>> > and
>> > >> >> >>> > >> > useful
>> > >> >> >>> > >> > > >> > addition to have? this was contested early on
>> > >> >> >>> > >> > > >> > 2. are we all in agreement on headers as top
>> > >> level
>> > >> >> >>> entity
>> > >> >> >>> > vs
>> > >> >> >>> > >> > headers
>> > >> >> >>> > >> > > >> > squirreled-away in V?
>> > >> >> >>> > >> > > >> >
>> > >> >> >>> > >> > > >> > if there are still concerns around these #2
>> > >> points
>> > >> >> >>> (#jay?
>> > >> >> >>> > >> > #jun?)?
>> > >> >> >>> > >> > > >> >
>> > >> >> >>> > >> > > >> > (and now back to our normal programming ...)
>> > >> >> >>> > >> > > >> >
>> > >> >> >>> > >> > > >> > varints are nice. having said that, its adding
>> > >> >> >>> complexity
>> > >> >> >>> > >> (see
>> > >> >> >>> > >> > > >> > https://github.com/addthis/
>> <https://github.com/addthis/>
>> > >> >> stream-lib/blob/master/src/
>> > >> >> >>> > >> > > >> > main/java/com/clearspring/
>> > >> >> analytics/util/Varint.java
>> > >> >> >>> > >> > > >> > as 1st google result) and would require anyone
>> > >> >> writing
>> > >> >> >>> > other
>> > >> >> >>> > >> > clients
>> > >> >> >>> > >> > > (C?
>> > >> >> >>> > >> > > >> > Python? Go? Bash? ;-) ) to get/implement the
>> > >> same,
>> > >> >> and
>> > >> >> >>> for
>> > >> >> >>> > >> > relatively
>> > >> >> >>> > >> > > >> > little gain (int vs string is order of
>> > magnitude,
>> > >> >> this
>> > >> >> >>> > isnt).
>> > >> >> >>> > >> > > >> >
>> > >> >> >>> > >> > > >> > int namespacing vs {int, int} namespacing are
>> > >> >> basically
>> > >> >> >>> > the
>> > >> >> >>> > >> > same
>> > >> >> >>> > >> > > thing -
>> > >> >> >>> > >> > > >> > youre just namespacing an int64 and giving
>> > people
>> > >> >> while
>> > >> >> >>> > 2^32
>> > >> >> >>> > >> > ranges
>> > >> >> >>> > >> > > at a
>> > >> >> >>> > >> > > >> > time. the part i like about this is letting
>> > >> people
>> > >> >> >>> have a
>> > >> >> >>> > >> large
>> > >> >> >>> > >> > > swath of
>> > >> >> >>> > >> > > >> > numbers with one registration so they dont
>> > have
>> > >> to
>> > >> >> come
>> > >> >> >>> > back
>> > >> >> >>> > >> > for
>> > >> >> >>> > >> > > every
>> > >> >> >>> > >> > > >> > single plugin/header they want to "reserve".
>> > >> >> >>> > >> > > >> >
>> > >> >> >>> > >> > > >> >
>> > >> >> >>> > >> > > >> > On Wed, Nov 9, 2016 at 11:01 AM, Roger Hoover
>> > <
>> > >> >> >>> > >> > > [email protected]>
>> > >> >> >>> > >> > > >> > wrote:
>> > >> >> >>> > >> > > >> >
>> > >> >> >>> > >> > > >> > > Since some of the debate has been about
>> > >> overhead +
>> > >> >> >>> > >> > performance, I'm
>> > >> >> >>> > >> > > >> > > wondering if we have considered a varint
>> > >> encoding
>> > >> >> (
>> > >> >> >>> > >> > > >> > > https://developers.google.com/
>> <https://developers.google.com/>
>> > >> >> protocol-buffers/docs/
>> > >> >> >>> > >> > > encoding#varints)
>> > >> >> >>> > >> > > >> > for
>> > >> >> >>> > >> > > >> > > the header length field (int32 in the
>> > proposal)
>> > >> >> and
>> > >> >> >>> for
>> > >> >> >>> > >> > header
>> > >> >> >>> > >> > > ids? If
>> > >> >> >>> > >> > > >> > you
>> > >> >> >>> > >> > > >> > > don't use headers, the overhead would be a
>> > >> single
>> > >> >> >>> byte
>> > >> >> >>> > and
>> > >> >> >>> > >> > for each
>> > >> >> >>> > >> > > >> > header
>> > >> >> >>> > >> > > >> > > id < 128 would also need only a single byte?
>> > >> >> >>> > >> > > >> > >
>> > >> >> >>> > >> > > >> > >
>> > >> >> >>> > >> > > >> > >
>> > >> >> >>> > >> > > >> > > On Wed, Nov 9, 2016 at 6:43 AM, radai <
>> > >> >> >>> > >> > [email protected]>
>> > >> >> >>> > >> > > >> > wrote:
>> > >> >> >>> > >> > > >> > >
>> > >> >> >>> > >> > > >> > > > @magnus - and very dangerous (youre
>> > >> essentially
>> > >> >> >>> > >> > downloading and
>> > >> >> >>> > >> > > >> > executing
>> > >> >> >>> > >> > > >> > > > arbitrary code off the internet on your
>> > >> servers
>> > >> >> ...
>> > >> >> >>> > bad
>> > >> >> >>> > >> > idea
>> > >> >> >>> > >> > > without
>> > >> >> >>> > >> > > >> a
>> > >> >> >>> > >> > > >> > > > sandbox, even with)
>> > >> >> >>> > >> > > >> > > >
>> > >> >> >>> > >> > > >> > > > as for it being a purely administrative
>> > task
>> > >> - i
>> > >> >> >>> > >> disagree.
>> > >> >> >>> > >> > > >> > > >
>> > >> >> >>> > >> > > >> > > > i wish it would, really, because then my
>> > >> earlier
>> > >> >> >>> > point on
>> > >> >> >>> > >> > the
>> > >> >> >>> > >> > > >> > complexity
>> > >> >> >>> > >> > > >> > > of
>> > >> >> >>> > >> > > >> > > > the remapping process would be invalid,
>> > but
>> > >> at
>> > >> >> >>> > linkedin,
>> > >> >> >>> > >> > for
>> > >> >> >>> > >> > > example,
>> > >> >> >>> > >> > > >> > we
>> > >> >> >>> > >> > > >> > > > (the team im in) run kafka as a service.
>> > we
>> > >> dont
>> > >> >> >>> > really
>> > >> >> >>> > >> > know
>> > >> >> >>> > >> > > what our
>> > >> >> >>> > >> > > >> > > users
>> > >> >> >>> > >> > > >> > > > (developing applications that use kafka)
>> > are
>> > >> up
>> > >> >> to
>> > >> >> >>> at
>> > >> >> >>> > any
>> > >> >> >>> > >> > given
>> > >> >> >>> > >> > > >> moment.
>> > >> >> >>> > >> > > >> > > it
>> > >> >> >>> > >> > > >> > > > is very possible (given the existance of
>> > >> headers
>> > >> >> >>> and a
>> > >> >> >>> > >> > > corresponding
>> > >> >> >>> > >> > > >> > > plugin
>> > >> >> >>> > >> > > >> > > > ecosystem) for some application to "equip"
>> > >> their
>> > >> >> >>> > >> producers
>> > >> >> >>> > >> > and
>> > >> >> >>> > >> > > >> > consumers
>> > >> >> >>> > >> > > >> > > > with the required plugin without us
>> > knowing.
>> > >> i
>> > >> >> dont
>> > >> >> >>> > mean
>> > >> >> >>> > >> > to imply
>> > >> >> >>> > >> > > >> thats
>> > >> >> >>> > >> > > >> > > > bad, i just want to make the point that
>> > its
>> > >> not
>> > >> >> as
>> > >> >> >>> > simple
>> > >> >> >>> > >> > > keeping it
>> > >> >> >>> > >> > > >> in
>> > >> >> >>> > >> > > >> > > > sync across a large-enough organization.
>> > >> >> >>> > >> > > >> > > >
>> > >> >> >>> > >> > > >> > > >
>> > >> >> >>> > >> > > >> > > > On Wed, Nov 9, 2016 at 6:17 AM, Magnus
>> > >> Edenhill
>> > >> >> <
>> > >> >> >>> > >> > > [email protected]>
>> > >> >> >>> > >> > > >> > > > wrote:
>> > >> >> >>> > >> > > >> > > >
>> > >> >> >>> > >> > > >> > > > > I think there is a piece missing in the
>> > >> >> Strings
>> > >> >> >>> > >> > discussion,
>> > >> >> >>> > >> > > where
>> > >> >> >>> > >> > > >> > > > > pro-Stringers
>> > >> >> >>> > >> > > >> > > > > reason that by providing unique string
>> > >> >> >>> identifiers
>> > >> >> >>> > for
>> > >> >> >>> > >> > each
>> > >> >> >>> > >> > > header
>> > >> >> >>> > >> > > >> > > > > everything will just
>> > >> >> >>> > >> > > >> > > > > magically work for all parts of the
>> > stream
>> > >> >> >>> pipeline.
>> > >> >> >>> > >> > > >> > > > >
>> > >> >> >>> > >> > > >> > > > > But the strings dont mean anything by
>> > >> >> themselves,
>> > >> >> >>> > and
>> > >> >> >>> > >> > while we
>> > >> >> >>> > >> > > >> could
>> > >> >> >>> > >> > > >> > > > > probably envision
>> > >> >> >>> > >> > > >> > > > > some auto plugin loader that downloads,
>> > >> >> compiles,
>> > >> >> >>> > links
>> > >> >> >>> > >> > and
>> > >> >> >>> > >> > > runs
>> > >> >> >>> > >> > > >> > > plugins
>> > >> >> >>> > >> > > >> > > > > on-demand
>> > >> >> >>> > >> > > >> > > > > as soon as they're seen by a consumer, I
>> > >> dont
>> > >> >> >>> really
>> > >> >> >>> > >> see
>> > >> >> >>> > >> > a
>> > >> >> >>> > >> > > use-case
>> > >> >> >>> > >> > > >> > for
>> > >> >> >>> > >> > > >> > > > > something
>> > >> >> >>> > >> > > >> > > > > so dynamic (and fragile) in practice.
>> > >> >> >>> > >> > > >> > > > >
>> > >> >> >>> > >> > > >> > > > > In the real world an application will be
>> > >> >> >>> configured
>> > >> >> >>> > >> with
>> > >> >> >>> > >> > a set
>> > >> >> >>> > >> > > of
>> > >> >> >>> > >> > > >> > > plugins
>> > >> >> >>> > >> > > >> > > > > to either add (producer)
>> > >> >> >>> > >> > > >> > > > > or read (consumer) headers.
>> > >> >> >>> > >> > > >> > > > > This is an administrative task based on
>> > >> what
>> > >> >> >>> > features a
>> > >> >> >>> > >> > client
>> > >> >> >>> > >> > > >> > > > > needs/provides and results in
>> > >> >> >>> > >> > > >> > > > > some sort of configuration to enable and
>> > >> >> >>> configure
>> > >> >> >>> > the
>> > >> >> >>> > >> > desired
>> > >> >> >>> > >> > > >> > plugins.
>> > >> >> >>> > >> > > >> > > > >
>> > >> >> >>> > >> > > >> > > > > Since this needs to be kept somewhat in
>> > >> sync
>> > >> >> >>> across
>> > >> >> >>> > an
>> > >> >> >>> > >> > > organisation
>> > >> >> >>> > >> > > >> > > > (there
>> > >> >> >>> > >> > > >> > > > > is no point in having producers
>> > >> >> >>> > >> > > >> > > > > add headers no consumers will read, and
>> > >> vice
>> > >> >> >>> versa),
>> > >> >> >>> > >> the
>> > >> >> >>> > >> > added
>> > >> >> >>> > >> > > >> > > complexity
>> > >> >> >>> > >> > > >> > > > > of assigning an id namespace
>> > >> >> >>> > >> > > >> > > > > for each plugin as it is being
>> > configured
>> > >> >> should
>> > >> >> >>> be
>> > >> >> >>> > >> > tolerable.
>> > >> >> >>> > >> > > >> > > > >
>> > >> >> >>> > >> > > >> > > > >
>> > >> >> >>> > >> > > >> > > > > /Magnus
>> > >> >> >>> > >> > > >> > > > >
>> > >> >> >>> > >> > > >> > > > > 2016-11-09 13:06 GMT+01:00 Michael
>> > Pearce <
>> > >> >> >>> > >> > > [email protected]>:
>> > >> >> >>> > >> > > >> > > > >
>> > >> >> >>> > >> > > >> > > > > > Just following/catching up on what
>> > seems
>> > >> to
>> > >> >> be
>> > >> >> >>> an
>> > >> >> >>> > >> > active
>> > >> >> >>> > >> > > night :)
>> > >> >> >>> > >> > > >> > > > > >
>> > >> >> >>> > >> > > >> > > > > > @Radai sorry if it may seem obvious
>> > but
>> > >> what
>> > >> >> >>> does
>> > >> >> >>> > MD
>> > >> >> >>> > >> > stand
>> > >> >> >>> > >> > > for?
>> > >> >> >>> > >> > > >> > > > > >
>> > >> >> >>> > >> > > >> > > > > > My take on String vs Int:
>> > >> >> >>> > >> > > >> > > > > >
>> > >> >> >>> > >> > > >> > > > > > I will state first I am pro Int (16 or
>> > >> 32).
>> > >> >> >>> > >> > > >> > > > > >
>> > >> >> >>> > >> > > >> > > > > > I do though playing devils advocate
>> > see a
>> > >> >> big
>> > >> >> >>> plus
>> > >> >> >>> > >> > with the
>> > >> >> >>> > >> > > >> > argument
>> > >> >> >>> > >> > > >> > > of
>> > >> >> >>> > >> > > >> > > > > > String keys, this is around
>> > integrating
>> > >> >> into an
>> > >> >> >>> > >> > existing
>> > >> >> >>> > >> > > >> > eco-system.
>> > >> >> >>> > >> > > >> > > > > >
>> > >> >> >>> > >> > > >> > > > > > As many other systems use String based
>> > >> >> headers
>> > >> >> >>> > >> (Flume,
>> > >> >> >>> > >> > JMS)
>> > >> >> >>> > >> > > it
>> > >> >> >>> > >> > > >> > makes
>> > >> >> >>> > >> > > >> > > > it
>> > >> >> >>> > >> > > >> > > > > > much easier for these to be
>> > >> >> >>> > incorporated/integrated
>> > >> >> >>> > >> > into.
>> > >> >> >>> > >> > > >> > > > > >
>> > >> >> >>> > >> > > >> > > > > > How with Int based headers could we
>> > >> provide
>> > >> >> a
>> > >> >> >>> > >> > way/guidence to
>> > >> >> >>> > >> > > >> make
>> > >> >> >>> > >> > > >> > > this
>> > >> >> >>> > >> > > >> > > > > > integration simple / easy with
>> > transition
>> > >> >> flows
>> > >> >> >>> > over
>> > >> >> >>> > >> to
>> > >> >> >>> > >> > > kafka?
>> > >> >> >>> > >> > > >> > > > > >
>> > >> >> >>> > >> > > >> > > > > > * tough luck buddy you're on your own
>> > >> >> >>> > >> > > >> > > > > > * simply hash the string into int code
>> > >> and
>> > >> >> hope
>> > >> >> >>> > for
>> > >> >> >>> > >> no
>> > >> >> >>> > >> > > collisions
>> > >> >> >>> > >> > > >> > > (how
>> > >> >> >>> > >> > > >> > > > to
>> > >> >> >>> > >> > > >> > > > > > convert back though?)
>> > >> >> >>> > >> > > >> > > > > > * http2 style as mentioned by nacho.
>> > >> >> >>> > >> > > >> > > > > >
>> > >> >> >>> > >> > > >> > > > > > cheers,
>> > >> >> >>> > >> > > >> > > > > > Mike
>> > >> >> >>> > >> > > >> > > > > >
>> > >> >> >>> > >> > > >> > > > > >
>> > >> >> >>> > >> > > >> > > > > > ______________________________
>> > __________
>> > >> >> >>> > >> > > >> > > > > > From: radai <
>> > [email protected]>
>> > >> >> >>> > >> > > >> > > > > > Sent: Wednesday, November 9, 2016
>> > 8:12 AM
>> > >> >> >>> > >> > > >> > > > > > To: [email protected]
>> > >> >> >>> > >> > > >> > > > > > Subject: Re: [DISCUSS] KIP-82 - Add
>> > >> Record
>> > >> >> >>> Headers
>> > >> >> >>> > >> > > >> > > > > >
>> > >> >> >>> > >> > > >> > > > > > thinking about it some more, the best
>> > >> way to
>> > >> >> >>> > transmit
>> > >> >> >>> > >> > the
>> > >> >> >>> > >> > > header
>> > >> >> >>> > >> > > >> > > > > remapping
>> > >> >> >>> > >> > > >> > > > > > data to consumers would be to put it
>> > in
>> > >> the
>> > >> >> MD
>> > >> >> >>> > >> response
>> > >> >> >>> > >> > > payload,
>> > >> >> >>> > >> > > >> so
>> > >> >> >>> > >> > > >> > > > maybe
>> > >> >> >>> > >> > > >> > > > > > it should be discussed now.
>> > >> >> >>> > >> > > >> > > > > >
>> > >> >> >>> > >> > > >> > > > > >
>> > >> >> >>> > >> > > >> > > > > > On Wed, Nov 9, 2016 at 12:09 AM,
>> > radai <
>> > >> >> >>> > >> > > >> [email protected]
>> > >> >> >>> > >> > > >> > >
>> > >> >> >>> > >> > > >> > > > > wrote:
>> > >> >> >>> > >> > > >> > > > > >
>> > >> >> >>> > >> > > >> > > > > > > im not opposed to the idea of
>> > namespace
>> > >> >> >>> mapping.
>> > >> >> >>> > >> all
>> > >> >> >>> > >> > im
>> > >> >> >>> > >> > > saying
>> > >> >> >>> > >> > > >> is
>> > >> >> >>> > >> > > >> > > > that
>> > >> >> >>> > >> > > >> > > > > > its
>> > >> >> >>> > >> > > >> > > > > > > not part of the "mvp" and, since it
>> > >> >> requires
>> > >> >> >>> no
>> > >> >> >>> > >> wire
>> > >> >> >>> > >> > format
>> > >> >> >>> > >> > > >> > change,
>> > >> >> >>> > >> > > >> > > > can
>> > >> >> >>> > >> > > >> > > > > > > always be added later.
>> > >> >> >>> > >> > > >> > > > > > > also, its not as simple as just
>> > >> >> configuring
>> > >> >> >>> MM
>> > >> >> >>> > to
>> > >> >> >>> > >> do
>> > >> >> >>> > >> > the
>> > >> >> >>> > >> > > >> > transform:
>> > >> >> >>> > >> > > >> > > > > lets
>> > >> >> >>> > >> > > >> > > > > > > say i've implemented large message
>> > >> >> support as
>> > >> >> >>> > >> > {666,1} and
>> > >> >> >>> > >> > > on
>> > >> >> >>> > >> > > >> some
>> > >> >> >>> > >> > > >> > > > > mirror
>> > >> >> >>> > >> > > >> > > > > > > target cluster its been remapped to
>> > >> >> {999,1}.
>> > >> >> >>> the
>> > >> >> >>> > >> > consumer
>> > >> >> >>> > >> > > >> plugin
>> > >> >> >>> > >> > > >> > > code
>> > >> >> >>> > >> > > >> > > > > > would
>> > >> >> >>> > >> > > >> > > > > > > also need to be told to look for the
>> > >> large
>> > >> >> >>> > message
>> > >> >> >>> > >> > "part X
>> > >> >> >>> > >> > > of
>> > >> >> >>> > >> > > >> Y"
>> > >> >> >>> > >> > > >> > > > header
>> > >> >> >>> > >> > > >> > > > > > > under {999,1}. doable, but tricky.
>> > >> >> >>> > >> > > >> > > > > > >
>> > >> >> >>> > >> > > >> > > > > > > On Tue, Nov 8, 2016 at 10:29 PM,
>> > Gwen
>> > >> >> >>> Shapira <
>> > >> >> >>> > >> > > >> [email protected]
>> > >> >> >>> > >> > > >> > >
>> > >> >> >>> > >> > > >> > > > > wrote:
>> > >> >> >>> > >> > > >> > > > > > >
>> > >> >> >>> > >> > > >> > > > > > >> While you can do whatever you want
>> > >> with a
>> > >> >> >>> > >> namespace
>> > >> >> >>> > >> > and
>> > >> >> >>> > >> > > your
>> > >> >> >>> > >> > > >> > code,
>> > >> >> >>> > >> > > >> > > > > > >> what I'd expect is for each app to
>> > >> >> >>> namespaces
>> > >> >> >>> > >> > > configurable...
>> > >> >> >>> > >> > > >> > > > > > >>
>> > >> >> >>> > >> > > >> > > > > > >> So if I accidentally used 666 for
>> > my
>> > >> HR
>> > >> >> >>> > >> department,
>> > >> >> >>> > >> > and
>> > >> >> >>> > >> > > still
>> > >> >> >>> > >> > > >> > want
>> > >> >> >>> > >> > > >> > > > to
>> > >> >> >>> > >> > > >> > > > > > >> run RadaiApp, I can config
>> > >> "namespace=42"
>> > >> >> >>> for
>> > >> >> >>> > >> > RadaiApp and
>> > >> >> >>> > >> > > >> > > > everything
>> > >> >> >>> > >> > > >> > > > > > >> will look normal.
>> > >> >> >>> > >> > > >> > > > > > >>
>> > >> >> >>> > >> > > >> > > > > > >> This means you only need to sync
>> > usage
>> > >> >> >>> inside
>> > >> >> >>> > your
>> > >> >> >>> > >> > own
>> > >> >> >>> > >> > > >> > > organization.
>> > >> >> >>> > >> > > >> > > > > > >> Still hard, but somewhat easier
>> > than
>> > >> >> syncing
>> > >> >> >>> > with
>> > >> >> >>> > >> > the
>> > >> >> >>> > >> > > entire
>> > >> >> >>> > >> > > >> > > world.
>> > >> >> >>> > >> > > >> > > > > > >>
>> > >> >> >>> > >> > > >> > > > > > >> On Tue, Nov 8, 2016 at 10:07 PM,
>> > >> radai <
>> > >> >> >>> > >> > > >> > > [email protected]>
>> > >> >> >>> > >> > > >> > > > > > >> wrote:
>> > >> >> >>> > >> > > >> > > > > > >> > and we can start with {namespace,
>> > >> id}
>> > >> >> and
>> > >> >> >>> no
>> > >> >> >>> > >> > re-mapping
>> > >> >> >>> > >> > > >> > support
>> > >> >> >>> > >> > > >> > > > and
>> > >> >> >>> > >> > > >> > > > > > >> always
>> > >> >> >>> > >> > > >> > > > > > >> > add it later on if/when
>> > collisions
>> > >> >> >>> actually
>> > >> >> >>> > >> > happen (i
>> > >> >> >>> > >> > > dont
>> > >> >> >>> > >> > > >> > think
>> > >> >> >>> > >> > > >> > > > > > they'd
>> > >> >> >>> > >> > > >> > > > > > >> be
>> > >> >> >>> > >> > > >> > > > > > >> > a problem).
>> > >> >> >>> > >> > > >> > > > > > >> >
>> > >> >> >>> > >> > > >> > > > > > >> > every interested party (so orgs
>> > or
>> > >> >> >>> > individuals)
>> > >> >> >>> > >> > could
>> > >> >> >>> > >> > > then
>> > >> >> >>> > >> > > >> > > > register
>> > >> >> >>> > >> > > >> > > > > a
>> > >> >> >>> > >> > > >> > > > > > >> > prefix (0 = reserved, 1 =
>> > confluent
>> > >> ...
>> > >> >> >>> 666
>> > >> >> >>> > = me
>> > >> >> >>> > >> > :-) )
>> > >> >> >>> > >> > > and
>> > >> >> >>> > >> > > >> do
>> > >> >> >>> > >> > > >> > > > > whatever
>> > >> >> >>> > >> > > >> > > > > > >> with
>> > >> >> >>> > >> > > >> > > > > > >> > the 2nd ID - so once linkedin
>> > >> >> registers,
>> > >> >> >>> say
>> > >> >> >>> > 3,
>> > >> >> >>> > >> > then
>> > >> >> >>> > >> > > >> linkedin
>> > >> >> >>> > >> > > >> > > devs
>> > >> >> >>> > >> > > >> > > > > are
>> > >> >> >>> > >> > > >> > > > > > >> free
>> > >> >> >>> > >> > > >> > > > > > >> > to use {3, *} with a reasonable
>> > >> >> >>> expectation
>> > >> >> >>> > to
>> > >> >> >>> > >> to
>> > >> >> >>> > >> > > collide
>> > >> >> >>> > >> > > >> with
>> > >> >> >>> > >> > > >> > > > > > anything
>> > >> >> >>> > >> > > >> > > > > > >> > else. further partitioning of
>> > that *
>> > >> >> >>> becomes
>> > >> >> >>> > >> > linkedin's
>> > >> >> >>> > >> > > >> > problem,
>> > >> >> >>> > >> > > >> > > > but
>> > >> >> >>> > >> > > >> > > > > > the
>> > >> >> >>> > >> > > >> > > > > > >> > "upstream registration" of a
>> > >> namespace
>> > >> >> >>> only
>> > >> >> >>> > has
>> > >> >> >>> > >> to
>> > >> >> >>> > >> > > happen
>> > >> >> >>> > >> > > >> > once.
>> > >> >> >>> > >> > > >> > > > > > >> >
>> > >> >> >>> > >> > > >> > > > > > >> > On Tue, Nov 8, 2016 at 9:03 PM,
>> > >> James
>> > >> >> >>> Cheng <
>> > >> >> >>> > >> > > >> > > [email protected]
>> > >> >> >>> > >> > > >> > > > >
>> > >> >> >>> > >> > > >> > > > > > >> wrote:
>> > >> >> >>> > >> > > >> > > > > > >> >
>> > >> >> >>> > >> > > >> > > > > > >> >>
>> > >> >> >>> > >> > > >> > > > > > >> >>
>> > >> >> >>> > >> > > >> > > > > > >> >>
>> > >> >> >>> > >> > > >> > > > > > >> >> > On Nov 8, 2016, at 5:54 PM,
>> > Gwen
>> > >> >> >>> Shapira <
>> > >> >> >>> > >> > > >> > [email protected]>
>> > >> >> >>> > >> > > >> > > > > > wrote:
>> > >> >> >>> > >> > > >> > > > > > >> >> >
>> > >> >> >>> > >> > > >> > > > > > >> >> > Thank you so much for this
>> > clear
>> > >> and
>> > >> >> >>> fair
>> > >> >> >>> > >> > summary of
>> > >> >> >>> > >> > > the
>> > >> >> >>> > >> > > >> > > > > arguments.
>> > >> >> >>> > >> > > >> > > > > > >> >> >
>> > >> >> >>> > >> > > >> > > > > > >> >> > I'm in favor of ints. Not a
>> > >> >> >>> deal-breaker,
>> > >> >> >>> > but
>> > >> >> >>> > >> > in
>> > >> >> >>> > >> > > favor.
>> > >> >> >>> > >> > > >> > > > > > >> >> >
>> > >> >> >>> > >> > > >> > > > > > >> >> > Even more in favor of Magnus's
>> > >> >> >>> > decentralized
>> > >> >> >>> > >> > > suggestion
>> > >> >> >>> > >> > > >> > with
>> > >> >> >>> > >> > > >> > > > > > Roger's
>> > >> >> >>> > >> > > >> > > > > > >> >> > tweak: add a namespace for
>> > >> headers.
>> > >> >> >>> This
>> > >> >> >>
>>



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog

Re: [DISCUSS] KIP-82 - Add Record Headers

Reply via email to