Woah, I wasn't aware this is something we'll do. It wasn't in the KIP, right?
I guess we could do it the same way ACLs currently work. I had in mind something that will allow admins to apply rules to the new create/delete/config topic APIs. So Todd can decide to reject "create topic" requests that ask for more than 40 partitions, or require exactly 3 replicas, or no more than 50GB partition size, etc. ACLs were added a bit ad-hoc, if we are planning to apply more rules to requests (and I think we should), we may want a bit more generic design around that. On Fri, Dec 2, 2016 at 7:16 AM, radai <radai.rosenbl...@gmail.com> wrote: > "wouldn't you be in the business of making sure everyone uses them > properly?" > > thats where a broker-side plugin would come handy - any incoming message > that does not conform to org policy (read - does not have the proper > headers) gets thrown out (with an error returned to user) > > On Thu, Dec 1, 2016 at 8:44 PM, Todd Palino <tpal...@gmail.com> wrote: > >> Come on, I’ve done at least 2 talks on this one :) >> >> Producing counts to a topic is part of it, but that’s only part. So you >> count you have 100 messages in topic A. When you mirror topic A to another >> cluster, you have 99 messages. Where was your problem? Or worse, you have >> 100 messages, but one producer duplicated messages and another one lost >> messages. You need details about where the message came from in order to >> pinpoint problems when they happen. Source producer info, where it was >> produced into your infrastructure, and when it was produced. This requires >> you to add the information to the message. >> >> And yes, you still need to maintain your clients. So maybe my original >> example was not the best. My thoughts on not wanting to be responsible for >> message formats stands, because that’s very much separate from the client. >> As you know, we have our own internal client library that can insert the >> right headers, and right now inserts the right audit information into the >> message fields. If they exist, and assuming the message is Avro encoded. >> What if someone wants to use JSON instead for a good reason? What if user X >> wants to encrypt messages, but user Y does not? Maintaining the client >> library is still much easier than maintaining the message formats. >> >> -Todd >> >> >> On Thu, Dec 1, 2016 at 6:21 PM, Gwen Shapira <g...@confluent.io> wrote: >> >> > Based on your last sentence, consider me convinced :) >> > >> > I get why headers are critical for Mirroring (you need tags to prevent >> > loops and sometimes to route messages to the correct destination). >> > But why do you need headers to audit? We are auditing by producing >> > counts to a side topic (and I was under the impression you do the >> > same), so we never need to modify the message. >> > >> > Another thing - after we added headers, wouldn't you be in the >> > business of making sure everyone uses them properly? Making sure >> > everyone includes the right headers you need, not using the header >> > names you intend to use, etc. I don't think the "policing" business >> > will ever go away. >> > >> > On Thu, Dec 1, 2016 at 5:25 PM, Todd Palino <tpal...@gmail.com> wrote: >> > > Got it. As an ops guy, I'm not very happy with the workaround. Avro >> means >> > > that I have to be concerned with the format of the messages in order to >> > run >> > > the infrastructure (audit, mirroring, etc.). That means that I have to >> > > handle the schemas, and I have to enforce rules about good formats. >> This >> > is >> > > not something I want to be in the business of, because I should be able >> > to >> > > run a service infrastructure without needing to be in the weeds of >> > dealing >> > > with customer data formats. >> > > >> > > Trust me, a sizable portion of my support time is spent dealing with >> > schema >> > > issues. I really would like to get away from that. Maybe I'd have more >> > time >> > > for other hobbies. Like writing. ;) >> > > >> > > -Todd >> > > >> > > On Thu, Dec 1, 2016 at 4:04 PM Gwen Shapira <g...@confluent.io> wrote: >> > > >> > >> I'm pretty satisfied with the current workarounds (Avro container >> > >> format), so I'm not too excited about the extra work required to do >> > >> headers in Kafka. I absolutely don't mind it if you do it... >> > >> I think the Apache convention for "good idea, but not willing to put >> > >> any work toward it" is +0.5? anyway, that's what I was trying to >> > >> convey :) >> > >> >> > >> On Thu, Dec 1, 2016 at 3:05 PM, Todd Palino <tpal...@gmail.com> >> wrote: >> > >> > Well I guess my question for you, then, is what is holding you back >> > from >> > >> > full support for headers? What’s the bit that you’re missing that >> has >> > you >> > >> > under a full +1? >> > >> > >> > >> > -Todd >> > >> > >> > >> > >> > >> > On Thu, Dec 1, 2016 at 1:59 PM, Gwen Shapira <g...@confluent.io> >> > wrote: >> > >> > >> > >> >> I know why people who support headers support them, and I've seen >> > what >> > >> >> the discussion is like. >> > >> >> >> > >> >> This is why I'm asking people who are against headers (especially >> > >> >> committers) what will make them change their mind - so we can get >> > this >> > >> >> part over one way or another. >> > >> >> >> > >> >> If I sound frustrated it is not at Radai, Jun or you (Todd)... I am >> > >> >> just looking for something concrete we can do to move the >> discussion >> > >> >> along to the yummy design details (which is the argument I really >> am >> > >> >> looking forward to). >> > >> >> >> > >> >> On Thu, Dec 1, 2016 at 1:53 PM, Todd Palino <tpal...@gmail.com> >> > wrote: >> > >> >> > So, Gwen, to your question (even though I’m not a committer)... >> > >> >> > >> > >> >> > I have always been a strong supporter of introducing the concept >> > of an >> > >> >> > envelope to messages, which headers accomplishes. The message key >> > is >> > >> >> > already an example of a piece of envelope information. By >> > providing a >> > >> >> means >> > >> >> > to do this within Kafka itself, and not relying on use-case >> > specific >> > >> >> > implementations, you make it much easier for components to >> > >> interoperate. >> > >> >> It >> > >> >> > simplifies development of all these things (message routing, >> > auditing, >> > >> >> > encryption, etc.) because each one does not have to reinvent the >> > >> wheel. >> > >> >> > >> > >> >> > It also makes it much easier from a client point of view if the >> > >> headers >> > >> >> are >> > >> >> > defined as part of the protocol and/or message format in general >> > >> because >> > >> >> > you can easily produce and consume messages without having to >> take >> > >> into >> > >> >> > account specific cases. For example, I want to route messages, >> but >> > >> >> client A >> > >> >> > doesn’t support the way audit implemented headers, and client B >> > >> doesn’t >> > >> >> > support the way encryption or routing implemented headers, so now >> > my >> > >> >> > application has to create some really fragile (my autocorrect >> just >> > >> tried >> > >> >> to >> > >> >> > make that “tragic”, which is probably appropriate too) code to >> > strip >> > >> >> > everything off, rather than just consuming the messages, picking >> > out >> > >> the >> > >> >> 1 >> > >> >> > or 2 headers it’s interested in, and performing its function. >> > >> >> > >> > >> >> > Honestly, this discussion has been going on for a long time, and >> > it’s >> > >> >> > always “Oh, you came up with 2 use cases, and yeah, those use >> cases >> > >> are >> > >> >> > real things that someone would want to do. Here’s an alternate >> way >> > to >> > >> >> > implement them so let’s not do headers.” If we have a few use >> cases >> > >> that >> > >> >> we >> > >> >> > actually came up with, you can be sure that over the next year >> > >> there’s a >> > >> >> > dozen others that we didn’t think of that someone would like to >> > do. I >> > >> >> > really think it’s time to stop rehashing this discussion and >> > instead >> > >> >> focus >> > >> >> > on a workable standard that we can adopt. >> > >> >> > >> > >> >> > -Todd >> > >> >> > >> > >> >> > >> > >> >> > On Thu, Dec 1, 2016 at 1:39 PM, Todd Palino <tpal...@gmail.com> >> > >> wrote: >> > >> >> > >> > >> >> >> C. per message encryption >> > >> >> >>> One drawback of this approach is that this significantly reduce >> > the >> > >> >> >>> effectiveness of compression, which happens on a set of >> > serialized >> > >> >> >>> messages. An alternative is to enable SSL for wire encryption >> and >> > >> rely >> > >> >> on >> > >> >> >>> the storage system (e.g. LUKS) for at rest encryption. >> > >> >> >> >> > >> >> >> >> > >> >> >> Jun, this is not sufficient. While this does cover the case of >> > >> removing >> > >> >> a >> > >> >> >> drive from the system, it will not satisfy most compliance >> > >> requirements >> > >> >> for >> > >> >> >> encryption of data as whoever has access to the broker itself >> > still >> > >> has >> > >> >> >> access to the unencrypted data. For end-to-end encryption you >> > need to >> > >> >> >> encrypt at the producer, before it enters the system, and >> decrypt >> > at >> > >> the >> > >> >> >> consumer, after it exits the system. >> > >> >> >> >> > >> >> >> -Todd >> > >> >> >> >> > >> >> >> >> > >> >> >> On Thu, Dec 1, 2016 at 1:03 PM, radai < >> radai.rosenbl...@gmail.com >> > > >> > >> >> wrote: >> > >> >> >> >> > >> >> >>> another big plus of headers in the protocol is that it would >> > enable >> > >> >> rapid >> > >> >> >>> iteration on ideas outside of core kafka and would reduce the >> > >> number of >> > >> >> >>> future wire format changes required. >> > >> >> >>> >> > >> >> >>> a lot of what is currently a KIP represents use cases that are >> > not >> > >> 100% >> > >> >> >>> relevant to all users, and some of them require rather invasive >> > wire >> > >> >> >>> protocol changes. a thing a good recent example of this is >> > kip-98. >> > >> >> >>> tx-utilizing traffic is expected to be a very small fraction of >> > >> total >> > >> >> >>> traffic and yet the changes are invasive. >> > >> >> >>> >> > >> >> >>> every such wire format change translates into painful and slow >> > >> >> adoption of >> > >> >> >>> new versions. >> > >> >> >>> >> > >> >> >>> i think a lot of functionality currently in KIPs could be "spun >> > out" >> > >> >> and >> > >> >> >>> implemented as opt-in plugins transmitting data over headers. >> > this >> > >> >> would >> > >> >> >>> keep the core wire format stable(r), core codebase smaller, and >> > >> avoid >> > >> >> the >> > >> >> >>> "burden of proof" thats sometimes required to prove a certain >> > >> feature >> > >> >> is >> > >> >> >>> useful enough for a wide-enough audience to warrant a wire >> format >> > >> >> change >> > >> >> >>> and code complexity additions. >> > >> >> >>> >> > >> >> >>> (to be clear - kip-98 goes beyond "mere" wire format changes >> and >> > im >> > >> not >> > >> >> >>> saying it could have been completely done with headers, but >> > >> >> exactly-once >> > >> >> >>> delivery certainly could) >> > >> >> >>> >> > >> >> >>> On Thu, Dec 1, 2016 at 11:20 AM, Gwen Shapira < >> g...@confluent.io >> > > >> > >> >> wrote: >> > >> >> >>> >> > >> >> >>> > On Thu, Dec 1, 2016 at 10:24 AM, radai < >> > >> radai.rosenbl...@gmail.com> >> > >> >> >>> wrote: >> > >> >> >>> > > "For use cases within an organization, one could always use >> > >> other >> > >> >> >>> > > approaches such as company-wise containers" >> > >> >> >>> > > this is what linkedin has traditionally done but there are >> > now >> > >> >> cases >> > >> >> >>> > (read >> > >> >> >>> > > - topics) where this is not acceptable. this makes headers >> > >> useful >> > >> >> even >> > >> >> >>> > > within single orgs for cases where one-container-fits-all >> > cannot >> > >> >> >>> apply. >> > >> >> >>> > > >> > >> >> >>> > > as for the particular use cases listed, i dont want this to >> > >> devolve >> > >> >> >>> to a >> > >> >> >>> > > discussion of particular use cases - i think its enough >> that >> > >> some >> > >> >> of >> > >> >> >>> them >> > >> >> >>> > >> > >> >> >>> > I think a main point of contention is that: We identified few >> > >> >> >>> > use-cases where headers are useful, do we want Kafka to be a >> > >> system >> > >> >> >>> > that supports those use-cases? >> > >> >> >>> > >> > >> >> >>> > For example, Jun said: >> > >> >> >>> > "Not sure how widely useful record-level lineage is though >> > since >> > >> the >> > >> >> >>> > overhead could >> > >> >> >>> > be significant." >> > >> >> >>> > >> > >> >> >>> > We know NiFi supports record level lineage. I don't think it >> > was >> > >> >> >>> > developed for lols, I think it is safe to assume that the NSA >> > >> needed >> > >> >> >>> > that functionality. We also know that certain financial >> > institutes >> > >> >> >>> > need to track tampering with records at a record level and >> > there >> > >> are >> > >> >> >>> > federal regulations that absolutely require this. They also >> > need >> > >> to >> > >> >> >>> > prove that routing apps that "touches" the messages and >> either >> > >> reads >> > >> >> >>> > or updates headers couldn't have possibly modified the >> payload >> > >> >> itself. >> > >> >> >>> > They use record level encryption to do that - apps can read >> and >> > >> >> >>> > (sometimes) modify headers but can't touch the payload. >> > >> >> >>> > >> > >> >> >>> > We can totally say "those are corner cases and not worth >> adding >> > >> >> >>> > headers to Kafka for", they should use a different pubsub >> > message >> > >> for >> > >> >> >>> > that (Nifi or one of the other 1000 that cater specifically >> to >> > the >> > >> >> >>> > financial industry). >> > >> >> >>> > >> > >> >> >>> > But this gets us into a catch 22: >> > >> >> >>> > If we discuss a specific use-case, someone can always say it >> > isn't >> > >> >> >>> > interesting enough for Kafka. If we discuss more general >> > trends, >> > >> >> >>> > others can say "well, we are not sure any of them really >> needs >> > >> >> headers >> > >> >> >>> > specifically. This is just hand waving and not interesting.". >> > >> >> >>> > >> > >> >> >>> > I think discussing use-cases in specifics is super important >> to >> > >> >> decide >> > >> >> >>> > implementation details for headers (my use-cases lean toward >> > >> >> numerical >> > >> >> >>> > keys with namespaces and object values, others differ), but I >> > >> think >> > >> >> we >> > >> >> >>> > need to answer the general "Are we going to have headers" >> > question >> > >> >> >>> > first. >> > >> >> >>> > >> > >> >> >>> > I'd love to hear from the other committers in the discussion: >> > >> >> >>> > What would it take to convince you that headers in Kafka are >> a >> > >> good >> > >> >> >>> > idea in general, so we can move ahead and try to agree on the >> > >> >> details? >> > >> >> >>> > >> > >> >> >>> > I feel like we keep moving the goal posts and this is truly >> > >> >> exhausting. >> > >> >> >>> > >> > >> >> >>> > For the record, I mildly support adding headers to Kafka >> > (+0.5?). >> > >> >> >>> > The community can continue to find workarounds to the issue >> and >> > >> there >> > >> >> >>> > are some benefits to keeping the message format and clients >> > >> simpler. >> > >> >> >>> > But I see the usefulness of headers to many use-cases and if >> we >> > >> can >> > >> >> >>> > find a good and generally useful way to add it to Kafka, it >> > will >> > >> make >> > >> >> >>> > Kafka easier to use for many - worthy goal in my eyes. >> > >> >> >>> > >> > >> >> >>> > > are interesting/feasible, but: >> > >> >> >>> > > A+B. i think there are use cases for polyglot topics. >> > >> especially if >> > >> >> >>> kafka >> > >> >> >>> > > is being used to "trunk" something else. >> > >> >> >>> > > D. multiple topics would make it harder to write portable >> > >> consumer >> > >> >> >>> code. >> > >> >> >>> > > partition remapping would mess with locality of consumption >> > >> >> >>> guarantees. >> > >> >> >>> > > E+F. a use case I see for lineage/metadata is >> > >> billing/chargeback. >> > >> >> for >> > >> >> >>> > that >> > >> >> >>> > > use case it is not enough to simply record the point of >> > origin, >> > >> but >> > >> >> >>> every >> > >> >> >>> > > replication stop (think mirror maker) must also add a >> record >> > to >> > >> >> form a >> > >> >> >>> > > "transit log". >> > >> >> >>> > > >> > >> >> >>> > > as for stream processing on top of kafka - i know samza >> has a >> > >> >> metadata >> > >> >> >>> > map >> > >> >> >>> > > which they carry around in addition to user values. headers >> > are >> > >> the >> > >> >> >>> > perfect >> > >> >> >>> > > fit for these things. >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > > >> > >> >> >>> > > On Wed, Nov 30, 2016 at 6:50 PM, Jun Rao <j...@confluent.io >> > >> > >> wrote: >> > >> >> >>> > > >> > >> >> >>> > >> Hi, Michael, >> > >> >> >>> > >> >> > >> >> >>> > >> In order to answer the first two questions, it would be >> > helpful >> > >> >> if we >> > >> >> >>> > could >> > >> >> >>> > >> identify 1 or 2 strong use cases for headers in the space >> > for >> > >> >> >>> > third-party >> > >> >> >>> > >> vendors. For use cases within an organization, one could >> > always >> > >> >> use >> > >> >> >>> > other >> > >> >> >>> > >> approaches such as company-wise containers to get around >> w/o >> > >> >> >>> headers. I >> > >> >> >>> > >> went through the use cases in the KIP and in Radai's wiki >> ( >> > >> >> >>> > >> https://cwiki.apache.org/confluence/display/KAFKA/A+ >> > >> >> >>> > Case+for+Kafka+Headers >> > >> >> >>> > >> ). >> > >> >> >>> > >> The following are the ones that that I understand and >> could >> > be >> > >> in >> > >> >> the >> > >> >> >>> > >> third-party use case category. >> > >> >> >>> > >> >> > >> >> >>> > >> A. content-type >> > >> >> >>> > >> It seems that in general, content-type should be set at >> the >> > >> topic >> > >> >> >>> level. >> > >> >> >>> > >> Not sure if mixing messages with different content types >> > >> should be >> > >> >> >>> > >> encouraged. >> > >> >> >>> > >> >> > >> >> >>> > >> B. schema id >> > >> >> >>> > >> Since the value is mostly useless without schema id, it >> > seems >> > >> that >> > >> >> >>> > storing >> > >> >> >>> > >> the schema id together with serialized bytes in the value >> is >> > >> >> better? >> > >> >> >>> > >> >> > >> >> >>> > >> C. per message encryption >> > >> >> >>> > >> One drawback of this approach is that this significantly >> > reduce >> > >> >> the >> > >> >> >>> > >> effectiveness of compression, which happens on a set of >> > >> serialized >> > >> >> >>> > >> messages. An alternative is to enable SSL for wire >> > encryption >> > >> and >> > >> >> >>> rely >> > >> >> >>> > on >> > >> >> >>> > >> the storage system (e.g. LUKS) for at rest encryption. >> > >> >> >>> > >> >> > >> >> >>> > >> D. cluster ID for mirroring across Kafka clusters >> > >> >> >>> > >> This is actually interesting. Today, to avoid introducing >> > >> cycles >> > >> >> when >> > >> >> >>> > doing >> > >> >> >>> > >> mirroring across data centers, one would either have to >> set >> > up >> > >> two >> > >> >> >>> Kafka >> > >> >> >>> > >> clusters (a local and an aggregate) per data center or >> > rename >> > >> >> topics. >> > >> >> >>> > >> Neither is ideal. With headers, the producer could tag >> each >> > >> >> message >> > >> >> >>> with >> > >> >> >>> > >> the producing cluster ID in the header. MirrorMaker could >> > then >> > >> >> avoid >> > >> >> >>> > >> mirroring messages to a cluster if they are tagged with >> the >> > >> same >> > >> >> >>> cluster >> > >> >> >>> > >> id. >> > >> >> >>> > >> >> > >> >> >>> > >> However, an alternative approach is to introduce sth like >> > >> >> >>> hierarchical >> > >> >> >>> > >> topic and store messages from different clusters in >> > different >> > >> >> >>> partitions >> > >> >> >>> > >> under the same topic. This approach avoids filtering out >> > >> unneeded >> > >> >> >>> data >> > >> >> >>> > and >> > >> >> >>> > >> makes offset preserving easier to support. It may make >> > >> compaction >> > >> >> >>> > trickier >> > >> >> >>> > >> though since the same key may show up in different >> > partitions. >> > >> >> >>> > >> >> > >> >> >>> > >> E. record-level lineage >> > >> >> >>> > >> For example, a source connector could store in the message >> > the >> > >> >> >>> metadata >> > >> >> >>> > >> (e.g. UUID) of the source record. Similarly, if a stream >> job >> > >> >> >>> transforms >> > >> >> >>> > >> messages from topic A to topic B, the library could >> include >> > the >> > >> >> >>> source >> > >> >> >>> > >> message offset in each of the transformed message in the >> > >> header. >> > >> >> Not >> > >> >> >>> > sure >> > >> >> >>> > >> how widely useful record-level lineage is though since the >> > >> >> overhead >> > >> >> >>> > could >> > >> >> >>> > >> be significant. >> > >> >> >>> > >> >> > >> >> >>> > >> F. auditing metadata >> > >> >> >>> > >> We could put things like clientId/host/user in the header >> in >> > >> each >> > >> >> >>> > message >> > >> >> >>> > >> for auditing. These metadata are really at the producer >> > level >> > >> >> though. >> > >> >> >>> > So, a >> > >> >> >>> > >> more efficient way is to only include a "producerId" per >> > >> message >> > >> >> and >> > >> >> >>> > send >> > >> >> >>> > >> the producerId -> metadata mapping independently. KIP-98 >> is >> > >> >> actually >> > >> >> >>> > >> proposing including such a producerId natively in the >> > message. >> > >> >> >>> > >> >> > >> >> >>> > >> So, overall, I not sure that I am fully convinced of the >> > strong >> > >> >> >>> > third-party >> > >> >> >>> > >> use cases of headers yet. Perhaps we could discuss a bit >> > more >> > >> to >> > >> >> make >> > >> >> >>> > one >> > >> >> >>> > >> or two really convincing use cases. >> > >> >> >>> > >> >> > >> >> >>> > >> Another orthogonal question is whether header should be >> > >> exposed >> > >> >> in >> > >> >> >>> > stream >> > >> >> >>> > >> processing systems such Kafka stream, Samza, and Spark >> > >> streaming. >> > >> >> >>> > >> Currently, those systems just deal with key/value pairs. >> > >> Should we >> > >> >> >>> > expose a >> > >> >> >>> > >> third thing header there too or somehow map header to key >> or >> > >> >> value? >> > >> >> >>> > >> >> > >> >> >>> > >> Thanks, >> > >> >> >>> > >> >> > >> >> >>> > >> Jun >> > >> >> >>> > >> >> > >> >> >>> > >> >> > >> >> >>> > >> On Tue, Nov 29, 2016 at 3:35 AM, Michael Pearce < >> > >> >> >>> michael.pea...@ig.com> >> > >> >> >>> > >> wrote: >> > >> >> >>> > >> >> > >> >> >>> > >> > I assume, that after a period of a week, that there is >> no >> > >> >> concerns >> > >> >> >>> now >> > >> >> >>> > >> > with points 1, and 2 and now we have agreement that >> > headers >> > >> are >> > >> >> >>> useful >> > >> >> >>> > >> and >> > >> >> >>> > >> > needed in Kafka. As such if put to a KIP vote, this >> > wouldn’t >> > >> be >> > >> >> a >> > >> >> >>> > reason >> > >> >> >>> > >> to >> > >> >> >>> > >> > reject. >> > >> >> >>> > >> > >> > >> >> >>> > >> > @ >> > >> >> >>> > >> > Ignacio on point 4). >> > >> >> >>> > >> > I think for purpose of getting this KIP moving past >> this, >> > we >> > >> can >> > >> >> >>> state >> > >> >> >>> > >> the >> > >> >> >>> > >> > key will be a 4 bytes space that can will be naturally >> > >> >> interpreted >> > >> >> >>> as >> > >> >> >>> > an >> > >> >> >>> > >> > Int32 (if namespacing is later wanted you can easily >> split >> > >> this >> > >> >> >>> into >> > >> >> >>> > two >> > >> >> >>> > >> > int16 spaces), from the wire protocol implementation >> this >> > >> makes >> > >> >> no >> > >> >> >>> > >> > difference I don’t believe. Is this reasonable to all? >> > >> >> >>> > >> > >> > >> >> >>> > >> > On 5) as per point 4 therefor happy we keep with 32 >> bits. >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> > On 18/11/2016, 20:34, "ignacio.so...@gmail.com on >> behalf >> > of >> > >> >> >>> Ignacio >> > >> >> >>> > >> > Solis" <ignacio.so...@gmail.com on behalf of >> > iso...@igso.net >> > >> > >> > >> >> >>> wrote: >> > >> >> >>> > >> > >> > >> >> >>> > >> > Summary: >> > >> >> >>> > >> > >> > >> >> >>> > >> > 3) Yes - Header value as byte[] >> > >> >> >>> > >> > >> > >> >> >>> > >> > 4a) Int,Int - No >> > >> >> >>> > >> > 4b) Int - Yes >> > >> >> >>> > >> > 4c) String - Reluctant maybe >> > >> >> >>> > >> > >> > >> >> >>> > >> > 5) I believe the header system should take a single >> > >> int. I >> > >> >> >>> think >> > >> >> >>> > >> > 32bits is >> > >> >> >>> > >> > a good size, if you want to interpret this as to >> 16bit >> > >> >> numbers >> > >> >> >>> in >> > >> >> >>> > the >> > >> >> >>> > >> > layer >> > >> >> >>> > >> > above go right ahead. If somebody wants to argue >> for >> > 16 >> > >> >> bits >> > >> >> >>> or >> > >> >> >>> > 64 >> > >> >> >>> > >> > bits of >> > >> >> >>> > >> > header key space I would listen. >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> > Discussion: >> > >> >> >>> > >> > Dividing the key space into sub_key_1 and sub_key_2 >> > >> makes no >> > >> >> >>> > sense to >> > >> >> >>> > >> > me at >> > >> >> >>> > >> > this layer. Are we going to start providing APIs to >> > get >> > >> all >> > >> >> >>> the >> > >> >> >>> > >> > sub_key_1s? or all the sub_key_2s? If there is no >> > >> >> >>> distinguishing >> > >> >> >>> > >> > functions >> > >> >> >>> > >> > that are applied to each one then they should be a >> > single >> > >> >> >>> value. >> > >> >> >>> > At >> > >> >> >>> > >> > this >> > >> >> >>> > >> > layer all we're doing is equality. >> > >> >> >>> > >> > If the above layer wants to interpret this as 2, 3 >> or >> > >> more >> > >> >> >>> values >> > >> >> >>> > >> > that's a >> > >> >> >>> > >> > different question. I personally think it's all one >> > >> >> keyspace >> > >> >> >>> > that is >> > >> >> >>> > >> > getting assigned using some structure, but if you >> > want to >> > >> >> >>> > sub-assign >> > >> >> >>> > >> > parts >> > >> >> >>> > >> > of it then that's fine. >> > >> >> >>> > >> > >> > >> >> >>> > >> > The same discussion applies to strings. If somebody >> > >> argued >> > >> >> for >> > >> >> >>> > >> > strings, >> > >> >> >>> > >> > would we be arguing to divide the strings with dots >> > ('.') >> > >> >> as a >> > >> >> >>> > >> > requirement? >> > >> >> >>> > >> > Would we want them to give us the different name >> > segments >> > >> >> >>> > separately? >> > >> >> >>> > >> > Would we be performing any actions on this key other >> > than >> > >> >> >>> > matching? >> > >> >> >>> > >> > >> > >> >> >>> > >> > Nacho >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> > >> > >> >> >>> > >> > On Fri, Nov 18, 2016 at 9:30 AM, Michael Pearce < >> > >> >> >>> > >> michael.pea...@ig.com >> > >> >> >>> > >> > > >> > >> >> >>> > >> > wrote: >> > >> >> >>> > >> > >> > >> >> >>> > >> > > #jay #jun any concerns on 1 and 2 still? >> > >> >> >>> > >> > > >> > >> >> >>> > >> > > @all >> > >> >> >>> > >> > > To get this moving along a bit more I'd also like >> to >> > >> ask >> > >> >> to >> > >> >> >>> get >> > >> >> >>> > >> > clarity on >> > >> >> >>> > >> > > the below last points: >> > >> >> >>> > >> > > >> > >> >> >>> > >> > > 3) I believe we're all roughly happy with the >> header >> > >> value >> > >> >> >>> > being a >> > >> >> >>> > >> > byte[]? >> > >> >> >>> > >> > > >> > >> >> >>> > >> > > 4) I believe consensus has been for an namespace >> > based >> > >> int >> > >> >> >>> > approach >> > >> >> >>> > >> > > {int,int} for the key. Any objections if this is >> > what >> > >> we >> > >> >> go >> > >> >> >>> > with? >> > >> >> >>> > >> > > >> > >> >> >>> > >> > > 5) as we have if assumption in (4) is correct, >> > >> {int,int} >> > >> >> >>> keys. >> > >> >> >>> > >> > > Should both int's be int16 or int32? >> > >> >> >>> > >> > > I'm for them being int16(2 bytes) as combined is >> > space >> > >> of >> > >> >> >>> > 4bytes as >> > >> >> >>> > >> > per >> > >> >> >>> > >> > > original and gives plenty of combinations for the >> > >> >> >>> foreseeable, >> > >> >> >>> > and >> > >> >> >>> > >> > keeps >> > >> >> >>> > >> > > the overhead small. >> > >> >> >>> > >> > > >> > >> >> >>> > >> > > Do we see any benefit in another kip call to >> discuss >> > >> >> these at >> > >> >> >>> > all? >> > >> >> >>> > >> > > >> > >> >> >>> > >> > > Cheers >> > >> >> >>> > >> > > Mike >> > >> >> >>> > >> > > ________________________________________ >> > >> >> >>> > >> > > From: K Burstev <k.burs...@yandex.com> >> > >> >> >>> > >> > > Sent: Friday, November 18, 2016 7:07:07 AM >> > >> >> >>> > >> > > To: dev@kafka.apache.org >> > >> >> >>> > >> > > Subject: Re: [DISCUSS] KIP-82 - Add Record Headers >> > >> >> >>> > >> > > >> > >> >> >>> > >> > > For what it is worth also i agree. As a user: >> > >> >> >>> > >> > > >> > >> >> >>> > >> > > 1) Yes - Headers are worthwhile >> > >> >> >>> > >> > > 2) Yes - Headers should be a top level option >> > >> >> >>> > >> > > >> > >> >> >>> > >> > > 14.11.2016, 21:15, "Ignacio Solis" < >> iso...@igso.net >> > >: >> > >> >> >>> > >> > > > 1) Yes - Headers are worthwhile >> > >> >> >>> > >> > > > 2) Yes - Headers should be a top level option >> > >> >> >>> > >> > > > >> > >> >> >>> > >> > > > On Mon, Nov 14, 2016 at 9:16 AM, Michael Pearce >> < >> > >> >> >>> > >> > michael.pea...@ig.com> >> > >> >> >>> > >> > > > wrote: >> > >> >> >>> > >> > > > >> > >> >> >>> > >> > > >> Hi Roger, >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > > >> The kip details/examples the original proposal >> > for >> > >> key >> > >> >> >>> > spacing >> > >> >> >>> > >> , >> > >> >> >>> > >> > not >> > >> >> >>> > >> > > the >> > >> >> >>> > >> > > >> new mentioned as per discussion namespace >> idea. >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > > >> We will need to update the kip, when we get >> > >> agreement >> > >> >> >>> this >> > >> >> >>> > is a >> > >> >> >>> > >> > better >> > >> >> >>> > >> > > >> approach (which seems to be the case if I have >> > >> >> understood >> > >> >> >>> > the >> > >> >> >>> > >> > general >> > >> >> >>> > >> > > >> feeling in the conversation) >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > > >> Re the variable ints, at very early stage we >> did >> > >> think >> > >> >> >>> about >> > >> >> >>> > >> > this. I >> > >> >> >>> > >> > > think >> > >> >> >>> > >> > > >> the added complexity for the saving isn't >> worth >> > it. >> > >> >> I'd >> > >> >> >>> > rather >> > >> >> >>> > >> go >> > >> >> >>> > >> > > with, if >> > >> >> >>> > >> > > >> we want to reduce overheads and size int16 >> > (2bytes) >> > >> >> keys >> > >> >> >>> as >> > >> >> >>> > it >> > >> >> >>> > >> > keeps it >> > >> >> >>> > >> > > >> simple. >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > > >> On the note of no headers, there is as per the >> > kip >> > >> as >> > >> >> we >> > >> >> >>> > use an >> > >> >> >>> > >> > > attribute >> > >> >> >>> > >> > > >> bit to denote if headers are present or not as >> > such >> > >> >> >>> > provides a >> > >> >> >>> > >> > zero >> > >> >> >>> > >> > > >> overhead currently if headers are not used. >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > > >> I think as radai mentions would be good first >> > if we >> > >> >> can >> > >> >> >>> get >> > >> >> >>> > >> > clarity if >> > >> >> >>> > >> > > do >> > >> >> >>> > >> > > >> we now have general consensus that (1) headers >> > are >> > >> >> >>> > worthwhile >> > >> >> >>> > >> and >> > >> >> >>> > >> > > useful, >> > >> >> >>> > >> > > >> and (2) we want it as a top level entity. >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > > >> Just to state the obvious i believe (1) >> headers >> > are >> > >> >> >>> > worthwhile >> > >> >> >>> > >> > and (2) >> > >> >> >>> > >> > > >> agree as a top level entity. >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > > >> Cheers >> > >> >> >>> > >> > > >> Mike >> > >> >> >>> > >> > > >> ________________________________________ >> > >> >> >>> > >> > > >> From: Roger Hoover <roger.hoo...@gmail.com> >> > >> >> >>> > >> > > >> Sent: Wednesday, November 9, 2016 9:10:47 PM >> > >> >> >>> > >> > > >> To: dev@kafka.apache.org >> > >> >> >>> > >> > > >> Subject: Re: [DISCUSS] KIP-82 - Add Record >> > Headers >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > > >> Sorry for going a little in the weeds but >> thanks >> > >> for >> > >> >> the >> > >> >> >>> > >> replies >> > >> >> >>> > >> > > regarding >> > >> >> >>> > >> > > >> varint. >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > > >> Agreed that a prefix and {int, int} can be the >> > >> same. >> > >> >> It >> > >> >> >>> > doesn't >> > >> >> >>> > >> > look >> > >> >> >>> > >> > > like >> > >> >> >>> > >> > > >> that's what the KIP is saying the "Open" >> > section. >> > >> The >> > >> >> >>> > example >> > >> >> >>> > >> > shows >> > >> >> >>> > >> > > >> 2100001 >> > >> >> >>> > >> > > >> for New Relic and 210002 for App Dynamics >> > implying >> > >> >> that >> > >> >> >>> the >> > >> >> >>> > New >> > >> >> >>> > >> > Relic >> > >> >> >>> > >> > > >> organization will have only a single header id >> > to >> > >> work >> > >> >> >>> > with. Or >> > >> >> >>> > >> > is >> > >> >> >>> > >> > > 2100001 >> > >> >> >>> > >> > > >> a prefix? The main point of a namespace or >> > prefix >> > >> is >> > >> >> to >> > >> >> >>> > reduce >> > >> >> >>> > >> > the >> > >> >> >>> > >> > > >> overhead of config mapping or registration >> > >> depending >> > >> >> on >> > >> >> >>> how >> > >> >> >>> > >> > > >> namespaces/prefixes are managed. >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > > >> Would love to hear more feedback on the >> > >> higher-level >> > >> >> >>> > questions >> > >> >> >>> > >> > > though... >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > > >> Cheers, >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > > >> Roger >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > > >> On Wed, Nov 9, 2016 at 11:38 AM, radai < >> > >> >> >>> > >> > radai.rosenbl...@gmail.com> >> > >> >> >>> > >> > > wrote: >> > >> >> >>> > >> > > >> >> > >> >> >>> > >> > > >> > I think this discussion is getting a bit >> into >> > the >> > >> >> >>> weeds on >> > >> >> >>> > >> > technical >> > >> >> >>> > >> > > >> > implementation details. >> > >> >> >>> > >> > > >> > I'd liek to step back a minute and try and >> > >> establish >> > >> >> >>> > where we >> > >> >> >>> > >> > are in >> > >> >> >>> > >> > > the >> > >> >> >>> > >> > > >> > larger picture: >> > >> >> >>> > >> > > >> > >> > >> >> >>> > >> > > >> > (re-wording nacho's last paragraph) >> > >> >> >>> > >> > > >> > 1. are we all in agreement that headers are >> a >> > >> >> >>> worthwhile >> > >> >> >>> > and >> > >> >> >>> > >> > useful >> > >> >> >>> > >> > > >> > addition to have? this was contested early >> on >> > >> >> >>> > >> > > >> > 2. are we all in agreement on headers as top >> > >> level >> > >> >> >>> entity >> > >> >> >>> > vs >> > >> >> >>> > >> > headers >> > >> >> >>> > >> > > >> > squirreled-away in V? >> > >> >> >>> > >> > > >> > >> > >> >> >>> > >> > > >> > if there are still concerns around these #2 >> > >> points >> > >> >> >>> (#jay? >> > >> >> >>> > >> > #jun?)? >> > >> >> >>> > >> > > >> > >> > >> >> >>> > >> > > >> > (and now back to our normal programming ...) >> > >> >> >>> > >> > > >> > >> > >> >> >>> > >> > > >> > varints are nice. having said that, its >> adding >> > >> >> >>> complexity >> > >> >> >>> > >> (see >> > >> >> >>> > >> > > >> > https://github.com/addthis/ >> > >> >> stream-lib/blob/master/src/ >> > >> >> >>> > >> > > >> > main/java/com/clearspring/ >> > >> >> analytics/util/Varint.java >> > >> >> >>> > >> > > >> > as 1st google result) and would require >> anyone >> > >> >> writing >> > >> >> >>> > other >> > >> >> >>> > >> > clients >> > >> >> >>> > >> > > (C? >> > >> >> >>> > >> > > >> > Python? Go? Bash? ;-) ) to get/implement the >> > >> same, >> > >> >> and >> > >> >> >>> for >> > >> >> >>> > >> > relatively >> > >> >> >>> > >> > > >> > little gain (int vs string is order of >> > magnitude, >> > >> >> this >> > >> >> >>> > isnt). >> > >> >> >>> > >> > > >> > >> > >> >> >>> > >> > > >> > int namespacing vs {int, int} namespacing >> are >> > >> >> basically >> > >> >> >>> > the >> > >> >> >>> > >> > same >> > >> >> >>> > >> > > thing - >> > >> >> >>> > >> > > >> > youre just namespacing an int64 and giving >> > people >> > >> >> while >> > >> >> >>> > 2^32 >> > >> >> >>> > >> > ranges >> > >> >> >>> > >> > > at a >> > >> >> >>> > >> > > >> > time. the part i like about this is letting >> > >> people >> > >> >> >>> have a >> > >> >> >>> > >> large >> > >> >> >>> > >> > > swath of >> > >> >> >>> > >> > > >> > numbers with one registration so they dont >> > have >> > >> to >> > >> >> come >> > >> >> >>> > back >> > >> >> >>> > >> > for >> > >> >> >>> > >> > > every >> > >> >> >>> > >> > > >> > single plugin/header they want to "reserve". >> > >> >> >>> > >> > > >> > >> > >> >> >>> > >> > > >> > >> > >> >> >>> > >> > > >> > On Wed, Nov 9, 2016 at 11:01 AM, Roger >> Hoover >> > < >> > >> >> >>> > >> > > roger.hoo...@gmail.com> >> > >> >> >>> > >> > > >> > wrote: >> > >> >> >>> > >> > > >> > >> > >> >> >>> > >> > > >> > > Since some of the debate has been about >> > >> overhead + >> > >> >> >>> > >> > performance, I'm >> > >> >> >>> > >> > > >> > > wondering if we have considered a varint >> > >> encoding >> > >> >> ( >> > >> >> >>> > >> > > >> > > https://developers.google.com/ >> > >> >> protocol-buffers/docs/ >> > >> >> >>> > >> > > encoding#varints) >> > >> >> >>> > >> > > >> > for >> > >> >> >>> > >> > > >> > > the header length field (int32 in the >> > proposal) >> > >> >> and >> > >> >> >>> for >> > >> >> >>> > >> > header >> > >> >> >>> > >> > > ids? If >> > >> >> >>> > >> > > >> > you >> > >> >> >>> > >> > > >> > > don't use headers, the overhead would be a >> > >> single >> > >> >> >>> byte >> > >> >> >>> > and >> > >> >> >>> > >> > for each >> > >> >> >>> > >> > > >> > header >> > >> >> >>> > >> > > >> > > id < 128 would also need only a single >> byte? >> > >> >> >>> > >> > > >> > > >> > >> >> >>> > >> > > >> > > >> > >> >> >>> > >> > > >> > > >> > >> >> >>> > >> > > >> > > On Wed, Nov 9, 2016 at 6:43 AM, radai < >> > >> >> >>> > >> > radai.rosenbl...@gmail.com> >> > >> >> >>> > >> > > >> > wrote: >> > >> >> >>> > >> > > >> > > >> > >> >> >>> > >> > > >> > > > @magnus - and very dangerous (youre >> > >> essentially >> > >> >> >>> > >> > downloading and >> > >> >> >>> > >> > > >> > executing >> > >> >> >>> > >> > > >> > > > arbitrary code off the internet on your >> > >> servers >> > >> >> ... >> > >> >> >>> > bad >> > >> >> >>> > >> > idea >> > >> >> >>> > >> > > without >> > >> >> >>> > >> > > >> a >> > >> >> >>> > >> > > >> > > > sandbox, even with) >> > >> >> >>> > >> > > >> > > > >> > >> >> >>> > >> > > >> > > > as for it being a purely administrative >> > task >> > >> - i >> > >> >> >>> > >> disagree. >> > >> >> >>> > >> > > >> > > > >> > >> >> >>> > >> > > >> > > > i wish it would, really, because then my >> > >> earlier >> > >> >> >>> > point on >> > >> >> >>> > >> > the >> > >> >> >>> > >> > > >> > complexity >> > >> >> >>> > >> > > >> > > of >> > >> >> >>> > >> > > >> > > > the remapping process would be invalid, >> > but >> > >> at >> > >> >> >>> > linkedin, >> > >> >> >>> > >> > for >> > >> >> >>> > >> > > example, >> > >> >> >>> > >> > > >> > we >> > >> >> >>> > >> > > >> > > > (the team im in) run kafka as a service. >> > we >> > >> dont >> > >> >> >>> > really >> > >> >> >>> > >> > know >> > >> >> >>> > >> > > what our >> > >> >> >>> > >> > > >> > > users >> > >> >> >>> > >> > > >> > > > (developing applications that use kafka) >> > are >> > >> up >> > >> >> to >> > >> >> >>> at >> > >> >> >>> > any >> > >> >> >>> > >> > given >> > >> >> >>> > >> > > >> moment. >> > >> >> >>> > >> > > >> > > it >> > >> >> >>> > >> > > >> > > > is very possible (given the existance of >> > >> headers >> > >> >> >>> and a >> > >> >> >>> > >> > > corresponding >> > >> >> >>> > >> > > >> > > plugin >> > >> >> >>> > >> > > >> > > > ecosystem) for some application to >> "equip" >> > >> their >> > >> >> >>> > >> producers >> > >> >> >>> > >> > and >> > >> >> >>> > >> > > >> > consumers >> > >> >> >>> > >> > > >> > > > with the required plugin without us >> > knowing. >> > >> i >> > >> >> dont >> > >> >> >>> > mean >> > >> >> >>> > >> > to imply >> > >> >> >>> > >> > > >> thats >> > >> >> >>> > >> > > >> > > > bad, i just want to make the point that >> > its >> > >> not >> > >> >> as >> > >> >> >>> > simple >> > >> >> >>> > >> > > keeping it >> > >> >> >>> > >> > > >> in >> > >> >> >>> > >> > > >> > > > sync across a large-enough organization. >> > >> >> >>> > >> > > >> > > > >> > >> >> >>> > >> > > >> > > > >> > >> >> >>> > >> > > >> > > > On Wed, Nov 9, 2016 at 6:17 AM, Magnus >> > >> Edenhill >> > >> >> < >> > >> >> >>> > >> > > mag...@edenhill.se> >> > >> >> >>> > >> > > >> > > > wrote: >> > >> >> >>> > >> > > >> > > > >> > >> >> >>> > >> > > >> > > > > I think there is a piece missing in >> the >> > >> >> Strings >> > >> >> >>> > >> > discussion, >> > >> >> >>> > >> > > where >> > >> >> >>> > >> > > >> > > > > pro-Stringers >> > >> >> >>> > >> > > >> > > > > reason that by providing unique string >> > >> >> >>> identifiers >> > >> >> >>> > for >> > >> >> >>> > >> > each >> > >> >> >>> > >> > > header >> > >> >> >>> > >> > > >> > > > > everything will just >> > >> >> >>> > >> > > >> > > > > magically work for all parts of the >> > stream >> > >> >> >>> pipeline. >> > >> >> >>> > >> > > >> > > > > >> > >> >> >>> > >> > > >> > > > > But the strings dont mean anything by >> > >> >> themselves, >> > >> >> >>> > and >> > >> >> >>> > >> > while we >> > >> >> >>> > >> > > >> could >> > >> >> >>> > >> > > >> > > > > probably envision >> > >> >> >>> > >> > > >> > > > > some auto plugin loader that >> downloads, >> > >> >> compiles, >> > >> >> >>> > links >> > >> >> >>> > >> > and >> > >> >> >>> > >> > > runs >> > >> >> >>> > >> > > >> > > plugins >> > >> >> >>> > >> > > >> > > > > on-demand >> > >> >> >>> > >> > > >> > > > > as soon as they're seen by a >> consumer, I >> > >> dont >> > >> >> >>> really >> > >> >> >>> > >> see >> > >> >> >>> > >> > a >> > >> >> >>> > >> > > use-case >> > >> >> >>> > >> > > >> > for >> > >> >> >>> > >> > > >> > > > > something >> > >> >> >>> > >> > > >> > > > > so dynamic (and fragile) in practice. >> > >> >> >>> > >> > > >> > > > > >> > >> >> >>> > >> > > >> > > > > In the real world an application will >> be >> > >> >> >>> configured >> > >> >> >>> > >> with >> > >> >> >>> > >> > a set >> > >> >> >>> > >> > > of >> > >> >> >>> > >> > > >> > > plugins >> > >> >> >>> > >> > > >> > > > > to either add (producer) >> > >> >> >>> > >> > > >> > > > > or read (consumer) headers. >> > >> >> >>> > >> > > >> > > > > This is an administrative task based >> on >> > >> what >> > >> >> >>> > features a >> > >> >> >>> > >> > client >> > >> >> >>> > >> > > >> > > > > needs/provides and results in >> > >> >> >>> > >> > > >> > > > > some sort of configuration to enable >> and >> > >> >> >>> configure >> > >> >> >>> > the >> > >> >> >>> > >> > desired >> > >> >> >>> > >> > > >> > plugins. >> > >> >> >>> > >> > > >> > > > > >> > >> >> >>> > >> > > >> > > > > Since this needs to be kept somewhat >> in >> > >> sync >> > >> >> >>> across >> > >> >> >>> > an >> > >> >> >>> > >> > > organisation >> > >> >> >>> > >> > > >> > > > (there >> > >> >> >>> > >> > > >> > > > > is no point in having producers >> > >> >> >>> > >> > > >> > > > > add headers no consumers will read, >> and >> > >> vice >> > >> >> >>> versa), >> > >> >> >>> > >> the >> > >> >> >>> > >> > added >> > >> >> >>> > >> > > >> > > complexity >> > >> >> >>> > >> > > >> > > > > of assigning an id namespace >> > >> >> >>> > >> > > >> > > > > for each plugin as it is being >> > configured >> > >> >> should >> > >> >> >>> be >> > >> >> >>> > >> > tolerable. >> > >> >> >>> > >> > > >> > > > > >> > >> >> >>> > >> > > >> > > > > >> > >> >> >>> > >> > > >> > > > > /Magnus >> > >> >> >>> > >> > > >> > > > > >> > >> >> >>> > >> > > >> > > > > 2016-11-09 13:06 GMT+01:00 Michael >> > Pearce < >> > >> >> >>> > >> > > michael.pea...@ig.com>: >> > >> >> >>> > >> > > >> > > > > >> > >> >> >>> > >> > > >> > > > > > Just following/catching up on what >> > seems >> > >> to >> > >> >> be >> > >> >> >>> an >> > >> >> >>> > >> > active >> > >> >> >>> > >> > > night :) >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> > > > > > @Radai sorry if it may seem obvious >> > but >> > >> what >> > >> >> >>> does >> > >> >> >>> > MD >> > >> >> >>> > >> > stand >> > >> >> >>> > >> > > for? >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> > > > > > My take on String vs Int: >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> > > > > > I will state first I am pro Int (16 >> or >> > >> 32). >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> > > > > > I do though playing devils advocate >> > see a >> > >> >> big >> > >> >> >>> plus >> > >> >> >>> > >> > with the >> > >> >> >>> > >> > > >> > argument >> > >> >> >>> > >> > > >> > > of >> > >> >> >>> > >> > > >> > > > > > String keys, this is around >> > integrating >> > >> >> into an >> > >> >> >>> > >> > existing >> > >> >> >>> > >> > > >> > eco-system. >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> > > > > > As many other systems use String >> based >> > >> >> headers >> > >> >> >>> > >> (Flume, >> > >> >> >>> > >> > JMS) >> > >> >> >>> > >> > > it >> > >> >> >>> > >> > > >> > makes >> > >> >> >>> > >> > > >> > > > it >> > >> >> >>> > >> > > >> > > > > > much easier for these to be >> > >> >> >>> > incorporated/integrated >> > >> >> >>> > >> > into. >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> > > > > > How with Int based headers could we >> > >> provide >> > >> >> a >> > >> >> >>> > >> > way/guidence to >> > >> >> >>> > >> > > >> make >> > >> >> >>> > >> > > >> > > this >> > >> >> >>> > >> > > >> > > > > > integration simple / easy with >> > transition >> > >> >> flows >> > >> >> >>> > over >> > >> >> >>> > >> to >> > >> >> >>> > >> > > kafka? >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> > > > > > * tough luck buddy you're on your >> own >> > >> >> >>> > >> > > >> > > > > > * simply hash the string into int >> code >> > >> and >> > >> >> hope >> > >> >> >>> > for >> > >> >> >>> > >> no >> > >> >> >>> > >> > > collisions >> > >> >> >>> > >> > > >> > > (how >> > >> >> >>> > >> > > >> > > > to >> > >> >> >>> > >> > > >> > > > > > convert back though?) >> > >> >> >>> > >> > > >> > > > > > * http2 style as mentioned by nacho. >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> > > > > > cheers, >> > >> >> >>> > >> > > >> > > > > > Mike >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> > > > > > ______________________________ >> > __________ >> > >> >> >>> > >> > > >> > > > > > From: radai < >> > radai.rosenbl...@gmail.com> >> > >> >> >>> > >> > > >> > > > > > Sent: Wednesday, November 9, 2016 >> > 8:12 AM >> > >> >> >>> > >> > > >> > > > > > To: dev@kafka.apache.org >> > >> >> >>> > >> > > >> > > > > > Subject: Re: [DISCUSS] KIP-82 - Add >> > >> Record >> > >> >> >>> Headers >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> > > > > > thinking about it some more, the >> best >> > >> way to >> > >> >> >>> > transmit >> > >> >> >>> > >> > the >> > >> >> >>> > >> > > header >> > >> >> >>> > >> > > >> > > > > remapping >> > >> >> >>> > >> > > >> > > > > > data to consumers would be to put it >> > in >> > >> the >> > >> >> MD >> > >> >> >>> > >> response >> > >> >> >>> > >> > > payload, >> > >> >> >>> > >> > > >> so >> > >> >> >>> > >> > > >> > > > maybe >> > >> >> >>> > >> > > >> > > > > > it should be discussed now. >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> > > > > > On Wed, Nov 9, 2016 at 12:09 AM, >> > radai < >> > >> >> >>> > >> > > >> radai.rosenbl...@gmail.com >> > >> >> >>> > >> > > >> > > >> > >> >> >>> > >> > > >> > > > > wrote: >> > >> >> >>> > >> > > >> > > > > > >> > >> >> >>> > >> > > >> > > > > > > im not opposed to the idea of >> > namespace >> > >> >> >>> mapping. >> > >> >> >>> > >> all >> > >> >> >>> > >> > im >> > >> >> >>> > >> > > saying >> > >> >> >>> > >> > > >> is >> > >> >> >>> > >> > > >> > > > that >> > >> >> >>> > >> > > >> > > > > > its >> > >> >> >>> > >> > > >> > > > > > > not part of the "mvp" and, since >> it >> > >> >> requires >> > >> >> >>> no >> > >> >> >>> > >> wire >> > >> >> >>> > >> > format >> > >> >> >>> > >> > > >> > change, >> > >> >> >>> > >> > > >> > > > can >> > >> >> >>> > >> > > >> > > > > > > always be added later. >> > >> >> >>> > >> > > >> > > > > > > also, its not as simple as just >> > >> >> configuring >> > >> >> >>> MM >> > >> >> >>> > to >> > >> >> >>> > >> do >> > >> >> >>> > >> > the >> > >> >> >>> > >> > > >> > transform: >> > >> >> >>> > >> > > >> > > > > lets >> > >> >> >>> > >> > > >> > > > > > > say i've implemented large message >> > >> >> support as >> > >> >> >>> > >> > {666,1} and >> > >> >> >>> > >> > > on >> > >> >> >>> > >> > > >> some >> > >> >> >>> > >> > > >> > > > > mirror >> > >> >> >>> > >> > > >> > > > > > > target cluster its been remapped >> to >> > >> >> {999,1}. >> > >> >> >>> the >> > >> >> >>> > >> > consumer >> > >> >> >>> > >> > > >> plugin >> > >> >> >>> > >> > > >> > > code >> > >> >> >>> > >> > > >> > > > > > would >> > >> >> >>> > >> > > >> > > > > > > also need to be told to look for >> the >> > >> large >> > >> >> >>> > message >> > >> >> >>> > >> > "part X >> > >> >> >>> > >> > > of >> > >> >> >>> > >> > > >> Y" >> > >> >> >>> > >> > > >> > > > header >> > >> >> >>> > >> > > >> > > > > > > under {999,1}. doable, but tricky. >> > >> >> >>> > >> > > >> > > > > > > >> > >> >> >>> > >> > > >> > > > > > > On Tue, Nov 8, 2016 at 10:29 PM, >> > Gwen >> > >> >> >>> Shapira < >> > >> >> >>> > >> > > >> g...@confluent.io >> > >> >> >>> > >> > > >> > > >> > >> >> >>> > >> > > >> > > > > wrote: >> > >> >> >>> > >> > > >> > > > > > > >> > >> >> >>> > >> > > >> > > > > > >> While you can do whatever you >> want >> > >> with a >> > >> >> >>> > >> namespace >> > >> >> >>> > >> > and >> > >> >> >>> > >> > > your >> > >> >> >>> > >> > > >> > code, >> > >> >> >>> > >> > > >> > > > > > >> what I'd expect is for each app >> to >> > >> >> >>> namespaces >> > >> >> >>> > >> > > configurable... >> > >> >> >>> > >> > > >> > > > > > >> >> > >> >> >>> > >> > > >> > > > > > >> So if I accidentally used 666 for >> > my >> > >> HR >> > >> >> >>> > >> department, >> > >> >> >>> > >> > and >> > >> >> >>> > >> > > still >> > >> >> >>> > >> > > >> > want >> > >> >> >>> > >> > > >> > > > to >> > >> >> >>> > >> > > >> > > > > > >> run RadaiApp, I can config >> > >> "namespace=42" >> > >> >> >>> for >> > >> >> >>> > >> > RadaiApp and >> > >> >> >>> > >> > > >> > > > everything >> > >> >> >>> > >> > > >> > > > > > >> will look normal. >> > >> >> >>> > >> > > >> > > > > > >> >> > >> >> >>> > >> > > >> > > > > > >> This means you only need to sync >> > usage >> > >> >> >>> inside >> > >> >> >>> > your >> > >> >> >>> > >> > own >> > >> >> >>> > >> > > >> > > organization. >> > >> >> >>> > >> > > >> > > > > > >> Still hard, but somewhat easier >> > than >> > >> >> syncing >> > >> >> >>> > with >> > >> >> >>> > >> > the >> > >> >> >>> > >> > > entire >> > >> >> >>> > >> > > >> > > world. >> > >> >> >>> > >> > > >> > > > > > >> >> > >> >> >>> > >> > > >> > > > > > >> On Tue, Nov 8, 2016 at 10:07 PM, >> > >> radai < >> > >> >> >>> > >> > > >> > > radai.rosenbl...@gmail.com> >> > >> >> >>> > >> > > >> > > > > > >> wrote: >> > >> >> >>> > >> > > >> > > > > > >> > and we can start with >> {namespace, >> > >> id} >> > >> >> and >> > >> >> >>> no >> > >> >> >>> > >> > re-mapping >> > >> >> >>> > >> > > >> > support >> > >> >> >>> > >> > > >> > > > and >> > >> >> >>> > >> > > >> > > > > > >> always >> > >> >> >>> > >> > > >> > > > > > >> > add it later on if/when >> > collisions >> > >> >> >>> actually >> > >> >> >>> > >> > happen (i >> > >> >> >>> > >> > > dont >> > >> >> >>> > >> > > >> > think >> > >> >> >>> > >> > > >> > > > > > they'd >> > >> >> >>> > >> > > >> > > > > > >> be >> > >> >> >>> > >> > > >> > > > > > >> > a problem). >> > >> >> >>> > >> > > >> > > > > > >> > >> > >> >> >>> > >> > > >> > > > > > >> > every interested party (so orgs >> > or >> > >> >> >>> > individuals) >> > >> >> >>> > >> > could >> > >> >> >>> > >> > > then >> > >> >> >>> > >> > > >> > > > register >> > >> >> >>> > >> > > >> > > > > a >> > >> >> >>> > >> > > >> > > > > > >> > prefix (0 = reserved, 1 = >> > confluent >> > >> ... >> > >> >> >>> 666 >> > >> >> >>> > = me >> > >> >> >>> > >> > :-) ) >> > >> >> >>> > >> > > and >> > >> >> >>> > >> > > >> do >> > >> >> >>> > >> > > >> > > > > whatever >> > >> >> >>> > >> > > >> > > > > > >> with >> > >> >> >>> > >> > > >> > > > > > >> > the 2nd ID - so once linkedin >> > >> >> registers, >> > >> >> >>> say >> > >> >> >>> > 3, >> > >> >> >>> > >> > then >> > >> >> >>> > >> > > >> linkedin >> > >> >> >>> > >> > > >> > > devs >> > >> >> >>> > >> > > >> > > > > are >> > >> >> >>> > >> > > >> > > > > > >> free >> > >> >> >>> > >> > > >> > > > > > >> > to use {3, *} with a reasonable >> > >> >> >>> expectation >> > >> >> >>> > to >> > >> >> >>> > >> to >> > >> >> >>> > >> > > collide >> > >> >> >>> > >> > > >> with >> > >> >> >>> > >> > > >> > > > > > anything >> > >> >> >>> > >> > > >> > > > > > >> > else. further partitioning of >> > that * >> > >> >> >>> becomes >> > >> >> >>> > >> > linkedin's >> > >> >> >>> > >> > > >> > problem, >> > >> >> >>> > >> > > >> > > > but >> > >> >> >>> > >> > > >> > > > > > the >> > >> >> >>> > >> > > >> > > > > > >> > "upstream registration" of a >> > >> namespace >> > >> >> >>> only >> > >> >> >>> > has >> > >> >> >>> > >> to >> > >> >> >>> > >> > > happen >> > >> >> >>> > >> > > >> > once. >> > >> >> >>> > >> > > >> > > > > > >> > >> > >> >> >>> > >> > > >> > > > > > >> > On Tue, Nov 8, 2016 at 9:03 PM, >> > >> James >> > >> >> >>> Cheng < >> > >> >> >>> > >> > > >> > > wushuja...@gmail.com >> > >> >> >>> > >> > > >> > > > > >> > >> >> >>> > >> > > >> > > > > > >> wrote: >> > >> >> >>> > >> > > >> > > > > > >> > >> > >> >> >>> > >> > > >> > > > > > >> >> >> > >> >> >>> > >> > > >> > > > > > >> >> >> > >> >> >>> > >> > > >> > > > > > >> >> >> > >> >> >>> > >> > > >> > > > > > >> >> > On Nov 8, 2016, at 5:54 PM, >> > Gwen >> > >> >> >>> Shapira < >> > >> >> >>> > >> > > >> > g...@confluent.io> >> > >> >> >>> > >> > > >> > > > > > wrote: >> > >> >> >>> > >> > > >> > > > > > >> >> > >> > >> >> >>> > >> > > >> > > > > > >> >> > Thank you so much for this >> > clear >> > >> and >> > >> >> >>> fair >> > >> >> >>> > >> > summary of >> > >> >> >>> > >> > > the >> > >> >> >>> > >> > > >> > > > > arguments. >> > >> >> >>> > >> > > >> > > > > > >> >> > >> > >> >> >>> > >> > > >> > > > > > >> >> > I'm in favor of ints. Not a >> > >> >> >>> deal-breaker, >> > >> >> >>> > but >> > >> >> >>> > >> > in >> > >> >> >>> > >> > > favor. >> > >> >> >>> > >> > > >> > > > > > >> >> > >> > >> >> >>> > >> > > >> > > > > > >> >> > Even more in favor of >> Magnus's >> > >> >> >>> > decentralized >> > >> >> >>> > >> > > suggestion >> > >> >> >>> > >> > > >> > with >> > >> >> >>> > >> > > >> > > > > > Roger's >> > >> >> >>> > >> > > >> > > > > > >> >> > tweak: add a namespace for >> > >> headers. >> > >> >> >>> This >> > >> >> >>> > will >> > >> >> >>> > >> > allow >> > >> >> >>> > >> > > each >> > >> >> >>> > >> > > >> > app >> > >> >> >>> > >> > > >> > > to >> > >> >> >>> > >> > > >> > > > > > just >> > >> >> >>> > >> > > >> > > > > > >> >> > use whatever IDs it wants >> > >> >> internally, >> > >> >> >>> and >> > >> >> >>> > >> then >> > >> >> >>> > >> > let >> > >> >> >>> > >> > > the >> > >> >> >>> > >> > > >> > admin >> > >> >> >>> > >> > > >> > > > > > >> deploying >> > >> >> >>> > >> > > >> > > > > > >> >> > the app figure out an >> > available >> > >> >> >>> namespace >> > >> >> >>> > ID >> > >> >> >>> > >> > for the >> > >> >> >>> > >> > > app >> > >> >> >>> > >> > > >> to >> > >> >> >>> > >> > > >> > > > live >> > >> >> >>> > >> > > >> > > > > > in. >> > >> >> >>> > >> > > >> > > > > > >> >> > So >> > io.confluent.schema-registry >> > >> can >> > >> >> be >> > >> >> >>> > >> > namespace >> > >> >> >>> > >> > > 0x01 on >> > >> >> >>> > >> > > >> my >> > >> >> >>> > >> > > >> > > > > > >> deployment >> > >> >> >>> > >> > > >> > > > > > >> >> > and 0x57 on yours, and the >> > poor >> > >> guys >> > >> >> >>> > >> > developing the >> > >> >> >>> > >> > > app >> > >> >> >>> > >> > > >> > don't >> > >> >> >>> > >> > > >> > > > > need >> > >> >> >>> > >> > > >> > > > > > to >> > >> >> >>> > >> > > >> > > > > > >> >> > worry about that. >> > >> >> >>> > >> > > >> > > > > > >> >> > >> > >> >> >>> > >> > > >> > > > > > >> >> >> > >> >> >>> > >> > > >> > > > > > >> >> Gwen, if I understand your >> > example >> > >> >> >>> right, an >> > >> >> >>> > >> > > application >> > >> >> >>> > >> > > >> > > deployer >> > >> >> >>> > >> > > >> > > > > > might >> > >> >> >>> > >> > > >> > > > > > >> >> decide to use 0x01 in one >> > >> deployment, >> > >> >> and >> > >> >> >>> > that >> > >> >> >>> > >> > means >> > >> >> >>> > >> > > that >> > >> >> >>> > >> > > >> > once >> > >> >> >>> > >> > > >> > > > the >> > >> >> >>> > >> > > >> > > > > > >> message >> > >> >> >>> > >> > > >> > > > > > >> >> is written into the broker, it >> > >> will be >> > >> >> >>> > saved on >> > >> >> >>> > >> > the >> > >> >> >>> > >> > > broker >> > >> >> >>> > >> > > >> > with >> > >> >> >>> > >> > > >> > > > > that >> > >> >> >>> > >> > > >> > > > > > >> >> specific namespace (0x01). >> > >> >> >>> > >> > > >> > > > > > >> >> >> > >> >> >>> > >> > > >> > > > > > >> >> If you were to mirror that >> > message >> > >> >> into >> > >> >> >>> > another >> > >> >> >>> > >> > > cluster, >> > >> >> >>> > >> > > >> the >> > >> >> >>> > >> > > >> > > 0x01 >> > >> >> >>> > >> > > >> > > > > > would >> > >> >> >>> > >> > > >> > > > > > >> >> accompany the message, right? >> > What >> > >> if >> > >> >> the >> > >> >> >>> > >> > deployers of >> > >> >> >>> > >> > > the >> > >> >> >>> > >> > > >> > same >> > >> >> >>> > >> > > >> > > > app >> > >> >> >>> > >> > > >> > > > > > in >> > >> >> >>> > >> > > >> > > > > > >> the >> > >> >> >>> > >> > > >> > > > > > >> >> other cluster uses 0x57? They >> > won't >> > >> >> >>> > understand >> > >> >> >>> > >> > each >> > >> >> >>> > >> > > other? >> > >> >> >>> > >> > > >> > > > > > >> >> >> > >> >> >>> > >> > > >> > > > > > >> >> I'm not sure that's an >> avoidable >> > >> >> >>> problem. I >> > >> >> >>> > >> > think it >> > >> >> >>> > >> > > simply >> > >> >> >>> > >> > > >> > > means >> > >> >> >>> > >> > > >> > > > > > that >> > >> >> >>> > >> > > >> > > > > > >> in >> > >> >> >>> > >> > > >> > > > > > >> >> order to share data, you have >> to >> > >> also >> > >> >> >>> have a >> > >> >> >>> > >> > shared >> > >> >> >>> > >> > > (agreed >> > >> >> >>> > >> > > >> > > upon) >> > >> >> >>> > >> > > >> > > > > > >> >> understanding of what the >> > >> namespaces >> > >> >> >>> mean. >> > >> >> >>> > >> Which >> > >> >> >>> > >> > I >> > >> >> >>> > >> > > think >> > >> >> >>> > >> > > >> > makes >> > >> >> >>> > >> > > >> > > > > sense, >> > >> >> >>> > >> > > >> > > > > > >> >> because the alternate (sharing >> > >> >> *nothing* >> > >> >> >>> at >> > >> >> >>> > >> all) >> > >> >> >>> > >> > would >> > >> >> >>> > >> > > mean >> > >> >> >>> > >> > > >> > > that >> > >> >> >>> > >> > > >> > > > > > there >> > >> >> >>> > >> > > >> > > > > > >> >> would be no way to understand >> > each >> > >> >> other. >> > >> >> >>> > >> > > >> > > > > > >> >> >> > >> >> >>> > >> > > >> > > > > > >> >> -James >> > >> >> >>> > >> > > >> > > > > > >> >> >> > >> >> >>> > >> > > >> > > > > > >> >> > Gwen >> > >> >> >>> > >> > > >> > > > > > >> >> > >> > >> >> >>> > >> > > >> > > > > > >> >> > On Tue, Nov 8, 2016 at 4:23 >> > PM, >> > >> >> radai < >> > >> >> >>> > >> > > >> > > > > radai.rosenbl...@gmail.com> >> > >> >> >>> > >> > > >> > > > > > >> >> wrote: >> > >> >> >>> > >> > > >> > > > > > >> >> >> +1 for sean's document. it >> > >> covers >> > >> >> >>> pretty >> > >> >> >>> > >> much >> > >> >> >>> > >> > all >> > >> >> >>> > >> > > the >> > >> >> >>> > >> > > >> > > > trade-offs >> > >> >> >>> > >> > > >> > > > > > and >> > >> >> >>> > >> > > >> > > > > > >> >> >> provides concrete figures >> to >> > >> argue >> > >> >> >>> about >> > >> >> >>> > :-) >> > >> >> >>> > >> > > >> > > > > > >> >> >> (nit-picking - used the >> same >> > >> xkcd >> > >> >> >>> twice, >> > >> >> >>> > >> also >> > >> >> >>> > >> > trove >> > >> >> >>> > >> > > has >> > >> >> >>> > >> > > >> > been >> > >> >> >>> > >> > > >> > > > > > >> superceded >> > >> > >> > >> > >> > >> > -- >> > Gwen Shapira >> > Product Manager | Confluent >> > 650.450.2760 | @gwenshap >> > Follow us: Twitter | blog >> > >> >> >> >> -- >> *Todd Palino* >> Staff Site Reliability Engineer >> Data Infrastructure Streaming >> >> >> >> linkedin.com/in/toddpalino >> -- Gwen Shapira Product Manager | Confluent 650.450.2760 | @gwenshap Follow us: Twitter | blog