@rhauch Here is the previous discussion thread, just reigniting so we can discuss against the original kip thread
Cheers Mike Sent from my iPhone > On 5 May 2017, at 02:21, Michael Pearce <michael.pea...@ig.com> wrote: > > Hi Ewen, > > Did you get a chance to look at the updated sample showing the idea? > > Did it help? > > Cheers > Mike > > Sent using OWA for iPhone > ________________________________________ > From: Michael Pearce <michael.pea...@ig.com> > Sent: Wednesday, May 3, 2017 10:11:55 AM > To: dev@kafka.apache.org > Subject: Re: [DISCUSS] KIP 145 - Expose Record Headers in Kafka Connect > > Hi Ewen, > > As code I think helps, as I don’t think I explained what I meant very well. > > I have pushed what I was thinking to the branch/pr. > https://github.com/apache/kafka/pull/2942 > > The key bits added on top here are: > new ConnectHeader that holds the header key (as string) and then header value > object header value schema > > new SubjectConverter which allows exposing a subject, in this case the > subject is the key. - this can be used to register the header type in repos > like schema registry, or in my case below in a property file. > > > We can default the subject converter to String based of Byte based where all > header values are treated safely as String or byte[] type. > > But this way you could add in your own converter which could be more > sophisticated and convert the header based on the key. > > The main part is to have access to the key, so you can look up the header > value type, based on the key from somewhere, aka a properties file, or some > central repo (aka schema repo), where the repo subject could be the topic + > key, or just key if key type is global, and the schema could be primitive, > String, byte[] or even can be more elaborate. > > Cheers > Mike > > On 03/05/2017, 06:00, "Ewen Cheslack-Postava" <e...@confluent.io> wrote: > > Michael, > > Aren't JMS headers an example where the variety is a problem? Unless I'm > misunderstanding, there's not even a fixed serialization format expected > for them since JMS defines the runtime types, not the wire format. For > example, we have JMSCorrelationID (String), JMSExpires (Long), and > JMSReplyTo (Destination). These are simply run time types, so we'd need > either (a) a different serializer/deserializer for each or (b) a > serializer/deserializer that can handle all of them (e.g. Avro, JSON, etc). > > What is the actual serialized format of the different fields? And if it's > not specified anywhere in the KIP, why should using the well-known type for > the header key (e.g. use StringSerializer, IntSerializer, etc) be better or > worse than using a general serialization format (e.g. Avro, JSON)? And if > the latter is the choice, how do you decide on the format? > > -Ewen > > On Tue, May 2, 2017 at 12:48 PM, Michael André Pearce < > michael.andre.pea...@me.com> wrote: > >> Hi Ewan, >> >> So on the point of JMS the predefined/standardised JMS and JMSX headers >> have predefined types. So these can be serialised/deserialised accordingly. >> >> Custom jms headers agreed could be a bit more difficult but on the 80/20 >> rule I would agree mostly they're string values and as anyhow you can hold >> bytes as a string it wouldn't cause any issue, defaulting to that. >> >> But I think easily we maybe able to do one better. >> >> Obviously can override the/config the headers converter but we can supply >> a default converter could take a config file with key to type mapping? >> >> Allowing people to maybe define/declare a header key with the expected >> type in some property file? To support string, byte[] and primitives? And >> undefined headers just either default to String or byte[] >> >> We could also pre define known headers like the jms ones mentioned above. >> >> E.g >> >> AwesomeHeader1=boolean >> AwesomeHeader2=long >> JMSCorrelationId=String >> JMSXGroupId=String >> >> >> What you think? >> >> >> Cheers >> Mike >> >> >> >> >> >> >> Sent from my iPhone >> >>> On 2 May 2017, at 18:45, Ewen Cheslack-Postava <e...@confluent.io> >> wrote: >>> >>> A couple of thoughts: >>> >>> First, agreed that we definitely want to expose header functionality. >> Thank >>> you Mike for starting the conversation! Even if Connect doesn't do >> anything >>> special with it, there's value in being able to access/set headers. >>> >>> On motivation -- I think there are much broader use cases. When thinking >>> about exposing headers, I'd actually use Replicator as only a minor >>> supporting case. The reason is that it is a very uncommon case where >> there >>> is zero impedance mismatch between the source and sink of the data since >>> they are both Kafka. This means you don't need to think much about data >>> formats/serialization. I think the JMS use case is a better example since >>> JMS headers and Kafka headers don't quite match up. Here's a quick list >> of >>> use cases I can think of off the top of my head: >>> >>> 1. Include headers from other systems that support them: JMS (or really >> any >>> MQ), HTTP >>> 2. Other connector-specific headers. For example, from JDBC maybe the >> table >>> the data comes from is a header; for a CDC connector you might include >> the >>> binlog offset as a header. >>> 3. Interceptor/SMT-style use cases for annotating things like provenance >> of >>> data: >>> 3a. Generically w/ user-supplied data like data center, host, app ID, >> etc. >>> 3b. Kafka Connect framework level info, such as the connector/task >>> generating the data >>> >>> On deviation from Connect's model -- to be honest, the KIP-82 also >> deviates >>> quite substantially from how Kafka handles data already, so we may >> struggle >>> a bit to rectify the two. (In particular, headers specify some structure >>> and enforce strings specifically for header keys, but then require you to >>> do serialization of header values yourself...). >>> >>> I think the use cases I mentioned above may also need different >> approaches >>> to how the data in headers are handled. As Gwen mentions, if we expose >> the >>> headers to Connectors, they need to have some idea of the format and the >>> reason for byte[] values in KIP-82 is to leave that decision up to the >>> organization using them. But without knowing the format, connectors can't >>> really do anything with them -- if a source connector assumes a format, >>> they may generate data incompatible with the format used by the rest of >> the >>> organization. On the other hand, I have a feeling most people will just >> use >>> <String, String> headers, so allowing connectors to embed arbitrarily >>> complex data may not work out well in practice. Or maybe we leave it >>> flexible, most people default to using StringConverter for the serializer >>> and Connectors will end up defaulting to that just for compatibility... >>> >>> I'm not sure I have a real proposal yet, but I do think understanding the >>> impact of using a Converter for headers would be useful, and we might >> want >>> to think about how this KIP would fit in with transformations (or if that >>> is something that can be deferred, handled separately from the existing >>> transformations, etc). >>> >>> -Ewen >>> >>> On Mon, May 1, 2017 at 11:52 AM, Michael Pearce <michael.pea...@ig.com> >>> wrote: >>> >>>> Hi Gwen, >>>> >>>> Then intent here was to allow tools that perform similar role to mirror >>>> makers of replicating the messaging from one cluster to another. Eg >> like >>>> mirror make should just be taking and transferring the headers as is. >>>> >>>> We don't actually use this inside our company, so not exposing this >> isn't >>>> an issue for us. Just believe there are companies like confluent who >> have >>>> tools like replicator that do. >>>> >>>> And as good citizens think we should complete the work and expose the >>>> headers same as in the record to at least allow them to replicate the >>>> messages as is. Note Steph seems to want it. >>>> >>>> Cheers >>>> Mike >>>> >>>> Sent using OWA for iPhone >>>> ________________________________________ >>>> From: Gwen Shapira <g...@confluent.io> >>>> Sent: Monday, May 1, 2017 2:36:34 PM >>>> To: dev@kafka.apache.org >>>> Subject: Re: [DISCUSS] KIP 145 - Expose Record Headers in Kafka Connect >>>> >>>> Hi, >>>> >>>> I'm excited to see the community expanding Connect in this direction! >>>> Headers + Transforms == Fun message routing. >>>> >>>> I like how clean the proposal is, but I'm concerned that it kinda >> deviates >>>> from how Connect handles data elsewhere. >>>> Unlike Kafka, Connect doesn't look at all data as byte-arrays, we have >>>> converters that take data in specific formats (JSON, Avro) and turns it >>>> into Connect data types (defined in the data api). I think it will be >> more >>>> consistent for connector developers to also get headers as some kind of >>>> structured or semi-structured data (and to expand the converters to >> handle >>>> header conversions as well). >>>> This will allow for Connect's separation of concerns - Connector >> developers >>>> don't worry about data formats (because they get the internal connect >>>> objects) and Converters do all the data format work. >>>> >>>> Another thing, in my experience, APIs work better if they are put into >> use >>>> almost immediately - so difficulties in using the APIs are immediately >>>> surfaced. Are you planning any connectors that will use this feature >> (not >>>> necessarily in Kafka, just in general)? Or perhaps we can think of a >> way to >>>> expand Kafka's file connectors so they'll use headers somehow (can't >> think >>>> of anything, but maybe?). >>>> >>>> Gwen >>>> >>>> On Sat, Apr 29, 2017 at 12:12 AM, Michael Pearce <michael.pea...@ig.com >>> >>>> wrote: >>>> >>>>> Hi All, >>>>> >>>>> Now KIP-82 is committed I would like to discuss extending the work to >>>>> expose it in Kafka Connect, its primary focus being so connectors that >>>> may >>>>> do similar tasks as MirrorMakers, either Kafka->Kafka or JMS-Kafka >> would >>>> be >>>>> able to replicate the headers. >>>>> It would be ideal but not mandatory for this to go in 0.11 release so >> is >>>>> available on day one of headers being available. >>>>> >>>>> Please find the KIP here: >>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- >>>>> 145+-+Expose+Record+Headers+in+Kafka+Connect >>>>> >>>>> Please find an initial implementation as a PR here: >>>>> https://github.com/apache/kafka/pull/2942 >>>>> >>>>> Kind Regards >>>>> Mike >>>>> The information contained in this email is strictly confidential and >> for >>>>> the use of the addressee only, unless otherwise indicated. If you are >> not >>>>> the intended recipient, please do not read, copy, use or disclose to >>>> others >>>>> this message or any attachment. Please also notify the sender by >> replying >>>>> to this email or by telephone (+44(020 7896 0011) and then delete the >>>> email >>>>> and any copies of it. Opinions, conclusion (etc) that do not relate to >>>> the >>>>> official business of this company shall be understood as neither given >>>> nor >>>>> endorsed by it. IG is a trading name of IG Markets Limited (a company >>>>> registered in England and Wales, company number 04008957) and IG Index >>>>> Limited (a company registered in England and Wales, company number >>>>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, >>>>> London EC4R 2YA. Both IG Markets Limited (register number 195355) and >> IG >>>>> Index Limited (register number 114059) are authorised and regulated by >>>> the >>>>> Financial Conduct Authority. >>>>> >>>> >>>> >>>> >>>> -- >>>> *Gwen Shapira* >>>> Product Manager | Confluent >>>> 650.450.2760 | @gwenshap >>>> Follow us: Twitter <https://twitter.com/ConfluentInc> | blog >>>> <http://www.confluent.io/blog> >>>> The information contained in this email is strictly confidential and for >>>> the use of the addressee only, unless otherwise indicated. If you are >> not >>>> the intended recipient, please do not read, copy, use or disclose to >> others >>>> this message or any attachment. Please also notify the sender by >> replying >>>> to this email or by telephone (+44(020 7896 0011) and then delete the >> email >>>> and any copies of it. Opinions, conclusion (etc) that do not relate to >> the >>>> official business of this company shall be understood as neither given >> nor >>>> endorsed by it. IG is a trading name of IG Markets Limited (a company >>>> registered in England and Wales, company number 04008957) and IG Index >>>> Limited (a company registered in England and Wales, company number >>>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, >>>> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG >>>> Index Limited (register number 114059) are authorised and regulated by >> the >>>> Financial Conduct Authority. >>>> >> > > > The information contained in this email is strictly confidential and for the > use of the addressee only, unless otherwise indicated. If you are not the > intended recipient, please do not read, copy, use or disclose to others this > message or any attachment. Please also notify the sender by replying to this > email or by telephone (+44(020 7896 0011) and then delete the email and any > copies of it. Opinions, conclusion (etc) that do not relate to the official > business of this company shall be understood as neither given nor endorsed by > it. IG is a trading name of IG Markets Limited (a company registered in > England and Wales, company number 04008957) and IG Index Limited (a company > registered in England and Wales, company number 01190902). Registered address > at Cannon Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets > Limited (register number 195355) and IG Index Limited (register number > 114059) are authorised and regulated by the Financial Conduct Authority.