@rhauch

Here is the previous discussion thread, just reigniting so we can discuss 
against the original kip thread


Cheers

Mike

Sent from my iPhone

> On 5 May 2017, at 02:21, Michael Pearce <michael.pea...@ig.com> wrote:
> 
> Hi Ewen,
> 
> Did you get a chance to look at the updated sample showing the idea?
> 
> Did it help?
> 
> Cheers
> Mike
> 
> Sent using OWA for iPhone
> ________________________________________
> From: Michael Pearce <michael.pea...@ig.com>
> Sent: Wednesday, May 3, 2017 10:11:55 AM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP 145 - Expose Record Headers in Kafka Connect
> 
> Hi Ewen,
> 
> As code I think helps, as I don’t think I explained what I meant very well.
> 
> I have pushed what I was thinking to the branch/pr.
> https://github.com/apache/kafka/pull/2942
> 
> The key bits added on top here are:
> new ConnectHeader that holds the header key (as string) and then header value 
> object header value schema
> 
> new SubjectConverter which allows exposing a subject, in this case the 
> subject is the key. - this can be used to register the header type in repos 
> like schema registry, or in my case below in a property file.
> 
> 
> We can default the subject converter to String based of Byte based where all 
> header values are treated safely as String or byte[] type.
> 
> But this way you could add in your own converter which could be more 
> sophisticated and convert the header based on the key.
> 
> The main part is to have access to the key, so you can look up the header 
> value type, based on the key from somewhere, aka a properties file, or some 
> central repo (aka schema repo), where the repo subject could be the topic + 
> key, or just key if key type is global, and the schema could be primitive, 
> String, byte[] or even can be more elaborate.
> 
> Cheers
> Mike
> 
> On 03/05/2017, 06:00, "Ewen Cheslack-Postava" <e...@confluent.io> wrote:
> 
>    Michael,
> 
>    Aren't JMS headers an example where the variety is a problem? Unless I'm
>    misunderstanding, there's not even a fixed serialization format expected
>    for them since JMS defines the runtime types, not the wire format. For
>    example, we have JMSCorrelationID (String), JMSExpires (Long), and
>    JMSReplyTo (Destination). These are simply run time types, so we'd need
>    either (a) a different serializer/deserializer for each or (b) a
>    serializer/deserializer that can handle all of them (e.g. Avro, JSON, etc).
> 
>    What is the actual serialized format of the different fields? And if it's
>    not specified anywhere in the KIP, why should using the well-known type for
>    the header key (e.g. use StringSerializer, IntSerializer, etc) be better or
>    worse than using a general serialization format (e.g. Avro, JSON)? And if
>    the latter is the choice, how do you decide on the format?
> 
>    -Ewen
> 
>    On Tue, May 2, 2017 at 12:48 PM, Michael André Pearce <
>    michael.andre.pea...@me.com> wrote:
> 
>> Hi Ewan,
>> 
>> So on the point of JMS the predefined/standardised JMS and JMSX headers
>> have predefined types. So these can be serialised/deserialised accordingly.
>> 
>> Custom jms headers agreed could be a bit more difficult but on the 80/20
>> rule I would agree mostly they're string values and as anyhow you can hold
>> bytes as a string it wouldn't cause any issue, defaulting to that.
>> 
>> But I think easily we maybe able to do one better.
>> 
>> Obviously can override the/config the headers converter but we can supply
>> a default converter could take a config file with key to type mapping?
>> 
>> Allowing people to maybe define/declare a header key with the expected
>> type in some property file? To support string, byte[] and primitives? And
>> undefined headers just either default to String or byte[]
>> 
>> We could also pre define known headers like the jms ones mentioned above.
>> 
>> E.g
>> 
>> AwesomeHeader1=boolean
>> AwesomeHeader2=long
>> JMSCorrelationId=String
>> JMSXGroupId=String
>> 
>> 
>> What you think?
>> 
>> 
>> Cheers
>> Mike
>> 
>> 
>> 
>> 
>> 
>> 
>> Sent from my iPhone
>> 
>>> On 2 May 2017, at 18:45, Ewen Cheslack-Postava <e...@confluent.io>
>> wrote:
>>> 
>>> A couple of thoughts:
>>> 
>>> First, agreed that we definitely want to expose header functionality.
>> Thank
>>> you Mike for starting the conversation! Even if Connect doesn't do
>> anything
>>> special with it, there's value in being able to access/set headers.
>>> 
>>> On motivation -- I think there are much broader use cases. When thinking
>>> about exposing headers, I'd actually use Replicator as only a minor
>>> supporting case. The reason is that it is a very uncommon case where
>> there
>>> is zero impedance mismatch between the source and sink of the data since
>>> they are both Kafka. This means you don't need to think much about data
>>> formats/serialization. I think the JMS use case is a better example since
>>> JMS headers and Kafka headers don't quite match up. Here's a quick list
>> of
>>> use cases I can think of off the top of my head:
>>> 
>>> 1. Include headers from other systems that support them: JMS (or really
>> any
>>> MQ), HTTP
>>> 2. Other connector-specific headers. For example, from JDBC maybe the
>> table
>>> the data comes from is a header; for a CDC connector you might include
>> the
>>> binlog offset as a header.
>>> 3. Interceptor/SMT-style use cases for annotating things like provenance
>> of
>>> data:
>>> 3a. Generically w/ user-supplied data like data center, host, app ID,
>> etc.
>>> 3b. Kafka Connect framework level info, such as the connector/task
>>> generating the data
>>> 
>>> On deviation from Connect's model -- to be honest, the KIP-82 also
>> deviates
>>> quite substantially from how Kafka handles data already, so we may
>> struggle
>>> a bit to rectify the two. (In particular, headers specify some structure
>>> and enforce strings specifically for header keys, but then require you to
>>> do serialization of header values yourself...).
>>> 
>>> I think the use cases I mentioned above may also need different
>> approaches
>>> to how the data in headers are handled. As Gwen mentions, if we expose
>> the
>>> headers to Connectors, they need to have some idea of the format and the
>>> reason for byte[] values in KIP-82 is to leave that decision up to the
>>> organization using them. But without knowing the format, connectors can't
>>> really do anything with them -- if a source connector assumes a format,
>>> they may generate data incompatible with the format used by the rest of
>> the
>>> organization. On the other hand, I have a feeling most people will just
>> use
>>> <String, String> headers, so allowing connectors to embed arbitrarily
>>> complex data may not work out well in practice. Or maybe we leave it
>>> flexible, most people default to using StringConverter for the serializer
>>> and Connectors will end up defaulting to that just for compatibility...
>>> 
>>> I'm not sure I have a real proposal yet, but I do think understanding the
>>> impact of using a Converter for headers would be useful, and we might
>> want
>>> to think about how this KIP would fit in with transformations (or if that
>>> is something that can be deferred, handled separately from the existing
>>> transformations, etc).
>>> 
>>> -Ewen
>>> 
>>> On Mon, May 1, 2017 at 11:52 AM, Michael Pearce <michael.pea...@ig.com>
>>> wrote:
>>> 
>>>> Hi Gwen,
>>>> 
>>>> Then intent here was to allow tools that perform similar role to mirror
>>>> makers of replicating the messaging from one cluster to another.  Eg
>> like
>>>> mirror make should just be taking and transferring the headers as is.
>>>> 
>>>> We don't actually use this inside our company, so not exposing this
>> isn't
>>>> an issue for us. Just believe there are companies like confluent who
>> have
>>>> tools like replicator that do.
>>>> 
>>>> And as good citizens think we should complete the work and expose the
>>>> headers same as in the record to at least allow them to replicate the
>>>> messages as is. Note Steph seems to want it.
>>>> 
>>>> Cheers
>>>> Mike
>>>> 
>>>> Sent using OWA for iPhone
>>>> ________________________________________
>>>> From: Gwen Shapira <g...@confluent.io>
>>>> Sent: Monday, May 1, 2017 2:36:34 PM
>>>> To: dev@kafka.apache.org
>>>> Subject: Re: [DISCUSS] KIP 145 - Expose Record Headers in Kafka Connect
>>>> 
>>>> Hi,
>>>> 
>>>> I'm excited to see the community expanding Connect in this direction!
>>>> Headers + Transforms == Fun message routing.
>>>> 
>>>> I like how clean the proposal is, but I'm concerned that it kinda
>> deviates
>>>> from how Connect handles data elsewhere.
>>>> Unlike Kafka, Connect doesn't look at all data as byte-arrays, we have
>>>> converters that take data in specific formats (JSON, Avro) and turns it
>>>> into Connect data types (defined in the data api). I think it will be
>> more
>>>> consistent for connector developers to also get headers as some kind of
>>>> structured or semi-structured data (and to expand the converters to
>> handle
>>>> header conversions as well).
>>>> This will allow for Connect's separation of concerns - Connector
>> developers
>>>> don't worry about data formats (because they get the internal connect
>>>> objects) and Converters do all the data format work.
>>>> 
>>>> Another thing, in my experience, APIs work better if they are put into
>> use
>>>> almost immediately - so difficulties in using the APIs are immediately
>>>> surfaced. Are you planning any connectors that will use this feature
>> (not
>>>> necessarily in Kafka, just in general)? Or perhaps we can think of a
>> way to
>>>> expand Kafka's file connectors so they'll use headers somehow (can't
>> think
>>>> of anything, but maybe?).
>>>> 
>>>> Gwen
>>>> 
>>>> On Sat, Apr 29, 2017 at 12:12 AM, Michael Pearce <michael.pea...@ig.com
>>> 
>>>> wrote:
>>>> 
>>>>> Hi All,
>>>>> 
>>>>> Now KIP-82 is committed I would like to discuss extending the work to
>>>>> expose it in Kafka Connect, its primary focus being so connectors that
>>>> may
>>>>> do similar tasks as MirrorMakers, either Kafka->Kafka or JMS-Kafka
>> would
>>>> be
>>>>> able to replicate the headers.
>>>>> It would be ideal but not mandatory for this to go in 0.11 release so
>> is
>>>>> available on day one of headers being available.
>>>>> 
>>>>> Please find the KIP here:
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>> 145+-+Expose+Record+Headers+in+Kafka+Connect
>>>>> 
>>>>> Please find an initial implementation as a PR here:
>>>>> https://github.com/apache/kafka/pull/2942
>>>>> 
>>>>> Kind Regards
>>>>> Mike
>>>>> The information contained in this email is strictly confidential and
>> for
>>>>> the use of the addressee only, unless otherwise indicated. If you are
>> not
>>>>> the intended recipient, please do not read, copy, use or disclose to
>>>> others
>>>>> this message or any attachment. Please also notify the sender by
>> replying
>>>>> to this email or by telephone (+44(020 7896 0011) and then delete the
>>>> email
>>>>> and any copies of it. Opinions, conclusion (etc) that do not relate to
>>>> the
>>>>> official business of this company shall be understood as neither given
>>>> nor
>>>>> endorsed by it. IG is a trading name of IG Markets Limited (a company
>>>>> registered in England and Wales, company number 04008957) and IG Index
>>>>> Limited (a company registered in England and Wales, company number
>>>>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
>>>>> London EC4R 2YA. Both IG Markets Limited (register number 195355) and
>> IG
>>>>> Index Limited (register number 114059) are authorised and regulated by
>>>> the
>>>>> Financial Conduct Authority.
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> *Gwen Shapira*
>>>> Product Manager | Confluent
>>>> 650.450.2760 | @gwenshap
>>>> Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
>>>> <http://www.confluent.io/blog>
>>>> The information contained in this email is strictly confidential and for
>>>> the use of the addressee only, unless otherwise indicated. If you are
>> not
>>>> the intended recipient, please do not read, copy, use or disclose to
>> others
>>>> this message or any attachment. Please also notify the sender by
>> replying
>>>> to this email or by telephone (+44(020 7896 0011) and then delete the
>> email
>>>> and any copies of it. Opinions, conclusion (etc) that do not relate to
>> the
>>>> official business of this company shall be understood as neither given
>> nor
>>>> endorsed by it. IG is a trading name of IG Markets Limited (a company
>>>> registered in England and Wales, company number 04008957) and IG Index
>>>> Limited (a company registered in England and Wales, company number
>>>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
>>>> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
>>>> Index Limited (register number 114059) are authorised and regulated by
>> the
>>>> Financial Conduct Authority.
>>>> 
>> 
> 
> 
> The information contained in this email is strictly confidential and for the 
> use of the addressee only, unless otherwise indicated. If you are not the 
> intended recipient, please do not read, copy, use or disclose to others this 
> message or any attachment. Please also notify the sender by replying to this 
> email or by telephone (+44(020 7896 0011) and then delete the email and any 
> copies of it. Opinions, conclusion (etc) that do not relate to the official 
> business of this company shall be understood as neither given nor endorsed by 
> it. IG is a trading name of IG Markets Limited (a company registered in 
> England and Wales, company number 04008957) and IG Index Limited (a company 
> registered in England and Wales, company number 01190902). Registered address 
> at Cannon Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets 
> Limited (register number 195355) and IG Index Limited (register number 
> 114059) are authorised and regulated by the Financial Conduct Authority.

Reply via email to