Hi Ewen,

Did you get a chance to look at the updated sample showing the idea?

Did it help?

Cheers
Mike

Sent using OWA for iPhone
________________________________________
From: Michael Pearce <michael.pea...@ig.com>
Sent: Wednesday, May 3, 2017 10:11:55 AM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP 145 - Expose Record Headers in Kafka Connect

Hi Ewen,

As code I think helps, as I don’t think I explained what I meant very well.

I have pushed what I was thinking to the branch/pr.
https://github.com/apache/kafka/pull/2942

The key bits added on top here are:
new ConnectHeader that holds the header key (as string) and then header value 
object header value schema

new SubjectConverter which allows exposing a subject, in this case the subject 
is the key. - this can be used to register the header type in repos like schema 
registry, or in my case below in a property file.


We can default the subject converter to String based of Byte based where all 
header values are treated safely as String or byte[] type.

But this way you could add in your own converter which could be more 
sophisticated and convert the header based on the key.

The main part is to have access to the key, so you can look up the header value 
type, based on the key from somewhere, aka a properties file, or some central 
repo (aka schema repo), where the repo subject could be the topic + key, or 
just key if key type is global, and the schema could be primitive, String, 
byte[] or even can be more elaborate.

Cheers
Mike

On 03/05/2017, 06:00, "Ewen Cheslack-Postava" <e...@confluent.io> wrote:

    Michael,

    Aren't JMS headers an example where the variety is a problem? Unless I'm
    misunderstanding, there's not even a fixed serialization format expected
    for them since JMS defines the runtime types, not the wire format. For
    example, we have JMSCorrelationID (String), JMSExpires (Long), and
    JMSReplyTo (Destination). These are simply run time types, so we'd need
    either (a) a different serializer/deserializer for each or (b) a
    serializer/deserializer that can handle all of them (e.g. Avro, JSON, etc).

    What is the actual serialized format of the different fields? And if it's
    not specified anywhere in the KIP, why should using the well-known type for
    the header key (e.g. use StringSerializer, IntSerializer, etc) be better or
    worse than using a general serialization format (e.g. Avro, JSON)? And if
    the latter is the choice, how do you decide on the format?

    -Ewen

    On Tue, May 2, 2017 at 12:48 PM, Michael André Pearce <
    michael.andre.pea...@me.com> wrote:

    > Hi Ewan,
    >
    > So on the point of JMS the predefined/standardised JMS and JMSX headers
    > have predefined types. So these can be serialised/deserialised 
accordingly.
    >
    > Custom jms headers agreed could be a bit more difficult but on the 80/20
    > rule I would agree mostly they're string values and as anyhow you can hold
    > bytes as a string it wouldn't cause any issue, defaulting to that.
    >
    > But I think easily we maybe able to do one better.
    >
    > Obviously can override the/config the headers converter but we can supply
    > a default converter could take a config file with key to type mapping?
    >
    > Allowing people to maybe define/declare a header key with the expected
    > type in some property file? To support string, byte[] and primitives? And
    > undefined headers just either default to String or byte[]
    >
    > We could also pre define known headers like the jms ones mentioned above.
    >
    > E.g
    >
    > AwesomeHeader1=boolean
    > AwesomeHeader2=long
    > JMSCorrelationId=String
    > JMSXGroupId=String
    >
    >
    > What you think?
    >
    >
    > Cheers
    > Mike
    >
    >
    >
    >
    >
    >
    > Sent from my iPhone
    >
    > > On 2 May 2017, at 18:45, Ewen Cheslack-Postava <e...@confluent.io>
    > wrote:
    > >
    > > A couple of thoughts:
    > >
    > > First, agreed that we definitely want to expose header functionality.
    > Thank
    > > you Mike for starting the conversation! Even if Connect doesn't do
    > anything
    > > special with it, there's value in being able to access/set headers.
    > >
    > > On motivation -- I think there are much broader use cases. When thinking
    > > about exposing headers, I'd actually use Replicator as only a minor
    > > supporting case. The reason is that it is a very uncommon case where
    > there
    > > is zero impedance mismatch between the source and sink of the data since
    > > they are both Kafka. This means you don't need to think much about data
    > > formats/serialization. I think the JMS use case is a better example 
since
    > > JMS headers and Kafka headers don't quite match up. Here's a quick list
    > of
    > > use cases I can think of off the top of my head:
    > >
    > > 1. Include headers from other systems that support them: JMS (or really
    > any
    > > MQ), HTTP
    > > 2. Other connector-specific headers. For example, from JDBC maybe the
    > table
    > > the data comes from is a header; for a CDC connector you might include
    > the
    > > binlog offset as a header.
    > > 3. Interceptor/SMT-style use cases for annotating things like provenance
    > of
    > > data:
    > > 3a. Generically w/ user-supplied data like data center, host, app ID,
    > etc.
    > > 3b. Kafka Connect framework level info, such as the connector/task
    > > generating the data
    > >
    > > On deviation from Connect's model -- to be honest, the KIP-82 also
    > deviates
    > > quite substantially from how Kafka handles data already, so we may
    > struggle
    > > a bit to rectify the two. (In particular, headers specify some structure
    > > and enforce strings specifically for header keys, but then require you 
to
    > > do serialization of header values yourself...).
    > >
    > > I think the use cases I mentioned above may also need different
    > approaches
    > > to how the data in headers are handled. As Gwen mentions, if we expose
    > the
    > > headers to Connectors, they need to have some idea of the format and the
    > > reason for byte[] values in KIP-82 is to leave that decision up to the
    > > organization using them. But without knowing the format, connectors 
can't
    > > really do anything with them -- if a source connector assumes a format,
    > > they may generate data incompatible with the format used by the rest of
    > the
    > > organization. On the other hand, I have a feeling most people will just
    > use
    > > <String, String> headers, so allowing connectors to embed arbitrarily
    > > complex data may not work out well in practice. Or maybe we leave it
    > > flexible, most people default to using StringConverter for the 
serializer
    > > and Connectors will end up defaulting to that just for compatibility...
    > >
    > > I'm not sure I have a real proposal yet, but I do think understanding 
the
    > > impact of using a Converter for headers would be useful, and we might
    > want
    > > to think about how this KIP would fit in with transformations (or if 
that
    > > is something that can be deferred, handled separately from the existing
    > > transformations, etc).
    > >
    > > -Ewen
    > >
    > > On Mon, May 1, 2017 at 11:52 AM, Michael Pearce <michael.pea...@ig.com>
    > > wrote:
    > >
    > >> Hi Gwen,
    > >>
    > >> Then intent here was to allow tools that perform similar role to mirror
    > >> makers of replicating the messaging from one cluster to another.  Eg
    > like
    > >> mirror make should just be taking and transferring the headers as is.
    > >>
    > >> We don't actually use this inside our company, so not exposing this
    > isn't
    > >> an issue for us. Just believe there are companies like confluent who
    > have
    > >> tools like replicator that do.
    > >>
    > >> And as good citizens think we should complete the work and expose the
    > >> headers same as in the record to at least allow them to replicate the
    > >> messages as is. Note Steph seems to want it.
    > >>
    > >> Cheers
    > >> Mike
    > >>
    > >> Sent using OWA for iPhone
    > >> ________________________________________
    > >> From: Gwen Shapira <g...@confluent.io>
    > >> Sent: Monday, May 1, 2017 2:36:34 PM
    > >> To: dev@kafka.apache.org
    > >> Subject: Re: [DISCUSS] KIP 145 - Expose Record Headers in Kafka Connect
    > >>
    > >> Hi,
    > >>
    > >> I'm excited to see the community expanding Connect in this direction!
    > >> Headers + Transforms == Fun message routing.
    > >>
    > >> I like how clean the proposal is, but I'm concerned that it kinda
    > deviates
    > >> from how Connect handles data elsewhere.
    > >> Unlike Kafka, Connect doesn't look at all data as byte-arrays, we have
    > >> converters that take data in specific formats (JSON, Avro) and turns it
    > >> into Connect data types (defined in the data api). I think it will be
    > more
    > >> consistent for connector developers to also get headers as some kind of
    > >> structured or semi-structured data (and to expand the converters to
    > handle
    > >> header conversions as well).
    > >> This will allow for Connect's separation of concerns - Connector
    > developers
    > >> don't worry about data formats (because they get the internal connect
    > >> objects) and Converters do all the data format work.
    > >>
    > >> Another thing, in my experience, APIs work better if they are put into
    > use
    > >> almost immediately - so difficulties in using the APIs are immediately
    > >> surfaced. Are you planning any connectors that will use this feature
    > (not
    > >> necessarily in Kafka, just in general)? Or perhaps we can think of a
    > way to
    > >> expand Kafka's file connectors so they'll use headers somehow (can't
    > think
    > >> of anything, but maybe?).
    > >>
    > >> Gwen
    > >>
    > >> On Sat, Apr 29, 2017 at 12:12 AM, Michael Pearce <michael.pea...@ig.com
    > >
    > >> wrote:
    > >>
    > >>> Hi All,
    > >>>
    > >>> Now KIP-82 is committed I would like to discuss extending the work to
    > >>> expose it in Kafka Connect, its primary focus being so connectors that
    > >> may
    > >>> do similar tasks as MirrorMakers, either Kafka->Kafka or JMS-Kafka
    > would
    > >> be
    > >>> able to replicate the headers.
    > >>> It would be ideal but not mandatory for this to go in 0.11 release so
    > is
    > >>> available on day one of headers being available.
    > >>>
    > >>> Please find the KIP here:
    > >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
    > >>> 145+-+Expose+Record+Headers+in+Kafka+Connect
    > >>>
    > >>> Please find an initial implementation as a PR here:
    > >>> https://github.com/apache/kafka/pull/2942
    > >>>
    > >>> Kind Regards
    > >>> Mike
    > >>> The information contained in this email is strictly confidential and
    > for
    > >>> the use of the addressee only, unless otherwise indicated. If you are
    > not
    > >>> the intended recipient, please do not read, copy, use or disclose to
    > >> others
    > >>> this message or any attachment. Please also notify the sender by
    > replying
    > >>> to this email or by telephone (+44(020 7896 0011) and then delete the
    > >> email
    > >>> and any copies of it. Opinions, conclusion (etc) that do not relate to
    > >> the
    > >>> official business of this company shall be understood as neither given
    > >> nor
    > >>> endorsed by it. IG is a trading name of IG Markets Limited (a company
    > >>> registered in England and Wales, company number 04008957) and IG Index
    > >>> Limited (a company registered in England and Wales, company number
    > >>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
    > >>> London EC4R 2YA. Both IG Markets Limited (register number 195355) and
    > IG
    > >>> Index Limited (register number 114059) are authorised and regulated by
    > >> the
    > >>> Financial Conduct Authority.
    > >>>
    > >>
    > >>
    > >>
    > >> --
    > >> *Gwen Shapira*
    > >> Product Manager | Confluent
    > >> 650.450.2760 | @gwenshap
    > >> Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
    > >> <http://www.confluent.io/blog>
    > >> The information contained in this email is strictly confidential and 
for
    > >> the use of the addressee only, unless otherwise indicated. If you are
    > not
    > >> the intended recipient, please do not read, copy, use or disclose to
    > others
    > >> this message or any attachment. Please also notify the sender by
    > replying
    > >> to this email or by telephone (+44(020 7896 0011) and then delete the
    > email
    > >> and any copies of it. Opinions, conclusion (etc) that do not relate to
    > the
    > >> official business of this company shall be understood as neither given
    > nor
    > >> endorsed by it. IG is a trading name of IG Markets Limited (a company
    > >> registered in England and Wales, company number 04008957) and IG Index
    > >> Limited (a company registered in England and Wales, company number
    > >> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
    > >> London EC4R 2YA. Both IG Markets Limited (register number 195355) and 
IG
    > >> Index Limited (register number 114059) are authorised and regulated by
    > the
    > >> Financial Conduct Authority.
    > >>
    >


The information contained in this email is strictly confidential and for the 
use of the addressee only, unless otherwise indicated. If you are not the 
intended recipient, please do not read, copy, use or disclose to others this 
message or any attachment. Please also notify the sender by replying to this 
email or by telephone (+44(020 7896 0011) and then delete the email and any 
copies of it. Opinions, conclusion (etc) that do not relate to the official 
business of this company shall be understood as neither given nor endorsed by 
it. IG is a trading name of IG Markets Limited (a company registered in England 
and Wales, company number 04008957) and IG Index Limited (a company registered 
in England and Wales, company number 01190902). Registered address at Cannon 
Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited 
(register number 195355) and IG Index Limited (register number 114059) are 
authorised and regulated by the Financial Conduct Authority.

Reply via email to