here's yet another use case of an organization using kafka in need of
headers - https://issues.apache.org/jira/browse/AVRO-1704
they appear to be a 100% avro shop so they can make do with changes within
avro (who in turn didnt mind defining an avro wire format with header
support)

On Wed, Oct 12, 2016 at 1:22 AM, Michael Pearce <michael.pea...@ig.com>
wrote:

> @Jay and Dana
>
> We have internally had a few discussions of how we may address this if we
> had a common apache kafka message wrapper for headers that can be used
> client side only to, and address the compaction issue.
> I have detailed this solution separately and linked from the main KIP-82
> wiki.
>
> Here’s a direct link –
> https://cwiki.apache.org/confluence/display/KAFKA/
> Headers+Value+Message+Wrapper
>
> We feel this solution though doesn’t manage to address all the use cases
> being mentioned still and also has some compatibility drawbacks e.g.
> backwards forwards compatibility especially on different language clients
> Also we still require with this solution, as still need to address
> compaction issue / tombstones, we need to make server side changes and as
> many message/record version changes.
>
> We believe the proposed solution in KIP-82 does address all these needs
> and is cleaner still, and more benefits.
> Please have a read, and comment. Also if you have any improvements on the
> proposed KIP-82 or an alternative solution/option your input is appreciated.
>
> @All
> As Joel has mentioned to get this moving along, and able to discuss more
> fluidly, it would be great if we can organize to meet up virtually online
> e.g. webex or something.
> I am aware, that the majority are based in America, myself is in the UK.
> @Kostya I assume you’re in Eastern Europe or Russia based on your email
> address (please correct this assumption), I hope the time difference isn’t
> too much that the below would suit you if you wish to join
>
> Can I propose next Wednesday 19th October at 18:30 BST , 10:30 PST, 20:30
> MSK we try meetup online?
>
> Would this date/time suit the majority?
> Also what is the preferred method? I can host via Adobe Connect style
> webex (which my company uses) but it isn’t the best IMHO, so more than
> happy to have someone suggest a better alternative.
>
> Best,
> Mike
>
>
>
>
> On 10/8/16, 7:26 AM, "Michael Pearce" <michael.pea...@ig.com> wrote:
>
>     >> I agree with the critique of compaction not having a value. I think
> we should consider fixing that directly.
>
>     > Agree that the compaction issue is troubling: compacted "null"
> deletes
>     are incompatible w/ headers that must be packed into the message
>     value. Are there any alternatives on compaction delete semantics that
>     could address this? The KIP wiki discussion I think mostly assumes
>     that compaction-delete is what it is and can't be changed/fixed.
>
>     This KIP is about dealing with quite a few use cases and issues,
> please see both the KIP use cases detailed by myself and also the
> additional use cases wiki added by LinkedIn linked from the main KIP.
>
>     The compaction is something that happily is addressed with headers,
> but most defiantly isn't the sole reason or use case for them, headers
> solves many issues and use cases. Thus their elegance and simplicity, and
> why they're so common in transport mechanisms and so succesfull, as stated
> like http, tcp, jms.
>
>     ________________________________________
>     From: Dana Powers <dana.pow...@gmail.com>
>     Sent: Friday, October 7, 2016 11:09 PM
>     To: dev@kafka.apache.org
>     Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>
>     > I agree with the critique of compaction not having a value. I think
> we should consider fixing that directly.
>
>     Agree that the compaction issue is troubling: compacted "null" deletes
>     are incompatible w/ headers that must be packed into the message
>     value. Are there any alternatives on compaction delete semantics that
>     could address this? The KIP wiki discussion I think mostly assumes
>     that compaction-delete is what it is and can't be changed/fixed.
>
>     -Dana
>
>     On Fri, Oct 7, 2016 at 1:38 PM, Michael Pearce <michael.pea...@ig.com>
> wrote:
>     >
>     > Hi Jay,
>     >
>     > Thanks for the comments and feedback.
>     >
>     > I think its quite clear that if a problem keeps arising then it is
> clear that it needs resolving, and addressing properly.
>     >
>     > Fair enough at linkedIn, and historically for the very first use
> cases addressing this maybe not have been a big priority. But as Kafka is
> now Apache open source and being picked up by many including my company, it
> is clear and evident that this is a requirement and issue that needs to be
> now addressed to address these needs.
>     >
>     > The fact in almost every transport mechanism including networking
> layers in the enterprise ive worked in, there has always been headers i
> think clearly shows their need and success for a transport mechanism.
>     >
>     > I understand some concerns with regards to impact for others not
> needing it.
>     >
>     > What we are proposing is flexible solution that provides no overhead
> on storage or network traffic layers if you chose not to use headers, but
> does enable those who need or want it to use it.
>     >
>     >
>     > On your response to 1), there is nothing saying that it should be
> put in any faster or without diligence and the same KIP process can still
> apply for adding kafka-scope headers, having headers, just makes it easier
> to add, without constant message and record changes. Timestamp is a clear
> real example of actually what should be in a header (along with other
> fields) but as such the whole message/record object needed to be changed to
> add this, as will any further headers deemed needed by kafka.
>     >
>     > On response to 2) why within my company as a platforms designer
> should i enforce that all teams use the same serialization for their
> payloads? But what i do need is some core cross cutting concerns and
> information addressed at my platform level and i don't want to impose onto
> my development teams. This is the same argument why byte[] is the exposed
> value and key because as a messaging platform you dont want to impose that
> on my company.
>     >
>     > On response to 3) Actually this isnt true, there are many 3rd party
> tools, we need to hook into our messaging flows that they only build onto
> standardised interfaces as obviously the cost to have a custom
> implementation for every company would be very high.
>     > APM tooling is a clear case in point, every enterprise level APM
> tool on the market is able to stitch in transaction flow end 2 end over a
> platform over http, jms because they can stitch in some "magic" data in a
> uniform/standardised for the two mentioned they stitch this into the
> headers. It is current form they cannot do this with Kafka. Providing a
> standardised interface will i believe actually benefit the project as
> commercial companies like these will now be able to plugin their tooling
> uniformly, making it attractive and possible.
>     >
>     > Some of you other concerns as Joel mentions these are more
> implementation details, that i think should be agreed upon, but i think can
> be addressed.
>     >
>     > e.g. re your concern on the hashmap.
>     > it is more than possible not to have every record have to have a
> hashmap unless it actually has a header (just like we have managed to do on
> the serialized meesage) so if theres a concern on the in memory record size
> for those using kafka without headers.
>     >
>     > On your second to last comment about every team choosing their own
> format, actually we do want this a little, as very first mentioned, no we
> don't want a free for all, but some freedom to have different serialization
> has different benefits and draw backs across our business. I can iterate
> these if needed. One of the use case for headers provided by linkedIn on
> top of my KIP even shows where headers could be beneficial here as a header
> could be used to detail which data format the message is serialized to
> allowing me to consume different formats.
>     >
>     > Also we have some systems that we need to integrate that pretty near
> impossible to wrap or touch their binary payloads, or we’re not allowed to
> touch them (historic system, or inter/intra corporate)
>     >
>     > Headers really gives as a solution to provide a pluggable platform,
> and standardisation that allows users to build platforms that adapt to
> their needs.
>     >
>     >
>     > Cheers
>     > Mike
>     >
>     >
>     > ________________________________________
>     > From: Jay Kreps <j...@confluent.io>
>     > Sent: Friday, October 7, 2016 4:45 PM
>     > To: dev@kafka.apache.org
>     > Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>     >
>     > Hey guys,
>     >
>     > This discussion has come up a number of times and we've always
> passed.
>     >
>     > One of things that has helped keep Kafka simple is not adding in new
>     > abstractions and concepts except when the proposal is really elegant
> and
>     > makes things simpler.
>     >
>     > Consider three use cases for headers:
>     >
>     >    1. Kafka-scope: We want to add a feature to Kafka that needs a
>     >    particular field.
>     >    2. Company-scope: You want to add a header to be shared by
> everyone in
>     >    your company.
>     >    3. World-wide scope: You are building a third party tool and want
> to add
>     >    some kind of header.
>     >
>     > For the case of (1) you should not use headers, you should just add
> a field
>     > to the record format. Having a second way of encoding things doesn't
> make
>     > sense. Occasionally people have complained that adding to the record
> format
>     > is hard and it would be nice to just shove lots of things in
> quickly. I
>     > think a better solution would be to make it easy to add to the record
>     > format, and I think we've made progress on that. I also think we
> should be
>     > insanely focused on the simplicity of the abstraction and not adding
> in new
>     > thingies often---we thought about time for years before adding a
> timestamp
>     > and I guarantee you we would have goofed it up if we'd gone with the
>     > earlier proposals. These things end up being long term commitments
> so it's
>     > really worth being thoughtful.
>     >
>     > For case (2) just use the body of the message. You don't need a
> globally
>     > agreed on definition of headers, just standardize on a header you
> want to
>     > include in the value in your company. Since this is just used by
> code in
>     > your company having a more standard header format doesn't really
> help you.
>     > In fact by using something like Avro you can define exactly the
> types you
>     > want, the required header fields, etc.
>     >
>     > The only case that headers help is (3). This is a bit of a niche
> case and i
>     > think is easily solved just making the reading and writing of given
>     > required fields pluggable to work with the header you have.
>     >
>     > A couple of specific problems with this proposal:
>     >
>     >    1. A global registry of numeric keys is super super ugly. This
> seems
>     >    silly compared to the Avro (or whatever) header solution which
> gives more
>     >    compact encoding, rich types, etc.
>     >    2. Using byte arrays for header values means they aren't really
>     >    interoperable for case (3). E.g. I can't make a UI that displays
> headers,
>     >    or allow you to set them in config. To work with third party
> headers, the
>     >    only case I think this really helps, you need the union of all
>     >    serialization schemes people have used for any tool.
>     >    3. For case (2) and (3) your key numbers are going to collide like
>     >    crazy. I don't think a global registry of magic numbers
> maintained either
>     >    by word of mouth or checking in changes to kafka source is the
> right thing
>     >    to do.
>     >    4. We are introducing a new serialization primitive which makes
> fields
>     >    disappear conditional on the contents of other fields. This
> breaks the
>     >    whole serialization/schema system we have today.
>     >    5. We're adding a hashmap to each record
>     >    6. This proposes making the ProducerRecord and ConsumerRecord
> mutable
>     >    and adding setters and getters (which we try to avoid).
>     >
>     > For context on LinkedIn: I set up the system there, but it may have
> changed
>     > since i left. The header is maintained with the record schemas in
> the avro
>     > schema registry and is required for all records. Essentially all
> messages
>     > must have a field named "header" of type EventHeader which is itself
> a
>     > record schema with a handful of fields (time, host, etc). The header
>     > follows the same compatibility rules as other avro fields, so it can
> be
>     > evolved in a compatible way gradually across apps. Avro is typed and
>     > doesn't require deserializing the full record to read the header. The
>     > header information is (timestamp, host, etc) is important and needs
> to
>     > propagate into other systems like Hadoop which don't have a concept
> of
>     > headers for records, so I doubt it could move out of the value in
> any case.
>     > Not allowing teams to chose a data format other than avro was
> considered a
>     > feature, not a bug, since the whole point was to be able to share
> data,
>     > which doesn't work if every team chooses their own format.
>     >
>     > I agree with the critique of compaction not having a value. I think
> we
>     > should consider fixing that directly.
>     >
>     > -Jay
>     >
>     > On Thu, Sep 22, 2016 at 12:31 PM, Michael Pearce <
> michael.pea...@ig.com>
>     > wrote:
>     >
>     >> Hi All,
>     >>
>     >>
>     >> I would like to discuss the following KIP proposal:
>     >>
>     >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>     >> 82+-+Add+Record+Headers
>     >>
>     >>
>     >>
>     >> I have some initial ?drafts of roughly the changes that would be
> needed.
>     >> This is no where finalized and look forward to the discussion
> especially as
>     >> some bits I'm personally in two minds about.
>     >>
>     >> https://github.com/michaelandrepearce/kafka/tree/
> kafka-headers-properties
>     >>
>     >>
>     >>
>     >> Here is a link to a alternative option mentioned in the kip but one
> i
>     >> would personally would discard (disadvantages mentioned in kip)
>     >>
>     >> https://github.com/michaelandrepearce/kafka/tree/kafka-headers-full
> ?
>     >>
>     >>
>     >> Thanks
>     >>
>     >> Mike
>     >>
>     >>
>     >>
>     >>
>     >>
>     >> The information contained in this email is strictly confidential
> and for
>     >> the use of the addressee only, unless otherwise indicated. If you
> are not
>     >> the intended recipient, please do not read, copy, use or disclose
> to others
>     >> this message or any attachment. Please also notify the sender by
> replying
>     >> to this email or by telephone (+44(020 7896 0011) and then delete
> the email
>     >> and any copies of it. Opinions, conclusion (etc) that do not relate
> to the
>     >> official business of this company shall be understood as neither
> given nor
>     >> endorsed by it. IG is a trading name of IG Markets Limited (a
> company
>     >> registered in England and Wales, company number 04008957) and IG
> Index
>     >> Limited (a company registered in England and Wales, company number
>     >> 01190902). Registered address at Cannon Bridge House, 25 Dowgate
> Hill,
>     >> London EC4R 2YA. Both IG Markets Limited (register number 195355)
> and IG
>     >> Index Limited (register number 114059) are authorised and regulated
> by the
>     >> Financial Conduct Authority.
>     >>
>     > The information contained in this email is strictly confidential and
> for the use of the addressee only, unless otherwise indicated. If you are
> not the intended recipient, please do not read, copy, use or disclose to
> others this message or any attachment. Please also notify the sender by
> replying to this email or by telephone (+44(020 7896 0011) and then delete
> the email and any copies of it. Opinions, conclusion (etc) that do not
> relate to the official business of this company shall be understood as
> neither given nor endorsed by it. IG is a trading name of IG Markets
> Limited (a company registered in England and Wales, company number
> 04008957) and IG Index Limited (a company registered in England and Wales,
> company number 01190902). Registered address at Cannon Bridge House, 25
> Dowgate Hill, London EC4R 2YA. Both IG Markets Limited (register number
> 195355) and IG Index Limited (register number 114059) are authorised and
> regulated by the Financial Conduct Authority.
>
>

Reply via email to