Re: [DISCUSS] KIP-82 - Add Record Headers

K Burstev Sun, 09 Oct 2016 12:26:51 -0700

I think here we have a situation like SQS in Amazon had, originally not 
supporting headers but as time and requirements increased the need becomes 
evident and finally headers or by which ever name it goes by are introduced got 
added back in 2014.
 
A blog from way back when SQS first added support for "headers" attributes 
including some further very basic use cases for why they decided to add them.
https://aws.amazon.com/blogs/aws/simple-queue-service-message-attributes/
 
I am sure they "passed" before also on adding it, but as a use cases and the 
product matures, it is inevitable they would be added and they did. I think 
Kafka is now at this stage.
 
The fact we have these wrapper work arounds is expensive and not solving our 
problems
 
* every single company re-implementing essentially the wheel to be able to send 
message meta data
* due to no common interface there cannot evolve an eco-system of 
plugins/interceptors to use them (again everyone's is custom but no doubt doing 
the same thing)
* cannot convince 3rd party commercial vendors to invest into adding support, 
as they don't want to write code against custom code written by my company as 
they get no re-use.
* work arounds cause production issues (compaction is just one noted point)


Headers really are a simple, elegant and common solution in my view and are 
addressing all of my above problems and reading the KIP many more needs and use 
cases.

It is too easy sometimes to simply say no without providing an alternative, or 
dismiss peoples real use cases. At the moment I don't see any sensible 
alternative proposition or commitment.
 
Here we have someone/a company addressing a real common need, willing to 
implement the solution it seems fairly advanced in the design also which simply 
needs the finer details discussed, I'll be honest haven't fully reviewed the 
sample code but so far it seems not very invasive, and could be in the next 
release.
 
As such this is why I am +1 for the KIP.
 
As for detail of the discussion about the actual implementation details.
 
For our headers in Kafka maybe everyone could read:
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/SQSMessageAttributes.html
 
I quite like that a type for the value is passed along with the key and value, 
which means you don't need to know the type of the value ahead of time when 
consuming the header. Im not saying we have to have them, but I think it is 
worth a thought.
 
Kostya


08.10.2016, 00:37, "Nacho Solis" <nso...@linkedin.com.invalid>:
> On Fri, Oct 7, 2016 at 8:45 AM, Jay Kreps <j...@confluent.io> wrote:
>
>>  This discussion has come up a number of times and we've always passed.
>
> Hopefully this time the arguments will be convincing enough that Kafka can
> decide to do something about it.
> 
>
>>  One of things that has helped keep Kafka simple is not adding in new
>>  abstractions and concepts except when the proposal is really elegant and
>>  makes things simpler.
>
> I completely agree that we want things to be simple and elegant. This is
> exactly what headers provide.
>
> Headers are a clean way to extend the system without sacrificing
> performance or elegance. The are modular and backwards compatible.
>
> 
>
>>  Consider three use cases for headers:
>>
>>  
>>   1. Kafka-scope: We want to add a feature to Kafka that needs a
>>     particular field.
>
> This is a _great_ use case for Kafka headers. Having headers means that
> you can have features that are optional. Features that are slowly deployed
> without needing to move everybody from one protocol version to another
> protocol version. All clients don't have to change and all brokers don't
> have to change.
>
> Without headers you need to parse the messages differently. With headers
> you use the same parser.
> I assume I don't need to get into how this makes the system extensible
> without requiring others to use the same extensions you have.
>
> 
>
>>     2. Company-scope: You want to add a header to be shared by everyone in
>>     your company.
>
> It is completely true that for client-side things you don't need a
> architectural header system. You could just write a wrapper and
> encapsulate every message you send. You could achieve end-to-end. Even if
> this end-to-end exists, Kafka currently offers no way to identify the type
> of a message (which I wish we could change), so we have to rely on some
> magic number to identify the type. Once we have that we can have a header
> system.
>
> Avro is useful for encoding schema based systems, but it's not as useful
> for modularity and it's not universal. We have a number of use cases that
> don't use avro (and don't want to). They want to send binary data, but from
> an org perspective still need some structure to be added for accounting,
> tracing, auditing, security, etc. There is some of this data that would
> also be useful at the broker side. This is somewhat problematic at this
> point (say, using a client side wrapper).
>
>>     3. World-wide scope: You are building a third party tool and want to add
>>     some kind of header.
>
> I understand that you see 3 as a niche case, trying to build a third party
> tool. For us this is being a good community citizen. Let's say that we
> have a plugin for large-message support. If we wanted to make that
> available to the community (as good citizens would), we could make our
> header module open source and others could re-use it. Why have to
> re-implement something? The same is true if some company decided to write
> a geo-location header and we wanted to use it for some mobile product. At
> this point, it seems that at least a few organizations are looking for a
> plugin system and it's likely that they'll have similar requirements. For
> example. it's possible many IoT companies would need similar features, or
> maybe the self-driving cars need similar features, etc. Something that
> would benefit a community at large even if it didn't benefit all users. So
> maybe LinkedIn wouldn't care about the self-driving car style features but
> we could care about the security features being worked on at BBVA.
>
>>   1. A global registry of numeric keys is super super ugly. This seems
>>     silly compared to the Avro (or whatever) header solution which gives
>>  more
>>     compact encoding, rich types, etc.
>
> This seems like a perfectly reasonable thing to discuss. I'm in favor of
> this. Avro is problematic for this, it implies you know the schema in
> advance. You can't easily compose things. The richness of the types is a
> matter of serialization so this would be a mute point. If you really wanted
> avro, you could encode an avro object inside one of the headers and the
> total overhead would be small.
>
> Numeric ints as keys are used by many network protocols as an efficient way
> to define the type of data carried. They have proven themselves.
>
> As for keeping a registry, this is a simple thing. We already keep multiple
> "registries", the Kafka ApiKeys and Error Codes are things we already
> maintain. Not to mention the "registries" of all the config variables.
>
> 
>
>>     2. Using byte arrays for header values means they aren't really
>>     interoperable for case (3). E.g. I can't make a UI that displays
>>  headers,
>>     or allow you to set them in config. To work with third party headers,
>>  the
>>     only case I think this really helps, you need the union of all
>>     serialization schemes people have used for any tool.
>
> Byte arrays are serialized by the plugin in question. If you don't have
> that plugin (or the code to handle that specific header) then you won't
> know what the data is. The same is true for deserializing a Key or a Value
> from a message.
> Having said that, there are TLV (which is what the proposed headers are)
> visualizers. The major network dump visualizers support them (that is
> tcpdump and wireshark).
>
>    3. For case (2) and (3) your key numbers are going to collide like
>>     crazy. I don't think a global registry of magic numbers maintained
>>  either
>>     by word of mouth or checking in changes to kafka source is the right
>>  thing
>>     to do.
>
> With the current proposal for numbering there is no collision. (2) and (3)
> have different key spaces. However, it's true that there is coordination
> needed if you're going to pull code straight off the web and use it without
> configuring. Even in that case, you could rely on hashing as a starting
> point. This is perfectly workable.
>
>>     4. We are introducing a new serialization primitive which makes fields
>>     disappear conditional on the contents of other fields. This breaks the
>>     whole serialization/schema system we have today.
>
> I'm not sure I understand the comment here. Can you elaborate?
> 
>
>>     5. We're adding a hashmap to each record
>
> Are you talking about the wire representation or the programmatic
> representation? On the wire you're just adding some fields, just like you
> have a "key" field. For the programmatic representation it's true that you
> would have a headers system that looks like a map (though I'm not sure that
> it's a hash map specifically). This should be no problem. If you think
> this is too much overhead (which I assume is what your concern is) then you
> don't have to use them. There will be no performance penalty.
>
>>     6. This proposes making the ProducerRecord and ConsumerRecord mutable
>>     and adding setters and getters (which we try to avoid).
>
> I'm not sure what you mean by "mutable". I'm going to assume you mean
> that the class has fields that can be changed. This is a matter of
> deciding how you deal with this from an API perspective. You will need a
> way to add headers to a Record, but there are ways to do this at the time
> of constructing it or it can be done in a parallel or wrapper class. We
> can discuss the details.
> 
>
>>  For context on LinkedIn: I set up the system there, but it may have changed
>>  since i left. The header is maintained with the record schemas in the avro
>>  schema registry and is required for all records. Essentially all messages
>>  must have a field named "header" of type EventHeader which is itself a
>>  record schema with a handful of fields (time, host, etc). The header
>>  follows the same compatibility rules as other avro fields, so it can be
>>  evolved in a compatible way gradually across apps. Avro is typed and
>>  doesn't require deserializing the full record to read the header. The
>>  header information is (timestamp, host, etc) is important and needs to
>>  propagate into other systems like Hadoop which don't have a concept of
>>  headers for records, so I doubt it could move out of the value in any case.
>>  Not allowing teams to chose a data format other than avro was considered a
>>  feature, not a bug, since the whole point was to be able to share data,
>>  which doesn't work if every team chooses their own format.
>
> We do have a few cases that do not use avro and would like to keep it that
> way.
>
> What is the current way (or the best way if there are multiple) to enforce
> messages to a topic are avro (or for that matter, any type)?
>
> If you were still here maybe you would also be in favor of headers now :-).
>
> Nacho
>
>>  On Thu, Sep 22, 2016 at 12:31 PM, Michael Pearce <michael.pea...@ig.com>
>>  wrote:
>>
>>  > Hi All,
>>  >
>>  >
>>  > I would like to discuss the following KIP proposal:
>>  >
>>  > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>  > 82+-+Add+Record+Headers
>>  >
>>  >
>>  >
>>  > I have some initial ?drafts of roughly the changes that would be needed.
>>  > This is no where finalized and look forward to the discussion especially
>>  as
>>  > some bits I'm personally in two minds about.
>>  >
>>  > https://github.com/michaelandrepearce/kafka/tree/
>>  kafka-headers-properties
>>  >
>>  >
>>  >
>>  > Here is a link to a alternative option mentioned in the kip but one i
>>  > would personally would discard (disadvantages mentioned in kip)
>>  >
>>  > https://github.com/michaelandrepearce/kafka/tree/kafka-headers-full?
>>  >
>>  >
>>  > Thanks
>>  >
>>  > Mike
>>  >
>>  >
>>  >
>>  >
>>  >
>>  > The information contained in this email is strictly confidential and for
>>  > the use of the addressee only, unless otherwise indicated. If you are not
>>  > the intended recipient, please do not read, copy, use or disclose to
>>  others
>>  > this message or any attachment. Please also notify the sender by replying
>>  > to this email or by telephone (+44(020 7896 0011) and then delete the
>>  email
>>  > and any copies of it. Opinions, conclusion (etc) that do not relate to
>>  the
>>  > official business of this company shall be understood as neither given
>>  nor
>>  > endorsed by it. IG is a trading name of IG Markets Limited (a company
>>  > registered in England and Wales, company number 04008957) and IG Index
>>  > Limited (a company registered in England and Wales, company number
>>  > 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
>>  > London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
>>  > Index Limited (register number 114059) are authorised and regulated by
>>  the
>>  > Financial Conduct Authority.
>>  >
>
> --
> Nacho (Ignacio) Solis
> Kafka
> nso...@linkedin.com

Re: [DISCUSS] KIP-82 - Add Record Headers

Reply via email to