Re: [DISCUSS] KIP-82 - Add Record Headers

Mayuresh Gharat Thu, 06 Oct 2016 15:31:57 -0700

+1 Nacho, Radai.

Ordering the Keys would help if we were gonna look at the headers linearly
but given the disadvantage that the client implementations have to know the
order of headers in order that the reading system in the pipeline doesn't
break, unordered list sounds better.


Thanks,

Mayuresh


On Thu, Oct 6, 2016 at 2:46 PM, Nacho Solis <[email protected]>
wrote:

> I'm also
>
> 1. no  (ordered keys)
> 2. yes (propose key space)
>
>
> 1. I don't think there is going to be much savings in ordering the keys.
> I'm assuming some parsing will happen either way. Ordering the keys would
> be useful if we were doing linear search on the headers, and even then, the
> performance difference would be small for any reasonable number of headers
> (even anything that fits in 1 meg).
>
> However, I think that it's likely that whoever is looking at the headers is
> going to want to search for plugins for any header in existence, as such,
> it's going to have to iterate over the whole header set. So, for every
> header, look up the plugin, not the other way around.  Even if we did it
> the other way around (for every plugin, search if there is a header
> present) we would expect only to have an algorithm that is O(n) and only
> iterate once over the list. We wouldn't need to iterate more than once.
>
> Given this, the code overhead of ordering the headers when something is
> inserted and such is a bigger pain than dealing with a potentially
> unordered list.
>
>
> 2. I like structure and to reduce the play space for potential keys.  This
> will allow us to do filter and know when we're testing. At the same time,
> we're reserving a lot of space for future usages. However, if there is no
> agreement on this I don't think it would be a blocker.  I just want to make
> sure we have some order and if possible contiguous ranges for similar
> usages.
>
> Nacho
>
>
> On Thu, Oct 6, 2016 at 2:26 PM, radai <[email protected]> wrote:
>
> > 1. tending towards no, but I dont have any strong opinions on header
> > ordering. it offers a potential speedup for header lookup in a serialized
> > blob (in wire format) but that goes away if the headers are fully
> > serialized/deserialized always. on the downside its an implementation
> > detail that 3rd party impls would need to worry about, and would be hard
> to
> > diagnose if they fail to. its also less friendly to high performance io
> > (think about appending headers to an existing blob in pass-through
> > components like mirror-maker vs write to the middle) - its still possible
> > though. however, the kafka code base is far from being iovec friendly
> > anyway.
> >
> > 2. yes.
> >
> >
> >
> >
> >
> > On Thu, Oct 6, 2016 at 8:58 AM, K Burstev <[email protected]> wrote:
> >
> > > @Mayuresh
> > >
> > > Yes exactly, it is a real nasty race issue.
> > >
> > > This is why I look forward to being able to trash our custom workaround
> > :)
> > >
> > > Kostya
> > >
> > >
> > > 06.10.2016, 02:36, "Mayuresh Gharat" <[email protected]>:
> > > > @Kostya
> > > >
> > > > Regarding "To get around this we have an awful *cough* solution
> whereby
> > > we
> > > > have to send our message wrapper with the headers and null content,
> and
> > > > then we have an application that has to consume from all the
> compacted
> > > > topics and when it sees this message it produces back in a null
> payload
> > > > record to make the broker compact it out."
> > > >
> > > >  ---> This has a race condition, right?
> > > >
> > > > Suppose the producer produces a message with headers and null content
> > at
> > > > time To to Kafka.
> > > >
> > > > Then the producer, at time To + 1, sends another message with headers
> > and
> > > > actual content to Kafka.
> > > >
> > > > What we expect is that the application that is consuming and then
> > > producing
> > > > same message with null payload should happen at time To + 0.5, so
> that
> > > the
> > > > message at To + 1 is not deleted.
> > > >
> > > > But there is no guarantee here.
> > > >
> > > > If the null payload goes in to Kafka at time To + 2, then essentially
> > you
> > > > loose the second message produced by the producer at time To + 1.
> > > >
> > > > Thanks,
> > > >
> > > > Mayuresh
> > > >
> > > > On Wed, Oct 5, 2016 at 6:13 PM, Joel Koshy <[email protected]>
> > wrote:
> > > >
> > > >>  @Nacho
> > > >>
> > > >>  > > - Brokers can't see the headers (part of the "V" black box)>
> > > >>  >
> > > >>
> > > >>  > (Also, it would be nice if we had a way to access the headers
> from
> > > the
> > > >>  > > brokers, something that is not trivial at this time with the
> > > current
> > > >>  > broker
> > > >>  > > architecture).
> > > >>  >
> > > >>  >
> > > >>
> > > >>  I think this can be addressed with broker interceptors which we
> > > touched on
> > > >>  in KIP-42
> > > >>  <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > >>  42%3A+Add+Producer+and+Consumer+Interceptors>
> > > >>  .
> > > >>
> > > >>  @Gwen
> > > >>
> > > >>  You are right that the wrapper thingy “works”, but there are some
> > > drawbacks
> > > >>  that Nacho and Radai have covered in detail that I can add a few
> more
> > > >>  comments to.
> > > >>
> > > >>  At LinkedIn, we *get by* without the proposed Kafka record headers
> by
> > > >>  dumping such metadata in one or two places:
> > > >>
> > > >>     - Most of our applications use Avro, so for the most part we can
> > > use an
> > > >>     explicit header field in the Avro schema. Topic owners are
> > supposed
> > > to
> > > >>     include this header in their schemas.
> > > >>     - A prefix to the payload that primarily contains the schema’s
> ID
> > > so we
> > > >>     can deserialize the Avro. (We could use this for other use-cases
> > as
> > > >>  well -
> > > >>     i.e., move some of the above into this prefix blob.)
> > > >>
> > > >>  Dumping headers in the Avro schema pollutes the application’s data
> > > model
> > > >>  with data/service-infra-related fields that are unrelated to the
> > > underlying
> > > >>  topic; and forces the application to deserialize the entire blob
> > > whether or
> > > >>  not the headers are actually used. Conversely from an
> infrastructure
> > > >>  perspective, we would really like to not touch any application
> data.
> > > Our
> > > >>  infiltration of the application’s schema is a major reason why many
> > at
> > > >>  LinkedIn sometimes assume that we (Kafka folks) are the shepherds
> for
> > > all
> > > >>  things Avro :)
> > > >>
> > > >>  Another drawback is that all this only works if everyone in the
> > > >>  organization is a good citizen and includes the header; and uses
> our
> > > >>  wrapper libraries - which is a good practice IMO - but may not
> always
> > > be
> > > >>  easy for open source projects that wish to directly use the Apache
> > > >>  producer/client. If instead we allow these headers to be inserted
> via
> > > >>  suitable interceptors outside the application payloads it would
> > remove
> > > such
> > > >>  issues of separation in the data model and choice of clients.
> > > >>
> > > >>  Radai has enumerated a number of use-cases
> > > >>  <https://cwiki.apache.org/confluence/display/KAFKA/A+
> > > >>  Case+for+Kafka+Headers>
> > > >>  and
> > > >>  I’m sure the broader community will have a lot more to add. The
> > > feature as
> > > >>  such would enable an ecosystem of plugins from different vendors
> that
> > > users
> > > >>  can mix and match in their data pipelines without requiring any
> > > specific
> > > >>  payload formats or client libraries.
> > > >>
> > > >>  Thanks,
> > > >>
> > > >>  Joel
> > > >>
> > > >>  > >
> > > >>  > >
> > > >>  > > On Wed, Oct 5, 2016 at 2:20 PM, Gwen Shapira <
> [email protected]>
> > > >>  wrote:
> > > >>  > >
> > > >>  > > > Since LinkedIn has some kind of wrapper thingy that adds the
> > > headers,
> > > >>  > > > where they could have added them to Apache Kafka - I'm very
> > > curious
> > > >>  to
> > > >>  > > > hear what drove that decision and the pros/cons of managing
> the
> > > >>  > > > headers outside Kafka itself.
> > > >>  > > >
> > > >>  >
> > > >
> > > > --
> > > > -Regards,
> > > > Mayuresh R. Gharat
> > > > (862) 250-7125
> > >
> >
>
>
>
> --
> Nacho (Ignacio) Solis
> Kafka
> [email protected]
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Re: [DISCUSS] KIP-82 - Add Record Headers

Reply via email to