Re: KIP-244: Add Record Header support to Kafka Streams

Guozhang Wang Fri, 11 May 2018 09:27:02 -0700

That's a good question. I think we can manage this in KIP-159. I will go
ahead and try to augment that KIP together with the original author Jeyhun.



Guozhang

On Fri, May 11, 2018 at 12:45 AM, Jorge Esteban Quilcate Otoya <
[email protected]> wrote:

> Thanks Guozhang and Matthias! I do also agree with this way of handling
> headers inheritance. I will add them to the KIP doc.
>
> > We can discuss about extending the current protocol and how to enable
> users
> > override those rule, and how to expose them in the DSL layer in a future
> > KIP.
>
> About this, should this be managed on KIP-159 or a new one?
>
> El jue., 10 may. 2018 a las 17:46, Matthias J. Sax (<[email protected]
> >)
> escribió:
>
> > Thanks Guozhang! Sounds good to me!
> >
> > -Matthias
> >
> > On 5/10/18 7:55 AM, Guozhang Wang wrote:
> > > Thanks for your thoughts Matthias. I think if we do want to bring
> KIP-244
> > > into 2.0 then we need to keep its scope small and well defined. For
> that
> > > I'm proposing:
> > >
> > > 1. Make the inheritance implementation of headers consistent with what
> we
> > > had with other record context fields. I.e. pass through the record
> > context
> > > in `context.forward()`. Note that within a processor node, users can
> > > already manipulate the Headers with the given APIs, so at the time of
> > > forwarding, the library can just copy what-ever is left / updated to
> the
> > > next processor node.
> > >
> > > 2. In the sink node, where a record is being sent to the Kafka topic,
> we
> > > should consider the following:
> > >
> > > a. For sink topics, we will set the headers into the producer record.
> > > b. For repartition topics, we will the headers into the producer
> record.
> > > c. For changelog topics, we will drop the headers in the produce record
> > > since they will not be used in restoration and not stored in the state
> > > store either.
> > >
> > >
> > > We can discuss about extending the current protocol and how to enable
> > users
> > > override those rule, and how to expose them in the DSL layer in a
> future
> > > KIP.
> > >
> > >
> > >
> > > Guozhang
> > >
> > >
> > > On Mon, May 7, 2018 at 5:49 PM, Matthias J. Sax <[email protected]
> >
> > > wrote:
> > >
> > >> Guozhang,
> > >>
> > >> if you advocate to forward headers by default, it might be a better
> > >> default strategy do forward the headers for all operators (similar to
> > >> topic/partition/offset metadata). It's usually harder for users to
> > >> reason about different cases and thus I would prefer to have
> consistent
> > >> behavior, ie, only one default strategy instead of introducing
> different
> > >> cases.
> > >>
> > >> Btw: My argument about dropping headers by default only implies, that
> > >> users need to copy the headers explicitly to the output records in
> there
> > >> code of they want to inspect them later -- it does not imply that
> > >> headers cannot be forwarded downstream. (Not sure if this was clear).
> > >>
> > >> I am also ok with copying be default thought (for me, it's a 51/49
> > >> preference for dropping by default only).
> > >>
> > >>
> > >> -Matthias
> > >>
> > >> On 5/7/18 4:52 PM, Guozhang Wang wrote:
> > >>> Hi Matthias,
> > >>>
> > >>> My concern of setting `null` in all cases is that it would make
> headers
> > >> not
> > >>> very useful in KIP-244 then, because headers will only be available
> at
> > >> the
> > >>> source stream / table, but not in any of the following instances. In
> > >>> practice users may be more likely to look into the headers later in
> the
> > >>> pipeline. Personally I'd suggest we pass the headers for all
> stateless
> > >>> operators in DSL and everywhere in PAPI's context.forward(). For
> > >>> repartition topics and sink topics, we also set them in the produced
> > >>> records accordingly; for changelog topics, we do not set them since
> > they
> > >>> are not going to be used anywhere in the store.
> > >>>
> > >>>
> > >>> Guozhang
> > >>>
> > >>>
> > >>> On Sun, May 6, 2018 at 9:03 PM, Matthias J. Sax <
> [email protected]
> > >
> > >>> wrote:
> > >>>
> > >>>> I agree, that we should not block this KIP if possible.
> Nevertheless,
> > we
> > >>>> should try to get a reasonable default strategy for inheriting the
> > >>>> headers so we don't need to change it later on.
> > >>>>
> > >>>> Let's see what other think. I still tend slightly to set to `null`
> by
> > >>>> default for all cases. If the default strategy is different for
> > >>>> different operators as you suggest, it might be confusion to users.
> > >>>> IMHO, the default behavior should be as simple as possible.
> > >>>>
> > >>>>
> > >>>> -Matthias
> > >>>>
> > >>>>
> > >>>> On 5/6/18 8:53 PM, Guozhang Wang wrote:
> > >>>>> Matthias, thanks for sharing your opinions in the inheritance
> > protocol
> > >> of
> > >>>>> the record context. I'm thinking maybe we should make this
> discussion
> > >> as
> > >>>> a
> > >>>>> separate KIP by itself? If yes, then KIP-244's scope would be
> > smaller,
> > >>>> and
> > >>>>> within KIP-244 we can have a simple inheritance rule that setting
> it
> > to
> > >>>>> null when 1) going through stateful operators and 2) sending to any
> > >>>> topics.
> > >>>>>
> > >>>>>
> > >>>>> Guozhang
> > >>>>>
> > >>>>> On Sun, May 6, 2018 at 10:24 AM, Matthias J. Sax <
> > >> [email protected]>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Making the inheritance protocol a public contract seems reasonable
> > to
> > >>>> me.
> > >>>>>>
> > >>>>>> In the current implementation, all output records inherits the
> > offset,
> > >>>>>> timestamp, topic, and partition metadata from the input record. We
> > >>>>>> already added an API to change the timestamp explicitly for the
> > output
> > >>>>>> record thought.
> > >>>>>>
> > >>>>>> I think it make sense to keep the inheritance of offset, topic,
> and
> > >>>>>> partition. For headers, it's worth to discuss. I see arguments for
> > two
> > >>>>>> strategies: (1) inherit by default, (2) set `null` by default.
> > >>>>>> Independent of the default behavior, we should add an API to set
> > >> headers
> > >>>>>> for output records explicitly though (similar to the "set
> timestamp
> > >>>> API").
> > >>>>>>
> > >>>>>> From my point of view, timestamp/headers are a different
> > >>>>>> "class/category" of data/metadata than topic/partition/offset. For
> > the
> > >>>>>> first category, it makes sense to manipulate them and it's more
> than
> > >>>>>> "plain metadata"; especially the timestamp. For the second
> category
> > it
> > >>>>>> does not make sense to manipulate it, and to me
> > topic/partition/offset
> > >>>>>> is pure metadata only---strictly speaking, it's even questionable
> if
> > >>>>>> output records should have any value for topic/partition/offset in
> > the
> > >>>>>> first place, or if they should be `null`, because those attributes
> > do
> > >>>>>> only make sense for source records that are consumed from a topic
> > >>>>>> directly only. On the other hand, if we make this difference
> > explicit,
> > >>>>>> it might be useful information for the use to track the current
> > >>>>>> topic/partition/offset of the original source record.
> > >>>>>>
> > >>>>>> Furthermore, to me, timestamps and headers are somewhat different,
> > >> too.
> > >>>>>> For stream processing it's required that every record has a
> > timestamp;
> > >>>>>> thus, it make sense to inherit the input record timestamp by
> default
> > >> (a
> > >>>>>> timestamp is not really metadata but actually equally important to
> > key
> > >>>>>> and value from my point of view). Header however are optional, and
> > >> thus
> > >>>>>> inheriting them is not really required. It might be convenient
> > though:
> > >>>>>> for example, imagine a simple "filter-only" application -- it
> would
> > be
> > >>>>>> cumbersome for users to explicitly copy the headers from the input
> > >>>>>> records to the output records -- it seems to be unnecessary
> > >> boilerplate
> > >>>>>> code. On the other hand, for any other more complex use case, it's
> > >>>>>> questionable to inherit headers---note, that headers would be
> > written
> > >> to
> > >>>>>> the output topics increasing the size of the messages. Overall, I
> am
> > >> not
> > >>>>>> sure which default strategy might be the better one for headers.
> Is
> > >>>>>> there a convincing argument for either one of them? I slightly
> tend
> > to
> > >>>>>> think that using `null` as default might be better.
> > >>>>>>
> > >>>>>> Last, we could also make the default behavior configurable.
> > Something
> > >>>>>> like `inherit.record.headers=true/false` with default value
> "false".
> > >>>>>> This would allow people to opt-in for auto-header-inheritance.
> Just
> > an
> > >>>>>> idea I wanted to add to the discussion---not sure if it's a good
> > one.
> > >>>>>>
> > >>>>>>
> > >>>>>> -Matthias
> > >>>>>>
> > >>>>>> On 5/4/18 3:13 PM, Guozhang Wang wrote:
> > >>>>>>> Hello Jorge,
> > >>>>>>>
> > >>>>>>>> Agree. Probably point 3 handles this. `Headers` been part of
> > >>>>>> `RecordContext`
> > >>>>>>> would be handled the same way as other attributes.
> > >>>>>>>
> > >>>>>>> Today we do not have a clear inheritance protocol for other
> fields
> > of
> > >>>>>>> RecordContext yet: although internally we do have some criterion
> on
> > >>>>>>> topic/partition/offset and timestamp, they are not explicitly
> > exposed
> > >>>> to
> > >>>>>>> users.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> I think we still need to have a defined protocol for headers
> > itself,
> > >>>> but
> > >>>>>> I
> > >>>>>>> agree that it better to be scoped out side of this KIP, since
> this
> > >>>>>>> inheritance protocol itself for all the fields of RecordContext
> > would
> > >>>>>>> better be a separate KIP. We can document this clearly in the
> wiki
> > >>>> page.
> > >>>>>>>
> > >>>>>>> Guozhang
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Fri, May 4, 2018 at 5:26 AM, Florian Garcia <
> > >>>>>>> [email protected]> wrote:
> > >>>>>>>
> > >>>>>>>> Hi,
> > >>>>>>>>
> > >>>>>>>> For me this is a great first step to have Headers in streaming.
> > >>>>>>>> My current use case is about distributed tracing (Zipkin) and
> with
> > >> the
> > >>>>>>>> headers in the processorContext() I'll be able to manage that
> for
> > >> the
> > >>>>>> most
> > >>>>>>>> cases.
> > >>>>>>>> The KIP-159 should follow after this but this is where all the
> > major
> > >>>>>>>> questions will arise for stateful operations (as Guozhang said).
> > >>>>>>>>
> > >>>>>>>> Thanks for the work on this Jorge.
> > >>>>>>>>
> > >>>>>>>> Le ven. 4 mai 2018 à 01:04, Jorge Esteban Quilcate Otoya <
> > >>>>>>>> [email protected]> a écrit :
> > >>>>>>>>
> > >>>>>>>>> Thanks Guozhang and John for your feedback.
> > >>>>>>>>>
> > >>>>>>>>>> 1. We need to have a clear inheritance protocol of headers in
> > our
> > >>>>>>>>> topology:
> > >>>>>>>>>> 1.a. In PAPI's context.forward() call, it should be
> > >>>> straight-forward.
> > >>>>>>>>>> 1.b. In DSL stateless operators, it should be
> straight-forward.
> > >>>>>>>>>> 1.c. What about in stateful operators like aggregates and
> joins?
> > >>>>>>>>>
> > >>>>>>>>> Agree. Probably point 3 handles this. `Headers` been part of
> > >>>>>>>>> `RecordContext` would be handled the same way as other
> > attributes.
> > >>>>>>>>>
> > >>>>>>>>>> 3. In future work "Adding DSL Processors to use Headers to
> > >>>>>>>>> filter/map/branch",
> > >>>>>>>>> it may well be covered in KIP-159; worth taking a look at that
> > KIP.
> > >>>>>>>>>
> > >>>>>>>>> Yes, I will point to it.
> > >>>>>>>>>
> > >>>>>>>>>> 2. In terms of internal implementations, should the state
> store
> > >>>>>>>>> cache include the headers then in order to be sent downstreams?
> > >>>>>>>>>
> > >>>>>>>>> Good question. As `LRUCacheEntry` extends `RecordContext`, I
> > thinks
> > >>>>>> this
> > >>>>>>>> is
> > >>>>>>>>> already supported. I will detail this on the KIP.
> > >>>>>>>>>
> > >>>>>>>>>> 4. MINOR: "void process(K key, V value, Headers headers)",
> this
> > >>>> should
> > >>>>>>>> be
> > >>>>>>>>> removed?
> > >>>>>>>>>
> > >>>>>>>>> Fixed, thanks.
> > >>>>>>>>>
> > >>>>>>>>>> 5. MINOR: it seems to be the case that in this KIP, our scope
> is
> > >>>> only
> > >>>>>>>>> for exposing
> > >>>>>>>>> the headers for reading, and not allowing users to add / modify
> > >>>>>> headers,
> > >>>>>>>>> right? If yes, we'd better state it clearly at the "Proposed
> > >> Changes"
> > >>>>>>>>> section.
> > >>>>>>>>>
> > >>>>>>>>> As headers is exposed in the `ProcessContext`, and headers will
> > be
> > >>>> send
> > >>>>>>>>> downstream, it can be mutated (add/remove headers).
> > >>>>>>>>>
> > >>>>>>>>>  > Also, despite the decreased scope in this KIP, I think it
> > might
> > >> be
> > >>>>>>>>> valuable to define what will happen to headers once this change
> > is
> > >>>>>>>>> implemented. For example, I think a minimal groundwork-level
> > change
> > >>>>>> might
> > >>>>>>>>> be to make the API changes, while promising to drop all headers
> > >> from
> > >>>>>>>> input
> > >>>>>>>>> records.
> > >>>>>>>>>
> > >>>>>>>>> I will suggest to pass headers to downstream nodes, and don't
> > drop
> > >>>>>> yhrm.
> > >>>>>>>>> Clients will have to drop `Headers` if they have used them.
> > >>>>>>>>> Or it could be something like a boolean config property that
> > manage
> > >>>>>> this.
> > >>>>>>>>> I would like to hear feedback here.
> > >>>>>>>>>
> > >>>>>>>>>> A maximal groundwork change would be to forward the headers
> > >> through
> > >>>>>> all
> > >>>>>>>>> operators
> > >>>>>>>>> in
> > >>>>>>>>>
> > >>>>>>>>> Streams. But I think there are some unresolved questions about
> > >>>>>>>> forwarding,
> > >>>>>>>>> like "what happens to the headers in a join?"
> > >>>>>>>>> Probably this would be solve once KIP-159 is implemented and
> > >>>> supporting
> > >>>>>>>>> Headers.
> > >>>>>>>>>
> > >>>>>>>>>> There's of course some middle ground, but instinctively, I
> think
> > >> I'd
> > >>>>>>>>> prefer to have a clear definition that headers are currently
> > *not*
> > >>>>>>>>> forwarded, rather than having a complex list of operators that
> do
> > >> or
> > >>>>>>>> don't
> > >>>>>>>>> forward them. Plus, I think it might be tricky to define this
> > >>>> behavior
> > >>>>>>>>> while not allowing the scope to return to that of your original
> > >>>>>> proposal!
> > >>>>>>>>>
> > >>>>>>>>> Agree. But `Headers` were forwarded *explicitly* in the
> original
> > >>>>>>>> proposal.
> > >>>>>>>>> The current one pass it as part of `RecordContext`, so if it's
> > >>>> forward
> > >>>>>> it
> > >>>>>>>>> or not is as the same as `RecordContext`.
> > >>>>>>>>> On top of this implementation, we can design how
> filter/map/join
> > >> will
> > >>>>>> be
> > >>>>>>>>> handled. Probably following KIP-159 approach.
> > >>>>>>>>>
> > >>>>>>>>> Cheers,
> > >>>>>>>>> Jorge.
> > >>>>>>>>>
> > >>>>>>>>> El mié., 2 may. 2018 a las 22:56, Guozhang Wang (<
> > >> [email protected]
> > >>>>> )
> > >>>>>>>>> escribió:
> > >>>>>>>>>
> > >>>>>>>>>> Hi Jorge,
> > >>>>>>>>>>
> > >>>>>>>>>> Thanks for the written KIP! Made a pass over it and left some
> > >>>> comments
> > >>>>>>>>>> (some of them overlapped with John's):
> > >>>>>>>>>>
> > >>>>>>>>>> 1. We need to have a clear inheritance protocol of headers in
> > our
> > >>>>>>>>> topology:
> > >>>>>>>>>>
> > >>>>>>>>>> 1.a. In PAPI's context.forward() call, it should be
> > >>>> straight-forward.
> > >>>>>>>>>> 1.b. In DSL stateless operators, it should be
> straight-forward.
> > >>>>>>>>>> 1.c. What about in stateful operators like aggregates and
> joins?
> > >>>>>>>>>>
> > >>>>>>>>>> 2. In terms of internal implementations, should the state
> store
> > >>>> cache
> > >>>>>>>>>> include the headers then in order to be sent downstreams?
> > >>>>>>>>>>
> > >>>>>>>>>> 3. In future work "Adding DSL Processors to use Headers to
> > >>>>>>>>>> filter/map/branch", it may well be covered in KIP-159; worth
> > >> taking
> > >>>> a
> > >>>>>>>>> look
> > >>>>>>>>>> at that KIP.
> > >>>>>>>>>>
> > >>>>>>>>>> 4. MINOR: "void process(K key, V value, Headers headers)",
> this
> > >>>> should
> > >>>>>>>> be
> > >>>>>>>>>> removed?
> > >>>>>>>>>>
> > >>>>>>>>>> 5. MINOR: it seems to be the case that in this KIP, our scope
> is
> > >>>> only
> > >>>>>>>> for
> > >>>>>>>>>> exposing the headers for reading, and not allowing users to
> add
> > /
> > >>>>>>>> modify
> > >>>>>>>>>> headers, right? If yes, we'd better state it clearly at the
> > >>>> "Proposed
> > >>>>>>>>>> Changes" section.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> Guozhang
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On Wed, May 2, 2018 at 8:42 AM, John Roesler <
> [email protected]
> > >
> > >>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Hi Jorge,
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thanks for the design work.
> > >>>>>>>>>>>
> > >>>>>>>>>>> I agree that de-scoping the work to just the Processor API
> will
> > >>>> help
> > >>>>>>>>>>> contain the design and implementation complexity.
> > >>>>>>>>>>>
> > >>>>>>>>>>> In the KIP, it mentions that the headers would be available
> in
> > >> the
> > >>>>>>>>>>> ProcessorContext, (like "context.headers()"). It also says
> that
> > >>>>>>>>>>> implementers would need to implement the method "void
> process(K
> > >>>> key,
> > >>>>>>>> V
> > >>>>>>>>>>> value, Headers headers);". I think maybe you meant to remove
> > the
> > >>>>>>>>> proposal
> > >>>>>>>>>>> to modify "process", since it wouldn't be necessary in
> > >> conjunction
> > >>>>>>>> with
> > >>>>>>>>>> the
> > >>>>>>>>>>> ProcessorContext change, and it's not represented in your PR.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Also, despite the decreased scope in this KIP, I think it
> might
> > >> be
> > >>>>>>>>>> valuable
> > >>>>>>>>>>> to define what will happen to headers once this change is
> > >>>>>>>> implemented.
> > >>>>>>>>>> For
> > >>>>>>>>>>> example, I think a minimal groundwork-level change might be
> to
> > >> make
> > >>>>>>>> the
> > >>>>>>>>>> API
> > >>>>>>>>>>> changes, while promising to drop all headers from input
> > records.
> > >>>>>>>>>>>
> > >>>>>>>>>>> A maximal groundwork change would be to forward the headers
> > >> through
> > >>>>>>>> all
> > >>>>>>>>>>> operators in Streams. But I think there are some unresolved
> > >>>> questions
> > >>>>>>>>>> about
> > >>>>>>>>>>> forwarding, like "what happens to the headers in a join?"
> > >>>>>>>>>>>
> > >>>>>>>>>>> There's of course some middle ground, but instinctively, I
> > think
> > >>>> I'd
> > >>>>>>>>>> prefer
> > >>>>>>>>>>> to have a clear definition that headers are currently *not*
> > >>>>>>>> forwarded,
> > >>>>>>>>>>> rather than having a complex list of operators that do or
> don't
> > >>>>>>>> forward
> > >>>>>>>>>>> them. Plus, I think it might be tricky to define this
> behavior
> > >>>> while
> > >>>>>>>>> not
> > >>>>>>>>>>> allowing the scope to return to that of your original
> proposal!
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thanks again for the KIP,
> > >>>>>>>>>>> -John
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Wed, May 2, 2018 at 8:05 AM, Jorge Esteban Quilcate Otoya
> <
> > >>>>>>>>>>> [email protected]> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Hi Matthias,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I've created a new JIRA to track this, updated the KIP and
> > >> create
> > >>>> a
> > >>>>>>>>> PR.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Looking forward to your feedback,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Jorge.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> El mar., 13 feb. 2018 a las 22:43, Matthias J. Sax (<
> > >>>>>>>>>>> [email protected]
> > >>>>>>>>>>>>> )
> > >>>>>>>>>>>> escribió:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Hi Jorge,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I would like to unblock this KIP to make some progress. The
> > >>>>>>>> tricky
> > >>>>>>>>>>>>> question of this work, seems to be how to expose headers at
> > DSL
> > >>>>>>>>>> level.
> > >>>>>>>>>>>>> This related to KIP-149 and KIP-159. However, for Processor
> > >> API,
> > >>>>>>>> it
> > >>>>>>>>>>>>> seems to be rather straight forward to add headers to the
> > API.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Thus, I would suggest to de-scope this KIP and add header
> > >> support
> > >>>>>>>>> for
> > >>>>>>>>>>>>> Processor API only as a first step. If this is done, we can
> > see
> > >>>>>>>> in
> > >>>>>>>>> a
> > >>>>>>>>>>>>> second step, how to add headers at DSL level.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> WDYT about this proposal?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> If you agree, please update the JIRA and KIP accordingly.
> > Note,
> > >>>>>>>>> that
> > >>>>>>>>>> we
> > >>>>>>>>>>>>> have two JIRA that are duplicates atm. We can scope them
> > >>>>>>>>> accordingly:
> > >>>>>>>>>>>>> one for PAPI only, and second as a dependent JIRA for DSL.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> -Matthias
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On 12/30/17 3:11 PM, Jorge Esteban Quilcate Otoya wrote:
> > >>>>>>>>>>>>>> Thanks for your feedback!
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> 1. I was adding headers to KeyValue to support groupBy,
> but
> > I
> > >>>>>>>>> think
> > >>>>>>>>>>> it
> > >>>>>>>>>>>> is
> > >>>>>>>>>>>>>> not necessary. It should be enough with mapping headers to
> > >>>>>>>>>> key/value
> > >>>>>>>>>>>> and
> > >>>>>>>>>>>>>> then group using current KeyValue structure.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> 2. Yes. IMO key/value stores, like RocksDB, rely on KV as
> > >>>>>>>>>> structure,
> > >>>>>>>>>>>>> hence
> > >>>>>>>>>>>>>> considering headers as part of stateful operations will
> not
> > >> fit
> > >>>>>>>>> in
> > >>>>>>>>>>> this
> > >>>>>>>>>>>>>> approach and increase complexity (I cannot think in a
> > use-case
> > >>>>>>>>> that
> > >>>>>>>>>>>> need
> > >>>>>>>>>>>>>> this).
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> 3. and 4. Changes on 1. will solve this issue.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Probably I rush a bit proposing this change, I was not
> aware
> > >> of
> > >>>>>>>>>>> KIP-159
> > >>>>>>>>>>>>> or
> > >>>>>>>>>>>>>> KAFKA-5632.
> > >>>>>>>>>>>>>> If KIP-159 is adopted and we reduce this KIP to add
> Headers
> > to
> > >>>>>>>>>>>>>> RecordContext will be enough, but I'm not sure about the
> > scope
> > >>>>>>>> of
> > >>>>>>>>>>>>> KIP-159.
> > >>>>>>>>>>>>>> If it includes stateful operations will be difficult to
> > >>>>>>>>> implemented
> > >>>>>>>>>>> as
> > >>>>>>>>>>>>>> stated in 2.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>> Jorge.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> El mar., 26 dic. 2017 a las 20:04, Matthias J. Sax (<
> > >>>>>>>>>>>>> [email protected]>)
> > >>>>>>>>>>>>>> escribió:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Thanks for the KIP Jorge,
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> As Bill pointed out already, we should be careful with
> > adding
> > >>>>>>>>> new
> > >>>>>>>>>>>>>>> overloads as this contradicts the work done via KIP-182.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> This KIP also seems to be related to KIP-149 and KIP-159.
> > Are
> > >>>>>>>>> you
> > >>>>>>>>>>>> aware
> > >>>>>>>>>>>>>>> of them? Both have quite long DISCUSS threads, but it
> might
> > >> be
> > >>>>>>>>>> worth
> > >>>>>>>>>>>>>>> browsing through them.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> A few further questions:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>  - why do you want to add the headers to `KeyValue`? I am
> > not
> > >>>>>>>>> sure
> > >>>>>>>>>>> if
> > >>>>>>>>>>>> we
> > >>>>>>>>>>>>>>> should consider headers as optional metadata and add it
> to
> > >>>>>>>>>>>>>>> `RecordContext` similar to timestamp, offset, etc. only
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>  - You only include stateless single-record
> transformations
> > >> at
> > >>>>>>>>> the
> > >>>>>>>>>>> DSL
> > >>>>>>>>>>>>>>> level. Do you suggest that all other operator just drop
> > >>>>>>>> headers
> > >>>>>>>>> on
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>>> floor?
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>  - Why do you only want to put headers into in-memory and
> > >>>>>>>> cache
> > >>>>>>>>>> but
> > >>>>>>>>>>>> not
> > >>>>>>>>>>>>>>> RocksDB store? What do you mean by "pass through"? IMHO,
> > all
> > >>>>>>>>>> stores
> > >>>>>>>>>>>>>>> should behave the same at DSL level.
> > >>>>>>>>>>>>>>>    -> if we store the headers in the state stores, what
> is
> > >> the
> > >>>>>>>>>>> upgrade
> > >>>>>>>>>>>>>>> path?
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>  - Why do we need to store record header in state in the
> > >> first
> > >>>>>>>>>>> place,
> > >>>>>>>>>>>> if
> > >>>>>>>>>>>>>>> we exclude stateful operator at DSL level?
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> What is the motivation for the "border lines" you choose?
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> -Matthias
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On 12/21/17 8:18 AM, Bill Bejeck wrote:
> > >>>>>>>>>>>>>>>> Jorge,
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Thanks for the KIP, I know this is a feature others in
> the
> > >>>>>>>>>>> community
> > >>>>>>>>>>>>> have
> > >>>>>>>>>>>>>>>> been interested in getting into Kafka Streams.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I took a quick pass over it, and I have one initial
> > >> question.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> We recently reduced overloads with KIP-182, and in this
> > KIP
> > >>>>>>>> we
> > >>>>>>>>>> are
> > >>>>>>>>>>>>>>>> increasing them again.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I can see from the KIP why they are necessary, but I'm
> > >>>>>>>>> wondering
> > >>>>>>>>>> if
> > >>>>>>>>>>>>> there
> > >>>>>>>>>>>>>>>> is something else we can do to cut down on the overloads
> > >>>>>>>>>>>> introduced.  I
> > >>>>>>>>>>>>>>>> don't have any sound suggestions ATM, so I'll have to
> > think
> > >>>>>>>>> about
> > >>>>>>>>>>> it
> > >>>>>>>>>>>>> some
> > >>>>>>>>>>>>>>>> more, but I wanted to put the thought out there.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>>> Bill
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On Thu, Dec 21, 2017 at 9:06 AM, Jorge Esteban Quilcate
> > >>>>>>>> Otoya <
> > >>>>>>>>>>>>>>>> [email protected]> wrote:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Hi all,
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> I have created a KIP to add Record Headers support to
> > Kafka
> > >>>>>>>>>>> Streams
> > >>>>>>>>>>>>> API:
> > >>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >>>>>>>>>>>>>>>>> 244%3A+Add+Record+Header+support+to+Kafka+Streams
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> The main goal is to be able to use headers to filter,
> map
> > >>>>>>>> and
> > >>>>>>>>>>>> process
> > >>>>>>>>>>>>>>>>> records as streams. Stateful processing (joins,
> windows)
> > >> are
> > >>>>>>>>> not
> > >>>>>>>>>>>>>>>>> considered.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Proposed changes/Draft:
> > >>>>>>>>>>>>>>>>> https://github.com/apache/kafka/compare/trunk...jeqo:
> > >>>>>>>>>>>> streams-headers
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Feedback and suggestions are more than welcome.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Jorge.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> --
> > >>>>>>>>>> -- Guozhang
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>
> > >>>
> > >>
> > >>
> > >
> > >
> >
> >
>



-- 
-- Guozhang

Re: KIP-244: Add Record Header support to Kafka Streams

Reply via email to