Re: KIP-244: Add Record Header support to Kafka Streams

Jorge Esteban Quilcate Otoya Fri, 11 May 2018 00:46:20 -0700

Thanks Guozhang and Matthias! I do also agree with this way of handling
headers inheritance. I will add them to the KIP doc.


> We can discuss about extending the current protocol and how to enable
users
> override those rule, and how to expose them in the DSL layer in a future
> KIP.

About this, should this be managed on KIP-159 or a new one?

El jue., 10 may. 2018 a las 17:46, Matthias J. Sax (<[email protected]>)
escribió:

> Thanks Guozhang! Sounds good to me!
>
> -Matthias
>
> On 5/10/18 7:55 AM, Guozhang Wang wrote:
> > Thanks for your thoughts Matthias. I think if we do want to bring KIP-244
> > into 2.0 then we need to keep its scope small and well defined. For that
> > I'm proposing:
> >
> > 1. Make the inheritance implementation of headers consistent with what we
> > had with other record context fields. I.e. pass through the record
> context
> > in `context.forward()`. Note that within a processor node, users can
> > already manipulate the Headers with the given APIs, so at the time of
> > forwarding, the library can just copy what-ever is left / updated to the
> > next processor node.
> >
> > 2. In the sink node, where a record is being sent to the Kafka topic, we
> > should consider the following:
> >
> > a. For sink topics, we will set the headers into the producer record.
> > b. For repartition topics, we will the headers into the producer record.
> > c. For changelog topics, we will drop the headers in the produce record
> > since they will not be used in restoration and not stored in the state
> > store either.
> >
> >
> > We can discuss about extending the current protocol and how to enable
> users
> > override those rule, and how to expose them in the DSL layer in a future
> > KIP.
> >
> >
> >
> > Guozhang
> >
> >
> > On Mon, May 7, 2018 at 5:49 PM, Matthias J. Sax <[email protected]>
> > wrote:
> >
> >> Guozhang,
> >>
> >> if you advocate to forward headers by default, it might be a better
> >> default strategy do forward the headers for all operators (similar to
> >> topic/partition/offset metadata). It's usually harder for users to
> >> reason about different cases and thus I would prefer to have consistent
> >> behavior, ie, only one default strategy instead of introducing different
> >> cases.
> >>
> >> Btw: My argument about dropping headers by default only implies, that
> >> users need to copy the headers explicitly to the output records in there
> >> code of they want to inspect them later -- it does not imply that
> >> headers cannot be forwarded downstream. (Not sure if this was clear).
> >>
> >> I am also ok with copying be default thought (for me, it's a 51/49
> >> preference for dropping by default only).
> >>
> >>
> >> -Matthias
> >>
> >> On 5/7/18 4:52 PM, Guozhang Wang wrote:
> >>> Hi Matthias,
> >>>
> >>> My concern of setting `null` in all cases is that it would make headers
> >> not
> >>> very useful in KIP-244 then, because headers will only be available at
> >> the
> >>> source stream / table, but not in any of the following instances. In
> >>> practice users may be more likely to look into the headers later in the
> >>> pipeline. Personally I'd suggest we pass the headers for all stateless
> >>> operators in DSL and everywhere in PAPI's context.forward(). For
> >>> repartition topics and sink topics, we also set them in the produced
> >>> records accordingly; for changelog topics, we do not set them since
> they
> >>> are not going to be used anywhere in the store.
> >>>
> >>>
> >>> Guozhang
> >>>
> >>>
> >>> On Sun, May 6, 2018 at 9:03 PM, Matthias J. Sax <[email protected]
> >
> >>> wrote:
> >>>
> >>>> I agree, that we should not block this KIP if possible. Nevertheless,
> we
> >>>> should try to get a reasonable default strategy for inheriting the
> >>>> headers so we don't need to change it later on.
> >>>>
> >>>> Let's see what other think. I still tend slightly to set to `null` by
> >>>> default for all cases. If the default strategy is different for
> >>>> different operators as you suggest, it might be confusion to users.
> >>>> IMHO, the default behavior should be as simple as possible.
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>>
> >>>> On 5/6/18 8:53 PM, Guozhang Wang wrote:
> >>>>> Matthias, thanks for sharing your opinions in the inheritance
> protocol
> >> of
> >>>>> the record context. I'm thinking maybe we should make this discussion
> >> as
> >>>> a
> >>>>> separate KIP by itself? If yes, then KIP-244's scope would be
> smaller,
> >>>> and
> >>>>> within KIP-244 we can have a simple inheritance rule that setting it
> to
> >>>>> null when 1) going through stateful operators and 2) sending to any
> >>>> topics.
> >>>>>
> >>>>>
> >>>>> Guozhang
> >>>>>
> >>>>> On Sun, May 6, 2018 at 10:24 AM, Matthias J. Sax <
> >> [email protected]>
> >>>>> wrote:
> >>>>>
> >>>>>> Making the inheritance protocol a public contract seems reasonable
> to
> >>>> me.
> >>>>>>
> >>>>>> In the current implementation, all output records inherits the
> offset,
> >>>>>> timestamp, topic, and partition metadata from the input record. We
> >>>>>> already added an API to change the timestamp explicitly for the
> output
> >>>>>> record thought.
> >>>>>>
> >>>>>> I think it make sense to keep the inheritance of offset, topic, and
> >>>>>> partition. For headers, it's worth to discuss. I see arguments for
> two
> >>>>>> strategies: (1) inherit by default, (2) set `null` by default.
> >>>>>> Independent of the default behavior, we should add an API to set
> >> headers
> >>>>>> for output records explicitly though (similar to the "set timestamp
> >>>> API").
> >>>>>>
> >>>>>> From my point of view, timestamp/headers are a different
> >>>>>> "class/category" of data/metadata than topic/partition/offset. For
> the
> >>>>>> first category, it makes sense to manipulate them and it's more than
> >>>>>> "plain metadata"; especially the timestamp. For the second category
> it
> >>>>>> does not make sense to manipulate it, and to me
> topic/partition/offset
> >>>>>> is pure metadata only---strictly speaking, it's even questionable if
> >>>>>> output records should have any value for topic/partition/offset in
> the
> >>>>>> first place, or if they should be `null`, because those attributes
> do
> >>>>>> only make sense for source records that are consumed from a topic
> >>>>>> directly only. On the other hand, if we make this difference
> explicit,
> >>>>>> it might be useful information for the use to track the current
> >>>>>> topic/partition/offset of the original source record.
> >>>>>>
> >>>>>> Furthermore, to me, timestamps and headers are somewhat different,
> >> too.
> >>>>>> For stream processing it's required that every record has a
> timestamp;
> >>>>>> thus, it make sense to inherit the input record timestamp by default
> >> (a
> >>>>>> timestamp is not really metadata but actually equally important to
> key
> >>>>>> and value from my point of view). Header however are optional, and
> >> thus
> >>>>>> inheriting them is not really required. It might be convenient
> though:
> >>>>>> for example, imagine a simple "filter-only" application -- it would
> be
> >>>>>> cumbersome for users to explicitly copy the headers from the input
> >>>>>> records to the output records -- it seems to be unnecessary
> >> boilerplate
> >>>>>> code. On the other hand, for any other more complex use case, it's
> >>>>>> questionable to inherit headers---note, that headers would be
> written
> >> to
> >>>>>> the output topics increasing the size of the messages. Overall, I am
> >> not
> >>>>>> sure which default strategy might be the better one for headers. Is
> >>>>>> there a convincing argument for either one of them? I slightly tend
> to
> >>>>>> think that using `null` as default might be better.
> >>>>>>
> >>>>>> Last, we could also make the default behavior configurable.
> Something
> >>>>>> like `inherit.record.headers=true/false` with default value "false".
> >>>>>> This would allow people to opt-in for auto-header-inheritance. Just
> an
> >>>>>> idea I wanted to add to the discussion---not sure if it's a good
> one.
> >>>>>>
> >>>>>>
> >>>>>> -Matthias
> >>>>>>
> >>>>>> On 5/4/18 3:13 PM, Guozhang Wang wrote:
> >>>>>>> Hello Jorge,
> >>>>>>>
> >>>>>>>> Agree. Probably point 3 handles this. `Headers` been part of
> >>>>>> `RecordContext`
> >>>>>>> would be handled the same way as other attributes.
> >>>>>>>
> >>>>>>> Today we do not have a clear inheritance protocol for other fields
> of
> >>>>>>> RecordContext yet: although internally we do have some criterion on
> >>>>>>> topic/partition/offset and timestamp, they are not explicitly
> exposed
> >>>> to
> >>>>>>> users.
> >>>>>>>
> >>>>>>>
> >>>>>>> I think we still need to have a defined protocol for headers
> itself,
> >>>> but
> >>>>>> I
> >>>>>>> agree that it better to be scoped out side of this KIP, since this
> >>>>>>> inheritance protocol itself for all the fields of RecordContext
> would
> >>>>>>> better be a separate KIP. We can document this clearly in the wiki
> >>>> page.
> >>>>>>>
> >>>>>>> Guozhang
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, May 4, 2018 at 5:26 AM, Florian Garcia <
> >>>>>>> [email protected]> wrote:
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> For me this is a great first step to have Headers in streaming.
> >>>>>>>> My current use case is about distributed tracing (Zipkin) and with
> >> the
> >>>>>>>> headers in the processorContext() I'll be able to manage that for
> >> the
> >>>>>> most
> >>>>>>>> cases.
> >>>>>>>> The KIP-159 should follow after this but this is where all the
> major
> >>>>>>>> questions will arise for stateful operations (as Guozhang said).
> >>>>>>>>
> >>>>>>>> Thanks for the work on this Jorge.
> >>>>>>>>
> >>>>>>>> Le ven. 4 mai 2018 à 01:04, Jorge Esteban Quilcate Otoya <
> >>>>>>>> [email protected]> a écrit :
> >>>>>>>>
> >>>>>>>>> Thanks Guozhang and John for your feedback.
> >>>>>>>>>
> >>>>>>>>>> 1. We need to have a clear inheritance protocol of headers in
> our
> >>>>>>>>> topology:
> >>>>>>>>>> 1.a. In PAPI's context.forward() call, it should be
> >>>> straight-forward.
> >>>>>>>>>> 1.b. In DSL stateless operators, it should be straight-forward.
> >>>>>>>>>> 1.c. What about in stateful operators like aggregates and joins?
> >>>>>>>>>
> >>>>>>>>> Agree. Probably point 3 handles this. `Headers` been part of
> >>>>>>>>> `RecordContext` would be handled the same way as other
> attributes.
> >>>>>>>>>
> >>>>>>>>>> 3. In future work "Adding DSL Processors to use Headers to
> >>>>>>>>> filter/map/branch",
> >>>>>>>>> it may well be covered in KIP-159; worth taking a look at that
> KIP.
> >>>>>>>>>
> >>>>>>>>> Yes, I will point to it.
> >>>>>>>>>
> >>>>>>>>>> 2. In terms of internal implementations, should the state store
> >>>>>>>>> cache include the headers then in order to be sent downstreams?
> >>>>>>>>>
> >>>>>>>>> Good question. As `LRUCacheEntry` extends `RecordContext`, I
> thinks
> >>>>>> this
> >>>>>>>> is
> >>>>>>>>> already supported. I will detail this on the KIP.
> >>>>>>>>>
> >>>>>>>>>> 4. MINOR: "void process(K key, V value, Headers headers)", this
> >>>> should
> >>>>>>>> be
> >>>>>>>>> removed?
> >>>>>>>>>
> >>>>>>>>> Fixed, thanks.
> >>>>>>>>>
> >>>>>>>>>> 5. MINOR: it seems to be the case that in this KIP, our scope is
> >>>> only
> >>>>>>>>> for exposing
> >>>>>>>>> the headers for reading, and not allowing users to add / modify
> >>>>>> headers,
> >>>>>>>>> right? If yes, we'd better state it clearly at the "Proposed
> >> Changes"
> >>>>>>>>> section.
> >>>>>>>>>
> >>>>>>>>> As headers is exposed in the `ProcessContext`, and headers will
> be
> >>>> send
> >>>>>>>>> downstream, it can be mutated (add/remove headers).
> >>>>>>>>>
> >>>>>>>>>  > Also, despite the decreased scope in this KIP, I think it
> might
> >> be
> >>>>>>>>> valuable to define what will happen to headers once this change
> is
> >>>>>>>>> implemented. For example, I think a minimal groundwork-level
> change
> >>>>>> might
> >>>>>>>>> be to make the API changes, while promising to drop all headers
> >> from
> >>>>>>>> input
> >>>>>>>>> records.
> >>>>>>>>>
> >>>>>>>>> I will suggest to pass headers to downstream nodes, and don't
> drop
> >>>>>> yhrm.
> >>>>>>>>> Clients will have to drop `Headers` if they have used them.
> >>>>>>>>> Or it could be something like a boolean config property that
> manage
> >>>>>> this.
> >>>>>>>>> I would like to hear feedback here.
> >>>>>>>>>
> >>>>>>>>>> A maximal groundwork change would be to forward the headers
> >> through
> >>>>>> all
> >>>>>>>>> operators
> >>>>>>>>> in
> >>>>>>>>>
> >>>>>>>>> Streams. But I think there are some unresolved questions about
> >>>>>>>> forwarding,
> >>>>>>>>> like "what happens to the headers in a join?"
> >>>>>>>>> Probably this would be solve once KIP-159 is implemented and
> >>>> supporting
> >>>>>>>>> Headers.
> >>>>>>>>>
> >>>>>>>>>> There's of course some middle ground, but instinctively, I think
> >> I'd
> >>>>>>>>> prefer to have a clear definition that headers are currently
> *not*
> >>>>>>>>> forwarded, rather than having a complex list of operators that do
> >> or
> >>>>>>>> don't
> >>>>>>>>> forward them. Plus, I think it might be tricky to define this
> >>>> behavior
> >>>>>>>>> while not allowing the scope to return to that of your original
> >>>>>> proposal!
> >>>>>>>>>
> >>>>>>>>> Agree. But `Headers` were forwarded *explicitly* in the original
> >>>>>>>> proposal.
> >>>>>>>>> The current one pass it as part of `RecordContext`, so if it's
> >>>> forward
> >>>>>> it
> >>>>>>>>> or not is as the same as `RecordContext`.
> >>>>>>>>> On top of this implementation, we can design how filter/map/join
> >> will
> >>>>>> be
> >>>>>>>>> handled. Probably following KIP-159 approach.
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>> Jorge.
> >>>>>>>>>
> >>>>>>>>> El mié., 2 may. 2018 a las 22:56, Guozhang Wang (<
> >> [email protected]
> >>>>> )
> >>>>>>>>> escribió:
> >>>>>>>>>
> >>>>>>>>>> Hi Jorge,
> >>>>>>>>>>
> >>>>>>>>>> Thanks for the written KIP! Made a pass over it and left some
> >>>> comments
> >>>>>>>>>> (some of them overlapped with John's):
> >>>>>>>>>>
> >>>>>>>>>> 1. We need to have a clear inheritance protocol of headers in
> our
> >>>>>>>>> topology:
> >>>>>>>>>>
> >>>>>>>>>> 1.a. In PAPI's context.forward() call, it should be
> >>>> straight-forward.
> >>>>>>>>>> 1.b. In DSL stateless operators, it should be straight-forward.
> >>>>>>>>>> 1.c. What about in stateful operators like aggregates and joins?
> >>>>>>>>>>
> >>>>>>>>>> 2. In terms of internal implementations, should the state store
> >>>> cache
> >>>>>>>>>> include the headers then in order to be sent downstreams?
> >>>>>>>>>>
> >>>>>>>>>> 3. In future work "Adding DSL Processors to use Headers to
> >>>>>>>>>> filter/map/branch", it may well be covered in KIP-159; worth
> >> taking
> >>>> a
> >>>>>>>>> look
> >>>>>>>>>> at that KIP.
> >>>>>>>>>>
> >>>>>>>>>> 4. MINOR: "void process(K key, V value, Headers headers)", this
> >>>> should
> >>>>>>>> be
> >>>>>>>>>> removed?
> >>>>>>>>>>
> >>>>>>>>>> 5. MINOR: it seems to be the case that in this KIP, our scope is
> >>>> only
> >>>>>>>> for
> >>>>>>>>>> exposing the headers for reading, and not allowing users to add
> /
> >>>>>>>> modify
> >>>>>>>>>> headers, right? If yes, we'd better state it clearly at the
> >>>> "Proposed
> >>>>>>>>>> Changes" section.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Guozhang
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Wed, May 2, 2018 at 8:42 AM, John Roesler <[email protected]
> >
> >>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi Jorge,
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks for the design work.
> >>>>>>>>>>>
> >>>>>>>>>>> I agree that de-scoping the work to just the Processor API will
> >>>> help
> >>>>>>>>>>> contain the design and implementation complexity.
> >>>>>>>>>>>
> >>>>>>>>>>> In the KIP, it mentions that the headers would be available in
> >> the
> >>>>>>>>>>> ProcessorContext, (like "context.headers()"). It also says that
> >>>>>>>>>>> implementers would need to implement the method "void process(K
> >>>> key,
> >>>>>>>> V
> >>>>>>>>>>> value, Headers headers);". I think maybe you meant to remove
> the
> >>>>>>>>> proposal
> >>>>>>>>>>> to modify "process", since it wouldn't be necessary in
> >> conjunction
> >>>>>>>> with
> >>>>>>>>>> the
> >>>>>>>>>>> ProcessorContext change, and it's not represented in your PR.
> >>>>>>>>>>>
> >>>>>>>>>>> Also, despite the decreased scope in this KIP, I think it might
> >> be
> >>>>>>>>>> valuable
> >>>>>>>>>>> to define what will happen to headers once this change is
> >>>>>>>> implemented.
> >>>>>>>>>> For
> >>>>>>>>>>> example, I think a minimal groundwork-level change might be to
> >> make
> >>>>>>>> the
> >>>>>>>>>> API
> >>>>>>>>>>> changes, while promising to drop all headers from input
> records.
> >>>>>>>>>>>
> >>>>>>>>>>> A maximal groundwork change would be to forward the headers
> >> through
> >>>>>>>> all
> >>>>>>>>>>> operators in Streams. But I think there are some unresolved
> >>>> questions
> >>>>>>>>>> about
> >>>>>>>>>>> forwarding, like "what happens to the headers in a join?"
> >>>>>>>>>>>
> >>>>>>>>>>> There's of course some middle ground, but instinctively, I
> think
> >>>> I'd
> >>>>>>>>>> prefer
> >>>>>>>>>>> to have a clear definition that headers are currently *not*
> >>>>>>>> forwarded,
> >>>>>>>>>>> rather than having a complex list of operators that do or don't
> >>>>>>>> forward
> >>>>>>>>>>> them. Plus, I think it might be tricky to define this behavior
> >>>> while
> >>>>>>>>> not
> >>>>>>>>>>> allowing the scope to return to that of your original proposal!
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks again for the KIP,
> >>>>>>>>>>> -John
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, May 2, 2018 at 8:05 AM, Jorge Esteban Quilcate Otoya <
> >>>>>>>>>>> [email protected]> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi Matthias,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I've created a new JIRA to track this, updated the KIP and
> >> create
> >>>> a
> >>>>>>>>> PR.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Looking forward to your feedback,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Jorge.
> >>>>>>>>>>>>
> >>>>>>>>>>>> El mar., 13 feb. 2018 a las 22:43, Matthias J. Sax (<
> >>>>>>>>>>> [email protected]
> >>>>>>>>>>>>> )
> >>>>>>>>>>>> escribió:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hi Jorge,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I would like to unblock this KIP to make some progress. The
> >>>>>>>> tricky
> >>>>>>>>>>>>> question of this work, seems to be how to expose headers at
> DSL
> >>>>>>>>>> level.
> >>>>>>>>>>>>> This related to KIP-149 and KIP-159. However, for Processor
> >> API,
> >>>>>>>> it
> >>>>>>>>>>>>> seems to be rather straight forward to add headers to the
> API.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thus, I would suggest to de-scope this KIP and add header
> >> support
> >>>>>>>>> for
> >>>>>>>>>>>>> Processor API only as a first step. If this is done, we can
> see
> >>>>>>>> in
> >>>>>>>>> a
> >>>>>>>>>>>>> second step, how to add headers at DSL level.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> WDYT about this proposal?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If you agree, please update the JIRA and KIP accordingly.
> Note,
> >>>>>>>>> that
> >>>>>>>>>> we
> >>>>>>>>>>>>> have two JIRA that are duplicates atm. We can scope them
> >>>>>>>>> accordingly:
> >>>>>>>>>>>>> one for PAPI only, and second as a dependent JIRA for DSL.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 12/30/17 3:11 PM, Jorge Esteban Quilcate Otoya wrote:
> >>>>>>>>>>>>>> Thanks for your feedback!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 1. I was adding headers to KeyValue to support groupBy, but
> I
> >>>>>>>>> think
> >>>>>>>>>>> it
> >>>>>>>>>>>> is
> >>>>>>>>>>>>>> not necessary. It should be enough with mapping headers to
> >>>>>>>>>> key/value
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>> then group using current KeyValue structure.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 2. Yes. IMO key/value stores, like RocksDB, rely on KV as
> >>>>>>>>>> structure,
> >>>>>>>>>>>>> hence
> >>>>>>>>>>>>>> considering headers as part of stateful operations will not
> >> fit
> >>>>>>>>> in
> >>>>>>>>>>> this
> >>>>>>>>>>>>>> approach and increase complexity (I cannot think in a
> use-case
> >>>>>>>>> that
> >>>>>>>>>>>> need
> >>>>>>>>>>>>>> this).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 3. and 4. Changes on 1. will solve this issue.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Probably I rush a bit proposing this change, I was not aware
> >> of
> >>>>>>>>>>> KIP-159
> >>>>>>>>>>>>> or
> >>>>>>>>>>>>>> KAFKA-5632.
> >>>>>>>>>>>>>> If KIP-159 is adopted and we reduce this KIP to add Headers
> to
> >>>>>>>>>>>>>> RecordContext will be enough, but I'm not sure about the
> scope
> >>>>>>>> of
> >>>>>>>>>>>>> KIP-159.
> >>>>>>>>>>>>>> If it includes stateful operations will be difficult to
> >>>>>>>>> implemented
> >>>>>>>>>>> as
> >>>>>>>>>>>>>> stated in 2.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>> Jorge.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> El mar., 26 dic. 2017 a las 20:04, Matthias J. Sax (<
> >>>>>>>>>>>>> [email protected]>)
> >>>>>>>>>>>>>> escribió:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks for the KIP Jorge,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> As Bill pointed out already, we should be careful with
> adding
> >>>>>>>>> new
> >>>>>>>>>>>>>>> overloads as this contradicts the work done via KIP-182.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> This KIP also seems to be related to KIP-149 and KIP-159.
> Are
> >>>>>>>>> you
> >>>>>>>>>>>> aware
> >>>>>>>>>>>>>>> of them? Both have quite long DISCUSS threads, but it might
> >> be
> >>>>>>>>>> worth
> >>>>>>>>>>>>>>> browsing through them.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> A few further questions:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>  - why do you want to add the headers to `KeyValue`? I am
> not
> >>>>>>>>> sure
> >>>>>>>>>>> if
> >>>>>>>>>>>> we
> >>>>>>>>>>>>>>> should consider headers as optional metadata and add it to
> >>>>>>>>>>>>>>> `RecordContext` similar to timestamp, offset, etc. only
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>  - You only include stateless single-record transformations
> >> at
> >>>>>>>>> the
> >>>>>>>>>>> DSL
> >>>>>>>>>>>>>>> level. Do you suggest that all other operator just drop
> >>>>>>>> headers
> >>>>>>>>> on
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>> floor?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>  - Why do you only want to put headers into in-memory and
> >>>>>>>> cache
> >>>>>>>>>> but
> >>>>>>>>>>>> not
> >>>>>>>>>>>>>>> RocksDB store? What do you mean by "pass through"? IMHO,
> all
> >>>>>>>>>> stores
> >>>>>>>>>>>>>>> should behave the same at DSL level.
> >>>>>>>>>>>>>>>    -> if we store the headers in the state stores, what is
> >> the
> >>>>>>>>>>> upgrade
> >>>>>>>>>>>>>>> path?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>  - Why do we need to store record header in state in the
> >> first
> >>>>>>>>>>> place,
> >>>>>>>>>>>> if
> >>>>>>>>>>>>>>> we exclude stateful operator at DSL level?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> What is the motivation for the "border lines" you choose?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On 12/21/17 8:18 AM, Bill Bejeck wrote:
> >>>>>>>>>>>>>>>> Jorge,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks for the KIP, I know this is a feature others in the
> >>>>>>>>>>> community
> >>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>> been interested in getting into Kafka Streams.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I took a quick pass over it, and I have one initial
> >> question.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> We recently reduced overloads with KIP-182, and in this
> KIP
> >>>>>>>> we
> >>>>>>>>>> are
> >>>>>>>>>>>>>>>> increasing them again.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I can see from the KIP why they are necessary, but I'm
> >>>>>>>>> wondering
> >>>>>>>>>> if
> >>>>>>>>>>>>> there
> >>>>>>>>>>>>>>>> is something else we can do to cut down on the overloads
> >>>>>>>>>>>> introduced.  I
> >>>>>>>>>>>>>>>> don't have any sound suggestions ATM, so I'll have to
> think
> >>>>>>>>> about
> >>>>>>>>>>> it
> >>>>>>>>>>>>> some
> >>>>>>>>>>>>>>>> more, but I wanted to put the thought out there.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>> Bill
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Thu, Dec 21, 2017 at 9:06 AM, Jorge Esteban Quilcate
> >>>>>>>> Otoya <
> >>>>>>>>>>>>>>>> [email protected]> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I have created a KIP to add Record Headers support to
> Kafka
> >>>>>>>>>>> Streams
> >>>>>>>>>>>>> API:
> >>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>>>>>>>>>>>>> 244%3A+Add+Record+Header+support+to+Kafka+Streams
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> The main goal is to be able to use headers to filter, map
> >>>>>>>> and
> >>>>>>>>>>>> process
> >>>>>>>>>>>>>>>>> records as streams. Stateful processing (joins, windows)
> >> are
> >>>>>>>>> not
> >>>>>>>>>>>>>>>>> considered.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Proposed changes/Draft:
> >>>>>>>>>>>>>>>>> https://github.com/apache/kafka/compare/trunk...jeqo:
> >>>>>>>>>>>> streams-headers
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Feedback and suggestions are more than welcome.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Jorge.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> -- Guozhang
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >
> >
>
>

Re: KIP-244: Add Record Header support to Kafka Streams

Reply via email to