I am actually not sure about this. Because it's about the semantics at PAPI level, but KIP-159 targets the DSL, it might actually be better to have a separate KIP?
-Matthias On 5/11/18 9:26 AM, Guozhang Wang wrote: > That's a good question. I think we can manage this in KIP-159. I will go > ahead and try to augment that KIP together with the original author Jeyhun. > > > Guozhang > > On Fri, May 11, 2018 at 12:45 AM, Jorge Esteban Quilcate Otoya < > quilcate.jo...@gmail.com> wrote: > >> Thanks Guozhang and Matthias! I do also agree with this way of handling >> headers inheritance. I will add them to the KIP doc. >> >>> We can discuss about extending the current protocol and how to enable >> users >>> override those rule, and how to expose them in the DSL layer in a future >>> KIP. >> >> About this, should this be managed on KIP-159 or a new one? >> >> El jue., 10 may. 2018 a las 17:46, Matthias J. Sax (<matth...@confluent.io >>> ) >> escribió: >> >>> Thanks Guozhang! Sounds good to me! >>> >>> -Matthias >>> >>> On 5/10/18 7:55 AM, Guozhang Wang wrote: >>>> Thanks for your thoughts Matthias. I think if we do want to bring >> KIP-244 >>>> into 2.0 then we need to keep its scope small and well defined. For >> that >>>> I'm proposing: >>>> >>>> 1. Make the inheritance implementation of headers consistent with what >> we >>>> had with other record context fields. I.e. pass through the record >>> context >>>> in `context.forward()`. Note that within a processor node, users can >>>> already manipulate the Headers with the given APIs, so at the time of >>>> forwarding, the library can just copy what-ever is left / updated to >> the >>>> next processor node. >>>> >>>> 2. In the sink node, where a record is being sent to the Kafka topic, >> we >>>> should consider the following: >>>> >>>> a. For sink topics, we will set the headers into the producer record. >>>> b. For repartition topics, we will the headers into the producer >> record. >>>> c. For changelog topics, we will drop the headers in the produce record >>>> since they will not be used in restoration and not stored in the state >>>> store either. >>>> >>>> >>>> We can discuss about extending the current protocol and how to enable >>> users >>>> override those rule, and how to expose them in the DSL layer in a >> future >>>> KIP. >>>> >>>> >>>> >>>> Guozhang >>>> >>>> >>>> On Mon, May 7, 2018 at 5:49 PM, Matthias J. Sax <matth...@confluent.io >>> >>>> wrote: >>>> >>>>> Guozhang, >>>>> >>>>> if you advocate to forward headers by default, it might be a better >>>>> default strategy do forward the headers for all operators (similar to >>>>> topic/partition/offset metadata). It's usually harder for users to >>>>> reason about different cases and thus I would prefer to have >> consistent >>>>> behavior, ie, only one default strategy instead of introducing >> different >>>>> cases. >>>>> >>>>> Btw: My argument about dropping headers by default only implies, that >>>>> users need to copy the headers explicitly to the output records in >> there >>>>> code of they want to inspect them later -- it does not imply that >>>>> headers cannot be forwarded downstream. (Not sure if this was clear). >>>>> >>>>> I am also ok with copying be default thought (for me, it's a 51/49 >>>>> preference for dropping by default only). >>>>> >>>>> >>>>> -Matthias >>>>> >>>>> On 5/7/18 4:52 PM, Guozhang Wang wrote: >>>>>> Hi Matthias, >>>>>> >>>>>> My concern of setting `null` in all cases is that it would make >> headers >>>>> not >>>>>> very useful in KIP-244 then, because headers will only be available >> at >>>>> the >>>>>> source stream / table, but not in any of the following instances. In >>>>>> practice users may be more likely to look into the headers later in >> the >>>>>> pipeline. Personally I'd suggest we pass the headers for all >> stateless >>>>>> operators in DSL and everywhere in PAPI's context.forward(). For >>>>>> repartition topics and sink topics, we also set them in the produced >>>>>> records accordingly; for changelog topics, we do not set them since >>> they >>>>>> are not going to be used anywhere in the store. >>>>>> >>>>>> >>>>>> Guozhang >>>>>> >>>>>> >>>>>> On Sun, May 6, 2018 at 9:03 PM, Matthias J. Sax < >> matth...@confluent.io >>>> >>>>>> wrote: >>>>>> >>>>>>> I agree, that we should not block this KIP if possible. >> Nevertheless, >>> we >>>>>>> should try to get a reasonable default strategy for inheriting the >>>>>>> headers so we don't need to change it later on. >>>>>>> >>>>>>> Let's see what other think. I still tend slightly to set to `null` >> by >>>>>>> default for all cases. If the default strategy is different for >>>>>>> different operators as you suggest, it might be confusion to users. >>>>>>> IMHO, the default behavior should be as simple as possible. >>>>>>> >>>>>>> >>>>>>> -Matthias >>>>>>> >>>>>>> >>>>>>> On 5/6/18 8:53 PM, Guozhang Wang wrote: >>>>>>>> Matthias, thanks for sharing your opinions in the inheritance >>> protocol >>>>> of >>>>>>>> the record context. I'm thinking maybe we should make this >> discussion >>>>> as >>>>>>> a >>>>>>>> separate KIP by itself? If yes, then KIP-244's scope would be >>> smaller, >>>>>>> and >>>>>>>> within KIP-244 we can have a simple inheritance rule that setting >> it >>> to >>>>>>>> null when 1) going through stateful operators and 2) sending to any >>>>>>> topics. >>>>>>>> >>>>>>>> >>>>>>>> Guozhang >>>>>>>> >>>>>>>> On Sun, May 6, 2018 at 10:24 AM, Matthias J. Sax < >>>>> matth...@confluent.io> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Making the inheritance protocol a public contract seems reasonable >>> to >>>>>>> me. >>>>>>>>> >>>>>>>>> In the current implementation, all output records inherits the >>> offset, >>>>>>>>> timestamp, topic, and partition metadata from the input record. We >>>>>>>>> already added an API to change the timestamp explicitly for the >>> output >>>>>>>>> record thought. >>>>>>>>> >>>>>>>>> I think it make sense to keep the inheritance of offset, topic, >> and >>>>>>>>> partition. For headers, it's worth to discuss. I see arguments for >>> two >>>>>>>>> strategies: (1) inherit by default, (2) set `null` by default. >>>>>>>>> Independent of the default behavior, we should add an API to set >>>>> headers >>>>>>>>> for output records explicitly though (similar to the "set >> timestamp >>>>>>> API"). >>>>>>>>> >>>>>>>>> From my point of view, timestamp/headers are a different >>>>>>>>> "class/category" of data/metadata than topic/partition/offset. For >>> the >>>>>>>>> first category, it makes sense to manipulate them and it's more >> than >>>>>>>>> "plain metadata"; especially the timestamp. For the second >> category >>> it >>>>>>>>> does not make sense to manipulate it, and to me >>> topic/partition/offset >>>>>>>>> is pure metadata only---strictly speaking, it's even questionable >> if >>>>>>>>> output records should have any value for topic/partition/offset in >>> the >>>>>>>>> first place, or if they should be `null`, because those attributes >>> do >>>>>>>>> only make sense for source records that are consumed from a topic >>>>>>>>> directly only. On the other hand, if we make this difference >>> explicit, >>>>>>>>> it might be useful information for the use to track the current >>>>>>>>> topic/partition/offset of the original source record. >>>>>>>>> >>>>>>>>> Furthermore, to me, timestamps and headers are somewhat different, >>>>> too. >>>>>>>>> For stream processing it's required that every record has a >>> timestamp; >>>>>>>>> thus, it make sense to inherit the input record timestamp by >> default >>>>> (a >>>>>>>>> timestamp is not really metadata but actually equally important to >>> key >>>>>>>>> and value from my point of view). Header however are optional, and >>>>> thus >>>>>>>>> inheriting them is not really required. It might be convenient >>> though: >>>>>>>>> for example, imagine a simple "filter-only" application -- it >> would >>> be >>>>>>>>> cumbersome for users to explicitly copy the headers from the input >>>>>>>>> records to the output records -- it seems to be unnecessary >>>>> boilerplate >>>>>>>>> code. On the other hand, for any other more complex use case, it's >>>>>>>>> questionable to inherit headers---note, that headers would be >>> written >>>>> to >>>>>>>>> the output topics increasing the size of the messages. Overall, I >> am >>>>> not >>>>>>>>> sure which default strategy might be the better one for headers. >> Is >>>>>>>>> there a convincing argument for either one of them? I slightly >> tend >>> to >>>>>>>>> think that using `null` as default might be better. >>>>>>>>> >>>>>>>>> Last, we could also make the default behavior configurable. >>> Something >>>>>>>>> like `inherit.record.headers=true/false` with default value >> "false". >>>>>>>>> This would allow people to opt-in for auto-header-inheritance. >> Just >>> an >>>>>>>>> idea I wanted to add to the discussion---not sure if it's a good >>> one. >>>>>>>>> >>>>>>>>> >>>>>>>>> -Matthias >>>>>>>>> >>>>>>>>> On 5/4/18 3:13 PM, Guozhang Wang wrote: >>>>>>>>>> Hello Jorge, >>>>>>>>>> >>>>>>>>>>> Agree. Probably point 3 handles this. `Headers` been part of >>>>>>>>> `RecordContext` >>>>>>>>>> would be handled the same way as other attributes. >>>>>>>>>> >>>>>>>>>> Today we do not have a clear inheritance protocol for other >> fields >>> of >>>>>>>>>> RecordContext yet: although internally we do have some criterion >> on >>>>>>>>>> topic/partition/offset and timestamp, they are not explicitly >>> exposed >>>>>>> to >>>>>>>>>> users. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I think we still need to have a defined protocol for headers >>> itself, >>>>>>> but >>>>>>>>> I >>>>>>>>>> agree that it better to be scoped out side of this KIP, since >> this >>>>>>>>>> inheritance protocol itself for all the fields of RecordContext >>> would >>>>>>>>>> better be a separate KIP. We can document this clearly in the >> wiki >>>>>>> page. >>>>>>>>>> >>>>>>>>>> Guozhang >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, May 4, 2018 at 5:26 AM, Florian Garcia < >>>>>>>>>> garcia.florian.pe...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> For me this is a great first step to have Headers in streaming. >>>>>>>>>>> My current use case is about distributed tracing (Zipkin) and >> with >>>>> the >>>>>>>>>>> headers in the processorContext() I'll be able to manage that >> for >>>>> the >>>>>>>>> most >>>>>>>>>>> cases. >>>>>>>>>>> The KIP-159 should follow after this but this is where all the >>> major >>>>>>>>>>> questions will arise for stateful operations (as Guozhang said). >>>>>>>>>>> >>>>>>>>>>> Thanks for the work on this Jorge. >>>>>>>>>>> >>>>>>>>>>> Le ven. 4 mai 2018 à 01:04, Jorge Esteban Quilcate Otoya < >>>>>>>>>>> quilcate.jo...@gmail.com> a écrit : >>>>>>>>>>> >>>>>>>>>>>> Thanks Guozhang and John for your feedback. >>>>>>>>>>>> >>>>>>>>>>>>> 1. We need to have a clear inheritance protocol of headers in >>> our >>>>>>>>>>>> topology: >>>>>>>>>>>>> 1.a. In PAPI's context.forward() call, it should be >>>>>>> straight-forward. >>>>>>>>>>>>> 1.b. In DSL stateless operators, it should be >> straight-forward. >>>>>>>>>>>>> 1.c. What about in stateful operators like aggregates and >> joins? >>>>>>>>>>>> >>>>>>>>>>>> Agree. Probably point 3 handles this. `Headers` been part of >>>>>>>>>>>> `RecordContext` would be handled the same way as other >>> attributes. >>>>>>>>>>>> >>>>>>>>>>>>> 3. In future work "Adding DSL Processors to use Headers to >>>>>>>>>>>> filter/map/branch", >>>>>>>>>>>> it may well be covered in KIP-159; worth taking a look at that >>> KIP. >>>>>>>>>>>> >>>>>>>>>>>> Yes, I will point to it. >>>>>>>>>>>> >>>>>>>>>>>>> 2. In terms of internal implementations, should the state >> store >>>>>>>>>>>> cache include the headers then in order to be sent downstreams? >>>>>>>>>>>> >>>>>>>>>>>> Good question. As `LRUCacheEntry` extends `RecordContext`, I >>> thinks >>>>>>>>> this >>>>>>>>>>> is >>>>>>>>>>>> already supported. I will detail this on the KIP. >>>>>>>>>>>> >>>>>>>>>>>>> 4. MINOR: "void process(K key, V value, Headers headers)", >> this >>>>>>> should >>>>>>>>>>> be >>>>>>>>>>>> removed? >>>>>>>>>>>> >>>>>>>>>>>> Fixed, thanks. >>>>>>>>>>>> >>>>>>>>>>>>> 5. MINOR: it seems to be the case that in this KIP, our scope >> is >>>>>>> only >>>>>>>>>>>> for exposing >>>>>>>>>>>> the headers for reading, and not allowing users to add / modify >>>>>>>>> headers, >>>>>>>>>>>> right? If yes, we'd better state it clearly at the "Proposed >>>>> Changes" >>>>>>>>>>>> section. >>>>>>>>>>>> >>>>>>>>>>>> As headers is exposed in the `ProcessContext`, and headers will >>> be >>>>>>> send >>>>>>>>>>>> downstream, it can be mutated (add/remove headers). >>>>>>>>>>>> >>>>>>>>>>>> > Also, despite the decreased scope in this KIP, I think it >>> might >>>>> be >>>>>>>>>>>> valuable to define what will happen to headers once this change >>> is >>>>>>>>>>>> implemented. For example, I think a minimal groundwork-level >>> change >>>>>>>>> might >>>>>>>>>>>> be to make the API changes, while promising to drop all headers >>>>> from >>>>>>>>>>> input >>>>>>>>>>>> records. >>>>>>>>>>>> >>>>>>>>>>>> I will suggest to pass headers to downstream nodes, and don't >>> drop >>>>>>>>> yhrm. >>>>>>>>>>>> Clients will have to drop `Headers` if they have used them. >>>>>>>>>>>> Or it could be something like a boolean config property that >>> manage >>>>>>>>> this. >>>>>>>>>>>> I would like to hear feedback here. >>>>>>>>>>>> >>>>>>>>>>>>> A maximal groundwork change would be to forward the headers >>>>> through >>>>>>>>> all >>>>>>>>>>>> operators >>>>>>>>>>>> in >>>>>>>>>>>> >>>>>>>>>>>> Streams. But I think there are some unresolved questions about >>>>>>>>>>> forwarding, >>>>>>>>>>>> like "what happens to the headers in a join?" >>>>>>>>>>>> Probably this would be solve once KIP-159 is implemented and >>>>>>> supporting >>>>>>>>>>>> Headers. >>>>>>>>>>>> >>>>>>>>>>>>> There's of course some middle ground, but instinctively, I >> think >>>>> I'd >>>>>>>>>>>> prefer to have a clear definition that headers are currently >>> *not* >>>>>>>>>>>> forwarded, rather than having a complex list of operators that >> do >>>>> or >>>>>>>>>>> don't >>>>>>>>>>>> forward them. Plus, I think it might be tricky to define this >>>>>>> behavior >>>>>>>>>>>> while not allowing the scope to return to that of your original >>>>>>>>> proposal! >>>>>>>>>>>> >>>>>>>>>>>> Agree. But `Headers` were forwarded *explicitly* in the >> original >>>>>>>>>>> proposal. >>>>>>>>>>>> The current one pass it as part of `RecordContext`, so if it's >>>>>>> forward >>>>>>>>> it >>>>>>>>>>>> or not is as the same as `RecordContext`. >>>>>>>>>>>> On top of this implementation, we can design how >> filter/map/join >>>>> will >>>>>>>>> be >>>>>>>>>>>> handled. Probably following KIP-159 approach. >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> Jorge. >>>>>>>>>>>> >>>>>>>>>>>> El mié., 2 may. 2018 a las 22:56, Guozhang Wang (< >>>>> wangg...@gmail.com >>>>>>>> ) >>>>>>>>>>>> escribió: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Jorge, >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for the written KIP! Made a pass over it and left some >>>>>>> comments >>>>>>>>>>>>> (some of them overlapped with John's): >>>>>>>>>>>>> >>>>>>>>>>>>> 1. We need to have a clear inheritance protocol of headers in >>> our >>>>>>>>>>>> topology: >>>>>>>>>>>>> >>>>>>>>>>>>> 1.a. In PAPI's context.forward() call, it should be >>>>>>> straight-forward. >>>>>>>>>>>>> 1.b. In DSL stateless operators, it should be >> straight-forward. >>>>>>>>>>>>> 1.c. What about in stateful operators like aggregates and >> joins? >>>>>>>>>>>>> >>>>>>>>>>>>> 2. In terms of internal implementations, should the state >> store >>>>>>> cache >>>>>>>>>>>>> include the headers then in order to be sent downstreams? >>>>>>>>>>>>> >>>>>>>>>>>>> 3. In future work "Adding DSL Processors to use Headers to >>>>>>>>>>>>> filter/map/branch", it may well be covered in KIP-159; worth >>>>> taking >>>>>>> a >>>>>>>>>>>> look >>>>>>>>>>>>> at that KIP. >>>>>>>>>>>>> >>>>>>>>>>>>> 4. MINOR: "void process(K key, V value, Headers headers)", >> this >>>>>>> should >>>>>>>>>>> be >>>>>>>>>>>>> removed? >>>>>>>>>>>>> >>>>>>>>>>>>> 5. MINOR: it seems to be the case that in this KIP, our scope >> is >>>>>>> only >>>>>>>>>>> for >>>>>>>>>>>>> exposing the headers for reading, and not allowing users to >> add >>> / >>>>>>>>>>> modify >>>>>>>>>>>>> headers, right? If yes, we'd better state it clearly at the >>>>>>> "Proposed >>>>>>>>>>>>> Changes" section. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Guozhang >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, May 2, 2018 at 8:42 AM, John Roesler < >> j...@confluent.io >>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Jorge, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for the design work. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I agree that de-scoping the work to just the Processor API >> will >>>>>>> help >>>>>>>>>>>>>> contain the design and implementation complexity. >>>>>>>>>>>>>> >>>>>>>>>>>>>> In the KIP, it mentions that the headers would be available >> in >>>>> the >>>>>>>>>>>>>> ProcessorContext, (like "context.headers()"). It also says >> that >>>>>>>>>>>>>> implementers would need to implement the method "void >> process(K >>>>>>> key, >>>>>>>>>>> V >>>>>>>>>>>>>> value, Headers headers);". I think maybe you meant to remove >>> the >>>>>>>>>>>> proposal >>>>>>>>>>>>>> to modify "process", since it wouldn't be necessary in >>>>> conjunction >>>>>>>>>>> with >>>>>>>>>>>>> the >>>>>>>>>>>>>> ProcessorContext change, and it's not represented in your PR. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Also, despite the decreased scope in this KIP, I think it >> might >>>>> be >>>>>>>>>>>>> valuable >>>>>>>>>>>>>> to define what will happen to headers once this change is >>>>>>>>>>> implemented. >>>>>>>>>>>>> For >>>>>>>>>>>>>> example, I think a minimal groundwork-level change might be >> to >>>>> make >>>>>>>>>>> the >>>>>>>>>>>>> API >>>>>>>>>>>>>> changes, while promising to drop all headers from input >>> records. >>>>>>>>>>>>>> >>>>>>>>>>>>>> A maximal groundwork change would be to forward the headers >>>>> through >>>>>>>>>>> all >>>>>>>>>>>>>> operators in Streams. But I think there are some unresolved >>>>>>> questions >>>>>>>>>>>>> about >>>>>>>>>>>>>> forwarding, like "what happens to the headers in a join?" >>>>>>>>>>>>>> >>>>>>>>>>>>>> There's of course some middle ground, but instinctively, I >>> think >>>>>>> I'd >>>>>>>>>>>>> prefer >>>>>>>>>>>>>> to have a clear definition that headers are currently *not* >>>>>>>>>>> forwarded, >>>>>>>>>>>>>> rather than having a complex list of operators that do or >> don't >>>>>>>>>>> forward >>>>>>>>>>>>>> them. Plus, I think it might be tricky to define this >> behavior >>>>>>> while >>>>>>>>>>>> not >>>>>>>>>>>>>> allowing the scope to return to that of your original >> proposal! >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks again for the KIP, >>>>>>>>>>>>>> -John >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, May 2, 2018 at 8:05 AM, Jorge Esteban Quilcate Otoya >> < >>>>>>>>>>>>>> quilcate.jo...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Matthias, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I've created a new JIRA to track this, updated the KIP and >>>>> create >>>>>>> a >>>>>>>>>>>> PR. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Looking forward to your feedback, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Jorge. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> El mar., 13 feb. 2018 a las 22:43, Matthias J. Sax (< >>>>>>>>>>>>>> matth...@confluent.io >>>>>>>>>>>>>>>> ) >>>>>>>>>>>>>>> escribió: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Jorge, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I would like to unblock this KIP to make some progress. The >>>>>>>>>>> tricky >>>>>>>>>>>>>>>> question of this work, seems to be how to expose headers at >>> DSL >>>>>>>>>>>>> level. >>>>>>>>>>>>>>>> This related to KIP-149 and KIP-159. However, for Processor >>>>> API, >>>>>>>>>>> it >>>>>>>>>>>>>>>> seems to be rather straight forward to add headers to the >>> API. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thus, I would suggest to de-scope this KIP and add header >>>>> support >>>>>>>>>>>> for >>>>>>>>>>>>>>>> Processor API only as a first step. If this is done, we can >>> see >>>>>>>>>>> in >>>>>>>>>>>> a >>>>>>>>>>>>>>>> second step, how to add headers at DSL level. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> WDYT about this proposal? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If you agree, please update the JIRA and KIP accordingly. >>> Note, >>>>>>>>>>>> that >>>>>>>>>>>>> we >>>>>>>>>>>>>>>> have two JIRA that are duplicates atm. We can scope them >>>>>>>>>>>> accordingly: >>>>>>>>>>>>>>>> one for PAPI only, and second as a dependent JIRA for DSL. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 12/30/17 3:11 PM, Jorge Esteban Quilcate Otoya wrote: >>>>>>>>>>>>>>>>> Thanks for your feedback! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1. I was adding headers to KeyValue to support groupBy, >> but >>> I >>>>>>>>>>>> think >>>>>>>>>>>>>> it >>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>> not necessary. It should be enough with mapping headers to >>>>>>>>>>>>> key/value >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> then group using current KeyValue structure. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2. Yes. IMO key/value stores, like RocksDB, rely on KV as >>>>>>>>>>>>> structure, >>>>>>>>>>>>>>>> hence >>>>>>>>>>>>>>>>> considering headers as part of stateful operations will >> not >>>>> fit >>>>>>>>>>>> in >>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>> approach and increase complexity (I cannot think in a >>> use-case >>>>>>>>>>>> that >>>>>>>>>>>>>>> need >>>>>>>>>>>>>>>>> this). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 3. and 4. Changes on 1. will solve this issue. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Probably I rush a bit proposing this change, I was not >> aware >>>>> of >>>>>>>>>>>>>> KIP-159 >>>>>>>>>>>>>>>> or >>>>>>>>>>>>>>>>> KAFKA-5632. >>>>>>>>>>>>>>>>> If KIP-159 is adopted and we reduce this KIP to add >> Headers >>> to >>>>>>>>>>>>>>>>> RecordContext will be enough, but I'm not sure about the >>> scope >>>>>>>>>>> of >>>>>>>>>>>>>>>> KIP-159. >>>>>>>>>>>>>>>>> If it includes stateful operations will be difficult to >>>>>>>>>>>> implemented >>>>>>>>>>>>>> as >>>>>>>>>>>>>>>>> stated in 2. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>> Jorge. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> El mar., 26 dic. 2017 a las 20:04, Matthias J. Sax (< >>>>>>>>>>>>>>>> matth...@confluent.io>) >>>>>>>>>>>>>>>>> escribió: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks for the KIP Jorge, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> As Bill pointed out already, we should be careful with >>> adding >>>>>>>>>>>> new >>>>>>>>>>>>>>>>>> overloads as this contradicts the work done via KIP-182. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This KIP also seems to be related to KIP-149 and KIP-159. >>> Are >>>>>>>>>>>> you >>>>>>>>>>>>>>> aware >>>>>>>>>>>>>>>>>> of them? Both have quite long DISCUSS threads, but it >> might >>>>> be >>>>>>>>>>>>> worth >>>>>>>>>>>>>>>>>> browsing through them. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> A few further questions: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> - why do you want to add the headers to `KeyValue`? I am >>> not >>>>>>>>>>>> sure >>>>>>>>>>>>>> if >>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>> should consider headers as optional metadata and add it >> to >>>>>>>>>>>>>>>>>> `RecordContext` similar to timestamp, offset, etc. only >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> - You only include stateless single-record >> transformations >>>>> at >>>>>>>>>>>> the >>>>>>>>>>>>>> DSL >>>>>>>>>>>>>>>>>> level. Do you suggest that all other operator just drop >>>>>>>>>>> headers >>>>>>>>>>>> on >>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> floor? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> - Why do you only want to put headers into in-memory and >>>>>>>>>>> cache >>>>>>>>>>>>> but >>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>> RocksDB store? What do you mean by "pass through"? IMHO, >>> all >>>>>>>>>>>>> stores >>>>>>>>>>>>>>>>>> should behave the same at DSL level. >>>>>>>>>>>>>>>>>> -> if we store the headers in the state stores, what >> is >>>>> the >>>>>>>>>>>>>> upgrade >>>>>>>>>>>>>>>>>> path? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> - Why do we need to store record header in state in the >>>>> first >>>>>>>>>>>>>> place, >>>>>>>>>>>>>>> if >>>>>>>>>>>>>>>>>> we exclude stateful operator at DSL level? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> What is the motivation for the "border lines" you choose? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 12/21/17 8:18 AM, Bill Bejeck wrote: >>>>>>>>>>>>>>>>>>> Jorge, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks for the KIP, I know this is a feature others in >> the >>>>>>>>>>>>>> community >>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>>> been interested in getting into Kafka Streams. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I took a quick pass over it, and I have one initial >>>>> question. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> We recently reduced overloads with KIP-182, and in this >>> KIP >>>>>>>>>>> we >>>>>>>>>>>>> are >>>>>>>>>>>>>>>>>>> increasing them again. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I can see from the KIP why they are necessary, but I'm >>>>>>>>>>>> wondering >>>>>>>>>>>>> if >>>>>>>>>>>>>>>> there >>>>>>>>>>>>>>>>>>> is something else we can do to cut down on the overloads >>>>>>>>>>>>>>> introduced. I >>>>>>>>>>>>>>>>>>> don't have any sound suggestions ATM, so I'll have to >>> think >>>>>>>>>>>> about >>>>>>>>>>>>>> it >>>>>>>>>>>>>>>> some >>>>>>>>>>>>>>>>>>> more, but I wanted to put the thought out there. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> Bill >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, Dec 21, 2017 at 9:06 AM, Jorge Esteban Quilcate >>>>>>>>>>> Otoya < >>>>>>>>>>>>>>>>>>> quilcate.jo...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I have created a KIP to add Record Headers support to >>> Kafka >>>>>>>>>>>>>> Streams >>>>>>>>>>>>>>>> API: >>>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- >>>>>>>>>>>>>>>>>>>> 244%3A+Add+Record+Header+support+to+Kafka+Streams >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The main goal is to be able to use headers to filter, >> map >>>>>>>>>>> and >>>>>>>>>>>>>>> process >>>>>>>>>>>>>>>>>>>> records as streams. Stateful processing (joins, >> windows) >>>>> are >>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>>> considered. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Proposed changes/Draft: >>>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/compare/trunk...jeqo: >>>>>>>>>>>>>>> streams-headers >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Feedback and suggestions are more than welcome. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Jorge. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> -- Guozhang >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> > > >
signature.asc
Description: OpenPGP digital signature