Hi Guozhang et al I was just reading the 2.0 release notes and noticed a section on Record Headers. https://cwiki.apache.org/confluence/display/KAFKA/KIP-244%3A+Add+Record+Header+support+to+Kafka+Streams+Processor+API
I am not yet sure if the contents of a RecordHeader is propagated all the way through the Sinks and Sources, but if it is, and if it remains attached to the record (including null records) I may be able to ditch the propagationWrapper for an implementation using RecordHeader. I am not yet sure if this is doable, so if anyone understands RecordHeader impl better than I, I would be happy to hear from you. In the meantime, let me know of any questions. I believe this PR has a lot of potential to solve problems for other people, as I have encountered a number of other companies in the wild all home-brewing their own solutions to come up with a method of handling relational data in streams. Adam On Fri, Jul 27, 2018 at 1:45 AM, Guozhang Wang <wangg...@gmail.com> wrote: > Hello Adam, > > Thanks for rebooting the discussion of this KIP ! Let me finish my pass on > the wiki and get back to you soon. Sorry for the delays.. > > Guozhang > > On Tue, Jul 24, 2018 at 6:08 AM, Adam Bellemare <adam.bellem...@gmail.com> > wrote: > >> Let me kick this off with a few starting points that I would like to >> generate some discussion on. >> >> 1) It seems to me that I will need to repartition the data twice - once on >> the foreign key, and once back to the primary key. Is there anything I am >> missing here? >> >> 2) I believe I will also need to materialize 3 state stores: the >> prefixScan >> SS, the highwater mark SS (for out-of-order resolution) and the final >> state >> store, due to the workflow I have laid out. I have not thought of a better >> way yet, but would appreciate any input on this matter. I have gone back >> through the mailing list for the previous discussions on this KIP, and I >> did not see anything relating to resolving out-of-order compute. I cannot >> see a way around the current three-SS structure that I have. >> >> 3) Caching is disabled on the prefixScan SS, as I do not know how to >> resolve the iterator obtained from rocksDB with that of the cache. In >> addition, I must ensure everything is flushed before scanning. Since the >> materialized prefixScan SS is under "control" of the function, I do not >> anticipate this to be a problem. Performance throughput will need to be >> tested, but as Jan observed in his initial overview of this issue, it is >> generally a surge of output events which affect performance moreso than >> the >> flush or prefixScan itself. >> >> Thoughts on any of these are greatly appreciated, since these elements are >> really the cornerstone of the whole design. I can put up the code I have >> written against 1.0.2 if we so desire, but first I was hoping to just >> tackle some of the fundamental design proposals. >> >> Thanks, >> Adam >> >> >> >> On Mon, Jul 23, 2018 at 10:05 AM, Adam Bellemare < >> adam.bellem...@gmail.com> >> wrote: >> >> > Here is the new discussion thread for KIP-213. I picked back up on the >> KIP >> > as this is something that we too at Flipp are now running in production. >> > Jan started this last year, and I know that Trivago is also using >> something >> > similar in production, at least in terms of APIs and functionality. >> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP- >> > 213+Support+non-key+joining+in+KTable >> > >> > I do have an implementation of the code for Kafka 1.0.2 (our local >> > production version) but I won't post it yet as I would like to focus on >> the >> > workflow and design first. That being said, I also need to add some >> clearer >> > integration tests (I did a lot of testing using a non-Kafka Streams >> > framework) and clean up the code a bit more before putting it in a PR >> > against trunk (I can do so later this week likely). >> > >> > Please take a look, >> > >> > Thanks >> > >> > Adam Bellemare >> > >> > >> > > > > -- > -- Guozhang >