Let me kick this off with a few starting points that I would like to generate some discussion on.
1) It seems to me that I will need to repartition the data twice - once on the foreign key, and once back to the primary key. Is there anything I am missing here? 2) I believe I will also need to materialize 3 state stores: the prefixScan SS, the highwater mark SS (for out-of-order resolution) and the final state store, due to the workflow I have laid out. I have not thought of a better way yet, but would appreciate any input on this matter. I have gone back through the mailing list for the previous discussions on this KIP, and I did not see anything relating to resolving out-of-order compute. I cannot see a way around the current three-SS structure that I have. 3) Caching is disabled on the prefixScan SS, as I do not know how to resolve the iterator obtained from rocksDB with that of the cache. In addition, I must ensure everything is flushed before scanning. Since the materialized prefixScan SS is under "control" of the function, I do not anticipate this to be a problem. Performance throughput will need to be tested, but as Jan observed in his initial overview of this issue, it is generally a surge of output events which affect performance moreso than the flush or prefixScan itself. Thoughts on any of these are greatly appreciated, since these elements are really the cornerstone of the whole design. I can put up the code I have written against 1.0.2 if we so desire, but first I was hoping to just tackle some of the fundamental design proposals. Thanks, Adam On Mon, Jul 23, 2018 at 10:05 AM, Adam Bellemare <[email protected]> wrote: > Here is the new discussion thread for KIP-213. I picked back up on the KIP > as this is something that we too at Flipp are now running in production. > Jan started this last year, and I know that Trivago is also using something > similar in production, at least in terms of APIs and functionality. > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > 213+Support+non-key+joining+in+KTable > > I do have an implementation of the code for Kafka 1.0.2 (our local > production version) but I won't post it yet as I would like to focus on the > workflow and design first. That being said, I also need to add some clearer > integration tests (I did a lot of testing using a non-Kafka Streams > framework) and clean up the code a bit more before putting it in a PR > against trunk (I can do so later this week likely). > > Please take a look, > > Thanks > > Adam Bellemare > >
