Re: KIP-213 - Scalable/Usable Foreign-Key KTable joins - Rebooted.

Adam Bellemare Tue, 24 Jul 2018 06:09:02 -0700

Let me kick this off with a few starting points that I would like to
generate some discussion on.

1) It seems to me that I will need to repartition the data twice - once on
the foreign key, and once back to the primary key. Is there anything I am
missing here?

2) I believe I will also need to materialize 3 state stores: the prefixScan
SS, the highwater mark SS (for out-of-order resolution) and the final state
store, due to the workflow I have laid out. I have not thought of a better
way yet, but would appreciate any input on this matter. I have gone back
through the mailing list for the previous discussions on this KIP, and I
did not see anything relating to resolving out-of-order compute. I cannot
see a way around the current three-SS structure that I have.

3) Caching is disabled on the prefixScan SS, as I do not know how to
resolve the iterator obtained from rocksDB with that of the cache. In
addition, I must ensure everything is flushed before scanning. Since the
materialized prefixScan SS is under "control" of the function, I do not
anticipate this to be a problem. Performance throughput will need to be
tested, but as Jan observed in his initial overview of this issue, it is
generally a surge of output events which affect performance moreso than the
flush or prefixScan itself.

Thoughts on any of these are greatly appreciated, since these elements are
really the cornerstone of the whole design. I can put up the code I have
written against 1.0.2 if we so desire, but first I was hoping to just
tackle some of the fundamental design proposals.

Thanks,
Adam

On Mon, Jul 23, 2018 at 10:05 AM, Adam Bellemare <[email protected]>
wrote:

> Here is the new discussion thread for KIP-213. I picked back up on the KIP
> as this is something that we too at Flipp are now running in production.
> Jan started this last year, and I know that Trivago is also using something
> similar in production, at least in terms of APIs and functionality.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 213+Support+non-key+joining+in+KTable
>
> I do have an implementation of the code for Kafka 1.0.2 (our local
> production version) but I won't post it yet as I would like to focus on the
> workflow and design first. That being said, I also need to add some clearer
> integration tests (I did a lot of testing using a non-Kafka Streams
> framework) and clean up the code a bit more before putting it in a PR
> against trunk (I can do so later this week likely).
>
> Please take a look,
>
> Thanks
>
> Adam Bellemare
>
>

Re: KIP-213 - Scalable/Usable Foreign-Key KTable joins - Rebooted.

Reply via email to