Re: [DISCUSS] KIP-955: Add stream-table join on foreign key

Igor Fomenko Wed, 26 Jul 2023 07:13:25 -0700

Hello Matthias,

Thank you for this response. It provides the context for a good discussion
related to the need for this new interface.

The use case I have in mind is not really a stream enrichment which usually
implies that the event has a primary key to some external info and that
external info could be just looked up in some other data source.

The pattern this KIP proposes is more akin to the data entity assembly
pattern from the persistence layer so it is not purely integration pattern
but rather a pattern that enables an event stream from persistence layer of
a data source application. The main driver here is the ability to stream a
data entity of any complexity (complexity in terms of the relational model)
from an application database to some data consumers. The technical
precondition here is of course that data is already extracted from the
relational database with something like Change Data Capture (CDC) and
placed to Kafka topics. Also due to CDC limitations, each database table
that is related to the entity relational data model is extracted to the
separate Kafka topic.

So to answer you first question the entity that needs to be "assembled"
from Kafka topics in the very common use case has 1:n relations where 1
corresponds to the triggering event enriched with the data from the main
(or parent) table of the data entity (for example completion of the
purchase order event + order data from the order table) and n corresponds
to the many children that needs to be joined with the order table to have
the full data entity (for example multiple line items of the purchase order
needs to be added from the line items child table).

It is not possible to use table-table join in this case because triggering
events are supplied separately from the actual data entity that needs to be
"assembled" and these events could only be presented as KStream due to
their nature. Also currently the FK table in table-table join is on the
"wrong" side of the join.
It is possible to use existing stream-table join only to get data from the
parent entity table (order table) because the event to order is 1:1. After
that it is required to add "children" tables of the order to complete
entity assembly, these childered are related as 1:n with foreign key fields
in each child table (which is order ID).

This use case is typically implemented with some sort of ESB (like
Mulesoft) where ESB receives an event and then uses JDBC adapter to issue
SQL query with left join on foreign key for child tables. ESB then loops
through the returned record set to assemble the full data entity. However
in many cases for various architecture reasons there is a desire to remove
JDBC queries from the data source and replace it with CDC streaming data to
Kafka. So in that case assembling data entities from Kafka topics instead
of JDBC would be beneficial.

Please let me know what you think.

Regards,

Igor

On Tue, Jul 25, 2023 at 5:53 PM Matthias J. Sax <mj...@apache.org> wrote:

> Igor,
>
> thanks for the KIP. Interesting proposal. I am wondering a little bit
> about the use-case and semantics, and if it's really required to add
> what you propose? Please correct me if I am wrong.
>
> In the end, a stream-table join is a "stream enrichment" (via a table
> lookup). Thus, it's inherently a 1:1 join (in contrast to a FK
> table-table join which is a n:1 join).
>
> If this assumption is correct, and you have data for which the table
> side join attribute is in the value, you could actually repartition the
> table data using the join attribute as the PK of the table.
>
> If my assumption is incorrect, and you say you want to have a 1:n join
> (note that I intentionally reversed from n:1 to 1:n), I would rather
> object, because it seems to violate the idea to "enrich" a stream, what
> means that each input record produced an output record, not multiple?
>
> Also note: for a FK table-table join, we use the forgeinKeyExtractor to
> get the join attribute from the left input table (which corresponds to
> the KStream in your KIP; ie, it's a n:1 join), while you propose to use
> the foreignKeyExtractor to be applied to the KTable (which is the right
> input, and thus it would be a 1:n join).
>
> Maybe you can clarify the use case a little bit. For the current KIP
> description I only see the 1:1 join case, what would mean we might not
> need such a feature?
>
>
> -Matthias
>
>
> On 7/24/23 11:36 AM, Igor Fomenko wrote:
> > Hello developers of the Kafka Streams,
> >
> > I would like to start discussion on KIP-955: Add stream-table join on
> > foreign key
> > <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-955%3A+Add+stream-table+join+on+foreign+key
> >
> > This KIP proposes the new API to join KStrem with KTable based on foreign
> > key relation.
> > Ths KIP was inspired by one of my former projects to integrate RDBMS
> > databases with data consumers using Change Data Capture and Kafka.
> > If we had the capability in Kafka Stream to join KStream with KTable on
> > foreign key this would simplify our implementation significantly.
> >
> > Looking forward to your feedback and discussion.
> >
> > Regards,
> >
> > Igor
> >
>

Re: [DISCUSS] KIP-955: Add stream-table join on foreign key

Reply via email to