Hello,

The KTable join semantics is not exactly the same with that of a RDBMS. You
can fine detailed semantics in the web docs (search for Joining Streams):

http://docs.confluent.io/3.0.0/streams/developer-guide.html#kafka-streams-dsl

In a nutshell, the joiner will be triggered only if both / left / either of
the joining streams has the matching record with the key of the incoming
received record (so the input values of the joiner could not be null / can
be null for only the other value / can be null on either values, but not
both), and otherwise a pair of {join-key, null} is output. We made this
design deliberately just to make sure that "table-table joins are
eventually consistent". This gives a kind of resilience to late arrival of
records that a late arrival in either stream can "update" the join result.


Guozhang

On Mon, Jul 4, 2016 at 6:10 PM, Philippe Derome <phder...@gmail.com> wrote:

> Same happens for regular join, keys that appear only in one stream will
> make it to output KTable tC with a null for either input stream. I guess
> it's related to Kafka-3911 Enforce ktable Materialization or umbrella JIRA
> 3909, Queryable state for Kafka Streams?
>
> On Mon, Jul 4, 2016 at 8:45 PM, Philippe Derome <phder...@gmail.com>
> wrote:
>
> > If we have two streams A and B for which we associate tables tA and tB,
> > then create a table tC as ta.leftJoin(tB, <some value joiner>) and then
> we
> > have a key kB in stream B but never made it to tA nor tC, do we need to
> > inject a pair (k,v) of (kB, null) into resulting change log for tC ?
> >
> > It sounds like it is definitely necessary if key kB is present in table
> tC
> > but if not, why add it?
> >
> > I have an example that reproduces this and would like to know if it is
> > considered normal, sub-optimal, or a defect. I don't view it as normal
> for
> > time being, particularly considering stream A as having very few keys
> and B
> > as having many, which could lead to an unnecessary large change log for
> C.
> >
> > Phil
> >
>



-- 
-- Guozhang

Reply via email to