[ 
https://issues.apache.org/jira/browse/KAFKA-12317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17817833#comment-17817833
 ] 

Matthias J. Sax commented on KAFKA-12317:
-----------------------------------------

[~aki] – I was just looking into 
[https://kafka.apache.org/documentation/streams/developer-guide/dsl-api.html#joining]
 and it seem we need to update this a little bit. Eg, there is
{quote}Input records with a {{null}} key or a {{null}} value are ignored and do 
not trigger the join.
{quote}
for left/outer stream-stream join, what is not correct any longer.

The "table" that explains joins semantics, is also done with the assumption 
that all keys are the same and that the key is never null – in general, this 
table view is hard to read anyway and it might be good to replace it with 
something better.

Would you be interested to do a follow up PR to update the docs? – I assume 
that other sections (not just stream-stream join) needs an update.

> Relax non-null key requirement for left/outer KStream joins
> -----------------------------------------------------------
>
>                 Key: KAFKA-12317
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12317
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Matthias J. Sax
>            Assignee: Florin Akermann
>            Priority: Major
>              Labels: kip
>
> Currently, for a stream-streams and stream-table/globalTable join 
> KafkaStreams drops all stream records with a `null`{-}key (`null`-join-key 
> for stream-globalTable), because for a `null`{-}(join)key the join is 
> undefined: ie, we don't have an attribute the do the table lookup (we 
> consider the stream-record as malformed). Note, that we define the semantics 
> of _left/outer_ join as: keep the stream record if no matching join record 
> was found.
> We could relax the definition of _left_ stream-table/globalTable and 
> _left/outer_ stream-stream join though, and not drop `null`-(join)key stream 
> records, and call the ValueJoiner with a `null` "other-side" value instead: 
> if the stream record key (or join-key) is `null`, we could treat is as 
> "failed lookup" instead of treating the stream record as corrupted.
> If we make this change, users that want to keep the current behavior, can add 
> a `filter()` before the join to drop `null`-(join)key records from the stream 
> explicitly.
> Note that this change also requires to change the behavior if we insert a 
> repartition topic before the join: currently, we drop `null`-key record 
> before writing into the repartition topic (as we know they would be dropped 
> later anyway). We need to relax this behavior for a left stream-table and 
> left/outer stream-stream join. User need to be aware (ie, we might need to 
> put this into the docs and JavaDocs), that records with `null`-key would be 
> partitioned randomly.
> KIP-962: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-962%3A+Relax+non-null+key+requirement+in+Kafka+Streams]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to