[ 
https://issues.apache.org/jira/browse/KAFKA-12909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521985#comment-17521985
 ] 

Matthias J. Sax commented on KAFKA-12909:
-----------------------------------------

This ticket is about left/outer join in particular and the "emit at window 
close" strategy is only applied to none-matching records. Ie, even if you have 
a left/outer join, all _inner_ join result of the operation are emitted right 
away. However, it's not "safe" to emit a left/right join result eagerly, as 
this record might actually find a join partner later – thus, we need to delay 
emitting "un-joined" record until the grace period passed to ensure the compute 
the right result.

In the old implementation, we basically did not compute the correct left/outer 
join result, but a super-set of it. – Your DB argument does not really apply, 
because the result is a KStream and thus we should only emit _final_ result. If 
we emit an <k,<v1,null>> eagerly and a second <k,<v1,v2>> later, the second one 
is _not_ and update to the first one (a KStream has no update semantics) – 
otherwise we would need to treat all results with the same key as _updates_ but 
if a record joins twice, the second join result is also not an update to the 
first one.

Does this make sense?

> Allow users to opt-into spurious left/outer stream-stream join improvement
> --------------------------------------------------------------------------
>
>                 Key: KAFKA-12909
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12909
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Matthias J. Sax
>            Assignee: Matthias J. Sax
>            Priority: Blocker
>             Fix For: 3.1.0
>
>
> https://issues.apache.org/jira/browse/KAFKA-10847 improves left/outer 
> stream-stream join, by not emitting left/outer results eagerly, but only 
> after the grace period passed.
> While this change is desired, there is an issue with regard to upgrades: if 
> users don't specify a grace period, we fall back to a 24h default. Thus, 
> left/outer join results would only be emitted 24h after the join window end. 
> This change in behavior could break existing applications when upgrading to 
> 3.0.0 release. – And even if users do set a grace period explicitly, it's 
> still unclear if the new delayed output behavior would work for them.
> Thus, we propose to disable the fix of KAFAK-10847 by default, and let user 
> opt-into the fix explicitly instead.
> To allow users to enable the fix, we want to piggy-back on KIP-633 
> (https://issues.apache.org/jira/browse/KAFKA-8613) that deprecated the 
> existing `JoinWindows.of()` and `JoinWindows#grace()` methods in favor of 
> `JoinWindows.ofSizeAndGrace()` – if users don't update their code, we would 
> keep the fix disabled, and thus, if users upgrade their app nothing changes. 
> Only if users switch to the new `ofSizeAndGrace()` API, we enable the fix and 
> thus give users the opportunity to opt-in expliclity and pick an appropriate 
> grace period for their application.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to