[ 
https://issues.apache.org/jira/browse/KAFKA-13386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17560812#comment-17560812
 ] 

Guozhang Wang commented on KAFKA-13386:
---------------------------------------

Hi [~nikuis], I agree with the same FK1 it will be sent to the same node and 
hence won't have out of ordering data, but you'd still end up with two 
duplicated join result since the second subscription would come back and join 
with the current value again (assuming it's still the same).

I think by the time we have multi-versioned join, we can relax on the ordering 
of the emit since different versioned join result would not rely on the emit 
ordering any more.

> Foreign Key Join filtering out valid records after a code change / schema 
> evolved
> ---------------------------------------------------------------------------------
>
>                 Key: KAFKA-13386
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13386
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.6.2
>            Reporter: Sergio Duran Vegas
>            Priority: Major
>
> The join optimization assumes the serializer is deterministic and invariant 
> across upgrades. So in case of changes this opimitzation will drop 
> invalid/intermediate records. In other situations we have relied on the same 
> property, for example when computing whether an update is a duplicate result 
> or not.
>  
> The problem is that some serializers are sadly not deterministic.
>  
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionResolverJoinProcessorSupplier.java]
>  
> {code:java}
> //If this value doesn't match the current value from the original table, it 
> is stale and should be discarded.
>  if (java.util.Arrays.equals(messageHash, currentHash)) {{code}
>  
> A solution for this problem would be that the comparison use foreign-key 
> reference itself instead of the whole message hash.
>  
> The bug fix proposal is to be allow the user to choose between one method of 
> comparison or another (whole hash or Fk reference). This would fix the 
> problem of dropping valid records on certain cases and allow the user to also 
> choose the current optimized way of checking valid records and intermediate 
> results dropping.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to