[
https://issues.apache.org/jira/browse/KAFKA-13386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17431420#comment-17431420
]
Guozhang Wang commented on KAFKA-13386:
---------------------------------------
Hello [~sduran] thanks for reporting your issue. My understanding is that when
the serializer's behavior changes upon an upgrade, records may then be sending
to a different partition after the upgrade, is that what you're suggesting? I'm
not sure though how that would cause valid records to be filtered, could you
elaborate a bit?
> Foreign Key Join filtering out valid records after a code change / schema
> evolved
> ---------------------------------------------------------------------------------
>
> Key: KAFKA-13386
> URL: https://issues.apache.org/jira/browse/KAFKA-13386
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 2.6.2
> Reporter: Sergio Duran Vegas
> Priority: Major
>
> The join optimization assumes the serializer is deterministic and invariant
> across upgrades. So in case of changes this opimitzation will drop
> invalid/intermediate records. In other situations we have relied on the same
> property, for example when computing whether an update is a duplicate result
> or not.
>
> The problem is that some serializers are sadly not deterministic.
>
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionResolverJoinProcessorSupplier.java]
>
> {code:java}
> //If this value doesn't match the current value from the original table, it
> is stale and should be discarded.
> if (java.util.Arrays.equals(messageHash, currentHash)) {{code}
>
> A solution for this problem would be that the comparison use foreign-key
> reference itself instead of the whole message hash.
>
> The bug fix proposal is to be allow the user to choose between one method of
> comparison or another (whole hash or Fk reference). This would fix the
> problem of dropping valid records on certain cases and allow the user to also
> choose the current optimized way of checking valid records and intermediate
> results dropping.
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)