[ https://issues.apache.org/jira/browse/KAFKA-13386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17431420#comment-17431420 ]
Guozhang Wang commented on KAFKA-13386: --------------------------------------- Hello [~sduran] thanks for reporting your issue. My understanding is that when the serializer's behavior changes upon an upgrade, records may then be sending to a different partition after the upgrade, is that what you're suggesting? I'm not sure though how that would cause valid records to be filtered, could you elaborate a bit? > Foreign Key Join filtering out valid records after a code change / schema > evolved > --------------------------------------------------------------------------------- > > Key: KAFKA-13386 > URL: https://issues.apache.org/jira/browse/KAFKA-13386 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 2.6.2 > Reporter: Sergio Duran Vegas > Priority: Major > > The join optimization assumes the serializer is deterministic and invariant > across upgrades. So in case of changes this opimitzation will drop > invalid/intermediate records. In other situations we have relied on the same > property, for example when computing whether an update is a duplicate result > or not. > > The problem is that some serializers are sadly not deterministic. > > [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionResolverJoinProcessorSupplier.java] > > {code:java} > //If this value doesn't match the current value from the original table, it > is stale and should be discarded. > if (java.util.Arrays.equals(messageHash, currentHash)) {{code} > > A solution for this problem would be that the comparison use foreign-key > reference itself instead of the whole message hash. > > The bug fix proposal is to be allow the user to choose between one method of > comparison or another (whole hash or Fk reference). This would fix the > problem of dropping valid records on certain cases and allow the user to also > choose the current optimized way of checking valid records and intermediate > results dropping. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)