Sergio Duran Vegas created KAFKA-13386:
------------------------------------------
Summary: Foreign Key Join filtering valid records after a code
change / schema evolved
Key: KAFKA-13386
URL: https://issues.apache.org/jira/browse/KAFKA-13386
Project: Kafka
Issue Type: Bug
Components: streams
Affects Versions: 2.6.2
Reporter: Sergio Duran Vegas
The join optimization assumes the serializer is deterministic and invariant
across upgrades. I can recall other discussions in which we wanted to rely on
the same property, for example when computing whether an update is a duplicate
result or not.
[https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionResolverJoinProcessorSupplier.java]
{code:java}
//If this value doesn't match the current value from the original table, it is
stale and should be discarded.
if (java.util.Arrays.equals(messageHash, currentHash)) {{code}
A solution for this problem would be that the comparison use foreign-key
reference itself instead of the whole message hash.
The bug fix proposal is to be allow the user to choose between one method of
comparison or another (whole hash or Fk reference). This would fix the problem
of dropping valid records on certain cases and allow the user to also choose
the current optimized way of checking valid records and intermediate results
dropping.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)