[
https://issues.apache.org/jira/browse/KAFKA-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15924238#comment-15924238
]
Damian Guy commented on KAFKA-4609:
-----------------------------------
[~miguno] It is because the caches are flushed independently and both KTables
trigger the join, i.e., assuming you have {table1.join(table2)} and within a
single commit interval you received:
table1 A:1
table2 A:A
when the stores are flushed on the commit interval. We flush the store for
table1, this triggers the join and produces A:1:A. We then flush table2, this
triggers the join and produce A:1:A
> KTable/KTable join followed by groupBy and aggregate/count can result in
> incorrect results
> ------------------------------------------------------------------------------------------
>
> Key: KAFKA-4609
> URL: https://issues.apache.org/jira/browse/KAFKA-4609
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 0.10.1.1, 0.10.2.0
> Reporter: Damian Guy
> Assignee: Damian Guy
> Labels: architecture
>
> When caching is enabled, KTable/KTable joins can result in duplicate values
> being emitted. This will occur if there were updates to the same key in both
> tables. Each table is flushed independently, and each table will trigger the
> join, so you get two results for the same key.
> If we subsequently perform a groupBy and then aggregate operation we will now
> process these duplicates resulting in incorrect aggregated values. For
> example count will be double the value it should be.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)