[
https://issues.apache.org/jira/browse/KAFKA-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15468302#comment-15468302
]
Guozhang Wang commented on KAFKA-3779:
--------------------------------------
Are all KTable changelog stream contains deduped data even after KAFKA-3776?
Generally speaking, the KTable changelog will only be deduped if it has a
corresponding state store for upon creation. Currently we have the following
scenarios for creating a KTable.
1. {{builder.table()}} to read from a source topic.
2. aggregation operators that generate a windowed / non-windowed KTable.
3. KTable's non-stateful operators such as {{filter}} that generates a new
KTable.
4. KTable-KTable join that generate a new KTable.
Today 1) and 2) above have a state store for the generated KTable, and hence it
is dedupped; for 3) as long as the original KTable is deduped it will be
deduped as well; for 4) the resulted KTable is not backed by a state store
since it may not be deduped.
Hence the new function {{KTable.getStoreName()}} may still return a null value;
in this case does it still make sense to add the cache for its {{KTable.to()}}
function?
> Add the LRU cache for KTable.to() operator
> ------------------------------------------
>
> Key: KAFKA-3779
> URL: https://issues.apache.org/jira/browse/KAFKA-3779
> Project: Kafka
> Issue Type: Sub-task
> Components: streams
> Affects Versions: 0.10.1.0
> Reporter: Eno Thereska
> Fix For: 0.10.1.0
>
>
> The KTable.to operator currently does not use a cache. We can add a cache to
> this operator to deduplicate and reduce data traffic as well. This is to be
> done after KAFKA-3777.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)