[
https://issues.apache.org/jira/browse/KAFKA-7821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthias J. Sax reassigned KAFKA-7821:
--------------------------------------
Assignee: Guozhang Wang
> Streams: default cache size can lose session windows in high-throughput
> deployment
> ----------------------------------------------------------------------------------
>
> Key: KAFKA-7821
> URL: https://issues.apache.org/jira/browse/KAFKA-7821
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 0.10.2.1, 2.1.0
> Reporter: Matthew Jarvie
> Assignee: Guozhang Wang
> Priority: Major
>
> We have observed that with a default cache size, a Streams aggregator will
> sometimes fail to find existing, open session windows while handling records.
> The effect is that it starts a new session and overwrites the old one and
> events fail to aggregate together.
> Our topology is fairly simple: We consume from a Kafka topic, group by keys,
> aggregate, then produce to another topic. Our aggregator is configured to use
> a window session strategy with an inactivity gap of 10 minutes and a
> retention period of 10 minutes. The system is deployed in production and
> handles about 250k messages per thread per minute (4 threads per
> application). The cache size is left default (10 MB).
> We worked around the issue by enlarging the cache (cache.max.bytes.buffering
> configuration parameter from 10 MB to 100MB) and no longer observe the issue
> at all. While troubleshooting, we noticed that older sessions would be the
> ones lost, so it seems like the cache is an LRU cache and is evicting windows
> before their inactivity time is up.
> This was originally observed in 10.2.1. We completed an upgrade to 2.1.0 and
> still observed the issue.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)