[ https://issues.apache.org/jira/browse/KAFKA-7821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guozhang Wang reassigned KAFKA-7821: ------------------------------------ Assignee: (was: Guozhang Wang) > Streams: default cache size can lose session windows in high-throughput > deployment > ---------------------------------------------------------------------------------- > > Key: KAFKA-7821 > URL: https://issues.apache.org/jira/browse/KAFKA-7821 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 0.10.2.1, 2.1.0 > Reporter: Matthew Jarvie > Priority: Major > > We have observed that with a default cache size, a Streams aggregator will > sometimes fail to find existing, open session windows while handling records. > The effect is that it starts a new session and overwrites the old one and > events fail to aggregate together. > Our topology is fairly simple: We consume from a Kafka topic, group by keys, > aggregate, then produce to another topic. Our aggregator is configured to use > a window session strategy with an inactivity gap of 10 minutes and a > retention period of 10 minutes. The system is deployed in production and > handles about 250k messages per thread per minute (4 threads per > application). The cache size is left default (10 MB). > We worked around the issue by enlarging the cache (cache.max.bytes.buffering > configuration parameter from 10 MB to 100MB) and no longer observe the issue > at all. While troubleshooting, we noticed that older sessions would be the > ones lost, so it seems like the cache is an LRU cache and is evicting windows > before their inactivity time is up. > This was originally observed in 10.2.1. We completed an upgrade to 2.1.0 and > still observed the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)