Matthew Jarvie created KAFKA-7821:
-------------------------------------

             Summary: Default cache size can lose session windows in 
high-throughput deployment
                 Key: KAFKA-7821
                 URL: https://issues.apache.org/jira/browse/KAFKA-7821
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 2.1.0, 0.10.2.1
            Reporter: Matthew Jarvie


We have observed that with a default cache size, a Streams aggregator will 
sometimes fail to find existing, open session windows while handling records. 
The effect is that it starts a new session and overwrites the old one and 
events fail to aggregate together.

Our topology is fairly simple: We consume from a Kafka topic, group by keys, 
aggregate, then produce to another topic. Our aggregator is configured to use a 
window session strategy with an inactivity gap of 10 minutes and a retention 
period of 10 minutes. The system is deployed in production and handles about 
250k messages per thread per minute (4 threads per application). The cache size 
is left default (10 MB).

We worked around the issue by enlarging the cache (cache.max.bytes.buffering 
configuration parameter from 10 MB to 100MB) and no longer observe the issue at 
all. While troubleshooting, we noticed that older sessions would be the ones 
lost, so it seems like the cache is an LRU cache and is evicting windows before 
their inactivity time is up.

This was originally observed in 10.2.1. We completed an upgrade to 2.1.0 and 
still observed the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to