[ 
https://issues.apache.org/jira/browse/KAFKA-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472257#comment-16472257
 ] 

Matthias J. Sax commented on KAFKA-6892:
----------------------------------------

Thanks for the details. Hard to say. But as you use RocksDB, it should just 
spill to disk and only have part of the state in-memory – you have many stores 
thought (pipeline A: 2 aggregations plus 3 stream-stream-join --2 stores each-- 
plus another aggregation; pipeline B: 2 aggregations; thus overall you have 5 
plus 6 stores – note that windowed-stream-stream joins, need to store all raw 
records).

But as you mention that the OS kills the app, it seem that there is no OOM 
exception. And RocksDB should also spill to disk... 

Configuring RocksDB is a "black art" – maybe it help to run part of the 
pipeline to see how much memory individual parts need? One more follow up: do 
you use Interactive Queries? If yes, you need to make sure to close all 
iterators – otherwise they leak memory.

> Kafka Streams memory usage grows over the time till OOM
> -------------------------------------------------------
>
>                 Key: KAFKA-6892
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6892
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 1.1.0
>            Reporter: Dawid Kulig
>            Priority: Minor
>         Attachments: kafka-streams-per-pod-resources-usage.png
>
>
> Hi. I am observing indefinite memory growth of my kafka-streams application. 
> It gets killed by the OS when reaching the memory limit (10gb). 
> It's running two unrelated pipelines (read from 4 source topics - 100 
> partitions each - aggregate data and write to two destination topics) 
> My environment: 
>  * Kubernetes cluster
>  * 4 app instances
>  * 10GB memory limit per pod (instance)
>  * JRE 8
> JVM / Streams app:
>  * -Xms2g
>  * -Xmx4g
>  * num.stream.threads = 4
>  * commit.interval.ms = 1000
>  * linger.ms = 1000
>  
> When my app is running for 24hours it reaches 10GB memory limit. Heap and GC 
> looks good, non-heap avg memory usage is 120MB. I've read it might be related 
> to the RocksDB that works underneath streams app, however I tried to tune it 
> using [confluent 
> doc|https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config]
>  unfortunately with no luck. 
> RocksDB config #1:
> {code:java}
> tableConfig.setBlockCacheSize(16 * 1024 * 1024L);
> tableConfig.setBlockSize(16 * 1024L);
> tableConfig.setCacheIndexAndFilterBlocks(true);
> options.setTableFormatConfig(tableConfig);
> options.setMaxWriteBufferNumber(2);{code}
> RocksDB config #2 
> {code:java}
> tableConfig.setBlockCacheSize(1024 * 1024L);
> tableConfig.setBlockSize(16 * 1024L);
> tableConfig.setCacheIndexAndFilterBlocks(true);
> options.setTableFormatConfig(tableConfig);
> options.setMaxWriteBufferNumber(2);
> options.setWriteBufferSize(8 * 1024L);{code}
>  
> This behavior has only been observed with our production traffic, where per 
> topic input message rate is 10msg/sec and is pretty much constant (no peaks). 
> I am attaching cluster resources usage from last 24h.
> Any help or advice would be much appreciated. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to