[jira] [Assigned] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached

Matthias J. Sax (Jira) Wed, 19 Aug 2020 16:06:20 -0700


     [ 
https://issues.apache.org/jira/browse/KAFKA-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Matthias J. Sax reassigned KAFKA-10396:
---------------------------------------

    Assignee: Rohan Desai

> Overall memory of container keep on growing due to kafka stream / rocksdb and 
> OOM killed once limit reached
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-10396
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10396
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.3.1, 2.5.0
>            Reporter: Vagesh Mathapati
>            Assignee: Rohan Desai
>            Priority: Critical
>         Attachments: CustomRocksDBConfig.java, MyStreamProcessor.java, 
> kafkaStreamConfig.java
>
>
> We are observing that overall memory of our container keep on growing and 
> never came down.
> After analysis find out that rocksdbjni.so is keep on allocating 64M chunks 
> of memory off-heap and never releases back. This causes OOM kill after memory 
> reaches configured limit.
> We use Kafka stream and globalktable for our many kafka topics.
> Below is our environment
>  * Kubernetes cluster
>  * openjdk 11.0.7 2020-04-14 LTS
>  * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS)
>  * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode)
>  * Springboot 2.3
>  * spring-kafka-2.5.0
>  * kafka-streams-2.5.0
>  * kafka-streams-avro-serde-5.4.0
>  * rocksdbjni-5.18.3
> Observed same result with kafka 2.3 version.
> Below is the snippet of our analysis
> from pmap output we took addresses from these 64M allocations (RSS)
> Address Kbytes RSS Dirty Mode Mapping
> 00007f3ce8000000 65536 65532 65532 rw--- [ anon ]
> 00007f3cf4000000 65536 65536 65536 rw--- [ anon ]
> 00007f3d64000000 65536 65536 65536 rw--- [ anon ]
> We tried to match with memory allocation logs enabled with the help of Azul 
> systems team.
> @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff7ca0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4]
>  - 0x7f3ce8ff9780
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ff9750
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff97c0
>  @ 
> /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ffcd10
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ffcd10
> We also identified that content on this 64M is just 0s and no any data 
> present in it.
> I tried to tune the rocksDB configuratino as mentioned but it did not helped. 
> [https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config]
>  
> Please let me know if you need any more details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached

Reply via email to