[jira] [Commented] (FLINK-38212) OOM during savepoint caused by potential memory leak issue in RocksDB related to jemalloc

Grzegorz Liter (Jira) Thu, 07 Aug 2025 08:14:05 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-38212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012612#comment-18012612
 ]


Grzegorz Liter commented on FLINK-38212:
----------------------------------------

Pod memory usage during Savepoints taken without both options enabled:

!image-2025-08-07-17-14-03-023.png|width=611,height=401!

> OOM during savepoint caused by potential memory leak issue in RocksDB related 
> to jemalloc
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-38212
>                 URL: https://issues.apache.org/jira/browse/FLINK-38212
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.20.2, 2.1.0
>         Environment: Flink 2.1.0 running in Application mode with Flink 
> Operator 1.12.1.
> Memory and savepoint related settings:
> {code:java}
> env.java.opts.taskmanager: ' -XX:+UnlockExperimentalVMOptions 
> -XX:+UseStringDeduplication
>       -XX:+AlwaysPreTouch  -XX:G1HeapRegionSize=16m 
> -Xlog:gc*:file=/tmp/gc.log:time,uptime,level,tags
>       -XX:SurvivorRatio=6 -XX:G1NewSizePercent=40
> execution.checkpointing.max-concurrent-checkpoints: "1"
> execution.checkpointing.snapshot-compression: "true"
> fs.s3a.aws.credentials.provider: 
> com.amazonaws.auth.WebIdentityTokenCredentialsProvider
> fs.s3a.block.size: 
> fs.s3a.experimental.input.fadvise: sequential
> fs.s3a.path.style.access: "true" 
> state.backend.incremental: "true"
> state.backend.type: rocksdb 
> state.checkpoints.dir: s3p://bucket/checkpoints
> state.savepoints.dir: s3p://bucket/savepoints 
> taskmanager.memory.jvm-overhead.fraction: "0.1"
> taskmanager.memory.jvm-overhead.max: 6g
> taskmanager.memory.managed.fraction: "0.4"
> taskmanager.memory.network.fraction: "0.05"
> taskmanager.network.memory.buffer-debloat.enabled: "true"
> taskmanager.numberOfTaskSlots: "12"
> ...
> resource:
>   memory: 16g{code}
>  
>            Reporter: Grzegorz Liter
>            Priority: Major
>         Attachments: image-2025-08-07-17-13-33-041.png, 
> image-2025-08-07-17-14-03-023.png
>
>
> I am running a job with snapshot size about ~17 GB with compression enabled. 
> I have observed that savepoints often fails due to TM getting killed by 
> Kubernetes due to exceeding memory limit on pod that had 30 GB of memory 
> limit assigned.
> Flink metrics nor detailed VM metrics taken with `jcmd <PID> VM.native_memory 
> detail` does not indicate any unusual memory increase. Consumed memory is 
> visible only in Kubernetes metrics and RSS.
> When enough memory set (+ potentially setting enough jvm overhead) to leave 
> some breathing room one snapshot could be taken but taking subsequent full 
> snapshots reliably leads to OOM.
> This documentation: 
> [switching-the-memory-allocator|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#switching-the-memory-allocator]
>  have lead me to trying 
> {code:java}
> MALLOC_ARENA_MAX=1
> DISABLE_JEMALLOC=true {code}
> This configuration helped to make savepoint reliably pass without OOM. I have 
> trying setting only one of each options at once but that was not fixing the 
> issue.
> I also tried downscaling pod down to 16 GB of memory and with these options 
> savepoint was reliably created without any issue. Without them every 
> savepoint fails.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-38212) OOM during savepoint caused by potential memory leak issue in RocksDB related to jemalloc

Reply via email to