[
https://issues.apache.org/jira/browse/FLINK-38212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012612#comment-18012612
]
Grzegorz Liter commented on FLINK-38212:
----------------------------------------
Pod memory usage during Savepoints taken without both options enabled:
!image-2025-08-07-17-14-03-023.png|width=611,height=401!
> OOM during savepoint caused by potential memory leak issue in RocksDB related
> to jemalloc
> -----------------------------------------------------------------------------------------
>
> Key: FLINK-38212
> URL: https://issues.apache.org/jira/browse/FLINK-38212
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Affects Versions: 1.20.2, 2.1.0
> Environment: Flink 2.1.0 running in Application mode with Flink
> Operator 1.12.1.
> Memory and savepoint related settings:
> {code:java}
> env.java.opts.taskmanager: ' -XX:+UnlockExperimentalVMOptions
> -XX:+UseStringDeduplication
> -XX:+AlwaysPreTouch -XX:G1HeapRegionSize=16m
> -Xlog:gc*:file=/tmp/gc.log:time,uptime,level,tags
> -XX:SurvivorRatio=6 -XX:G1NewSizePercent=40
> execution.checkpointing.max-concurrent-checkpoints: "1"
> execution.checkpointing.snapshot-compression: "true"
> fs.s3a.aws.credentials.provider:
> com.amazonaws.auth.WebIdentityTokenCredentialsProvider
> fs.s3a.block.size:
> fs.s3a.experimental.input.fadvise: sequential
> fs.s3a.path.style.access: "true"
> state.backend.incremental: "true"
> state.backend.type: rocksdb
> state.checkpoints.dir: s3p://bucket/checkpoints
> state.savepoints.dir: s3p://bucket/savepoints
> taskmanager.memory.jvm-overhead.fraction: "0.1"
> taskmanager.memory.jvm-overhead.max: 6g
> taskmanager.memory.managed.fraction: "0.4"
> taskmanager.memory.network.fraction: "0.05"
> taskmanager.network.memory.buffer-debloat.enabled: "true"
> taskmanager.numberOfTaskSlots: "12"
> ...
> resource:
> memory: 16g{code}
>
> Reporter: Grzegorz Liter
> Priority: Major
> Attachments: image-2025-08-07-17-13-33-041.png,
> image-2025-08-07-17-14-03-023.png
>
>
> I am running a job with snapshot size about ~17 GB with compression enabled.
> I have observed that savepoints often fails due to TM getting killed by
> Kubernetes due to exceeding memory limit on pod that had 30 GB of memory
> limit assigned.
> Flink metrics nor detailed VM metrics taken with `jcmd <PID> VM.native_memory
> detail` does not indicate any unusual memory increase. Consumed memory is
> visible only in Kubernetes metrics and RSS.
> When enough memory set (+ potentially setting enough jvm overhead) to leave
> some breathing room one snapshot could be taken but taking subsequent full
> snapshots reliably leads to OOM.
> This documentation:
> [switching-the-memory-allocator|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#switching-the-memory-allocator]
> have lead me to trying
> {code:java}
> MALLOC_ARENA_MAX=1
> DISABLE_JEMALLOC=true {code}
> This configuration helped to make savepoint reliably pass without OOM. I have
> trying setting only one of each options at once but that was not fixing the
> issue.
> I also tried downscaling pod down to 16 GB of memory and with these options
> savepoint was reliably created without any issue. Without them every
> savepoint fails.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)