Hi Chen, with version 1.10 Flink introduced that RocksDB uses Flink's managed memory [1]. This shall prevent RocksDB from exceeding the memory limits of a process/container. Unfortunately, this is not yet perfect due to a problem in RocksDB [2]. Due to this fact, RocksDB can still exceed the managed memory budget. What you could do is to configure a higher off-heap size for your tasks via taskmanager.memory.task.off-heap.size to compensate for this.
I also pull in Yu Li who can tell you more about the current limitations of the memory limitation for RocksDB. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/state/state_backends.html#memory-management [2] https://issues.apache.org/jira/browse/FLINK-15532 Cheers, Till On Tue, Mar 30, 2021 at 7:36 PM chenqin <qinnc...@gmail.com> wrote: > Hi Till, > > We did some investigation and found this memory usage point to > rocksdbstatebackend running on managed memory. So far we have seen this bug > in rocksdbstatebackend on managed memory. we followed suggestion [1] and > disabled managed memory management so far not seeing issue. > > I felt this might be a major bug since we run flink 1.11.2 with managed > RocksDBstatebackend in mulitple large production jobs and consistency repo > yarn kill after job runs a period of time. > > [1] > > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Debugging-quot-Container-is-running-beyond-physical-memory-limits-quot-on-YARN-for-a-long-running-stb-td38227.html > > > > > > -- > Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ >