guanghua pi created FLINK-36655:
-----------------------------------
Summary: using flink state processor api to process big state in
rocksdb is very slow
Key: FLINK-36655
URL: https://issues.apache.org/jira/browse/FLINK-36655
Project: Flink
Issue Type: Technical Debt
Components: API / DataStream
Affects Versions: 1.14.6, 1.13.6, 1.12.7
Reporter: guanghua pi
Attachments: image-2024-11-04-17-06-24-614.png
My current streaming task status backend is rocksdb. A savepoint will generate
65g of data. I'm using State Processor API to read data on rocksdb. My demo
program is very simple: it reads the original data and then writes it to
another HDFS directory. I use the parameters of rocksdb:
SPINNING_DISK_OPTIMIZED_HIGH_MEM. The configuration of my flink_config file is
as follows:
||taskmanager.memory.managed.fraction: 0.1
taskmanager.memory.jvm-overhead.fraction: 0.05
taskmanager.memory.jvm-overhead.max: 128mb
taskmanager.memory.jvm-overhead.min: 64mb
taskmanager.memory.framework.off-heap.size: 64mb
taskmanager.memory.jvm-metaspace.size: 128m
taskmanager.memory.network.max: 128mb
taskmanager.memory.network.fraction: 0.1
taskmanager.memory.managed.size: 32mb
taskmanager.memory.task.off-heap.size: 2253mb
state.backend.rocksdb.memory.managed: false
state.backend.rocksdb.metrics.block-cache-capacity: true
state.backend.rocksdb.metrics.block-cache-pinned-usage: true
state.backend.rocksdb.metrics.block-cache-usage: true
state.backend.rocksdb.metrics.bloom-filter-full-positive: true
state.backend.rocksdb.memory.write-buffer-ratio: 0.5
state.backend.rocksdb.memory.high-prio-pool-ratio: 0.2
state.backend.rocksdb.memory.fixed-per-slot: 1024mb|| ||
|Col A1| |
this is my TM figure:
!image-2024-11-04-17-06-24-614.png!
TM memory and JM is -yjm 1G -ytm 3G
My current problem slow below
1. After running the program for 4 hours, I will encounter " Diagnostics:
[2024-11-04 03:00:48.539]Container
[pid=8166,containerID=container_1728961635507_3104_01_000007] is running
765952B beyond the 'PHYSICAL' memory limit. Current usage: 3.0 GB of 3 GB
physical memory used; 10.1 GB of 6.2 GB virtual memory used. Killing container."
2. The reading speed of rocksdb continues to slow down over time. For example,
60w can be read in the first hour, but only 50w can be read in 1h. It will
continue to decline in the end.
3. in log file , I find :
Obtained shared RocksDB cache of size 67108864 bytes . but I setting
state.backend.rocksdb.memory.fixed-per-slot: 1024mb. The value cannot be
matched.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)