[jira] [Created] (FLINK-36655) using flink state processor api to process big state in rocksdb is very slow

guanghua pi (Jira) Mon, 04 Nov 2024 01:21:08 -0800

guanghua pi created FLINK-36655:
-----------------------------------

             Summary: using flink state processor api to process big state in 
rocksdb is very slow 
                 Key: FLINK-36655
                 URL: https://issues.apache.org/jira/browse/FLINK-36655
             Project: Flink
          Issue Type: Technical Debt
          Components: API / DataStream
    Affects Versions: 1.14.6, 1.13.6, 1.12.7
            Reporter: guanghua pi
         Attachments: image-2024-11-04-17-06-24-614.png


My current streaming task status backend is rocksdb. A savepoint will generate 
65g of data. I'm using State Processor API to read data on rocksdb. My demo 
program is very simple: it reads the original data and then writes it to 
another HDFS directory. I use the parameters of rocksdb: 
SPINNING_DISK_OPTIMIZED_HIGH_MEM. The configuration of my flink_config file is 
as follows:
||taskmanager.memory.managed.fraction: 0.1
taskmanager.memory.jvm-overhead.fraction: 0.05
taskmanager.memory.jvm-overhead.max: 128mb
taskmanager.memory.jvm-overhead.min: 64mb
taskmanager.memory.framework.off-heap.size: 64mb
taskmanager.memory.jvm-metaspace.size: 128m
taskmanager.memory.network.max: 128mb
taskmanager.memory.network.fraction: 0.1
taskmanager.memory.managed.size: 32mb
taskmanager.memory.task.off-heap.size: 2253mb
state.backend.rocksdb.memory.managed: false
state.backend.rocksdb.metrics.block-cache-capacity: true
state.backend.rocksdb.metrics.block-cache-pinned-usage: true
state.backend.rocksdb.metrics.block-cache-usage: true
state.backend.rocksdb.metrics.bloom-filter-full-positive: true
state.backend.rocksdb.memory.write-buffer-ratio: 0.5
state.backend.rocksdb.memory.high-prio-pool-ratio: 0.2
state.backend.rocksdb.memory.fixed-per-slot: 1024mb|| ||
|Col A1| |

this is my TM figure:

!image-2024-11-04-17-06-24-614.png!

TM memory and JM is -yjm 1G -ytm 3G
My current problem  slow below 

1. After running the program for 4 hours, I will encounter " Diagnostics: 
[2024-11-04 03:00:48.539]Container 
[pid=8166,containerID=container_1728961635507_3104_01_000007] is running 
765952B beyond the 'PHYSICAL' memory limit. Current usage: 3.0 GB of 3 GB 
physical memory used; 10.1 GB of 6.2 GB virtual memory used. Killing container."

2. The reading speed of rocksdb continues to slow down over time. For example, 
60w can be read in the first hour, but only 50w can be read in 1h. It will 
continue to decline in the end.

3. in log file , I find :
Obtained shared RocksDB cache of size 67108864 bytes . but I setting 
state.backend.rocksdb.memory.fixed-per-slot: 1024mb. The value cannot be 
matched.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36655) using flink state processor api to process big state in rocksdb is very slow

Reply via email to