[ 
https://issues.apache.org/jira/browse/FLINK-29402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609377#comment-17609377
 ] 

Donatien commented on FLINK-29402:
----------------------------------

I added an image showing the behaviour of an identical job under different 
configuration.

The job is the one used in the e2e tess of rocksDB memory control: 
[https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-rocksdb-state-memory-control-test/src/main/java/org/apache/flink/streaming/tests/RocksDBStateMemoryControlTestProgram.java]

The parameters used are as follows:
 * keyspace: 1.000.000
 * payload size: 5.000
 * The workload stops after sending 3Gb worth of records.
 * The mapper used is the ValueStateMapper
 * The payload is not appended to existing key but replacing the previous one.

The first column shows the job running on a 2Gb TM with a container limit of 
2Gb. On the first graph we have the backpressure endured by the previous 
operator (Source/split). The second graph shows that the TM is using Around 1Gb 
of memory (managed + heap + network + ...) but the pod is effectively using all 
of the available memory for Linux Page Cache. The third graph shows the 
ingestion in Mb/s). The last two graphs are RocksDB metrics: cache hit + miss, 
and cache usage.

The second colunm shows the same workload with the same amount of TM memory but 
with a container limit of 20Gb. The second graph shows that the container 
memory raises above the TM specification (around 5Gb, which is the size of the 
estimated state: 1.000.000 key multiplied by 5.000 bytes). We can see that 
there is a strongly decreased backpressure as well a better performances on the 
third graph.

The last column shows the same configuration as the second but with directIO 
enabled, thus not using Linux Page Cache. The graphs look similar to the first 
column as expected.

> Add USE_DIRECT_READ configuration parameter for RocksDB
> -------------------------------------------------------
>
>                 Key: FLINK-29402
>                 URL: https://issues.apache.org/jira/browse/FLINK-29402
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>    Affects Versions: 1.15.2
>            Reporter: Donatien
>            Priority: Not a Priority
>              Labels: Enhancement, rocksdb
>             Fix For: 1.15.2
>
>         Attachments: directIO-performance-comparison.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> RocksDB allows the use of DirectIO for read operations to bypass the Linux 
> Page Cache. To understand the impact of Linux Page Cache on performance, one 
> can run a heavy workload on a single-tasked Task Manager with a container 
> memory limit identical to the TM process memory. Running this same workload 
> on a TM with no container memory limit will result in better performances but 
> with the host memory exceeding the TM requirement.
> Linux Page Cache are of course useful but can give false results when 
> benchmarking the Managed Memory used by RocksDB. DirectIO is typically 
> enabled for benchmarks on working set estimation [Zwaenepoel et 
> al.|[https://arxiv.org/abs/1702.04323].]
> I propose to add a configuration key allowing users to enable the use of 
> DirectIO for reads thanks to the RocksDB API. This configuration would be 
> disabled by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to