[ https://issues.apache.org/jira/browse/FLINK-29402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17610081#comment-17610081 ]
Yun Tang commented on FLINK-29402: ---------------------------------- Thanks for creating this ticket. From my understanding, this option would not be used in production environments. For benchmarking cases, I believe some streaming systems benchmarks would not enable direct IO, such as https://github.com/nexmark/nexmark , https://www.databricks.com/blog/2017/10/11/benchmarking-structured-streaming-on-databricks-runtime-against-state-of-the-art-streaming-systems.html, and https://github.com/Klarrio/open-stream-processing-benchmark . Moreover, we could still let these options enabled via code, I don't think it's so useful to introduce these two options considering we already have so many options. > Add USE_DIRECT_READ configuration parameter for RocksDB > ------------------------------------------------------- > > Key: FLINK-29402 > URL: https://issues.apache.org/jira/browse/FLINK-29402 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends > Affects Versions: 1.16.0 > Reporter: Donatien > Priority: Not a Priority > Labels: Enhancement, pull-request-available, rocksdb > Fix For: 1.16.0 > > Attachments: directIO-performance-comparison.png > > Original Estimate: 1h > Remaining Estimate: 1h > > RocksDB allows the use of DirectIO for read operations to bypass the Linux > Page Cache. To understand the impact of Linux Page Cache on performance, one > can run a heavy workload on a single-tasked Task Manager with a container > memory limit identical to the TM process memory. Running this same workload > on a TM with no container memory limit will result in better performances but > with the host memory exceeding the TM requirement. > Linux Page Cache are of course useful but can give false results when > benchmarking the Managed Memory used by RocksDB. DirectIO is typically > enabled for benchmarks on working set estimation [Zwaenepoel et > al.|[https://arxiv.org/abs/1702.04323].] > I propose to add a configuration key allowing users to enable the use of > DirectIO for reads thanks to the RocksDB API. This configuration would be > disabled by default. -- This message was sent by Atlassian Jira (v8.20.10#820010)