[ 
https://issues.apache.org/jira/browse/FLINK-29402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17610081#comment-17610081
 ] 

Yun Tang commented on FLINK-29402:
----------------------------------

Thanks for creating this ticket. From my understanding, this option would not 
be used in production environments. For benchmarking cases, I believe some 
streaming systems benchmarks would not enable direct IO, such as 
https://github.com/nexmark/nexmark , 
https://www.databricks.com/blog/2017/10/11/benchmarking-structured-streaming-on-databricks-runtime-against-state-of-the-art-streaming-systems.html,
 and https://github.com/Klarrio/open-stream-processing-benchmark . 
Moreover, we could still let these options enabled via code, I don't think it's 
so useful to introduce these two options considering we already have so many 
options.

> Add USE_DIRECT_READ configuration parameter for RocksDB
> -------------------------------------------------------
>
>                 Key: FLINK-29402
>                 URL: https://issues.apache.org/jira/browse/FLINK-29402
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>    Affects Versions: 1.16.0
>            Reporter: Donatien
>            Priority: Not a Priority
>              Labels: Enhancement, pull-request-available, rocksdb
>             Fix For: 1.16.0
>
>         Attachments: directIO-performance-comparison.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> RocksDB allows the use of DirectIO for read operations to bypass the Linux 
> Page Cache. To understand the impact of Linux Page Cache on performance, one 
> can run a heavy workload on a single-tasked Task Manager with a container 
> memory limit identical to the TM process memory. Running this same workload 
> on a TM with no container memory limit will result in better performances but 
> with the host memory exceeding the TM requirement.
> Linux Page Cache are of course useful but can give false results when 
> benchmarking the Managed Memory used by RocksDB. DirectIO is typically 
> enabled for benchmarks on working set estimation [Zwaenepoel et 
> al.|[https://arxiv.org/abs/1702.04323].]
> I propose to add a configuration key allowing users to enable the use of 
> DirectIO for reads thanks to the RocksDB API. This configuration would be 
> disabled by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to