[jira] [Commented] (FLINK-29402) Add USE_DIRECT_READ configuration parameter for RocksDB

2022-09-25 Thread Yanfei Lei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609277#comment-17609277
 ] 

Yanfei Lei commented on FLINK-29402:


This is a very interesting proposal, I think this is not hard to implement in 
Flink. From the [wiki|https://github.com/facebook/rocksdb/wiki/Direct-IO] there 
are two options to control the DirectIO: {{use_direct_reads}} and 
{{use_direct_io_for_flush_and_compaction, }}and these two options are supported 
by current {{{}frocksdb-jni(6.20.3){}}}.  

BTW, do you have quantitative benchmark results about DirectIO *ON* vs DirectIO 
{*}OFF{*}?

> Add USE_DIRECT_READ configuration parameter for RocksDB
> ---
>
> Key: FLINK-29402
> URL: https://issues.apache.org/jira/browse/FLINK-29402
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.15.2
>Reporter: Donatien
>Priority: Not a Priority
>  Labels: Enhancement, rocksdb
> Fix For: 1.15.2
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> RocksDB allows the use of DirectIO for read operations to bypass the Linux 
> Page Cache. To understand the impact of Linux Page Cache on performance, one 
> can run a heavy workload on a single-tasked Task Manager with a container 
> memory limit identical to the TM process memory. Running this same workload 
> on a TM with no container memory limit will result in better performances but 
> with the host memory exceeding the TM requirement.
> Linux Page Cache are of course useful but can give false results when 
> benchmarking the Managed Memory used by RocksDB. DirectIO is typically 
> enabled for benchmarks on working set estimation [Zwaenepoel et 
> al.|[https://arxiv.org/abs/1702.04323].]
> I propose to add a configuration key allowing users to enable the use of 
> DirectIO for reads thanks to the RocksDB API. This configuration would be 
> disabled by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29402) Add USE_DIRECT_READ configuration parameter for RocksDB

2022-09-25 Thread Donatien (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609298#comment-17609298
 ] 

Donatien commented on FLINK-29402:
--

Indeed, with RocksDB API it is easy to add this new option (two new options 
with {{use_direct_io_for_flush_and_compaction). I am not quite familiar with 
the process of adding a new option, e.g adding it to the doc, localization, 
..., but if you give me some resources about guidelines I'll be happy to create 
a PR.}}

 

I will edit my ticket later to add some examples with Grafana of the impact of 
DirectIO on performance.

> Add USE_DIRECT_READ configuration parameter for RocksDB
> ---
>
> Key: FLINK-29402
> URL: https://issues.apache.org/jira/browse/FLINK-29402
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.15.2
>Reporter: Donatien
>Priority: Not a Priority
>  Labels: Enhancement, rocksdb
> Fix For: 1.15.2
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> RocksDB allows the use of DirectIO for read operations to bypass the Linux 
> Page Cache. To understand the impact of Linux Page Cache on performance, one 
> can run a heavy workload on a single-tasked Task Manager with a container 
> memory limit identical to the TM process memory. Running this same workload 
> on a TM with no container memory limit will result in better performances but 
> with the host memory exceeding the TM requirement.
> Linux Page Cache are of course useful but can give false results when 
> benchmarking the Managed Memory used by RocksDB. DirectIO is typically 
> enabled for benchmarks on working set estimation [Zwaenepoel et 
> al.|[https://arxiv.org/abs/1702.04323].]
> I propose to add a configuration key allowing users to enable the use of 
> DirectIO for reads thanks to the RocksDB API. This configuration would be 
> disabled by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29402) Add USE_DIRECT_READ configuration parameter for RocksDB

2022-09-26 Thread Donatien (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609377#comment-17609377
 ] 

Donatien commented on FLINK-29402:
--

I added an image showing the behaviour of an identical job under different 
configuration.

The job is the one used in the e2e tess of rocksDB memory control: 
[https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-rocksdb-state-memory-control-test/src/main/java/org/apache/flink/streaming/tests/RocksDBStateMemoryControlTestProgram.java]

The parameters used are as follows:
 * keyspace: 1.000.000
 * payload size: 5.000
 * The workload stops after sending 3Gb worth of records.
 * The mapper used is the ValueStateMapper
 * The payload is not appended to existing key but replacing the previous one.

The first column shows the job running on a 2Gb TM with a container limit of 
2Gb. On the first graph we have the backpressure endured by the previous 
operator (Source/split). The second graph shows that the TM is using Around 1Gb 
of memory (managed + heap + network + ...) but the pod is effectively using all 
of the available memory for Linux Page Cache. The third graph shows the 
ingestion in Mb/s). The last two graphs are RocksDB metrics: cache hit + miss, 
and cache usage.

The second colunm shows the same workload with the same amount of TM memory but 
with a container limit of 20Gb. The second graph shows that the container 
memory raises above the TM specification (around 5Gb, which is the size of the 
estimated state: 1.000.000 key multiplied by 5.000 bytes). We can see that 
there is a strongly decreased backpressure as well a better performances on the 
third graph.

The last column shows the same configuration as the second but with directIO 
enabled, thus not using Linux Page Cache. The graphs look similar to the first 
column as expected.

> Add USE_DIRECT_READ configuration parameter for RocksDB
> ---
>
> Key: FLINK-29402
> URL: https://issues.apache.org/jira/browse/FLINK-29402
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.15.2
>Reporter: Donatien
>Priority: Not a Priority
>  Labels: Enhancement, rocksdb
> Fix For: 1.15.2
>
> Attachments: directIO-performance-comparison.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> RocksDB allows the use of DirectIO for read operations to bypass the Linux 
> Page Cache. To understand the impact of Linux Page Cache on performance, one 
> can run a heavy workload on a single-tasked Task Manager with a container 
> memory limit identical to the TM process memory. Running this same workload 
> on a TM with no container memory limit will result in better performances but 
> with the host memory exceeding the TM requirement.
> Linux Page Cache are of course useful but can give false results when 
> benchmarking the Managed Memory used by RocksDB. DirectIO is typically 
> enabled for benchmarks on working set estimation [Zwaenepoel et 
> al.|[https://arxiv.org/abs/1702.04323].]
> I propose to add a configuration key allowing users to enable the use of 
> DirectIO for reads thanks to the RocksDB API. This configuration would be 
> disabled by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29402) Add USE_DIRECT_READ configuration parameter for RocksDB

2022-09-26 Thread Yanfei Lei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609421#comment-17609421
 ] 

Yanfei Lei commented on FLINK-29402:


Thanks for your benchmark results. Comparing the second colunm and the third 
colunm, “num bytes in per second” becomes lower, maybe we should point this out 
in the documentation.

AFAIK, there isn't a guideline about adding a new option, but I think you can 
refer to this ticket to start: https://issues.apache.org/jira/browse/FLINK-20496

CC [~yunta], [~yuanmei] 

> Add USE_DIRECT_READ configuration parameter for RocksDB
> ---
>
> Key: FLINK-29402
> URL: https://issues.apache.org/jira/browse/FLINK-29402
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.15.2
>Reporter: Donatien
>Priority: Not a Priority
>  Labels: Enhancement, rocksdb
> Fix For: 1.15.2
>
> Attachments: directIO-performance-comparison.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> RocksDB allows the use of DirectIO for read operations to bypass the Linux 
> Page Cache. To understand the impact of Linux Page Cache on performance, one 
> can run a heavy workload on a single-tasked Task Manager with a container 
> memory limit identical to the TM process memory. Running this same workload 
> on a TM with no container memory limit will result in better performances but 
> with the host memory exceeding the TM requirement.
> Linux Page Cache are of course useful but can give false results when 
> benchmarking the Managed Memory used by RocksDB. DirectIO is typically 
> enabled for benchmarks on working set estimation [Zwaenepoel et 
> al.|[https://arxiv.org/abs/1702.04323].]
> I propose to add a configuration key allowing users to enable the use of 
> DirectIO for reads thanks to the RocksDB API. This configuration would be 
> disabled by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29402) Add USE_DIRECT_READ configuration parameter for RocksDB

2022-09-27 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17610081#comment-17610081
 ] 

Yun Tang commented on FLINK-29402:
--

Thanks for creating this ticket. From my understanding, this option would not 
be used in production environments. For benchmarking cases, I believe some 
streaming systems benchmarks would not enable direct IO, such as 
https://github.com/nexmark/nexmark , 
https://www.databricks.com/blog/2017/10/11/benchmarking-structured-streaming-on-databricks-runtime-against-state-of-the-art-streaming-systems.html,
 and https://github.com/Klarrio/open-stream-processing-benchmark . 
Moreover, we could still let these options enabled via code, I don't think it's 
so useful to introduce these two options considering we already have so many 
options.

> Add USE_DIRECT_READ configuration parameter for RocksDB
> ---
>
> Key: FLINK-29402
> URL: https://issues.apache.org/jira/browse/FLINK-29402
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.16.0
>Reporter: Donatien
>Priority: Not a Priority
>  Labels: Enhancement, pull-request-available, rocksdb
> Fix For: 1.16.0
>
> Attachments: directIO-performance-comparison.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> RocksDB allows the use of DirectIO for read operations to bypass the Linux 
> Page Cache. To understand the impact of Linux Page Cache on performance, one 
> can run a heavy workload on a single-tasked Task Manager with a container 
> memory limit identical to the TM process memory. Running this same workload 
> on a TM with no container memory limit will result in better performances but 
> with the host memory exceeding the TM requirement.
> Linux Page Cache are of course useful but can give false results when 
> benchmarking the Managed Memory used by RocksDB. DirectIO is typically 
> enabled for benchmarks on working set estimation [Zwaenepoel et 
> al.|[https://arxiv.org/abs/1702.04323].]
> I propose to add a configuration key allowing users to enable the use of 
> DirectIO for reads thanks to the RocksDB API. This configuration would be 
> disabled by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29402) Add USE_DIRECT_READ configuration parameter for RocksDB

2022-10-03 Thread Donatien (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612208#comment-17612208
 ] 

Donatien commented on FLINK-29402:
--

Thanks for your comment! Considering that Facebook uses DirectIO for reads and 
writes when performing benchmarks 
([https://www.usenix.org/system/files/fast20-cao_zhichao.pdf)] on RocksDB, I 
would say it is best practice to also enable DirectIO for Flink benchmarks 
using RocksDB. Disabling DirectIO can lead to unpredictable experiments 
depending on 1. the container memory limit 2. the amount of free heap memory 
used by the Page Cache. Again I understand that it is only for research 
purposes and agree that this could be done programmatically.

> Add USE_DIRECT_READ configuration parameter for RocksDB
> ---
>
> Key: FLINK-29402
> URL: https://issues.apache.org/jira/browse/FLINK-29402
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.16.0
>Reporter: Donatien
>Priority: Not a Priority
>  Labels: Enhancement, pull-request-available, rocksdb
> Fix For: 1.17.0
>
> Attachments: directIO-performance-comparison.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> RocksDB allows the use of DirectIO for read operations to bypass the Linux 
> Page Cache. To understand the impact of Linux Page Cache on performance, one 
> can run a heavy workload on a single-tasked Task Manager with a container 
> memory limit identical to the TM process memory. Running this same workload 
> on a TM with no container memory limit will result in better performances but 
> with the host memory exceeding the TM requirement.
> Linux Page Cache are of course useful but can give false results when 
> benchmarking the Managed Memory used by RocksDB. DirectIO is typically 
> enabled for benchmarks on working set estimation [Zwaenepoel et 
> al.|[https://arxiv.org/abs/1702.04323].]
> I propose to add a configuration key allowing users to enable the use of 
> DirectIO for reads thanks to the RocksDB API. This configuration would be 
> disabled by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29402) Add USE_DIRECT_READ configuration parameter for RocksDB

2022-10-20 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620861#comment-17620861
 ] 

Yuan Mei commented on FLINK-29402:
--

[~donaschmi] 
 * I understand enabling/disabling DirectIO leads to different performance 
results. That seems obvious because of page caching.
 * Wondering whether this option is introduced purely for benchmarking or 
research performance testing?
 * If yes, I would be hesitant to introduce a new option purely for testing 
purposes. I share the same concern as Yun Tang. 
 * Rocksdb Configuration options have already been complicated, and we should 
not introduce more to confuse normal users if not having to.

> Add USE_DIRECT_READ configuration parameter for RocksDB
> ---
>
> Key: FLINK-29402
> URL: https://issues.apache.org/jira/browse/FLINK-29402
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.16.0
>Reporter: Donatien
>Priority: Not a Priority
>  Labels: Enhancement, pull-request-available, rocksdb
> Fix For: 1.17.0
>
> Attachments: directIO-performance-comparison.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> RocksDB allows the use of DirectIO for read operations to bypass the Linux 
> Page Cache. To understand the impact of Linux Page Cache on performance, one 
> can run a heavy workload on a single-tasked Task Manager with a container 
> memory limit identical to the TM process memory. Running this same workload 
> on a TM with no container memory limit will result in better performances but 
> with the host memory exceeding the TM requirement.
> Linux Page Cache are of course useful but can give false results when 
> benchmarking the Managed Memory used by RocksDB. DirectIO is typically 
> enabled for benchmarks on working set estimation [Zwaenepoel et 
> al.|[https://arxiv.org/abs/1702.04323].]
> I propose to add a configuration key allowing users to enable the use of 
> DirectIO for reads thanks to the RocksDB API. This configuration would be 
> disabled by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29402) Add USE_DIRECT_READ configuration parameter for RocksDB

2022-10-20 Thread Donatien (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620901#comment-17620901
 ] 

Donatien commented on FLINK-29402:
--

Hey Yuan! This concern of enabling DirectIO for benchmarking came from a 
Facebook paper 
([https://www.usenix.org/system/files/fast20-cao_zhichao.pdf)|https://www.usenix.org/system/files/fast20-cao_zhichao.pdf]
 where they disabled page caching for their benchmark. It makes sense to avoid 
any external phenomenon that could interfere with the actual performance of a 
k-v store. 
In the case of benchmarking Flink stateful stream processing, I would say that 
this observation also stands. But I 100% agree that it is only for research 
purposes and should not be introduced to normal users.

> Add USE_DIRECT_READ configuration parameter for RocksDB
> ---
>
> Key: FLINK-29402
> URL: https://issues.apache.org/jira/browse/FLINK-29402
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.16.0
>Reporter: Donatien
>Priority: Not a Priority
>  Labels: Enhancement, pull-request-available, rocksdb
> Fix For: 1.17.0
>
> Attachments: directIO-performance-comparison.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> RocksDB allows the use of DirectIO for read operations to bypass the Linux 
> Page Cache. To understand the impact of Linux Page Cache on performance, one 
> can run a heavy workload on a single-tasked Task Manager with a container 
> memory limit identical to the TM process memory. Running this same workload 
> on a TM with no container memory limit will result in better performances but 
> with the host memory exceeding the TM requirement.
> Linux Page Cache are of course useful but can give false results when 
> benchmarking the Managed Memory used by RocksDB. DirectIO is typically 
> enabled for benchmarks on working set estimation [Zwaenepoel et 
> al.|[https://arxiv.org/abs/1702.04323].]
> I propose to add a configuration key allowing users to enable the use of 
> DirectIO for reads thanks to the RocksDB API. This configuration would be 
> disabled by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29402) Add USE_DIRECT_READ configuration parameter for RocksDB

2022-10-20 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620916#comment-17620916
 ] 

Yuan Mei commented on FLINK-29402:
--

# benchmarking for a pure k-v store ruling out other factors (like page cache) 
totally makes sense.
 # However, from Flink perspective, it is more reasonable to take Flink engine 
as an entire piece. In this case, most likely we should and need to use Page 
Cache. Benchmarking with page cache aligns better with a real-world use case. 
But that's a different topic, I would say.

Since you agree as well that we do not introduce a new config purely for 
benchmarking purposes, I am going to close this ticket.

> Add USE_DIRECT_READ configuration parameter for RocksDB
> ---
>
> Key: FLINK-29402
> URL: https://issues.apache.org/jira/browse/FLINK-29402
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.16.0
>Reporter: Donatien
>Priority: Not a Priority
>  Labels: Enhancement, pull-request-available, rocksdb
> Fix For: 1.17.0
>
> Attachments: directIO-performance-comparison.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> RocksDB allows the use of DirectIO for read operations to bypass the Linux 
> Page Cache. To understand the impact of Linux Page Cache on performance, one 
> can run a heavy workload on a single-tasked Task Manager with a container 
> memory limit identical to the TM process memory. Running this same workload 
> on a TM with no container memory limit will result in better performances but 
> with the host memory exceeding the TM requirement.
> Linux Page Cache are of course useful but can give false results when 
> benchmarking the Managed Memory used by RocksDB. DirectIO is typically 
> enabled for benchmarks on working set estimation [Zwaenepoel et 
> al.|[https://arxiv.org/abs/1702.04323].]
> I propose to add a configuration key allowing users to enable the use of 
> DirectIO for reads thanks to the RocksDB API. This configuration would be 
> disabled by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)