[ 
https://issues.apache.org/jira/browse/CASSANDRA-14466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498703#comment-16498703
 ] 

Mulugeta Mammo commented on CASSANDRA-14466:
--------------------------------------------

[~aweisberg]

Thanks for the feedback. Yes, we are well aware of the traditional 
recommendation of not mixing O_DIRECT with buffered I/O as described in detail 
in the Linux manual pages here: 
[http://man7.org/linux/man-pages/man2/open.2.html]. However, our understanding 
is that latest kernel versions do a pretty good job of managing any possible 
page-cache coherence. We have confirmed this by running multiple experiments 
and by running Cassandra stress test with a mix read/write workload that uses 
O_DIRECT for read and buffered I/O for write. 

Regarding the general question on the performance benefit of O_DIRECT over 
buffered I/O:
 * O_DIRECT reduces the overhead of a file system cache pollution. Our 
experiments show that, when running a pure read workload under heavy load the 
system's performance is overwhelmed by the kernel resources that are needed to 
manage memory.  For example, we see a very high kernel CPU utilization that is 
close to 50%, vmstat shows a high memory paging activity and the perf profile 
shows try_to_unmap_one as one of the hottest kernel functions. This bottleneck 
is reproducible on various server systems with multiple high bandwidth flash 
storage devices (i.e. NVMe devices).
 * O_DIRECT avoids the double caching problem, to better utilize the Cassandra 
specific caching options.
 * O_DIRECT offers a more predictable latency independent of other processes 
using file system cache. 

*Details:*

Cassandra does not have a way to disable the OS file system caching on Linux.  
The option to bypass the file system cache in the JDK was first introduced in 
Java on JDK 10 with the introduction of Direct IO support.  We modified 
Cassandra to incorporate Direct IO support in the read and write IO path.The 
Write IO path did not provide performance improvements, hence to simplify the 
patch only the read IO path is being proposed here.    

This Direct IO patch produces the following relative performance improvements 
over baseline Cassandra:
 * Kernel CPU utilization is reduced by 90%. 
 * Cassandra Ops/sec throughput increases up to 80%
 * Mean, 95th, 99th latencies are reduced by 40-50%

General Hardware Configuration:
 * high core count servers (40+ cores)
 * 128GB+ DRAM
 * Dataset:DRAM ratio of 8:1 or greater
 * multiple NVMe devices

General Software Configuration:
 * Cassandra versions: Cassandra 3.x and trunk 4.0 versions
 * Cassandra process running: one
 * Schema compression settings: default LZ4 64KB chunk size
 * Schema: cqlstress-insanity-example.yaml
 * Benchmark: Cassandra-Stress
 * Runtime: 5 minutes or more depending on how long DRAM takes to reach steady 
state
 * Workload: 100% read
 * Heavily loaded system (number of client threads >200)
 * Kernel: version 4.0 and above
 * Other settings: Datastax production settings 
([https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configRecommendedSettings.html]


Thanks,

Mulugeta

> Enable Direct I/O 
> ------------------
>
>                 Key: CASSANDRA-14466
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14466
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Local Write-Read Paths
>            Reporter: Mulugeta Mammo
>            Priority: Major
>         Attachments: direct_io.patch
>
>
> Hi,
> JDK 10 introduced a new API for Direct IO that enables applications to bypass 
> the file system cache and potentially improve performance. Details of this 
> feature can be found at [https://bugs.openjdk.java.net/browse/JDK-8164900].
> This patch uses the JDK 10 API to enable Direct IO for the Cassandra read 
> path. By default, we have disabled this feature; but it can be enabled using 
> a  new configuration parameter, enable_direct_io_for_read_path. We have 
> conducted a Cassandra read-only stress test and measured a throughput gain of 
> up to 60% on flash drives.
> The patch requires JDK 10 Cassandra Support - 
> https://issues.apache.org/jira/browse/CASSANDRA-9608 
> Please review the patch and let us know your feedback.
> Thanks,
> [^direct_io.patch]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to