[ 
https://issues.apache.org/jira/browse/CASSANDRA-15452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844835#comment-17844835
 ] 

Jon Haddad edited comment on CASSANDRA-15452 at 5/9/24 1:42 AM:
----------------------------------------------------------------

I should share some additional information and point out some things that might 
not be immediately obvious in the graphs about the EBS test.
 * The GP3 EBS volume was configured at 3K IOPs with 256MB throughput.  
 * All tests were run with an empty page cache by doing {{
echo 3 | sudo tee /proc/sys/vm/drop_caches}}then a nodetool compact.
 * Compaction was not throttled.
 * Read ahead was set to 4KB.
 * The EBS volume was, as expected, completely limited by IOPS, as it peaks 
right at 3K, with read throughput on the drive around 18-20MB/s.  
 * The spikes in writes, presumably from fsync, cause the dip in read perf.
 * The patched version had a significant reduction in IOPS usage, using only 
~500 IOPS to achieve around 65MB/s, *more than a 3x* improvement in throughput 
and a {*}6x improvement in IOPS usage{*}.  In a production environment this 
would lead to more predictable read latencies, as compaction I/O competes with 
reads for disk resources.
 * Multiple compaction threads should be able to achieve an even more 
impressive improvement since we haven't yet reached the resource limit of the 
underlying device.
 * This patch should significantly improve node density when using EBS as 
compaction is a significant bottleneck.

Some other general information:
 * The patch does not yet take advantage of the fadvise call, which should help 
a production environment significantly by preventing compaction from polluting 
the page cache with pages that are about to be removed from the filesystem.

Overall I am very excited by the results we're seeing here and will follow up 
with additional testing.


was (Author: rustyrazorblade):
I should share some additional information and point out some things that might 
not be immediately obvious in the graphs about the EBS test.
 * The GP3 EBS volume was configured at 3K IOPs with 256MB throughput.  
 * All tests were run with an empty page cache by doing the following, then a 
nodetool compact.
 * Compaction was not throttled.
 * Read ahead was set to 4KB.
 * The EBS volume was, as expected, completely limited by IOPS, as it peaks 
right at 3K, with read throughput on the drive around 18-20MB/s.  
 * The spikes in writes, presumably from fsync, cause the dip in read perf.
 * The patched version had a significant reduction in IOPS usage, using only 
~500 IOPS to achieve around 65MB/s, *more than a 3x* improvement in throughput 
and a {*}6x improvement in IOPS usage{*}.  In a production environment this 
would lead to more predictable read latencies, as compaction I/O competes with 
reads for disk resources.
 * Multiple compaction threads should be able to achieve an even more 
impressive improvement since we haven't yet reached the resource limit of the 
underlying device.
 * This patch should significantly improve node density when using EBS as 
compaction is a significant bottleneck.

Some other general information:
 * The patch does not yet take advantage of the fadvise call, which should help 
a production environment significantly by preventing compaction from polluting 
the page cache with pages that are about to be removed from the filesystem.

Overall I am very excited by the results we're seeing here and will follow up 
with additional testing.

> Improve disk access patterns during compaction and streaming
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-15452
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15452
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Legacy/Local Write-Read Paths, Local/Compaction
>            Reporter: Jon Haddad
>            Assignee: Jordan West
>            Priority: Normal
>         Attachments: everyfs.txt, iostat-5.0-head.output, 
> iostat-5.0-patched.output, iostat-ebs-15452.png, iostat-ebs-head.png, 
> iostat-instance-15452.png, iostat-instance-head.png, results.txt, 
> sequential.fio
>
>
> On read heavy workloads Cassandra performs much better when using a low read 
> ahead setting.   In my tests I've seen an 5x improvement in throughput and 
> more than a 50% reduction in latency.  However, I've also observed that it 
> can have a negative impact on compaction and streaming throughput. It 
> especially negatively impacts cloud environments where small reads incur high 
> costs in IOPS due to tiny requests.
>  # We should investigate using POSIX_FADV_DONTNEED on files we're compacting 
> to see if we can improve performance and reduce page faults. 
>  # This should be combined with an internal read ahead style buffer that 
> Cassandra manages, similar to a BufferedInputStream but with our own 
> machinery.  This buffer should read fairly large blocks of data off disk at 
> at time.  EBS, for example, allows 1 IOP to be up to 256KB.  A considerable 
> amount of time is spent in blocking I/O during compaction and streaming. 
> Reducing the frequency we read from disk should speed up all sequential I/O 
> operations.
>  # We can reduce system calls by buffering writes as well, but I think it 
> will have less of an impact than the reads



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to