[ https://issues.apache.org/jira/browse/CASSANDRA-15452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823391#comment-17823391 ]
Jon Haddad commented on CASSANDRA-15452: ---------------------------------------- I took another look at this. This lets us extract every read operation against a single data file: {noformat} awk '$4 == "R" { print $0 }' everyfs.txt | grep '30-bti-Data.db' > 30-bti-data.txt{noformat} If you glance at the end of the data, the last entry is this: {noformat} 23:47:12 CompactionExec 44651 R 2699 12483 0.00 da-30-bti-Data.db{noformat} The data file is only 15KB. But we're doing over 6 thousand reads {noformat} wc -l ../research/30-bti-data.txt 6420 ../research/30-bti-data.txt{noformat} The 5th column is the number of bytes read. Summing this: {noformat} awk '{ sum += $5; } END {print sum}' ../research/30-bti-data.txt 25571844{noformat} = 25MB which is a lot to pull through the filesystem when in an optimal situation have done a single 16KB read. > Improve disk access patterns during compaction and streaming > ------------------------------------------------------------ > > Key: CASSANDRA-15452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15452 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Local Write-Read Paths, Local/Compaction > Reporter: Jon Haddad > Priority: Normal > Attachments: everyfs.txt, results.txt, sequential.fio > > > On read heavy workloads Cassandra performs much better when using a low read > ahead setting. In my tests I've seen an 5x improvement in throughput and > more than a 50% reduction in latency. However, I've also observed that it > can have a negative impact on compaction and streaming throughput. It > especially negatively impacts cloud environments where small reads incur high > costs in IOPS due to tiny requests. > # We should investigate using POSIX_FADV_DONTNEED on files we're compacting > to see if we can improve performance and reduce page faults. > # This should be combined with an internal read ahead style buffer that > Cassandra manages, similar to a BufferedInputStream but with our own > machinery. This buffer should read fairly large blocks of data off disk at > at time. EBS, for example, allows 1 IOP to be up to 256KB. A considerable > amount of time is spent in blocking I/O during compaction and streaming. > Reducing the frequency we read from disk should speed up all sequential I/O > operations. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org