Jon Haddad created CASSANDRA-19607: -------------------------------------- Summary: Compaction double reads every chunk Key: CASSANDRA-19607 URL: https://issues.apache.org/jira/browse/CASSANDRA-19607 Project: Cassandra Issue Type: Bug Reporter: Jon Haddad Attachments: channelproxy-read-da8.txt, channelproxy-read-da9-rev.html
I was taking a look at the I/O operations performed during compaction and noticed it looks like we're double reading every chunk of the filesystem, meaning we perform 2x the system calls that we need to. The second read will come from page cache, but this isn't free, so we should try to avoid it. Here's the steps to reproduce. First, wrote a small amount of data to a table and flush. I have a single SSTable that's 151KB here: {noformat} -rw-r--r-- 1 cassandra 127 May 1 21:00 da-8-bti-CompressionInfo.db -rw-r--r-- 1 cassandra 151K May 1 21:00 da-8-bti-Data.db -rw-r--r-- 1 cassandra 10 May 1 21:00 da-8-bti-Digest.crc32 -rw-r--r-- 1 cassandra 136 May 1 21:00 da-8-bti-Filter.db -rw-r--r-- 1 cassandra 829 May 1 21:00 da-8-bti-Partitions.db -rw-r--r-- 1 cassandra 0 May 1 21:00 da-8-bti-Rows.db -rw-r--r-- 1 cassandra 5.0K May 1 21:00 da-8-bti-Statistics.db -rw-r--r-- 1 cassandra 94 May 1 21:00 da-8-bti-TOC.txt {noformat} Then watch all reads by doing the following: {noformat} $ xfsslower -p $PID 0 {noformat} Next, start a event profiling session on ChannelProxy.read by doing the following with async-profiler: {noformat} $ asprof -e org.apache.cassandra.io.util.ChannelProxy.read -d 10 $(cassandra-pid) -f /mnt/cassandra/artifacts/channelproxy-read.txt {noformat} Then I started a compaction on the table which lasted roughly a second. I took the log of filesystem operations and pulled out all the accesses to `da-8-bti-Data.db` using ripgrep: {noformat} $ rg da-8-bti-Data.db artifacts/cassandra1/xfsslower-da8.txt {noformat} The headings are: {noformat} TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME {noformat} The data: {noformat} 27:21:07:05 CompactionExec 30550 R 16387 0 0.02 da-8-bti-Data.db 28:21:07:05 CompactionExec 30550 R 16387 0 0.00 da-8-bti-Data.db 40:21:07:05 CompactionExec 30550 R 16436 16 0.01 da-8-bti-Data.db 41:21:07:05 CompactionExec 30550 R 16436 16 0.01 da-8-bti-Data.db 44:21:07:05 CompactionExec 30550 R 16419 32 0.01 da-8-bti-Data.db 45:21:07:05 CompactionExec 30550 R 16419 32 0.01 da-8-bti-Data.db 48:21:07:05 CompactionExec 30550 R 16441 48 0.01 da-8-bti-Data.db 49:21:07:05 CompactionExec 30550 R 16441 48 0.00 da-8-bti-Data.db 52:21:07:05 CompactionExec 30550 R 16421 64 0.01 da-8-bti-Data.db 53:21:07:05 CompactionExec 30550 R 16421 64 0.01 da-8-bti-Data.db 56:21:07:05 CompactionExec 30550 R 16415 80 0.01 da-8-bti-Data.db 57:21:07:05 CompactionExec 30550 R 16415 80 0.01 da-8-bti-Data.db 60:21:07:05 CompactionExec 30550 R 16439 96 0.01 da-8-bti-Data.db 61:21:07:05 CompactionExec 30550 R 16439 96 0.00 da-8-bti-Data.db 64:21:07:05 CompactionExec 30550 R 16390 112 0.01 da-8-bti-Data.db 65:21:07:05 CompactionExec 30550 R 16390 112 0.00 da-8-bti-Data.db 68:21:07:05 CompactionExec 30550 R 16384 128 0.01 da-8-bti-Data.db 69:21:07:05 CompactionExec 30550 R 16384 128 0.01 da-8-bti-Data.db 72:21:07:05 CompactionExec 30550 R 6019 144 0.00 da-8-bti-Data.db 73:21:07:05 CompactionExec 30550 R 6019 144 0.00 da-8-bti-Data.db {noformat} There are 20 IO operations reported here, each of which occurs twice. I was originally thinking this might be a bug with the reporting, but other files operations are listed only once. For example: {noformat} 21:07:05 CompactionExec 30550 W 16386 112 0.09 da-9-bti-Data.db 21:07:05 CompactionExec 30550 W 4 128 0.00 da-9-bti-Data.db {noformat} Looking at the output, the last read is approximately 6KB and starts at the 144KB offset which matches up to the total length of the file. The profiler output indicates there's 2 code paths that arrive at the FS operation 9 times. I've also attached the profiler output, which shows 22 calls to org.apache.cassandra.io.util.ChannelProxy.read. Two of these originate in the Stats file, leaving 9 chunks in this file, read twice, plus 2 additional calls. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org