[ https://issues.apache.org/jira/browse/CASSANDRA-19607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jon Haddad updated CASSANDRA-19607: ----------------------------------- Resolution: Fixed Status: Resolved (was: Triage Needed) Technically not a duplicate, but the same solution will resolve this problem as well. > Compaction double reads every chunk > ----------------------------------- > > Key: CASSANDRA-19607 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19607 > Project: Cassandra > Issue Type: Bug > Reporter: Jon Haddad > Priority: Normal > Attachments: channelproxy-read-da8.txt, channelproxy-read-da9-rev.html > > > I was taking a look at the I/O operations performed during compaction and > noticed it looks like we're double reading every chunk of the filesystem, > meaning we perform 2x the system calls that we need to. The second read will > come from page cache, but this isn't free, so we should try to avoid it. > Here's the steps to reproduce. > First, wrote a small amount of data to a table and flush. I have a single > SSTable that's 151KB here: > {noformat} > -rw-r--r-- 1 cassandra 127 May 1 21:00 da-8-bti-CompressionInfo.db > -rw-r--r-- 1 cassandra 151K May 1 21:00 da-8-bti-Data.db > -rw-r--r-- 1 cassandra 10 May 1 21:00 da-8-bti-Digest.crc32 > -rw-r--r-- 1 cassandra 136 May 1 21:00 da-8-bti-Filter.db > -rw-r--r-- 1 cassandra 829 May 1 21:00 da-8-bti-Partitions.db > -rw-r--r-- 1 cassandra 0 May 1 21:00 da-8-bti-Rows.db > -rw-r--r-- 1 cassandra 5.0K May 1 21:00 da-8-bti-Statistics.db > -rw-r--r-- 1 cassandra 94 May 1 21:00 da-8-bti-TOC.txt > {noformat} > Then watch all reads by doing the following: > {noformat} > $ xfsslower -p $PID 0 > {noformat} > Next, start a event profiling session on ChannelProxy.read by doing the > following with async-profiler: > {noformat} > $ asprof -e org.apache.cassandra.io.util.ChannelProxy.read -d 10 > $(cassandra-pid) -f /mnt/cassandra/artifacts/channelproxy-read.txt > {noformat} > Then I started a compaction on the table which lasted roughly a second. > I took the log of filesystem operations and pulled out all the accesses to > `da-8-bti-Data.db` using ripgrep: > {noformat} > $ rg da-8-bti-Data.db artifacts/cassandra1/xfsslower-da8.txt > {noformat} > The headings are: > {noformat} > TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME > {noformat} > The data: > {noformat} > 27:21:07:05 CompactionExec 30550 R 16387 0 0.02 da-8-bti-Data.db > 28:21:07:05 CompactionExec 30550 R 16387 0 0.00 da-8-bti-Data.db > 40:21:07:05 CompactionExec 30550 R 16436 16 0.01 da-8-bti-Data.db > 41:21:07:05 CompactionExec 30550 R 16436 16 0.01 da-8-bti-Data.db > 44:21:07:05 CompactionExec 30550 R 16419 32 0.01 da-8-bti-Data.db > 45:21:07:05 CompactionExec 30550 R 16419 32 0.01 da-8-bti-Data.db > 48:21:07:05 CompactionExec 30550 R 16441 48 0.01 da-8-bti-Data.db > 49:21:07:05 CompactionExec 30550 R 16441 48 0.00 da-8-bti-Data.db > 52:21:07:05 CompactionExec 30550 R 16421 64 0.01 da-8-bti-Data.db > 53:21:07:05 CompactionExec 30550 R 16421 64 0.01 da-8-bti-Data.db > 56:21:07:05 CompactionExec 30550 R 16415 80 0.01 da-8-bti-Data.db > 57:21:07:05 CompactionExec 30550 R 16415 80 0.01 da-8-bti-Data.db > 60:21:07:05 CompactionExec 30550 R 16439 96 0.01 da-8-bti-Data.db > 61:21:07:05 CompactionExec 30550 R 16439 96 0.00 da-8-bti-Data.db > 64:21:07:05 CompactionExec 30550 R 16390 112 0.01 da-8-bti-Data.db > 65:21:07:05 CompactionExec 30550 R 16390 112 0.00 da-8-bti-Data.db > 68:21:07:05 CompactionExec 30550 R 16384 128 0.01 da-8-bti-Data.db > 69:21:07:05 CompactionExec 30550 R 16384 128 0.01 da-8-bti-Data.db > 72:21:07:05 CompactionExec 30550 R 6019 144 0.00 da-8-bti-Data.db > 73:21:07:05 CompactionExec 30550 R 6019 144 0.00 da-8-bti-Data.db > {noformat} > There are 20 IO operations reported here, each of which occurs twice. I was > originally thinking this might be a bug with the reporting, but other files > operations are listed only once. For example: > {noformat} > 21:07:05 CompactionExec 30550 W 16386 112 0.09 da-9-bti-Data.db > 21:07:05 CompactionExec 30550 W 4 128 0.00 da-9-bti-Data.db > {noformat} > Looking at the output, the last read is approximately 6KB and starts at the > 144KB offset which matches up to the total length of the file. > The profiler output indicates there's 2 code paths that arrive at the FS > operation 9 times. > I've also attached the profiler output, which shows 22 calls to > org.apache.cassandra.io.util.ChannelProxy.read. Two of these originate in > the Stats file, leaving 9 chunks in this file, read twice, plus 2 additional > calls. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org