Jon Haddad created CASSANDRA-19607:
--------------------------------------

             Summary: Compaction double reads every chunk
                 Key: CASSANDRA-19607
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19607
             Project: Cassandra
          Issue Type: Bug
            Reporter: Jon Haddad
         Attachments: channelproxy-read-da8.txt, channelproxy-read-da9-rev.html

I was taking a look at the I/O operations performed during compaction and 
noticed it looks like we're double reading every chunk of the filesystem, 
meaning we perform 2x the system calls that we need to.  The second read will 
come from page cache, but this isn't free, so we should try to avoid it.  
Here's the steps to reproduce.

First, wrote a small amount of data to a table and flush.  I have a single 
SSTable that's 151KB here:

{noformat}
-rw-r--r-- 1 cassandra  127 May  1 21:00 da-8-bti-CompressionInfo.db
-rw-r--r-- 1 cassandra 151K May  1 21:00 da-8-bti-Data.db
-rw-r--r-- 1 cassandra   10 May  1 21:00 da-8-bti-Digest.crc32
-rw-r--r-- 1 cassandra  136 May  1 21:00 da-8-bti-Filter.db
-rw-r--r-- 1 cassandra  829 May  1 21:00 da-8-bti-Partitions.db
-rw-r--r-- 1 cassandra    0 May  1 21:00 da-8-bti-Rows.db
-rw-r--r-- 1 cassandra 5.0K May  1 21:00 da-8-bti-Statistics.db
-rw-r--r-- 1 cassandra   94 May  1 21:00 da-8-bti-TOC.txt
{noformat}

Then watch all reads by doing the following:

{noformat}
$ xfsslower -p $PID 0
{noformat}

Next, start a event profiling session on ChannelProxy.read by doing the 
following with async-profiler:

{noformat}
$ asprof -e org.apache.cassandra.io.util.ChannelProxy.read -d 10 
$(cassandra-pid) -f /mnt/cassandra/artifacts/channelproxy-read.txt
{noformat}

Then I started a compaction on the table which lasted roughly a second.

I took the log of filesystem operations and pulled out all the accesses to 
`da-8-bti-Data.db` using ripgrep:

{noformat}
$ rg da-8-bti-Data.db artifacts/cassandra1/xfsslower-da8.txt
{noformat}

The headings are:

{noformat}
TIME     COMM           PID    T BYTES   OFF_KB   LAT(ms) FILENAME
{noformat}

The data:

{noformat}
27:21:07:05 CompactionExec 30550  R 16387   0           0.02 da-8-bti-Data.db
28:21:07:05 CompactionExec 30550  R 16387   0           0.00 da-8-bti-Data.db
40:21:07:05 CompactionExec 30550  R 16436   16          0.01 da-8-bti-Data.db
41:21:07:05 CompactionExec 30550  R 16436   16          0.01 da-8-bti-Data.db
44:21:07:05 CompactionExec 30550  R 16419   32          0.01 da-8-bti-Data.db
45:21:07:05 CompactionExec 30550  R 16419   32          0.01 da-8-bti-Data.db
48:21:07:05 CompactionExec 30550  R 16441   48          0.01 da-8-bti-Data.db
49:21:07:05 CompactionExec 30550  R 16441   48          0.00 da-8-bti-Data.db
52:21:07:05 CompactionExec 30550  R 16421   64          0.01 da-8-bti-Data.db
53:21:07:05 CompactionExec 30550  R 16421   64          0.01 da-8-bti-Data.db
56:21:07:05 CompactionExec 30550  R 16415   80          0.01 da-8-bti-Data.db
57:21:07:05 CompactionExec 30550  R 16415   80          0.01 da-8-bti-Data.db
60:21:07:05 CompactionExec 30550  R 16439   96          0.01 da-8-bti-Data.db
61:21:07:05 CompactionExec 30550  R 16439   96          0.00 da-8-bti-Data.db
64:21:07:05 CompactionExec 30550  R 16390   112         0.01 da-8-bti-Data.db
65:21:07:05 CompactionExec 30550  R 16390   112         0.00 da-8-bti-Data.db
68:21:07:05 CompactionExec 30550  R 16384   128         0.01 da-8-bti-Data.db
69:21:07:05 CompactionExec 30550  R 16384   128         0.01 da-8-bti-Data.db
72:21:07:05 CompactionExec 30550  R 6019    144         0.00 da-8-bti-Data.db
73:21:07:05 CompactionExec 30550  R 6019    144         0.00 da-8-bti-Data.db
{noformat}

There are 20 IO operations reported here, each of which occurs twice.  I was 
originally thinking this might be a bug with the reporting, but other files 
operations are listed only once.  For example:

{noformat}
21:07:05 CompactionExec 30550  W 16386   112         0.09 da-9-bti-Data.db
21:07:05 CompactionExec 30550  W 4       128         0.00 da-9-bti-Data.db
{noformat}

Looking at the output, the last read is approximately 6KB and starts at the 
144KB offset which matches up to the total length of the file.

The profiler output indicates there's 2 code paths that arrive at the FS 
operation 9 times.

I've also attached the profiler output, which shows 22 calls to 
org.apache.cassandra.io.util.ChannelProxy.read.  Two of these originate in the 
Stats file, leaving 9 chunks in this file, read twice, plus 2 additional calls. 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to