[ 
https://issues.apache.org/jira/browse/CASSANDRA-19607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-19607:
-----------------------------------
    Resolution: Fixed
        Status: Resolved  (was: Triage Needed)

Technically not a duplicate, but the same solution will resolve this problem as 
well.

> Compaction double reads every chunk
> -----------------------------------
>
>                 Key: CASSANDRA-19607
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19607
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jon Haddad
>            Priority: Normal
>         Attachments: channelproxy-read-da8.txt, channelproxy-read-da9-rev.html
>
>
> I was taking a look at the I/O operations performed during compaction and 
> noticed it looks like we're double reading every chunk of the filesystem, 
> meaning we perform 2x the system calls that we need to.  The second read will 
> come from page cache, but this isn't free, so we should try to avoid it.  
> Here's the steps to reproduce.
> First, wrote a small amount of data to a table and flush.  I have a single 
> SSTable that's 151KB here:
> {noformat}
> -rw-r--r-- 1 cassandra  127 May  1 21:00 da-8-bti-CompressionInfo.db
> -rw-r--r-- 1 cassandra 151K May  1 21:00 da-8-bti-Data.db
> -rw-r--r-- 1 cassandra   10 May  1 21:00 da-8-bti-Digest.crc32
> -rw-r--r-- 1 cassandra  136 May  1 21:00 da-8-bti-Filter.db
> -rw-r--r-- 1 cassandra  829 May  1 21:00 da-8-bti-Partitions.db
> -rw-r--r-- 1 cassandra    0 May  1 21:00 da-8-bti-Rows.db
> -rw-r--r-- 1 cassandra 5.0K May  1 21:00 da-8-bti-Statistics.db
> -rw-r--r-- 1 cassandra   94 May  1 21:00 da-8-bti-TOC.txt
> {noformat}
> Then watch all reads by doing the following:
> {noformat}
> $ xfsslower -p $PID 0
> {noformat}
> Next, start a event profiling session on ChannelProxy.read by doing the 
> following with async-profiler:
> {noformat}
> $ asprof -e org.apache.cassandra.io.util.ChannelProxy.read -d 10 
> $(cassandra-pid) -f /mnt/cassandra/artifacts/channelproxy-read.txt
> {noformat}
> Then I started a compaction on the table which lasted roughly a second.
> I took the log of filesystem operations and pulled out all the accesses to 
> `da-8-bti-Data.db` using ripgrep:
> {noformat}
> $ rg da-8-bti-Data.db artifacts/cassandra1/xfsslower-da8.txt
> {noformat}
> The headings are:
> {noformat}
> TIME     COMM           PID    T BYTES   OFF_KB   LAT(ms) FILENAME
> {noformat}
> The data:
> {noformat}
> 27:21:07:05 CompactionExec 30550  R 16387   0           0.02 da-8-bti-Data.db
> 28:21:07:05 CompactionExec 30550  R 16387   0           0.00 da-8-bti-Data.db
> 40:21:07:05 CompactionExec 30550  R 16436   16          0.01 da-8-bti-Data.db
> 41:21:07:05 CompactionExec 30550  R 16436   16          0.01 da-8-bti-Data.db
> 44:21:07:05 CompactionExec 30550  R 16419   32          0.01 da-8-bti-Data.db
> 45:21:07:05 CompactionExec 30550  R 16419   32          0.01 da-8-bti-Data.db
> 48:21:07:05 CompactionExec 30550  R 16441   48          0.01 da-8-bti-Data.db
> 49:21:07:05 CompactionExec 30550  R 16441   48          0.00 da-8-bti-Data.db
> 52:21:07:05 CompactionExec 30550  R 16421   64          0.01 da-8-bti-Data.db
> 53:21:07:05 CompactionExec 30550  R 16421   64          0.01 da-8-bti-Data.db
> 56:21:07:05 CompactionExec 30550  R 16415   80          0.01 da-8-bti-Data.db
> 57:21:07:05 CompactionExec 30550  R 16415   80          0.01 da-8-bti-Data.db
> 60:21:07:05 CompactionExec 30550  R 16439   96          0.01 da-8-bti-Data.db
> 61:21:07:05 CompactionExec 30550  R 16439   96          0.00 da-8-bti-Data.db
> 64:21:07:05 CompactionExec 30550  R 16390   112         0.01 da-8-bti-Data.db
> 65:21:07:05 CompactionExec 30550  R 16390   112         0.00 da-8-bti-Data.db
> 68:21:07:05 CompactionExec 30550  R 16384   128         0.01 da-8-bti-Data.db
> 69:21:07:05 CompactionExec 30550  R 16384   128         0.01 da-8-bti-Data.db
> 72:21:07:05 CompactionExec 30550  R 6019    144         0.00 da-8-bti-Data.db
> 73:21:07:05 CompactionExec 30550  R 6019    144         0.00 da-8-bti-Data.db
> {noformat}
> There are 20 IO operations reported here, each of which occurs twice.  I was 
> originally thinking this might be a bug with the reporting, but other files 
> operations are listed only once.  For example:
> {noformat}
> 21:07:05 CompactionExec 30550  W 16386   112         0.09 da-9-bti-Data.db
> 21:07:05 CompactionExec 30550  W 4       128         0.00 da-9-bti-Data.db
> {noformat}
> Looking at the output, the last read is approximately 6KB and starts at the 
> 144KB offset which matches up to the total length of the file.
> The profiler output indicates there's 2 code paths that arrive at the FS 
> operation 9 times.
> I've also attached the profiler output, which shows 22 calls to 
> org.apache.cassandra.io.util.ChannelProxy.read.  Two of these originate in 
> the Stats file, leaving 9 chunks in this file, read twice, plus 2 additional 
> calls.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to