[ https://issues.apache.org/jira/browse/CASSANDRA-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000258#comment-14000258 ]
Paulo Ricardo Motta Gomes commented on CASSANDRA-6563: ------------------------------------------------------ Below I will present some live cluster analysis about 10 days after deploying the original patch that entirely removes the check for range overlap in worthDroppingTombstones(). *Analysis Description* In our dataset we use both LCS and STCS, but most of the CFs are STCS. A significant portion of our dataset is comprised of append-only TTL-ed data, so a good match for tombstone compaction. Most of our large CFs with high droppable tombstone ratio use STCS, but there are a few that use LCS that also benefited from the patch. I deployed the patch in 2 different ranges with similar results. The metrics were collected between 1st of May and 16 of May, the nodes were patched on the 7th of May. Used cassandra version was 1.2.16. In the analysis I compare the total space used (Cassandra Load), tombstone Ratio, disk utilization (system disk xvbd util), total bytes compacted and system load (linux cpu). For the last three metrics I also calculate the integral of the metric to make it easier to compare the total amount during the period. *Analysis* Graphs: https://issues.apache.org/jira/secure/attachment/12645241/patch-v1-range1.png Each graph compares the metrics of the patched node with it's previous neighbor and next neighbor, no VNODES is used. So, the first row in the figure is node N-1, the second row is node N (the patched node, marked with asterisk), and the third row is node N+1. * *Cassandra load*: In the patched node, it's possible to see a sudden decrease of 7% of disk space when the patch was applied, due to the execution of single SSTable compactions. The growth rate of disk usage is also decreased after the patch, since tombstone are cleared more often. In the whole period, there was a 1.2% disk space increase in the patched node, against about 10% growth on the unpatched nodes. * *Tombstone ratio*: After the patch is applied, it's possible to see a decrease in the droppable tombstone ratio, that revolves around the default level of 20% after that. The droppable tombstone ratio of unpatched nodes remains high for most CFs, what indicates that tombstone compactions are not being triggered at all. * *Disk utilization*: it's not possible to detect any change in the disk utilization pattern after the patch is applied, what might indicate the I/O is not affected by the patch, at least for our mixed dataset. I double checked the IOPS graph for the period and there was not even a slight sign of change in the I/O pattern after the patch was applied. (https://issues.apache.org/jira/secure/attachment/12645312/patch-v1-iostat.png) * *Total Bytes compacted*: The number of compacted bytes in the patched node was about 17% higher in the period. About 7% due to the initial tombstones that were cleared and more 7% due to cleared tombstones after the patch was applied (the difference between the 2 nodes sizes). The remaining 3% can be attributed to unnecessary compactions + normal variations because of different node ranges. * *System CPU Load*: Was not affected by the patch. *Alternative Patch* I implemented another version of the patch (v2) as suggested by [~krummas], that instead of dropping the overlap check entirely, it only performs the check for SSTables containing rows with smaller timestamp than the candidate SSTable (https://issues.apache.org/jira/secure/attachment/12645316/1.2.16-CASSANDRA-6563-v2.txt). One week ago I deployed this alternative patch on 2 of our production nodes, and unfortunately loosing the checks did not achieve significant results. I added some debugging log to the code and what I verified is that despite reducing the number of sstables to compare with, even if only one SSTable has a column with an equal or lower timestamp to the candidate SSTable, the token ranges of these sstables always overlap because of the Random Partitioner. So, this supports the claim that even with loosen checks, the single-sstable tombstone compaction is almost never being triggered. At least on the use cases that could benefit from it. The graphs for the alternative patch analysis can be found here: https://issues.apache.org/jira/secure/attachment/12645240/patch-v2-range3.png > TTL histogram compactions not triggered at high "Estimated droppable > tombstones" rate > ------------------------------------------------------------------------------------- > > Key: CASSANDRA-6563 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6563 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 1.2.12ish > Reporter: Chris Burroughs > Assignee: Paulo Ricardo Motta Gomes > Fix For: 1.2.17, 2.0.8 > > Attachments: 1.2.16-CASSANDRA-6563-v2.txt, 1.2.16-CASSANDRA-6563.txt, > 2.0.7-CASSANDRA-6563.txt, patch-v1-iostat.png, patch-v1-range1.png, > patch-v2-range3.png, patched-droppadble-ratio.png, patched-storage-load.png, > patched1-compacted-bytes.png, patched2-compacted-bytes.png, > unpatched-droppable-ratio.png, unpatched-storage-load.png, > unpatched1-compacted-bytes.png, unpatched2-compacted-bytes.png > > > I have several column families in a largish cluster where virtually all > columns are written with a (usually the same) TTL. My understanding of > CASSANDRA-3442 is that sstables that have a high ( > 20%) estimated > percentage of droppable tombstones should be individually compacted. This > does not appear to be occurring with size tired compaction. > Example from one node: > {noformat} > $ ll /data/sstables/data/ks/Cf/*Data.db > -rw-rw-r-- 31 cassandra cassandra 26651211757 Nov 26 22:59 > /data/sstables/data/ks/Cf/ks-Cf-ic-295562-Data.db > -rw-rw-r-- 31 cassandra cassandra 6272641818 Nov 27 02:51 > /data/sstables/data/ks/Cf/ks-Cf-ic-296121-Data.db > -rw-rw-r-- 31 cassandra cassandra 1814691996 Dec 4 21:50 > /data/sstables/data/ks/Cf/ks-Cf-ic-320449-Data.db > -rw-rw-r-- 30 cassandra cassandra 10909061157 Dec 11 17:31 > /data/sstables/data/ks/Cf/ks-Cf-ic-340318-Data.db > -rw-rw-r-- 29 cassandra cassandra 459508942 Dec 12 10:37 > /data/sstables/data/ks/Cf/ks-Cf-ic-342259-Data.db > -rw-rw-r-- 1 cassandra cassandra 336908 Dec 12 12:03 > /data/sstables/data/ks/Cf/ks-Cf-ic-342307-Data.db > -rw-rw-r-- 1 cassandra cassandra 2063935 Dec 12 12:03 > /data/sstables/data/ks/Cf/ks-Cf-ic-342309-Data.db > -rw-rw-r-- 1 cassandra cassandra 409 Dec 12 12:03 > /data/sstables/data/ks/Cf/ks-Cf-ic-342314-Data.db > -rw-rw-r-- 1 cassandra cassandra 31180007 Dec 12 12:03 > /data/sstables/data/ks/Cf/ks-Cf-ic-342319-Data.db > -rw-rw-r-- 1 cassandra cassandra 2398345 Dec 12 12:03 > /data/sstables/data/ks/Cf/ks-Cf-ic-342322-Data.db > -rw-rw-r-- 1 cassandra cassandra 21095 Dec 12 12:03 > /data/sstables/data/ks/Cf/ks-Cf-ic-342331-Data.db > -rw-rw-r-- 1 cassandra cassandra 81454 Dec 12 12:03 > /data/sstables/data/ks/Cf/ks-Cf-ic-342335-Data.db > -rw-rw-r-- 1 cassandra cassandra 1063718 Dec 12 12:03 > /data/sstables/data/ks/Cf/ks-Cf-ic-342339-Data.db > -rw-rw-r-- 1 cassandra cassandra 127004 Dec 12 12:03 > /data/sstables/data/ks/Cf/ks-Cf-ic-342344-Data.db > -rw-rw-r-- 1 cassandra cassandra 146785 Dec 12 12:03 > /data/sstables/data/ks/Cf/ks-Cf-ic-342346-Data.db > -rw-rw-r-- 1 cassandra cassandra 697338 Dec 12 12:03 > /data/sstables/data/ks/Cf/ks-Cf-ic-342351-Data.db > -rw-rw-r-- 1 cassandra cassandra 3921428 Dec 12 12:03 > /data/sstables/data/ks/Cf/ks-Cf-ic-342367-Data.db > -rw-rw-r-- 1 cassandra cassandra 240332 Dec 12 12:03 > /data/sstables/data/ks/Cf/ks-Cf-ic-342370-Data.db > -rw-rw-r-- 1 cassandra cassandra 45669 Dec 12 12:03 > /data/sstables/data/ks/Cf/ks-Cf-ic-342374-Data.db > -rw-rw-r-- 1 cassandra cassandra 53127549 Dec 12 12:03 > /data/sstables/data/ks/Cf/ks-Cf-ic-342375-Data.db > -rw-rw-r-- 16 cassandra cassandra 12466853166 Dec 25 22:40 > /data/sstables/data/ks/Cf/ks-Cf-ic-396473-Data.db > -rw-rw-r-- 12 cassandra cassandra 3903237198 Dec 29 19:42 > /data/sstables/data/ks/Cf/ks-Cf-ic-408926-Data.db > -rw-rw-r-- 7 cassandra cassandra 3692260987 Jan 3 08:25 > /data/sstables/data/ks/Cf/ks-Cf-ic-427733-Data.db > -rw-rw-r-- 4 cassandra cassandra 3971403602 Jan 6 20:50 > /data/sstables/data/ks/Cf/ks-Cf-ic-437537-Data.db > -rw-rw-r-- 3 cassandra cassandra 1007832224 Jan 7 15:19 > /data/sstables/data/ks/Cf/ks-Cf-ic-440331-Data.db > -rw-rw-r-- 2 cassandra cassandra 896132537 Jan 8 11:05 > /data/sstables/data/ks/Cf/ks-Cf-ic-447740-Data.db > -rw-rw-r-- 1 cassandra cassandra 963039096 Jan 9 04:59 > /data/sstables/data/ks/Cf/ks-Cf-ic-449425-Data.db > -rw-rw-r-- 1 cassandra cassandra 232168351 Jan 9 10:14 > /data/sstables/data/ks/Cf/ks-Cf-ic-450287-Data.db > -rw-rw-r-- 1 cassandra cassandra 73126319 Jan 9 11:28 > /data/sstables/data/ks/Cf/ks-Cf-ic-450307-Data.db > -rw-rw-r-- 1 cassandra cassandra 40921916 Jan 9 12:08 > /data/sstables/data/ks/Cf/ks-Cf-ic-450336-Data.db > -rw-rw-r-- 1 cassandra cassandra 60881193 Jan 9 12:23 > /data/sstables/data/ks/Cf/ks-Cf-ic-450341-Data.db > -rw-rw-r-- 1 cassandra cassandra 4746 Jan 9 12:23 > /data/sstables/data/ks/Cf/ks-Cf-ic-450350-Data.db > -rw-rw-r-- 1 cassandra cassandra 5769 Jan 9 12:23 > /data/sstables/data/ks/Cf/ks-Cf-ic-450352-Data.db > {noformat} > {noformat} > 295562: Estimated droppable tombstones: 0.899035828535183 > 296121: Estimated droppable tombstones: 0.9135080937806197 > 320449: Estimated droppable tombstones: 0.8916766879896414 > {noformat} > I've checked in on this example node several times and compactionstats has > not shown any other activity that would be blocking the tombstone based > compaction. The TTL is in the 15-20 day range so an sstable from November > should have had ample opportunities by January. -- This message was sent by Atlassian JIRA (v6.2#6252)