[ 
https://issues.apache.org/jira/browse/CASSANDRA-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000258#comment-14000258
 ] 

Paulo Ricardo Motta Gomes commented on CASSANDRA-6563:
------------------------------------------------------

Below I will present some live cluster analysis about 10 days after deploying 
the original patch that entirely removes the check for range overlap in 
worthDroppingTombstones(). 

*Analysis Description*

In our dataset we use both LCS and STCS, but most of the CFs are STCS. A 
significant portion of our dataset is comprised of append-only TTL-ed data, so 
a good match for tombstone compaction. Most of our large CFs with high 
droppable tombstone ratio use STCS, but there are a few that use LCS that also 
benefited from the patch. 

I deployed the patch in 2 different ranges with similar results. The metrics 
were collected between 1st of May and 16 of May, the nodes were patched on the 
7th of May. Used cassandra version was 1.2.16.
  
In the analysis I compare the total space used (Cassandra Load), tombstone 
Ratio, disk utilization (system disk xvbd util), total bytes compacted and 
system load (linux cpu). For the last three metrics I also calculate the 
integral of the metric to make it easier to compare the total amount during the 
period.

*Analysis*

Graphs: 
https://issues.apache.org/jira/secure/attachment/12645241/patch-v1-range1.png

Each graph compares the metrics of the patched node with it's previous neighbor 
and next neighbor, no VNODES is used. So, the first row in the figure is node 
N-1, the second row is node N (the patched node, marked with asterisk), and the 
third row is node N+1.

* *Cassandra load*: In the patched node, it's possible to see a sudden decrease 
of 7% of disk space when the patch was applied, due to the execution of single 
SSTable compactions. The growth rate of disk usage is also decreased after the 
patch, since tombstone are cleared more often. In the whole period, there was a 
1.2% disk space increase in the patched node, against about 10% growth on the 
unpatched nodes.

* *Tombstone ratio*: After the patch is applied, it's possible to see a 
decrease in the droppable tombstone ratio, that revolves around the default 
level of 20% after that. The droppable tombstone ratio of unpatched nodes 
remains high for most CFs, what indicates that tombstone compactions are not 
being triggered at all.

* *Disk utilization*: it's not possible to detect any change in the disk 
utilization pattern after the patch is applied, what might indicate the I/O is 
not affected by the patch, at least for our mixed dataset. I double checked the 
IOPS graph for the period and there was not even a slight sign of change in the 
I/O pattern after the patch was applied. 
(https://issues.apache.org/jira/secure/attachment/12645312/patch-v1-iostat.png)

* *Total Bytes compacted*: The number of compacted bytes in the patched node 
was about 17% higher in the period. About 7% due to the initial tombstones that 
were cleared and more 7% due to cleared tombstones after the patch was applied 
(the difference between the 2 nodes sizes). The remaining 3% can be attributed 
to unnecessary compactions + normal variations because of different node ranges.

* *System CPU Load*: Was not affected by the patch.

*Alternative Patch*

I implemented another version of the patch (v2) as suggested by [~krummas], 
that instead of dropping the overlap check entirely, it only performs the check 
for SSTables containing rows with smaller timestamp than the candidate SSTable 
(https://issues.apache.org/jira/secure/attachment/12645316/1.2.16-CASSANDRA-6563-v2.txt).
 

One week ago I deployed this alternative patch on 2 of our production nodes, 
and unfortunately loosing the checks did not achieve significant results. I 
added some debugging log to the code and what I verified is that despite 
reducing the number of sstables to compare with, even if only one SSTable has a 
column with an equal or lower timestamp to the candidate SSTable, the token 
ranges of these sstables always overlap because of the Random Partitioner. So, 
this supports the claim that even with loosen checks, the single-sstable 
tombstone compaction is almost never being triggered. At least on the use cases 
that could benefit from it.

The graphs for the alternative patch analysis can be found here: 
https://issues.apache.org/jira/secure/attachment/12645240/patch-v2-range3.png

> TTL histogram compactions not triggered at high "Estimated droppable 
> tombstones" rate
> -------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6563
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6563
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: 1.2.12ish
>            Reporter: Chris Burroughs
>            Assignee: Paulo Ricardo Motta Gomes
>             Fix For: 1.2.17, 2.0.8
>
>         Attachments: 1.2.16-CASSANDRA-6563-v2.txt, 1.2.16-CASSANDRA-6563.txt, 
> 2.0.7-CASSANDRA-6563.txt, patch-v1-iostat.png, patch-v1-range1.png, 
> patch-v2-range3.png, patched-droppadble-ratio.png, patched-storage-load.png, 
> patched1-compacted-bytes.png, patched2-compacted-bytes.png, 
> unpatched-droppable-ratio.png, unpatched-storage-load.png, 
> unpatched1-compacted-bytes.png, unpatched2-compacted-bytes.png
>
>
> I have several column families in a largish cluster where virtually all 
> columns are written with a (usually the same) TTL.  My understanding of 
> CASSANDRA-3442 is that sstables that have a high ( > 20%) estimated 
> percentage of droppable tombstones should be individually compacted.  This 
> does not appear to be occurring with size tired compaction.
> Example from one node:
> {noformat}
> $ ll /data/sstables/data/ks/Cf/*Data.db
> -rw-rw-r-- 31 cassandra cassandra 26651211757 Nov 26 22:59 
> /data/sstables/data/ks/Cf/ks-Cf-ic-295562-Data.db
> -rw-rw-r-- 31 cassandra cassandra  6272641818 Nov 27 02:51 
> /data/sstables/data/ks/Cf/ks-Cf-ic-296121-Data.db
> -rw-rw-r-- 31 cassandra cassandra  1814691996 Dec  4 21:50 
> /data/sstables/data/ks/Cf/ks-Cf-ic-320449-Data.db
> -rw-rw-r-- 30 cassandra cassandra 10909061157 Dec 11 17:31 
> /data/sstables/data/ks/Cf/ks-Cf-ic-340318-Data.db
> -rw-rw-r-- 29 cassandra cassandra   459508942 Dec 12 10:37 
> /data/sstables/data/ks/Cf/ks-Cf-ic-342259-Data.db
> -rw-rw-r--  1 cassandra cassandra      336908 Dec 12 12:03 
> /data/sstables/data/ks/Cf/ks-Cf-ic-342307-Data.db
> -rw-rw-r--  1 cassandra cassandra     2063935 Dec 12 12:03 
> /data/sstables/data/ks/Cf/ks-Cf-ic-342309-Data.db
> -rw-rw-r--  1 cassandra cassandra         409 Dec 12 12:03 
> /data/sstables/data/ks/Cf/ks-Cf-ic-342314-Data.db
> -rw-rw-r--  1 cassandra cassandra    31180007 Dec 12 12:03 
> /data/sstables/data/ks/Cf/ks-Cf-ic-342319-Data.db
> -rw-rw-r--  1 cassandra cassandra     2398345 Dec 12 12:03 
> /data/sstables/data/ks/Cf/ks-Cf-ic-342322-Data.db
> -rw-rw-r--  1 cassandra cassandra       21095 Dec 12 12:03 
> /data/sstables/data/ks/Cf/ks-Cf-ic-342331-Data.db
> -rw-rw-r--  1 cassandra cassandra       81454 Dec 12 12:03 
> /data/sstables/data/ks/Cf/ks-Cf-ic-342335-Data.db
> -rw-rw-r--  1 cassandra cassandra     1063718 Dec 12 12:03 
> /data/sstables/data/ks/Cf/ks-Cf-ic-342339-Data.db
> -rw-rw-r--  1 cassandra cassandra      127004 Dec 12 12:03 
> /data/sstables/data/ks/Cf/ks-Cf-ic-342344-Data.db
> -rw-rw-r--  1 cassandra cassandra      146785 Dec 12 12:03 
> /data/sstables/data/ks/Cf/ks-Cf-ic-342346-Data.db
> -rw-rw-r--  1 cassandra cassandra      697338 Dec 12 12:03 
> /data/sstables/data/ks/Cf/ks-Cf-ic-342351-Data.db
> -rw-rw-r--  1 cassandra cassandra     3921428 Dec 12 12:03 
> /data/sstables/data/ks/Cf/ks-Cf-ic-342367-Data.db
> -rw-rw-r--  1 cassandra cassandra      240332 Dec 12 12:03 
> /data/sstables/data/ks/Cf/ks-Cf-ic-342370-Data.db
> -rw-rw-r--  1 cassandra cassandra       45669 Dec 12 12:03 
> /data/sstables/data/ks/Cf/ks-Cf-ic-342374-Data.db
> -rw-rw-r--  1 cassandra cassandra    53127549 Dec 12 12:03 
> /data/sstables/data/ks/Cf/ks-Cf-ic-342375-Data.db
> -rw-rw-r-- 16 cassandra cassandra 12466853166 Dec 25 22:40 
> /data/sstables/data/ks/Cf/ks-Cf-ic-396473-Data.db
> -rw-rw-r-- 12 cassandra cassandra  3903237198 Dec 29 19:42 
> /data/sstables/data/ks/Cf/ks-Cf-ic-408926-Data.db
> -rw-rw-r--  7 cassandra cassandra  3692260987 Jan  3 08:25 
> /data/sstables/data/ks/Cf/ks-Cf-ic-427733-Data.db
> -rw-rw-r--  4 cassandra cassandra  3971403602 Jan  6 20:50 
> /data/sstables/data/ks/Cf/ks-Cf-ic-437537-Data.db
> -rw-rw-r--  3 cassandra cassandra  1007832224 Jan  7 15:19 
> /data/sstables/data/ks/Cf/ks-Cf-ic-440331-Data.db
> -rw-rw-r--  2 cassandra cassandra   896132537 Jan  8 11:05 
> /data/sstables/data/ks/Cf/ks-Cf-ic-447740-Data.db
> -rw-rw-r--  1 cassandra cassandra   963039096 Jan  9 04:59 
> /data/sstables/data/ks/Cf/ks-Cf-ic-449425-Data.db
> -rw-rw-r--  1 cassandra cassandra   232168351 Jan  9 10:14 
> /data/sstables/data/ks/Cf/ks-Cf-ic-450287-Data.db
> -rw-rw-r--  1 cassandra cassandra    73126319 Jan  9 11:28 
> /data/sstables/data/ks/Cf/ks-Cf-ic-450307-Data.db
> -rw-rw-r--  1 cassandra cassandra    40921916 Jan  9 12:08 
> /data/sstables/data/ks/Cf/ks-Cf-ic-450336-Data.db
> -rw-rw-r--  1 cassandra cassandra    60881193 Jan  9 12:23 
> /data/sstables/data/ks/Cf/ks-Cf-ic-450341-Data.db
> -rw-rw-r--  1 cassandra cassandra        4746 Jan  9 12:23 
> /data/sstables/data/ks/Cf/ks-Cf-ic-450350-Data.db
> -rw-rw-r--  1 cassandra cassandra        5769 Jan  9 12:23 
> /data/sstables/data/ks/Cf/ks-Cf-ic-450352-Data.db
> {noformat}
> {noformat}
> 295562: Estimated droppable tombstones: 0.899035828535183
> 296121: Estimated droppable tombstones: 0.9135080937806197
> 320449: Estimated droppable tombstones: 0.8916766879896414
> {noformat}
> I've checked in on this example node several times and compactionstats has 
> not shown any other activity that would be blocking the tombstone based 
> compaction.  The TTL is in the 15-20 day range so an sstable from November 
> should have had ample opportunities by January.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to