[ 
https://issues.apache.org/jira/browse/CASSANDRA-7331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186945#comment-14186945
 ] 

Jonathan Ellis commented on CASSANDRA-7331:
-------------------------------------------

Eventually we will want to perform tombstone-compaction on all the candidates 
(that have large amounts of tombstones) even if they have a low value, because 
they just haven't been compacted with their peers yet.

If I understand correctly, your goal here is, given a bunch of candidates to 
perform tombstone-compaction on, let's order them by which is likely to clean 
up the most.  Right?

If that's the case, I don't think it's worth the complexity, since it's only 
really beneficial if you're super behind on compaction with no hope of ever 
catching up.  And the right fix there is to add more capacity or make 
compaction faster in some form or another.

> Improve Droppable Tombstone compaction
> --------------------------------------
>
>                 Key: CASSANDRA-7331
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7331
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: sankalp kohli
>            Priority: Minor
>              Labels: compaction
>
> I was thinking about this idea so creating a JIRA to discuss it. 
> Currently we do compaction for stables which have more than configurable 
> number of droppable tombstones. 
> Also there is another JIRA CASSANDRA-7019 to do compactions involving 
> multiple stables from different levels which will be triggered based of same 
> threshold. 
> One of the areas of improvement here to pick better candidates will be to 
> find out if a tombstone can actually get rid of data in other stables. 
> We can add a byte to tombstone to keep track of whether it has knocked off 
> the actual data(for which it is there) or not. 
> All tombstones will start out with 0 as its value. When it compacts with 
> other stables and causes data to be deleted, it will be incremented. 
> For cases where there are multiple updates and then a delete, this value can 
> be more than 1 depending on how many updates came in before delete. 
> If we have this, by looking at these numbers in tombstones, we can find a 
> stable which by compacting, we will get rid of maximum data. We can also add 
> a global number per stable which sums up these numbers. 
> I am not sure how this will work with range tombstones and whether this will 
> be useful.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to