[jira] [Updated] (CASSANDRA-8547) Make RangeTombstone.Tracker.isDeleted() faster

Dominic Letz (JIRA) Wed, 31 Dec 2014 00:02:44 -0800

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dominic Letz updated CASSANDRA-8547:
------------------------------------
    Attachment: cassandra-2.1-8547.txt
                cassandra-2.0.11-8547.txt

The fix for this turned out to be much more trivial then I thought. I've 
attached it for both 2.0.11 and 2.1

There seems to be a silent assumption in the RangeTombstone.Tracker that ranges 
with the same .max values to do not repeat e.g. the existing maxOrderingSet is 
only storing one value and it's maxOrderingSet.add(t) will discarded but there 
is no check for that case.
I've left the logic like it is but feel there might be an issue around second 
deletes for the same range value.

> Make RangeTombstone.Tracker.isDeleted() faster
> ----------------------------------------------
>
>                 Key: CASSANDRA-8547
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8547
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: 2.0.11
>            Reporter: Dominic Letz
>             Fix For: 2.1.3
>
>         Attachments: cassandra-2.0.11-8547.txt, cassandra-2.1-8547.txt, 
> rangetombstone.tracker.txt
>
>
> During compaction and repairs with many tombstones an exorbitant amount of 
> time is spend in RangeTombstone.Tracker.isDeleted().
> The amount of time spend there can be so big that compactions and repairs 
> look "stalled" and the time remaining time estimated frozen at the same value 
> for days.
> Using visualvm I've been sample profiling the code during execution and both 
> in Compaction as well as during repairs found this. (point in time backtraces 
> attached)
> Looking at the code the problem is obviously the linear scanning:
> {code}
>         public boolean isDeleted(Column column)
>         {
>             for (RangeTombstone tombstone : ranges)
>             {
>                 if (comparator.compare(column.name(), tombstone.min) >= 0
>                     && comparator.compare(column.name(), tombstone.max) <= 0
>                     && tombstone.maxTimestamp() >= column.timestamp())
>                 {
>                     return true;
>                 }
>             }
>             return false;
>         }
> {code}
> I would like to propose to change this and instead use a sorted list (e.g. 
> RangeTombstoneList) here instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8547) Make RangeTombstone.Tracker.isDeleted() faster

Reply via email to