[jira] [Updated] (CASSANDRA-8547) Make RangeTombstone.Tracker.isDeleted() faster
[ https://issues.apache.org/jira/browse/CASSANDRA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-8547: - Fix Version/s: (was: 2.1.x) > Make RangeTombstone.Tracker.isDeleted() faster > -- > > Key: CASSANDRA-8547 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8547 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: 2.0.11 >Reporter: Dominic Letz >Assignee: Dominic Letz > Labels: tombstone > Attachments: Selection_044.png, cassandra-2.0.11-8547.txt, > cassandra-2.1-8547.txt, rangetombstone.tracker.txt > > > During compaction and repairs with many tombstones an exorbitant amount of > time is spend in RangeTombstone.Tracker.isDeleted(). > The amount of time spend there can be so big that compactions and repairs > look "stalled" and the time remaining time estimated frozen at the same value > for days. > Using visualvm I've been sample profiling the code during execution and both > in Compaction as well as during repairs found this. (point in time backtraces > attached) > Looking at the code the problem is obviously the linear scanning: > {code} > public boolean isDeleted(Column column) > { > for (RangeTombstone tombstone : ranges) > { > if (comparator.compare(column.name(), tombstone.min) >= 0 > && comparator.compare(column.name(), tombstone.max) <= 0 > && tombstone.maxTimestamp() >= column.timestamp()) > { > return true; > } > } > return false; > } > {code} > I would like to propose to change this and instead use a sorted list (e.g. > RangeTombstoneList) here instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8547) Make RangeTombstone.Tracker.isDeleted() faster
[ https://issues.apache.org/jira/browse/CASSANDRA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Letz updated CASSANDRA-8547: Labels: tombstone (was: ) > Make RangeTombstone.Tracker.isDeleted() faster > -- > > Key: CASSANDRA-8547 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8547 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: 2.0.11 >Reporter: Dominic Letz >Assignee: Dominic Letz > Labels: tombstone > Fix For: 2.1.3 > > Attachments: Selection_044.png, cassandra-2.0.11-8547.txt, > cassandra-2.1-8547.txt, rangetombstone.tracker.txt > > > During compaction and repairs with many tombstones an exorbitant amount of > time is spend in RangeTombstone.Tracker.isDeleted(). > The amount of time spend there can be so big that compactions and repairs > look "stalled" and the time remaining time estimated frozen at the same value > for days. > Using visualvm I've been sample profiling the code during execution and both > in Compaction as well as during repairs found this. (point in time backtraces > attached) > Looking at the code the problem is obviously the linear scanning: > {code} > public boolean isDeleted(Column column) > { > for (RangeTombstone tombstone : ranges) > { > if (comparator.compare(column.name(), tombstone.min) >= 0 > && comparator.compare(column.name(), tombstone.max) <= 0 > && tombstone.maxTimestamp() >= column.timestamp()) > { > return true; > } > } > return false; > } > {code} > I would like to propose to change this and instead use a sorted list (e.g. > RangeTombstoneList) here instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8547) Make RangeTombstone.Tracker.isDeleted() faster
[ https://issues.apache.org/jira/browse/CASSANDRA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-8547: Reviewer: Sylvain Lebresne (was: Benedict) > Make RangeTombstone.Tracker.isDeleted() faster > -- > > Key: CASSANDRA-8547 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8547 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: 2.0.11 >Reporter: Dominic Letz >Assignee: Dominic Letz > Fix For: 2.1.3 > > Attachments: Selection_044.png, cassandra-2.0.11-8547.txt, > cassandra-2.1-8547.txt, rangetombstone.tracker.txt > > > During compaction and repairs with many tombstones an exorbitant amount of > time is spend in RangeTombstone.Tracker.isDeleted(). > The amount of time spend there can be so big that compactions and repairs > look "stalled" and the time remaining time estimated frozen at the same value > for days. > Using visualvm I've been sample profiling the code during execution and both > in Compaction as well as during repairs found this. (point in time backtraces > attached) > Looking at the code the problem is obviously the linear scanning: > {code} > public boolean isDeleted(Column column) > { > for (RangeTombstone tombstone : ranges) > { > if (comparator.compare(column.name(), tombstone.min) >= 0 > && comparator.compare(column.name(), tombstone.max) <= 0 > && tombstone.maxTimestamp() >= column.timestamp()) > { > return true; > } > } > return false; > } > {code} > I would like to propose to change this and instead use a sorted list (e.g. > RangeTombstoneList) here instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8547) Make RangeTombstone.Tracker.isDeleted() faster
[ https://issues.apache.org/jira/browse/CASSANDRA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Letz updated CASSANDRA-8547: Attachment: Selection_044.png Attaching a screenshot of visualvm showing where a typical compaction job is spending its time for us without the patch. > Make RangeTombstone.Tracker.isDeleted() faster > -- > > Key: CASSANDRA-8547 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8547 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: 2.0.11 >Reporter: Dominic Letz >Assignee: Dominic Letz > Fix For: 2.1.3 > > Attachments: Selection_044.png, cassandra-2.0.11-8547.txt, > cassandra-2.1-8547.txt, rangetombstone.tracker.txt > > > During compaction and repairs with many tombstones an exorbitant amount of > time is spend in RangeTombstone.Tracker.isDeleted(). > The amount of time spend there can be so big that compactions and repairs > look "stalled" and the time remaining time estimated frozen at the same value > for days. > Using visualvm I've been sample profiling the code during execution and both > in Compaction as well as during repairs found this. (point in time backtraces > attached) > Looking at the code the problem is obviously the linear scanning: > {code} > public boolean isDeleted(Column column) > { > for (RangeTombstone tombstone : ranges) > { > if (comparator.compare(column.name(), tombstone.min) >= 0 > && comparator.compare(column.name(), tombstone.max) <= 0 > && tombstone.maxTimestamp() >= column.timestamp()) > { > return true; > } > } > return false; > } > {code} > I would like to propose to change this and instead use a sorted list (e.g. > RangeTombstoneList) here instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8547) Make RangeTombstone.Tracker.isDeleted() faster
[ https://issues.apache.org/jira/browse/CASSANDRA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Letz updated CASSANDRA-8547: Attachment: cassandra-2.1-8547.txt cassandra-2.0.11-8547.txt The fix for this turned out to be much more trivial then I thought. I've attached it for both 2.0.11 and 2.1 There seems to be a silent assumption in the RangeTombstone.Tracker that ranges with the same .max values to do not repeat e.g. the existing maxOrderingSet is only storing one value and it's maxOrderingSet.add(t) will discarded but there is no check for that case. I've left the logic like it is but feel there might be an issue around second deletes for the same range value. > Make RangeTombstone.Tracker.isDeleted() faster > -- > > Key: CASSANDRA-8547 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8547 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: 2.0.11 >Reporter: Dominic Letz > Fix For: 2.1.3 > > Attachments: cassandra-2.0.11-8547.txt, cassandra-2.1-8547.txt, > rangetombstone.tracker.txt > > > During compaction and repairs with many tombstones an exorbitant amount of > time is spend in RangeTombstone.Tracker.isDeleted(). > The amount of time spend there can be so big that compactions and repairs > look "stalled" and the time remaining time estimated frozen at the same value > for days. > Using visualvm I've been sample profiling the code during execution and both > in Compaction as well as during repairs found this. (point in time backtraces > attached) > Looking at the code the problem is obviously the linear scanning: > {code} > public boolean isDeleted(Column column) > { > for (RangeTombstone tombstone : ranges) > { > if (comparator.compare(column.name(), tombstone.min) >= 0 > && comparator.compare(column.name(), tombstone.max) <= 0 > && tombstone.maxTimestamp() >= column.timestamp()) > { > return true; > } > } > return false; > } > {code} > I would like to propose to change this and instead use a sorted list (e.g. > RangeTombstoneList) here instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8547) Make RangeTombstone.Tracker.isDeleted() faster
[ https://issues.apache.org/jira/browse/CASSANDRA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-8547: --- Fix Version/s: 2.1.3 > Make RangeTombstone.Tracker.isDeleted() faster > -- > > Key: CASSANDRA-8547 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8547 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: 2.0.11 >Reporter: Dominic Letz > Fix For: 2.1.3 > > Attachments: rangetombstone.tracker.txt > > > During compaction and repairs with many tombstones an exorbitant amount of > time is spend in RangeTombstone.Tracker.isDeleted(). > The amount of time spend there can be so big that compactions and repairs > look "stalled" and the time remaining time estimated frozen at the same value > for days. > Using visualvm I've been sample profiling the code during execution and both > in Compaction as well as during repairs found this. (point in time backtraces > attached) > Looking at the code the problem is obviously the linear scanning: > {code} > public boolean isDeleted(Column column) > { > for (RangeTombstone tombstone : ranges) > { > if (comparator.compare(column.name(), tombstone.min) >= 0 > && comparator.compare(column.name(), tombstone.max) <= 0 > && tombstone.maxTimestamp() >= column.timestamp()) > { > return true; > } > } > return false; > } > {code} > I would like to propose to change this and instead use a sorted list (e.g. > RangeTombstoneList) here instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)