[jira] [Updated] (CASSANDRA-8547) Make RangeTombstone.Tracker.isDeleted() faster

2015-06-04 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-8547:
-
Fix Version/s: (was: 2.1.x)

> Make RangeTombstone.Tracker.isDeleted() faster
> --
>
> Key: CASSANDRA-8547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
> Environment: 2.0.11
>Reporter: Dominic Letz
>Assignee: Dominic Letz
>  Labels: tombstone
> Attachments: Selection_044.png, cassandra-2.0.11-8547.txt, 
> cassandra-2.1-8547.txt, rangetombstone.tracker.txt
>
>
> During compaction and repairs with many tombstones an exorbitant amount of 
> time is spend in RangeTombstone.Tracker.isDeleted().
> The amount of time spend there can be so big that compactions and repairs 
> look "stalled" and the time remaining time estimated frozen at the same value 
> for days.
> Using visualvm I've been sample profiling the code during execution and both 
> in Compaction as well as during repairs found this. (point in time backtraces 
> attached)
> Looking at the code the problem is obviously the linear scanning:
> {code}
> public boolean isDeleted(Column column)
> {
> for (RangeTombstone tombstone : ranges)
> {
> if (comparator.compare(column.name(), tombstone.min) >= 0
> && comparator.compare(column.name(), tombstone.max) <= 0
> && tombstone.maxTimestamp() >= column.timestamp())
> {
> return true;
> }
> }
> return false;
> }
> {code}
> I would like to propose to change this and instead use a sorted list (e.g. 
> RangeTombstoneList) here instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8547) Make RangeTombstone.Tracker.isDeleted() faster

2015-01-13 Thread Dominic Letz (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Letz updated CASSANDRA-8547:

Labels: tombstone  (was: )

> Make RangeTombstone.Tracker.isDeleted() faster
> --
>
> Key: CASSANDRA-8547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
> Environment: 2.0.11
>Reporter: Dominic Letz
>Assignee: Dominic Letz
>  Labels: tombstone
> Fix For: 2.1.3
>
> Attachments: Selection_044.png, cassandra-2.0.11-8547.txt, 
> cassandra-2.1-8547.txt, rangetombstone.tracker.txt
>
>
> During compaction and repairs with many tombstones an exorbitant amount of 
> time is spend in RangeTombstone.Tracker.isDeleted().
> The amount of time spend there can be so big that compactions and repairs 
> look "stalled" and the time remaining time estimated frozen at the same value 
> for days.
> Using visualvm I've been sample profiling the code during execution and both 
> in Compaction as well as during repairs found this. (point in time backtraces 
> attached)
> Looking at the code the problem is obviously the linear scanning:
> {code}
> public boolean isDeleted(Column column)
> {
> for (RangeTombstone tombstone : ranges)
> {
> if (comparator.compare(column.name(), tombstone.min) >= 0
> && comparator.compare(column.name(), tombstone.max) <= 0
> && tombstone.maxTimestamp() >= column.timestamp())
> {
> return true;
> }
> }
> return false;
> }
> {code}
> I would like to propose to change this and instead use a sorted list (e.g. 
> RangeTombstoneList) here instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8547) Make RangeTombstone.Tracker.isDeleted() faster

2015-01-06 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-8547:

Reviewer: Sylvain Lebresne  (was: Benedict)

> Make RangeTombstone.Tracker.isDeleted() faster
> --
>
> Key: CASSANDRA-8547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
> Environment: 2.0.11
>Reporter: Dominic Letz
>Assignee: Dominic Letz
> Fix For: 2.1.3
>
> Attachments: Selection_044.png, cassandra-2.0.11-8547.txt, 
> cassandra-2.1-8547.txt, rangetombstone.tracker.txt
>
>
> During compaction and repairs with many tombstones an exorbitant amount of 
> time is spend in RangeTombstone.Tracker.isDeleted().
> The amount of time spend there can be so big that compactions and repairs 
> look "stalled" and the time remaining time estimated frozen at the same value 
> for days.
> Using visualvm I've been sample profiling the code during execution and both 
> in Compaction as well as during repairs found this. (point in time backtraces 
> attached)
> Looking at the code the problem is obviously the linear scanning:
> {code}
> public boolean isDeleted(Column column)
> {
> for (RangeTombstone tombstone : ranges)
> {
> if (comparator.compare(column.name(), tombstone.min) >= 0
> && comparator.compare(column.name(), tombstone.max) <= 0
> && tombstone.maxTimestamp() >= column.timestamp())
> {
> return true;
> }
> }
> return false;
> }
> {code}
> I would like to propose to change this and instead use a sorted list (e.g. 
> RangeTombstoneList) here instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8547) Make RangeTombstone.Tracker.isDeleted() faster

2015-01-05 Thread Dominic Letz (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Letz updated CASSANDRA-8547:

Attachment: Selection_044.png

Attaching a screenshot of visualvm showing where a typical compaction job is 
spending its time for us without the patch.

> Make RangeTombstone.Tracker.isDeleted() faster
> --
>
> Key: CASSANDRA-8547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
> Environment: 2.0.11
>Reporter: Dominic Letz
>Assignee: Dominic Letz
> Fix For: 2.1.3
>
> Attachments: Selection_044.png, cassandra-2.0.11-8547.txt, 
> cassandra-2.1-8547.txt, rangetombstone.tracker.txt
>
>
> During compaction and repairs with many tombstones an exorbitant amount of 
> time is spend in RangeTombstone.Tracker.isDeleted().
> The amount of time spend there can be so big that compactions and repairs 
> look "stalled" and the time remaining time estimated frozen at the same value 
> for days.
> Using visualvm I've been sample profiling the code during execution and both 
> in Compaction as well as during repairs found this. (point in time backtraces 
> attached)
> Looking at the code the problem is obviously the linear scanning:
> {code}
> public boolean isDeleted(Column column)
> {
> for (RangeTombstone tombstone : ranges)
> {
> if (comparator.compare(column.name(), tombstone.min) >= 0
> && comparator.compare(column.name(), tombstone.max) <= 0
> && tombstone.maxTimestamp() >= column.timestamp())
> {
> return true;
> }
> }
> return false;
> }
> {code}
> I would like to propose to change this and instead use a sorted list (e.g. 
> RangeTombstoneList) here instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8547) Make RangeTombstone.Tracker.isDeleted() faster

2014-12-31 Thread Dominic Letz (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Letz updated CASSANDRA-8547:

Attachment: cassandra-2.1-8547.txt
cassandra-2.0.11-8547.txt

The fix for this turned out to be much more trivial then I thought. I've 
attached it for both 2.0.11 and 2.1

There seems to be a silent assumption in the RangeTombstone.Tracker that ranges 
with the same .max values to do not repeat e.g. the existing maxOrderingSet is 
only storing one value and it's maxOrderingSet.add(t) will discarded but there 
is no check for that case.
I've left the logic like it is but feel there might be an issue around second 
deletes for the same range value.

> Make RangeTombstone.Tracker.isDeleted() faster
> --
>
> Key: CASSANDRA-8547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
> Environment: 2.0.11
>Reporter: Dominic Letz
> Fix For: 2.1.3
>
> Attachments: cassandra-2.0.11-8547.txt, cassandra-2.1-8547.txt, 
> rangetombstone.tracker.txt
>
>
> During compaction and repairs with many tombstones an exorbitant amount of 
> time is spend in RangeTombstone.Tracker.isDeleted().
> The amount of time spend there can be so big that compactions and repairs 
> look "stalled" and the time remaining time estimated frozen at the same value 
> for days.
> Using visualvm I've been sample profiling the code during execution and both 
> in Compaction as well as during repairs found this. (point in time backtraces 
> attached)
> Looking at the code the problem is obviously the linear scanning:
> {code}
> public boolean isDeleted(Column column)
> {
> for (RangeTombstone tombstone : ranges)
> {
> if (comparator.compare(column.name(), tombstone.min) >= 0
> && comparator.compare(column.name(), tombstone.max) <= 0
> && tombstone.maxTimestamp() >= column.timestamp())
> {
> return true;
> }
> }
> return false;
> }
> {code}
> I would like to propose to change this and instead use a sorted list (e.g. 
> RangeTombstoneList) here instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8547) Make RangeTombstone.Tracker.isDeleted() faster

2014-12-29 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-8547:
---
Fix Version/s: 2.1.3

> Make RangeTombstone.Tracker.isDeleted() faster
> --
>
> Key: CASSANDRA-8547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
> Environment: 2.0.11
>Reporter: Dominic Letz
> Fix For: 2.1.3
>
> Attachments: rangetombstone.tracker.txt
>
>
> During compaction and repairs with many tombstones an exorbitant amount of 
> time is spend in RangeTombstone.Tracker.isDeleted().
> The amount of time spend there can be so big that compactions and repairs 
> look "stalled" and the time remaining time estimated frozen at the same value 
> for days.
> Using visualvm I've been sample profiling the code during execution and both 
> in Compaction as well as during repairs found this. (point in time backtraces 
> attached)
> Looking at the code the problem is obviously the linear scanning:
> {code}
> public boolean isDeleted(Column column)
> {
> for (RangeTombstone tombstone : ranges)
> {
> if (comparator.compare(column.name(), tombstone.min) >= 0
> && comparator.compare(column.name(), tombstone.max) <= 0
> && tombstone.maxTimestamp() >= column.timestamp())
> {
> return true;
> }
> }
> return false;
> }
> {code}
> I would like to propose to change this and instead use a sorted list (e.g. 
> RangeTombstoneList) here instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)