[jira] [Comment Edited] (CASSANDRA-8546) RangeTombstoneList becoming bottleneck on tombstone heavy tasks

Dominic Letz (JIRA) Mon, 05 Jan 2015 04:01:13 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264520#comment-14264520
 ]


Dominic Letz edited comment on CASSANDRA-8546 at 1/5/15 12:00 PM:
------------------------------------------------------------------

Sorry for the assignment, Tyler mentioned you to look into it but did assign 
back to me. Since he didn't mention anything for me to be done I assumed that 
was an error and he meant to assign to you, as he mentioned you.

Good call on BTree vs. GapList indeed the BTree does guarantee O(log N) but you 
will get in all cases. While GapList gives O(1) on front and back insert, and 
O(1) for middle insert with the same locality - and insertFrom() is exactly 
doing that on overlapping ranges. But it will degrade to the mentioned O(N) on 
random non-overlapping tombstones.

The true reason I went for GapList though was that it allowed me to keep most 
of the code unchanged as it also supports index based access. Now though I 
think I know the code good enough to replace it with TreeMap or similar.

Let me know whether such a changed patch would be helpful.



was (Author: dominicletz):
Sorry for the assignment, Tyler mentioned you to look into it but did assign 
back to me. Since he didn't mention anything for me to be done I assumed that 
was an error and he meant to assign to you, as he mentioned you.

Good call on BTree vs. GapList indeed the BTree does guarantee O(log N) but you 
will get in all cases. While GapList gives O(1) on front and back insert, and 
O(1) for middle insert with the same locality - and insertFrom() which is 
exactly doing that on overlapping ranges.

The true reason I went for GapList though was that it allowed me to keep most 
of the code unchanged as it also supports index based access. Now though I 
think I know the code good enough to replace it with TreeMap or similar.

Let me know whether such a changed patch would be helpful.


> RangeTombstoneList becoming bottleneck on tombstone heavy tasks
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-8546
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8546
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: 2.0.11 / 2.1
>            Reporter: Dominic Letz
>            Assignee: Benedict
>             Fix For: 2.1.3
>
>         Attachments: cassandra-2.0.11-8546.txt, cassandra-2.1-8546.txt, 
> tombstone_test.tgz
>
>
> I would like to propose a change of the data structure used in the 
> RangeTombstoneList to store and insert tombstone ranges to something with at 
> least O(log N) insert in the middle and at near O(1) and start AND end. Here 
> is why:
> When having tombstone heavy work-loads the current implementation of 
> RangeTombstoneList becomes a bottleneck with slice queries.
> Scanning the number of tombstones up to the default maximum (100k) can take 
> up to 3 minutes of how addInternal() scales on insertion of middle and start 
> elements.
> The attached test shows that with 50k deletes from both sides of a range.
> INSERT 1...110000
> flush()
> DELETE 1...50000
> DELETE 110000...60000
> While one direction performs ok (~400ms on my notebook):
> {code}
> SELECT * FROM timeseries WHERE name = 'a' ORDER BY timestamp DESC LIMIT 1
> {code}
> The other direction underperforms (~7seconds on my notebook)
> {code}
> SELECT * FROM timeseries WHERE name = 'a' ORDER BY timestamp ASC LIMIT 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-8546) RangeTombstoneList becoming bottleneck on tombstone heavy tasks

Reply via email to