[ https://issues.apache.org/jira/browse/CASSANDRA-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264185#comment-14264185 ]
Dominic Letz edited comment on CASSANDRA-8546 at 1/5/15 6:05 AM: ----------------------------------------------------------------- Seems I'm the only one having fun with this one. I've done some more profiling and found that when middle-inserts and searching is alternating the GapLists own binarySearch is not performing well as it "normalizes" the GapList on each search. In this update I'm using the built-in Collections.binarySearch instead. was (Author: dominicletz): Seems I'm the only one having fun with this one. I've done some more profiling and found that when there are is a pendulum between middle-inserts and searching the GapLists own binarySearch is not performing well as it "normalizes" the GapList on each search. In this update I'm using the built-in Collections.binarySearch instead. > RangeTombstoneList becoming bottleneck on tombstone heavy tasks > --------------------------------------------------------------- > > Key: CASSANDRA-8546 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8546 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: 2.0.11 / 2.1 > Reporter: Dominic Letz > Assignee: Dominic Letz > Fix For: 2.1.3 > > Attachments: cassandra-2.0.11-8546.txt, cassandra-2.1-8546.txt, > tombstone_test.tgz > > > I would like to propose a change of the data structure used in the > RangeTombstoneList to store and insert tombstone ranges to something with at > least O(log N) insert in the middle and at near O(1) and start AND end. Here > is why: > When having tombstone heavy work-loads the current implementation of > RangeTombstoneList becomes a bottleneck with slice queries. > Scanning the number of tombstones up to the default maximum (100k) can take > up to 3 minutes of how addInternal() scales on insertion of middle and start > elements. > The attached test shows that with 50k deletes from both sides of a range. > INSERT 1...110000 > flush() > DELETE 1...50000 > DELETE 110000...60000 > While one direction performs ok (~400ms on my notebook): > {code} > SELECT * FROM timeseries WHERE name = 'a' ORDER BY timestamp DESC LIMIT 1 > {code} > The other direction underperforms (~7seconds on my notebook) > {code} > SELECT * FROM timeseries WHERE name = 'a' ORDER BY timestamp ASC LIMIT 1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)