[ 
https://issues.apache.org/jira/browse/CASSANDRA-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13859581#comment-13859581
 ] 

Benedict commented on CASSANDRA-6271:
-------------------------------------

Ah, and I just realised I made a major mistake in my parent-pointer approach, 
which further damages the idea, probably a show stopper:

We can't re-use *any* sub-trees if we use parent pointers when modifying a 
btree, as the old btree needs to retain its prior parent pointers. So a single 
column modification will need to rewrite the entire btree. We could reduce the 
impact of this by allocating wrapper objects that just contain the array and 
the parent pointer, but then we're getting into larger memory costs and more 
dereference costs when traversing. Also, further polluting the code that 
navigates with dealing with this extra level of indirection. Now that I think 
about it, I think this is why I ruled them out in the first place.

> Replace SnapTree in AtomicSortedColumns
> ---------------------------------------
>
>                 Key: CASSANDRA-6271
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6271
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>         Attachments: oprate.svg
>
>
> On the write path a huge percentage of time is spent in GC (>50% in my tests, 
> if accounting for slow down due to parallel marking). SnapTrees are both GC 
> unfriendly due to their structure and also very expensive to keep around - 
> each column name in AtomicSortedColumns uses > 100 bytes on average 
> (excluding the actual ByteBuffer).
> I suggest using a sorted array; changes are supplied at-once, as opposed to 
> one at a time, and if < 10% of the keys in the array change (and data equal 
> to < 10% of the size of the key array) we simply overlay a new array of 
> changes only over the top. Otherwise we rewrite the array. This method should 
> ensure much less GC overhead, and also save approximately 80% of the current 
> memory overhead.
> TreeMap is similarly difficult object for the GC, and a related task might be 
> to remove it where not strictly necessary, even though we don't keep them 
> hanging around for long. TreeMapBackedSortedColumns, for instance, seems to 
> be used in a lot of places where we could simply sort the columns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to