[jira] [Updated] (CASSANDRA-6271) Replace SnapTree in AtomicSortedColumns
[ https://issues.apache.org/jira/browse/CASSANDRA-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-6271: Attachment: tmp.patch Replace SnapTree in AtomicSortedColumns --- Key: CASSANDRA-6271 URL: https://issues.apache.org/jira/browse/CASSANDRA-6271 Project: Cassandra Issue Type: Improvement Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 Attachments: oprate.svg, tmp.patch On the write path a huge percentage of time is spent in GC (50% in my tests, if accounting for slow down due to parallel marking). SnapTrees are both GC unfriendly due to their structure and also very expensive to keep around - each column name in AtomicSortedColumns uses 100 bytes on average (excluding the actual ByteBuffer). I suggest using a sorted array; changes are supplied at-once, as opposed to one at a time, and if 10% of the keys in the array change (and data equal to 10% of the size of the key array) we simply overlay a new array of changes only over the top. Otherwise we rewrite the array. This method should ensure much less GC overhead, and also save approximately 80% of the current memory overhead. TreeMap is similarly difficult object for the GC, and a related task might be to remove it where not strictly necessary, even though we don't keep them hanging around for long. TreeMapBackedSortedColumns, for instance, seems to be used in a lot of places where we could simply sort the columns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6271) Replace SnapTree in AtomicSortedColumns
[ https://issues.apache.org/jira/browse/CASSANDRA-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-6271: Attachment: 0001-Always-call-ReplaceFunction.txt Another problem is that the ColumnUpdater is not called for the first insert in a partition. This breaks some dtests (and is the reason for the last comments on CASSANDRA-4511). Attaching a simple patch that fixes that. I'll note that this patch also include the tmp.patch fix as it happens. Replace SnapTree in AtomicSortedColumns --- Key: CASSANDRA-6271 URL: https://issues.apache.org/jira/browse/CASSANDRA-6271 Project: Cassandra Issue Type: Improvement Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 Attachments: 0001-Always-call-ReplaceFunction.txt, oprate.svg, tmp.patch On the write path a huge percentage of time is spent in GC (50% in my tests, if accounting for slow down due to parallel marking). SnapTrees are both GC unfriendly due to their structure and also very expensive to keep around - each column name in AtomicSortedColumns uses 100 bytes on average (excluding the actual ByteBuffer). I suggest using a sorted array; changes are supplied at-once, as opposed to one at a time, and if 10% of the keys in the array change (and data equal to 10% of the size of the key array) we simply overlay a new array of changes only over the top. Otherwise we rewrite the array. This method should ensure much less GC overhead, and also save approximately 80% of the current memory overhead. TreeMap is similarly difficult object for the GC, and a related task might be to remove it where not strictly necessary, even though we don't keep them hanging around for long. TreeMapBackedSortedColumns, for instance, seems to be used in a lot of places where we could simply sort the columns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6271) Replace SnapTree in AtomicSortedColumns
[ https://issues.apache.org/jira/browse/CASSANDRA-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-6271: Attachment: tmp2.patch Added a unit test, which also caught that we had missed the apply(null, insert) in NodeBuilder, so fixed that. Replace SnapTree in AtomicSortedColumns --- Key: CASSANDRA-6271 URL: https://issues.apache.org/jira/browse/CASSANDRA-6271 Project: Cassandra Issue Type: Improvement Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 Attachments: 0001-Always-call-ReplaceFunction.txt, oprate.svg, tmp.patch, tmp2.patch On the write path a huge percentage of time is spent in GC (50% in my tests, if accounting for slow down due to parallel marking). SnapTrees are both GC unfriendly due to their structure and also very expensive to keep around - each column name in AtomicSortedColumns uses 100 bytes on average (excluding the actual ByteBuffer). I suggest using a sorted array; changes are supplied at-once, as opposed to one at a time, and if 10% of the keys in the array change (and data equal to 10% of the size of the key array) we simply overlay a new array of changes only over the top. Otherwise we rewrite the array. This method should ensure much less GC overhead, and also save approximately 80% of the current memory overhead. TreeMap is similarly difficult object for the GC, and a related task might be to remove it where not strictly necessary, even though we don't keep them hanging around for long. TreeMapBackedSortedColumns, for instance, seems to be used in a lot of places where we could simply sort the columns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6271) Replace SnapTree in AtomicSortedColumns
[ https://issues.apache.org/jira/browse/CASSANDRA-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-6271: Attachment: tmp3.patch Fixed patch generation and added extra test line to catch the original bug case. Replace SnapTree in AtomicSortedColumns --- Key: CASSANDRA-6271 URL: https://issues.apache.org/jira/browse/CASSANDRA-6271 Project: Cassandra Issue Type: Improvement Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 Attachments: 0001-Always-call-ReplaceFunction.txt, oprate.svg, tmp.patch, tmp2.patch, tmp3.patch On the write path a huge percentage of time is spent in GC (50% in my tests, if accounting for slow down due to parallel marking). SnapTrees are both GC unfriendly due to their structure and also very expensive to keep around - each column name in AtomicSortedColumns uses 100 bytes on average (excluding the actual ByteBuffer). I suggest using a sorted array; changes are supplied at-once, as opposed to one at a time, and if 10% of the keys in the array change (and data equal to 10% of the size of the key array) we simply overlay a new array of changes only over the top. Otherwise we rewrite the array. This method should ensure much less GC overhead, and also save approximately 80% of the current memory overhead. TreeMap is similarly difficult object for the GC, and a related task might be to remove it where not strictly necessary, even though we don't keep them hanging around for long. TreeMapBackedSortedColumns, for instance, seems to be used in a lot of places where we could simply sort the columns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6271) Replace SnapTree in AtomicSortedColumns
[ https://issues.apache.org/jira/browse/CASSANDRA-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-6271: Attachment: oprate.svg I've uploaded a patch to [6271|https://github.com/belliottsmith/cassandra/tree/iss-6271]. This patch replaces AtomicSortedColumns with AtomicBTreeColumns in Memtable. The BTree is a custom CoW job, that minimises memory utilisation without sacrificing performance. Modifications are performed in an actual batch (as opposed to a simulated batch as currently the case), using a builder to construct a new tree from as many parts of the old tree as possible. The fan-factor is currently set to 32, which was both experimentally and logically a good choice (I was expecting 16-32, to ensure we don't have too many cache lines for search in a given node, nor high merge costs). Each node, and the BTree itself, are represented as just an Object[], with utility functions to operate on the root node. This is a little anti-OOP, but I don't think a full fledged Set object is either called for or helpful here, as we have only a small number of operations we perform on the trees, and we'll only use it in fairly controlled circumstances; and this offers us some further memory savings. In synthetic benchmarks, I found this BTree to be slower than TreeSet by around 25-50%, but found that SnapTreeMap (when used in the same manner we use it in the code) was a similar degree slower again than the BTree, which was a good start. This is despite trying my best to induce worst-case behaviour from the BTree, and was persistent across all the tests I performed. The stress benchmarks are pretty promising too. I've attached graphs of this patch against trunk, and against this patch combined with my patch for 3578. The result is roughly 50% greater throughput for writes, with both patches. In fact, it got to the point where a node was flat out just running stress against the 4x cluster. I think we can push this a little further with a patch for switch lock removal, but we'll probably only see a small uptick from that. I also tested the patch for highly contended writes of a small number of rows, and found no measureable difference in performance. Note that this patch does not currently implement iterator removal for the BTree. I'm not sure yet how is best to do it, and am leaning towards a simple replacement of the Column that's deleted with a special DeletedColumn, that is filtered out on iteration. This would mean .size() would either have to be slower or inaccurate, so I need to do some analysis to determine if this is okay, or if I should bite the bullet and implement a full batch delete operation on the BTree, but this is a bit of a pig to do whilst ensuring a balanced tree, and sometimes wasteful. I could also not balance the tree, or only make a 'best effort' that avoids re-allocating the same node multiple times even if it would result in imbalance. Suggestions welcome. Replace SnapTree in AtomicSortedColumns --- Key: CASSANDRA-6271 URL: https://issues.apache.org/jira/browse/CASSANDRA-6271 Project: Cassandra Issue Type: Improvement Reporter: Benedict Assignee: Benedict Labels: performance Attachments: oprate.svg On the write path a huge percentage of time is spent in GC (50% in my tests, if accounting for slow down due to parallel marking). SnapTrees are both GC unfriendly due to their structure and also very expensive to keep around - each column name in AtomicSortedColumns uses 100 bytes on average (excluding the actual ByteBuffer). I suggest using a sorted array; changes are supplied at-once, as opposed to one at a time, and if 10% of the keys in the array change (and data equal to 10% of the size of the key array) we simply overlay a new array of changes only over the top. Otherwise we rewrite the array. This method should ensure much less GC overhead, and also save approximately 80% of the current memory overhead. TreeMap is similarly difficult object for the GC, and a related task might be to remove it where not strictly necessary, even though we don't keep them hanging around for long. TreeMapBackedSortedColumns, for instance, seems to be used in a lot of places where we could simply sort the columns. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (CASSANDRA-6271) Replace SnapTree in AtomicSortedColumns
[ https://issues.apache.org/jira/browse/CASSANDRA-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-6271: Description: On the write path a huge percentage of time is spent in GC (50% in my tests, if accounting for slow down due to parallel marking). SnapTrees are both GC unfriendly due to their structure and also very expensive to keep around - each column name in AtomicSortedColumns uses 100 bytes on average. I suggest using a sorted array; changes are supplied at-once, as opposed to one at a time, and if 10% of the values in the array change (and data equal to 10% of the size of the key array) we simply overlay a new array of changes only over the top. Otherwise we rewrite the array. This method should ensure much less GC overhead, and also save approximately 80% of the current memory overhead. TreeMap is similarly difficult object for the GC, and a related task might be to remove it where not strictly necessary, even though we don't keep them hanging around for long. TreeMapBackedSortedColumns, for instance, seems to be used in a lot of places where we could simply sort the columns. was: On the write path a huge percentage of time is spent in GC (50% in my tests, if accounting for slow down due to parallel marking). SnapTrees are both GC unfriendly due to their structure and also very expensive to keep around - each column name in AtomicSortedColumns uses 100 bytes on average. I suggest using a sorted array; changes are supplied at-once, as opposed to one at a time, and if 10% of the values in the array change we simply overlay a new array of changes only over the top. Otherwise we rewrite the array. This method should ensure much less GC overhead, and also save approximately 80% of the current memory overhead. TreeMap is similarly difficult object for the GC, and a related task might be to remove it where not strictly necessary, even though we don't keep them hanging around for long. TreeMapBackedSortedColumns, for instance, seems to be used in a lot of places where we could simply sort the columns. Replace SnapTree in AtomicSortedColumns --- Key: CASSANDRA-6271 URL: https://issues.apache.org/jira/browse/CASSANDRA-6271 Project: Cassandra Issue Type: Improvement Reporter: Benedict Assignee: Benedict On the write path a huge percentage of time is spent in GC (50% in my tests, if accounting for slow down due to parallel marking). SnapTrees are both GC unfriendly due to their structure and also very expensive to keep around - each column name in AtomicSortedColumns uses 100 bytes on average. I suggest using a sorted array; changes are supplied at-once, as opposed to one at a time, and if 10% of the values in the array change (and data equal to 10% of the size of the key array) we simply overlay a new array of changes only over the top. Otherwise we rewrite the array. This method should ensure much less GC overhead, and also save approximately 80% of the current memory overhead. TreeMap is similarly difficult object for the GC, and a related task might be to remove it where not strictly necessary, even though we don't keep them hanging around for long. TreeMapBackedSortedColumns, for instance, seems to be used in a lot of places where we could simply sort the columns. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (CASSANDRA-6271) Replace SnapTree in AtomicSortedColumns
[ https://issues.apache.org/jira/browse/CASSANDRA-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-6271: Description: On the write path a huge percentage of time is spent in GC (50% in my tests, if accounting for slow down due to parallel marking). SnapTrees are both GC unfriendly due to their structure and also very expensive to keep around - each column name in AtomicSortedColumns uses 100 bytes on average. I suggest using a sorted array; changes are supplied at-once, as opposed to one at a time, and if 10% of the values in the array change (and data equal to 10% of the size of the key array) we simply overlay a new array of changes only over the top. Otherwise we rewrite the array. This method should ensure much less GC overhead, and also save approximately 80% of the current memory overhead. TreeMap is similarly difficult object for the GC, and a related task might be to remove it where not strictly necessary, even though we don't keep them hanging around for long. TreeMapBackedSortedColumns, for instance, seems to be used in a lot of places where we could simply sort the columns. was: On the write path a huge percentage of time is spent in GC (50% in my tests, if accounting for slow down due to parallel marking). SnapTrees are both GC unfriendly due to their structure and also very expensive to keep around - each column name in AtomicSortedColumns uses 100 bytes on average. I suggest using a sorted array; changes are supplied at-once, as opposed to one at a time, and if 10% of the values in the array change we simply overlay a new array of changes only over the top. Otherwise we rewrite the array. This method should ensure much less GC overhead, and also save approximately 80% of the current memory overhead. TreeMap is similarly difficult object for the GC, and a related task might be to remove it where not strictly necessary, even though we don't keep them hanging around for long. TreeMapBackedSortedColumns, for instance, seems to be used in a lot of places where we could simply sort the columns. Replace SnapTree in AtomicSortedColumns --- Key: CASSANDRA-6271 URL: https://issues.apache.org/jira/browse/CASSANDRA-6271 Project: Cassandra Issue Type: Improvement Reporter: Benedict Assignee: Benedict On the write path a huge percentage of time is spent in GC (50% in my tests, if accounting for slow down due to parallel marking). SnapTrees are both GC unfriendly due to their structure and also very expensive to keep around - each column name in AtomicSortedColumns uses 100 bytes on average. I suggest using a sorted array; changes are supplied at-once, as opposed to one at a time, and if 10% of the values in the array change (and data equal to 10% of the size of the key array) we simply overlay a new array of changes only over the top. Otherwise we rewrite the array. This method should ensure much less GC overhead, and also save approximately 80% of the current memory overhead. TreeMap is similarly difficult object for the GC, and a related task might be to remove it where not strictly necessary, even though we don't keep them hanging around for long. TreeMapBackedSortedColumns, for instance, seems to be used in a lot of places where we could simply sort the columns. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (CASSANDRA-6271) Replace SnapTree in AtomicSortedColumns
[ https://issues.apache.org/jira/browse/CASSANDRA-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-6271: Description: On the write path a huge percentage of time is spent in GC (50% in my tests, if accounting for slow down due to parallel marking). SnapTrees are both GC unfriendly due to their structure and also very expensive to keep around - each column name in AtomicSortedColumns uses 100 bytes on average. I suggest using a sorted array; changes are supplied at-once, as opposed to one at a time, and if 10% of the keys in the array change (and data equal to 10% of the size of the key array) we simply overlay a new array of changes only over the top. Otherwise we rewrite the array. This method should ensure much less GC overhead, and also save approximately 80% of the current memory overhead. TreeMap is similarly difficult object for the GC, and a related task might be to remove it where not strictly necessary, even though we don't keep them hanging around for long. TreeMapBackedSortedColumns, for instance, seems to be used in a lot of places where we could simply sort the columns. was: On the write path a huge percentage of time is spent in GC (50% in my tests, if accounting for slow down due to parallel marking). SnapTrees are both GC unfriendly due to their structure and also very expensive to keep around - each column name in AtomicSortedColumns uses 100 bytes on average. I suggest using a sorted array; changes are supplied at-once, as opposed to one at a time, and if 10% of the values in the array change (and data equal to 10% of the size of the key array) we simply overlay a new array of changes only over the top. Otherwise we rewrite the array. This method should ensure much less GC overhead, and also save approximately 80% of the current memory overhead. TreeMap is similarly difficult object for the GC, and a related task might be to remove it where not strictly necessary, even though we don't keep them hanging around for long. TreeMapBackedSortedColumns, for instance, seems to be used in a lot of places where we could simply sort the columns. Replace SnapTree in AtomicSortedColumns --- Key: CASSANDRA-6271 URL: https://issues.apache.org/jira/browse/CASSANDRA-6271 Project: Cassandra Issue Type: Improvement Reporter: Benedict Assignee: Benedict On the write path a huge percentage of time is spent in GC (50% in my tests, if accounting for slow down due to parallel marking). SnapTrees are both GC unfriendly due to their structure and also very expensive to keep around - each column name in AtomicSortedColumns uses 100 bytes on average. I suggest using a sorted array; changes are supplied at-once, as opposed to one at a time, and if 10% of the keys in the array change (and data equal to 10% of the size of the key array) we simply overlay a new array of changes only over the top. Otherwise we rewrite the array. This method should ensure much less GC overhead, and also save approximately 80% of the current memory overhead. TreeMap is similarly difficult object for the GC, and a related task might be to remove it where not strictly necessary, even though we don't keep them hanging around for long. TreeMapBackedSortedColumns, for instance, seems to be used in a lot of places where we could simply sort the columns. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (CASSANDRA-6271) Replace SnapTree in AtomicSortedColumns
[ https://issues.apache.org/jira/browse/CASSANDRA-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-6271: Labels: performance (was: ) Replace SnapTree in AtomicSortedColumns --- Key: CASSANDRA-6271 URL: https://issues.apache.org/jira/browse/CASSANDRA-6271 Project: Cassandra Issue Type: Improvement Reporter: Benedict Assignee: Benedict Labels: performance On the write path a huge percentage of time is spent in GC (50% in my tests, if accounting for slow down due to parallel marking). SnapTrees are both GC unfriendly due to their structure and also very expensive to keep around - each column name in AtomicSortedColumns uses 100 bytes on average. I suggest using a sorted array; changes are supplied at-once, as opposed to one at a time, and if 10% of the keys in the array change (and data equal to 10% of the size of the key array) we simply overlay a new array of changes only over the top. Otherwise we rewrite the array. This method should ensure much less GC overhead, and also save approximately 80% of the current memory overhead. TreeMap is similarly difficult object for the GC, and a related task might be to remove it where not strictly necessary, even though we don't keep them hanging around for long. TreeMapBackedSortedColumns, for instance, seems to be used in a lot of places where we could simply sort the columns. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (CASSANDRA-6271) Replace SnapTree in AtomicSortedColumns
[ https://issues.apache.org/jira/browse/CASSANDRA-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-6271: Description: On the write path a huge percentage of time is spent in GC (50% in my tests, if accounting for slow down due to parallel marking). SnapTrees are both GC unfriendly due to their structure and also very expensive to keep around - each column name in AtomicSortedColumns uses 100 bytes on average (excluding the actual ByteBuffer). I suggest using a sorted array; changes are supplied at-once, as opposed to one at a time, and if 10% of the keys in the array change (and data equal to 10% of the size of the key array) we simply overlay a new array of changes only over the top. Otherwise we rewrite the array. This method should ensure much less GC overhead, and also save approximately 80% of the current memory overhead. TreeMap is similarly difficult object for the GC, and a related task might be to remove it where not strictly necessary, even though we don't keep them hanging around for long. TreeMapBackedSortedColumns, for instance, seems to be used in a lot of places where we could simply sort the columns. was: On the write path a huge percentage of time is spent in GC (50% in my tests, if accounting for slow down due to parallel marking). SnapTrees are both GC unfriendly due to their structure and also very expensive to keep around - each column name in AtomicSortedColumns uses 100 bytes on average. I suggest using a sorted array; changes are supplied at-once, as opposed to one at a time, and if 10% of the keys in the array change (and data equal to 10% of the size of the key array) we simply overlay a new array of changes only over the top. Otherwise we rewrite the array. This method should ensure much less GC overhead, and also save approximately 80% of the current memory overhead. TreeMap is similarly difficult object for the GC, and a related task might be to remove it where not strictly necessary, even though we don't keep them hanging around for long. TreeMapBackedSortedColumns, for instance, seems to be used in a lot of places where we could simply sort the columns. Replace SnapTree in AtomicSortedColumns --- Key: CASSANDRA-6271 URL: https://issues.apache.org/jira/browse/CASSANDRA-6271 Project: Cassandra Issue Type: Improvement Reporter: Benedict Assignee: Benedict Labels: performance On the write path a huge percentage of time is spent in GC (50% in my tests, if accounting for slow down due to parallel marking). SnapTrees are both GC unfriendly due to their structure and also very expensive to keep around - each column name in AtomicSortedColumns uses 100 bytes on average (excluding the actual ByteBuffer). I suggest using a sorted array; changes are supplied at-once, as opposed to one at a time, and if 10% of the keys in the array change (and data equal to 10% of the size of the key array) we simply overlay a new array of changes only over the top. Otherwise we rewrite the array. This method should ensure much less GC overhead, and also save approximately 80% of the current memory overhead. TreeMap is similarly difficult object for the GC, and a related task might be to remove it where not strictly necessary, even though we don't keep them hanging around for long. TreeMapBackedSortedColumns, for instance, seems to be used in a lot of places where we could simply sort the columns. -- This message was sent by Atlassian JIRA (v6.1#6144)