[ https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070533#comment-14070533 ]
graham sanderson edited comment on CASSANDRA-7546 at 7/22/14 5:25 PM: ---------------------------------------------------------------------- Well that makes sense, I hadn't checked if there was a limit on mutator threads - we didn't change it... this probably explains the hard upper bound in my synthetic test (which incidentally does not to the transformation) I agree with you on SnapTreeMap, once I see the "essentially" free clone operation has to acquire a lock (or at least wait for no mutations)... I surmised there were probably dragons there that might cause all kinds of nastyness whether it be pain on concurrent updates to horribly unbalancing updates to the tree, or dragging huge amounts of stale data (garbage to be) with it due to overly lazy copy on write (again I didn't look too closely). BTree looks much better (and probably does less rebalancing since it has wider nodes I think), though as discussed it doesn't prevent the underlying race we also want to avoid So, I'll see if I have time to work on this later today, but the plan is... for 2.0.x (just checking) a) move the transformation.apply out of the loop or do it at most once (I prefer the former since it makes the race window smaller) b) do a one way flip flag per AtomicSortedColumns instance, which is flipped when a cost reaches a certain value. I was going to calculate the delta in each mutator thread (probably adding a log-like measure e.g. using Integer.numberOfLeadingZeros(tree.size()) per failing CAS), though looking ugh at SnapTreeMap again, it seems that tree.size() is not a good method to call in the presence of mutations, so I guess Holder can just track the tree size itself c) given this is possibly a temporary solution, is it worth exposing the "cut-off" value even un-documented such that it could be overridden in cassandra.yaml? Note the default should be such that most AtomicSortedColumns instance never get cut-off since they are not heavily contended and large (indicating contended inserts not updates) We can re-circle the wagons for 2.1 which needs to be a separate patch anyway was (Author: graham sanderson): Well that makes sense, I hadn't checked if there was a limit on mutator threads - we didn't change it... this probably explains the hard upper bound in my synthetic test (which incidentally does not to the transformation) I agree with you on SnapTreeMap, once I see the "essentially" free clone operation has to acquire a lock (or at least wait for no mutations)... I surmised there were probably dragons there that might cause all kinds of nastyness whether it be pain on concurrent updates to a horribly unbalanced tree, or dragging huge amounts of garbage with it due to overly lazy copy on write (again I didn't look too closely). BTree looks much better (and probably does less rebalancing since it has wider nodes I think), though as discussed it doesn't prevent the underlying race. So, I'll see if I have time to work on this later today, but the plan is... for 2.0.x (just checking) a) move the transformation.apply out of the loop and do it once b) do a one way flip flag per AtomicSortedColumns instance, which is flipped when a cost reaches a certain value. I was going to calculate the delta in each mutator thread (probably adding a log-like measure e.g. using Integer.numberOfLeadingZeros(tree.size()) per failing CAS), though looking ugh at SnapTreeMap again, it seems that tree.size() is not a good method to call in the presence of mutations, so I guess Holder can just track the tree size itself c) given this is possibly a temporary solution, is it worth exposing the "cut-off" value even un-documented such that it could be overridden in cassandra.yaml? Note the default should be such that most AtomicSortedColumns instance never get cut-off since they are not heavily contended and large (indicating contended inserts not updates) > AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory > ----------------------------------------------------------------------------- > > Key: CASSANDRA-7546 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7546 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: graham sanderson > Assignee: graham sanderson > Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_alt.txt, > suggestion1.txt, suggestion1_21.txt > > > In order to preserve atomicity, this code attempts to read, clone/update, > then CAS the state of the partition. > Under heavy contention for updating a single partition this can cause some > fairly staggering memory growth (the more cores on your machine the worst it > gets). > Whilst many usage patterns don't do highly concurrent updates to the same > partition, hinting today, does, and in this case wild (order(s) of magnitude > more than expected) memory allocation rates can be seen (especially when the > updates being hinted are small updates to different partitions which can > happen very fast on their own) - see CASSANDRA-7545 > It would be best to eliminate/reduce/limit the spinning memory allocation > whilst not slowing down the very common un-contended case. -- This message was sent by Atlassian JIRA (v6.2#6252)