[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070533#comment-14070533
 ] 

graham sanderson edited comment on CASSANDRA-7546 at 7/22/14 5:25 PM:
----------------------------------------------------------------------

Well that makes sense, I hadn't checked if there was a limit on mutator threads 
- we didn't change it... this probably explains the hard upper bound in my 
synthetic test (which incidentally does not to the transformation)

I agree with you on SnapTreeMap, once I see the "essentially" free clone 
operation has to acquire a lock (or at least wait for no mutations)... I 
surmised there were probably dragons there that might cause all kinds of 
nastyness whether it be pain on concurrent updates to horribly unbalancing 
updates to the tree, or dragging huge amounts of stale data (garbage to be) 
with it due to overly lazy copy on write (again I didn't look too closely). 
BTree looks much better (and probably does less rebalancing since it has wider 
nodes I think), though as discussed it doesn't prevent the underlying race we 
also want to avoid

So, I'll see if I have time to work on this later today, but the plan is... for 
2.0.x (just checking)

a) move the transformation.apply out of the loop or do it at most once (I 
prefer the former since it makes the race window smaller)
b) do a one way flip flag per AtomicSortedColumns instance, which is flipped 
when a cost reaches a certain value. I was going to calculate the delta in each 
mutator thread (probably adding a log-like measure e.g. using 
Integer.numberOfLeadingZeros(tree.size()) per failing CAS), though looking ugh 
at SnapTreeMap again, it seems that tree.size() is not a good method to call in 
the presence of mutations, so I guess Holder can just track the tree size itself
c) given this is possibly a temporary solution, is it worth exposing the 
"cut-off" value even un-documented such that it could be overridden in 
cassandra.yaml? Note the default should be such that most AtomicSortedColumns 
instance never get cut-off since they are not heavily contended and large 
(indicating contended inserts not updates)

We can re-circle the wagons for 2.1 which needs to be a separate patch anyway



was (Author: graham sanderson):
Well that makes sense, I hadn't checked if there was a limit on mutator threads 
- we didn't change it... this probably explains the hard upper bound in my 
synthetic test (which incidentally does not to the transformation)

I agree with you on SnapTreeMap, once I see the "essentially" free clone 
operation has to acquire a lock (or at least wait for no mutations)... I 
surmised there were probably dragons there that might cause all kinds of 
nastyness whether it be pain on concurrent updates to a horribly unbalanced 
tree, or dragging huge amounts of garbage with it due to overly lazy copy on 
write (again I didn't look too closely). BTree looks much better (and probably 
does less rebalancing since it has wider nodes I think), though as discussed it 
doesn't prevent the underlying race.

So, I'll see if I have time to work on this later today, but the plan is... for 
2.0.x (just checking)

a) move the transformation.apply out of the loop and do it once
b) do a one way flip flag per AtomicSortedColumns instance, which is flipped 
when a cost reaches a certain value. I was going to calculate the delta in each 
mutator thread (probably adding a log-like measure e.g. using 
Integer.numberOfLeadingZeros(tree.size()) per failing CAS), though looking ugh 
at SnapTreeMap again, it seems that tree.size() is not a good method to call in 
the presence of mutations, so I guess Holder can just track the tree size itself
c) given this is possibly a temporary solution, is it worth exposing the 
"cut-off" value even un-documented such that it could be overridden in 
cassandra.yaml? Note the default should be such that most AtomicSortedColumns 
instance never get cut-off since they are not heavily contended and large 
(indicating contended inserts not updates)


> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7546
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: graham sanderson
>            Assignee: graham sanderson
>         Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to