[ https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069896#comment-14069896 ]
Benedict commented on CASSANDRA-7546: ------------------------------------- bq. Alternatively if you are saying, let each thread keep working while they still believe they can win, This was my original rationale for the patch I posted, however now I am much more in favour of bq. a one way switch per Atomic*Columns instance that flips after a number waster "operations"? However whether it is one-way or not is somewhat unimportant for me. This flip would only last the lifetime of a memtable, which is not super lengthy (under heavily load probably only a few minutes), and would not have dramatically negative consequences if it got it slightly wrong However^2 I'm still having a hard time believing rebalancing costs in snap tree can be that high, and further if that really is the problem it should not be an issue in 2.1, as the b-tree rebalances with O(lg(N)) allocations. I'd be a little surprised if the snap tree didn't do the same, as if there were more than O(lg(N)) allocations, the algorithmic complexity would be > O(lg(N)) also. It's possible somehow that it manages to inter-refererence with on-going copies, so that we get a highly complex graph that retains exponentially more garbage the more competing updates there are, but again I would be very surprised if this were the case. However outside of either of these I would expect the garbage generated to all be immediately collectible, so it would have to be the sheer volume alone that overwhelmed the GC, which is certainly possible but this would entail a _lot_ of hinting, and I'd be surprised if a node could be receiving a large enough quantity. On the other hand the arena allocations in 2.0 are definitely incapable of being collected and could be allocated almost as rapidly. bq. I'm not sure which changes you are talking about back-porting and whether the "at most twice" refers to looping once then locking In this instance I'm referring to copying the source ColumnFamily locally in the variable once after failing the cas, so that we do not keep allocating arena space. Alternatively, we could just do it upfront in the method, as the only extra cost is an array allocation proportional in size to the input data, which is fairly cheap. All of this said, I think the behaviour of locking after wasting an excessive number of cycles is still a good one, so I'm comfortable introducing it either way, and it would certainly help with all of the above causes. > AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory > ----------------------------------------------------------------------------- > > Key: CASSANDRA-7546 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7546 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: graham sanderson > Assignee: graham sanderson > Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_alt.txt, > suggestion1.txt, suggestion1_21.txt > > > In order to preserve atomicity, this code attempts to read, clone/update, > then CAS the state of the partition. > Under heavy contention for updating a single partition this can cause some > fairly staggering memory growth (the more cores on your machine the worst it > gets). > Whilst many usage patterns don't do highly concurrent updates to the same > partition, hinting today, does, and in this case wild (order(s) of magnitude > more than expected) memory allocation rates can be seen (especially when the > updates being hinted are small updates to different partitions which can > happen very fast on their own) - see CASSANDRA-7545 > It would be best to eliminate/reduce/limit the spinning memory allocation > whilst not slowing down the very common un-contended case. -- This message was sent by Atlassian JIRA (v6.2#6252)