[ https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139982#comment-14139982 ]
graham sanderson commented on CASSANDRA-7546: --------------------------------------------- Ok, cool thanks - I've upgraded my 2.1.0 to 2.1.1... {{7cfd3ed}} for what it's worth. I merged {{7964+7926}} into that and updated my load machine with that. I switched to 40x40x40x40 clustering keys as suggested and changed the 10M entries in the command line args to 2560000 accordingly (it now runs successfully) The output is below Note I ended up with 1275 partitions (note during the warmup I ended up with 1025 so there may be a 1-off bug there also either in stress or my config!)... still not sure this is what we expect - each node has only seen about 3M mutations total (and I've run the stress test twice - once without the GC stuff working) Anyway, let me know what you think - I won't be running more tests until tomorrow US time. Another question - what do you usually do to get comparable results; right now I have been blowing away the stresscql keyspace every time to at least keep compaction out of the equation. Given the length of the cassandra-stress run, I'm not sure there is much to be gained by bouncing the cluster in between runs, but you probably know better having used it before. {code} Results: op rate : 10595 partition rate : 10595 row rate : 10595 latency mean : 85.8 latency median : 49.9 latency 95th percentile : 360.0 latency 99th percentile : 417.9 latency 99.9th percentile : 491.9 latency max : 552.2 total gc count : 3 total gc mb : 19471 total gc time (s) : 0 avg gc time(ms) : 67 stdev gc time(ms) : 5 Total operation time : 00:00:40 Improvement over 609 threadCount: -1% id, total ops , adj row/s, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr, gc: #, max ms, sum ms, sdv ms, mb 4 threadCount, 6939 , -0, 226, 226, 226, 17.6, 16.3, 40.3, 49.4, 51.1, 131.8, 30.6, 0.01464, 0, 0, 0, 0, 0 8 threadCount, 11827 , 385, 385, 385, 385, 20.7, 15.1, 47.5, 51.3, 82.1, 111.7, 30.7, 0.02511, 0, 0, 0, 0, 0 16 threadCount, 19068 , -0, 612, 612, 612, 26.1, 28.8, 49.9, 60.6, 89.7, 172.1, 31.2, 0.01924, 0, 0, 0, 0, 0 24 threadCount, 24441 , -0, 775, 775, 775, 30.9, 32.6, 52.1, 80.3, 88.3, 150.4, 31.5, 0.01508, 0, 0, 0, 0, 0 36 threadCount, 36641 , -0, 1155, 1155, 1155, 31.1, 30.2, 59.0, 78.1, 89.7, 172.1, 31.7, 0.01127, 0, 0, 0, 0, 0 54 threadCount, 55220 , -0, 1730, 1730, 1730, 31.1, 29.1, 54.5, 74.3, 84.3, 164.4, 31.9, 0.00883, 0, 0, 0, 0, 0 81 threadCount, 83460 , -0, 2609, 2609, 2609, 31.0, 28.9, 51.2, 71.0, 79.2, 175.4, 32.0, 0.01678, 0, 0, 0, 0, 0 121 threadCount, 140705 , -0, 4402, 4402, 4402, 27.4, 25.8, 49.7, 53.2, 70.3, 302.8, 32.0, 0.01438, 2, 462, 462, 11, 12889 181 threadCount, 226213 , -0, 7116, 7116, 7116, 25.4, 24.2, 48.8, 51.8, 60.1, 279.0, 31.8, 0.01335, 1, 230, 230, 0, 6401 271 threadCount, 320658 , -0, 10089, 10089, 10089, 26.8, 25.0, 48.3, 50.1, 57.4, 297.0, 31.8, 0.01256, 2, 425, 425, 14, 12786 406 threadCount, 342451 , -0, 10609, 10609, 10609, 38.2, 40.3, 59.0, 77.5, 81.7, 142.4, 32.3, 0.00920, 0, 0, 0, 0, 0 609 threadCount, 381058 , -0, 10651, 10651, 10651, 57.0, 48.6, 171.5, 224.4, 248.4, 342.0, 35.8, 0.01234, 1, 66, 66, 0, 6520 913 threadCount, 432518 , -0, 10595, 10595, 10595, 85.8, 49.9, 360.0, 417.9, 491.9, 552.2, 40.8, 0.01471, 3, 200, 200, 5, 19471 END {code} > AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory > ----------------------------------------------------------------------------- > > Key: CASSANDRA-7546 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7546 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: graham sanderson > Assignee: graham sanderson > Fix For: 2.1.1 > > Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, > 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, > 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, > suggestion1.txt, suggestion1_21.txt, young_gen_gc.png > > > In order to preserve atomicity, this code attempts to read, clone/update, > then CAS the state of the partition. > Under heavy contention for updating a single partition this can cause some > fairly staggering memory growth (the more cores on your machine the worst it > gets). > Whilst many usage patterns don't do highly concurrent updates to the same > partition, hinting today, does, and in this case wild (order(s) of magnitude > more than expected) memory allocation rates can be seen (especially when the > updates being hinted are small updates to different partitions which can > happen very fast on their own) - see CASSANDRA-7545 > It would be best to eliminate/reduce/limit the spinning memory allocation > whilst not slowing down the very common un-contended case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)