[ https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137777#comment-14137777 ]
graham sanderson commented on CASSANDRA-7546: --------------------------------------------- OK, so I'm running latest stress.jar on my load machine - given the number of changes to stress in 2.1.1 (and the addition by the looks of things of remote GC logging via cassandra-stress which would be useful in this case), I guess I'll upgrade the cluster as well. Here is my current config (minus the comments) and the launch command... note there were some typos in our conversation above {code} keyspace: stresscql keyspace_definition: | CREATE KEYSPACE stresscql WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3}; table: testtable table_definition: | CREATE TABLE testtable ( p text, c1 int, c2 int, c3 int, v blob, PRIMARY KEY(p, c1, c2, c3) ) WITH COMPACT STORAGE AND compaction = { 'class':'LeveledCompactionStrategy' } AND comment='TestTable' columnspec: - name: p size: fixed(16) - name: c1 cluster: fixed(100) - name: c2 cluster: fixed(100) - name: c3 cluster: fixed(1000) # note I made it slightly bigger since 10M is better than 1M for a max - 1M happens pretty quickly - name: v size: gaussian(50..250) queries: simple1: cql: select * from testtable where k = ? and v = ? LIMIT 10 fields: samerow {code} {code} ./cassandra-stress user profile=~/cqlstress-7546.yaml ops\(insert=1\) cl=LOCAL_QUORUM -node $NODES -mode native prepared cql3 -pop seq=1..10M -insert visits=fixed\(10M\) revisit=uniform\(1..1024\) | tee results/results-2.1.0-p1024-a.txt {code} As of right now, we're still (8 minutes later) at: {code} INFO 19:11:51 Using data-center name 'Austin' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor) Connected to cluster: Austin Multi-Tenant Cassandra 1 INFO 19:11:51 New Cassandra host cassandra4.aus.vast.com/172.17.26.14:9042 added Datatacenter: Austin; Host: cassandra4.aus.vast.com/172.17.26.14; Rack: 98.9 Datatacenter: Austin; Host: /172.17.26.15; Rack: 98.9 Datatacenter: Austin; Host: /172.17.26.13; Rack: 98.9 Datatacenter: Austin; Host: /172.17.26.12; Rack: 98.9 Datatacenter: Austin; Host: /172.17.26.11; Rack: 98.9 INFO 19:11:51 New Cassandra host /172.17.26.12:9042 added INFO 19:11:51 New Cassandra host /172.17.26.11:9042 added INFO 19:11:51 New Cassandra host /172.17.26.13:9042 added INFO 19:11:51 New Cassandra host /172.17.26.15:9042 added Created schema. Sleeping 5s for propagation. Warming up insert with 250000 iterations... Failed to connect over JMX; not collecting these stats Generating batches with [1..1] partitions and [1..1] rows (of [10000000..10000000] total rows in the partitions) {code} Number of distinct partitions is currently 2365 and growing. Is this what we expect? doesn't seem like 250,000 should have exhausted any partitions? > AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory > ----------------------------------------------------------------------------- > > Key: CASSANDRA-7546 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7546 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: graham sanderson > Assignee: graham sanderson > Fix For: 2.1.1 > > Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, > 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, > 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, > suggestion1.txt, suggestion1_21.txt, young_gen_gc.png > > > In order to preserve atomicity, this code attempts to read, clone/update, > then CAS the state of the partition. > Under heavy contention for updating a single partition this can cause some > fairly staggering memory growth (the more cores on your machine the worst it > gets). > Whilst many usage patterns don't do highly concurrent updates to the same > partition, hinting today, does, and in this case wild (order(s) of magnitude > more than expected) memory allocation rates can be seen (especially when the > updates being hinted are small updates to different partitions which can > happen very fast on their own) - see CASSANDRA-7545 > It would be best to eliminate/reduce/limit the spinning memory allocation > whilst not slowing down the very common un-contended case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)