[ https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134635#comment-14134635 ]
graham sanderson edited comment on CASSANDRA-7546 at 9/15/14 10:50 PM: ----------------------------------------------------------------------- Finally getting back to this, been doing other things (this slightly lower priority as we have it in production already) as well as keeping breaking myself physically, requiring orthopedic visits! I just realized that the version c6a2c65a75ade being voted on for 2.1.0 that I deployed is not the same as 2.1.0 released. I am now upgrading, since cassandra-stress changes snuck in. Note, than I plan to stress using 1024, 256, 16, 1 partitions, with all 5 nodes up, and then with 4 nodes up and one down to test effect of hinting, (note repl factor of 3 and cl=LOCAL_QUORUM), as well as with at least memtable_allocation_type = heap_buffers & off_heap_buffers I want to do one cell insert per batch... I'm upgrading in part because of the new visit/revisit stuff - I'm not 100% sure how to use them correctly, I'll keep playing but you may answer before I have finished upgrading and tried with this. My first attempt on the original 2.1.0 revision, ended up with only one clustering key value per partition which is not what I wanted (because it'll make trees small) Sample YAML for 1024 partitions {code} # # This is an example YAML profile for cassandra-stress # # insert data # cassandra-stress user profile=/home/jake/stress1.yaml ops(insert=1) # # read, using query simple1: # cassandra-stress profile=/home/jake/stress1.yaml ops(simple1=1) # # mixed workload (90/10) # cassandra-stress user profile=/home/jake/stress1.yaml ops(insert=1,simple1=9) # # Keyspace info # keyspace: stresscql # # The CQL for creating a keyspace (optional if it already exists) # keyspace_definition: | CREATE KEYSPACE stresscql WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3}; # # Table info # table: testtable # # The CQL for creating a table you wish to stress (optional if it already exists) # table_definition: | CREATE TABLE testtable ( p text, c text, v blob, PRIMARY KEY(p, c) ) WITH COMPACT STORAGE AND compaction = { 'class':'LeveledCompactionStrategy' } AND comment='TestTable' # # Optional meta information on the generated columns in the above table # The min and max only apply to text and blob types # The distribution field represents the total unique population # distribution of that column across rows. Supported types are # # EXP(min..max) An exponential distribution over the range [min..max] # EXTREME(min..max,shape) An extreme value (Weibull) distribution over the range [min..max] # GAUSSIAN(min..max,stdvrng) A gaussian/normal distribution, where mean=(min+max)/2, and stdev is (mean-min)/stdvrng # GAUSSIAN(min..max,mean,stdev) A gaussian/normal distribution, with explicitly defined mean and stdev # UNIFORM(min..max) A uniform distribution over the range [min, max] # FIXED(val) A fixed distribution, always returning the same value # Aliases: extr, gauss, normal, norm, weibull # # If preceded by ~, the distribution is inverted # # Defaults for all columns are size: uniform(4..8), population: uniform(1..100B), cluster: fixed(1) # columnspec: - name: p size: fixed(16) population: uniform(1..1024) # the range of unique values to select for the field (default is 100Billion) - name: c size: fixed(26) # cluster: uniform(1..100B) - name: v size: gaussian(50..250) insert: partitions: fixed(1) # number of unique partitions to update in a single operation # if batchcount > 1, multiple batches will be used but all partitions will # occur in all batches (unless they finish early); only the row counts will vary batchtype: LOGGED # type of batch to use visits: fixed(10M) # not sure about this queries: simple1: select * from testtable where k = ? and v = ? LIMIT 10 {code} Command-line {code} ./cassandra-stress user profile=~/cqlstress-1024.yaml ops\(insert=1\) cl=LOCAL_QUORUM -node $NODES -mode native prepared cql3 | tee results/results-2.1.0-p1024-a.txt {code} was (Author: graham sanderson): Finally getting back to this, been doing other things (this slightly lower priority as we have it in production already)... I just realized that the version c6a2c65a75ade being voted on for 2.1.0 that I deployed is not the same as 2.1.0 released. I am now upgrading, since cassandra-stress changes snuck in. Note, than I plan to stress using 1024, 256, 16, 1 partitions, with all 5 nodes up, and then with 4 nodes up and one down to test effect of hinting, (note repl factor of 3 and cl=LOCAL_QUORUM) I want to do one cell insert per batch... I'm upgrading in part because of the new visit/revisit stuff - I'm not 100% sure how to use them correctly, I'll keep playing but you may answer before I have finished upgrading and tried with this. My first attempt on the original 2.1.0 revision, ended up with only one clustering key value per partition which is not what I wanted (because it'll make trees small) Sample YAML for 1024 partitions {code} # # This is an example YAML profile for cassandra-stress # # insert data # cassandra-stress user profile=/home/jake/stress1.yaml ops(insert=1) # # read, using query simple1: # cassandra-stress profile=/home/jake/stress1.yaml ops(simple1=1) # # mixed workload (90/10) # cassandra-stress user profile=/home/jake/stress1.yaml ops(insert=1,simple1=9) # # Keyspace info # keyspace: stresscql # # The CQL for creating a keyspace (optional if it already exists) # keyspace_definition: | CREATE KEYSPACE stresscql WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3}; # # Table info # table: testtable # # The CQL for creating a table you wish to stress (optional if it already exists) # table_definition: | CREATE TABLE testtable ( p text, c text, v blob, PRIMARY KEY(p, c) ) WITH COMPACT STORAGE AND compaction = { 'class':'LeveledCompactionStrategy' } AND comment='TestTable' # # Optional meta information on the generated columns in the above table # The min and max only apply to text and blob types # The distribution field represents the total unique population # distribution of that column across rows. Supported types are # # EXP(min..max) An exponential distribution over the range [min..max] # EXTREME(min..max,shape) An extreme value (Weibull) distribution over the range [min..max] # GAUSSIAN(min..max,stdvrng) A gaussian/normal distribution, where mean=(min+max)/2, and stdev is (mean-min)/stdvrng # GAUSSIAN(min..max,mean,stdev) A gaussian/normal distribution, with explicitly defined mean and stdev # UNIFORM(min..max) A uniform distribution over the range [min, max] # FIXED(val) A fixed distribution, always returning the same value # Aliases: extr, gauss, normal, norm, weibull # # If preceded by ~, the distribution is inverted # # Defaults for all columns are size: uniform(4..8), population: uniform(1..100B), cluster: fixed(1) # columnspec: - name: p size: fixed(16) population: uniform(1..1024) # the range of unique values to select for the field (default is 100Billion) - name: c size: fixed(26) # cluster: uniform(1..100B) - name: v size: gaussian(50..250) insert: partitions: fixed(1) # number of unique partitions to update in a single operation # if batchcount > 1, multiple batches will be used but all partitions will # occur in all batches (unless they finish early); only the row counts will vary batchtype: LOGGED # type of batch to use visits: fixed(10M) # not sure about this queries: simple1: select * from testtable where k = ? and v = ? LIMIT 10 {code} Command-line {code} ./cassandra-stress user profile=~/cqlstress-1024.yaml ops\(insert=1\) cl=LOCAL_QUORUM -node $NODES -mode native prepared cql3 | tee results/results-2.1.0-p1024-a.txt {code} > AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory > ----------------------------------------------------------------------------- > > Key: CASSANDRA-7546 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7546 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: graham sanderson > Assignee: graham sanderson > Fix For: 2.1.1 > > Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, > 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, > 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, > suggestion1.txt, suggestion1_21.txt, young_gen_gc.png > > > In order to preserve atomicity, this code attempts to read, clone/update, > then CAS the state of the partition. > Under heavy contention for updating a single partition this can cause some > fairly staggering memory growth (the more cores on your machine the worst it > gets). > Whilst many usage patterns don't do highly concurrent updates to the same > partition, hinting today, does, and in this case wild (order(s) of magnitude > more than expected) memory allocation rates can be seen (especially when the > updates being hinted are small updates to different partitions which can > happen very fast on their own) - see CASSANDRA-7545 > It would be best to eliminate/reduce/limit the spinning memory allocation > whilst not slowing down the very common un-contended case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)