[ https://issues.apache.org/jira/browse/CASSANDRA-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13859641#comment-13859641 ]
Benedict commented on CASSANDRA-6534: ------------------------------------- Are your test inserts being run in parallel, or serially? Are you overwriting the map values with each insert, or adding new ones (if the latter, the comparison between collections/non-collections is not fair) There are a number of reasons why many rapid updates to a single partition key may perform badly (this is not really the behaviour Cassandra is optimised for), but I would try this out on the latest trunk to see if CASSANDRA-5417 helps you at all. > Slow inserts with collections into a single partition (Pathological GC > behavior) > -------------------------------------------------------------------------------- > > Key: CASSANDRA-6534 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6534 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: dsc12-1.2.12-1.noarch.rpm > cassandra12-1.2.12-1.noarch.rpm > centos 6.4 > Reporter: Michael Penick > Fix For: 1.2.12 > > Attachments: GC_behavior.png > > > We noticed extremely slow insertion rates to a single partition key, using > composite column with a collection value. We were not able to replicate the > issue using the same schema, but with a non-colleciton value and using much > larger values. During the collection insertion tests we have tons of these > messages in the system.log: > "GCInspector.java (line 119) GC for ConcurrentMarkSweep: 1287 ms for 2 > collections, 1233256368 used; max is 8375238656" > We are inserting a tiny amounts of data 32-64 bytes and seeing the issue > after only a couple 10k inserts. The amount of memory being used by C*/JVM is > no where near proportional to the amount data being inserted. Why is C* > consuming so much memory? > Attached is a picture of the GC under one of the pathological tests. Keep in > mind we are only inserting 128KB - 256KB of data and we are almost hitting > the limit of the heap. > GC flags: > -XX:+UseThreadPriorities > -XX:ThreadPriorityPolicy=42 > -Xms8192M > -Xmx8192M > -Xmn2048M > -XX:+HeapDumpOnOutOfMemoryError > -Xss180k > -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC > -XX:+CMSParallelRemarkEnabled > -XX:SurvivorRatio=8 > -XX:MaxTenuringThreshold=1 > -XX:CMSInitiatingOccupancyFraction=75 > -XX:+UseCMSInitiatingOccupancyOnly > -XX:+UseTLAB > Example schemas: > Note: The type of collection or primitive type in the collection doesn't seem > to matter. > {code} > CREATE TABLE test.test ( > row_key text, > column_key uuid, > column_value list<int>, > PRIMARY KEY(row_key, column_key)); > CREATE TABLE test.test ( > row_key text, > column_key uuid, > column_value map<text, text>, > PRIMARY KEY(row_key, column_key)); > {code} > Example inserts: > Note: This issue is able to be replicated with extremely small inserts (a > well as larger ~1KB) > {code} > INSERT INTO test.test > (row_key, column_key, column_value) > VALUES > ('0000000001', e0138677-7246-11e3-ac78-016ae7083d37, [0, 1, 2, 3]); > INSERT INTO test.test > (row_key, column_key, column_value) > VALUES > ('0000000022', 1ac5770a-7247-11e3-80e4-016ae7083d37, { 'a': > '0123456701234567012345670', 'b': '0123456701234567012345670' }); > {code} > As a comparison, I was able to run the same tests with the following schema > with no issue: > Note: This test was able to run at a much faster insertion speed, for much > longer and much bigger column sizes (1KB) without any GC issues. > {code} > CREATE TABLE test.test ( > row_key text, > column_key uuid, > column_value text, > PRIMARY KEY(row_key, column_key) ) > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)