[ 
https://issues.apache.org/jira/browse/CASSANDRA-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13859690#comment-13859690
 ] 

Benedict commented on CASSANDRA-6534:
-------------------------------------

bq. if that's what you were getting at?
No, actually the opposite. If you're storing new values every time in the same 
wide row with composites/collections, you're going to be incurring large 
overheads without the linked patch. The columns/keys have to be sorted, and the 
comparison of two composite keys (which a collection key is backed by) requires 
deserializing the components. Absolutely all of your inserts are also stored 
against the same wide row, which has to be maintained in sorted order 
(including all map elements) which means you have a single very large (100k+ 
elements) binary search tree in memory for the row.

However, if you're performing these in parallel it's quite possible the issue 
is actually that the parallel modifications are all racing to complete, and 
producing lots of aborted attempts to modify the partition key. Each partition 
key is modified in a copy-on-write manner, and since the modifications are 
potentially expensive due to the deserialization comparisons being performed 
over the large binary search tree, they are highly likely to overlap, so each 
insert is essentially performing a copy-on-write followed by an abort by all in 
progress except the one successful operation. So you could be seeing a hugely 
disproportionate generation of garbage, both in the tree modification and the 
allocation of byte buffers that are discarded.

The linked patch might alleviate some of this by making the modifications 
faster, so having fewer aborted races to modify, and some other improvements 
that will reduce garbage are in the works that would marginally mitigate this 
issue, but ultimately the underlying issue is that you should not be performing 
large volumes of highly concurrent modifications to a single partition key.



> Slow inserts with collections into a single partition (Pathological GC 
> behavior)
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6534
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6534
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: dsc12-1.2.12-1.noarch.rpm
> cassandra12-1.2.12-1.noarch.rpm
> centos 6.4
>            Reporter: Michael Penick
>             Fix For: 1.2.12
>
>         Attachments: GC_behavior.png
>
>
> We noticed extremely slow insertion rates to a single partition key, using 
> composite column with a collection value. We were not able to replicate the 
> issue using the same schema, but with a non-colleciton value and using much 
> larger values.  During the collection insertion tests we have tons of these 
> messages in the system.log:
> "GCInspector.java (line 119) GC for ConcurrentMarkSweep: 1287 ms for 2 
> collections, 1233256368 used; max is 8375238656"
> We are inserting a tiny amounts of data 32-64 bytes and seeing the issue 
> after only a couple 10k inserts. The amount of memory being used by C*/JVM is 
> no where near proportional to the amount data being inserted. Why is C* 
> consuming so much memory?
> Attached is a picture of the GC under one of the pathological tests. Keep in 
> mind we are only inserting 128KB - 256KB of data and we are almost hitting 
> the limit of the heap.
> GC flags:
> -XX:+UseThreadPriorities
> -XX:ThreadPriorityPolicy=42
> -Xms8192M
> -Xmx8192M
> -Xmn2048M
> -XX:+HeapDumpOnOutOfMemoryError
> -Xss180k
> -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1
> -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+UseTLAB
> Example schemas:
> Note: The type of collection or primitive type in the collection doesn't seem 
> to matter.
> {code}
> CREATE TABLE test.test (
> row_key text, 
> column_key uuid,
>  column_value list<int>, 
> PRIMARY KEY(row_key, column_key));
> CREATE TABLE test.test (
> row_key text, 
> column_key uuid, 
> column_value map<text, text>, 
> PRIMARY KEY(row_key, column_key));
> {code}
> Example inserts:
> Note: This issue is able to be replicated with extremely small inserts (a 
> well as larger ~1KB)
> {code}
> INSERT INTO test.test 
> (row_key, column_key, column_value)
> VALUES 
> ('0000000001', e0138677-7246-11e3-ac78-016ae7083d37, [0, 1, 2, 3]);
> INSERT INTO test.test 
> (row_key, column_key, column_value) 
> VALUES
> ('0000000022', 1ac5770a-7247-11e3-80e4-016ae7083d37, { 'a': 
> '0123456701234567012345670',  'b': '0123456701234567012345670' });
> {code}
> As a comparison, I was able to run the same tests with the following schema 
> with no issue:
> Note: This test was able to run at a much faster insertion speed, for much 
> longer and much bigger column sizes (1KB) without any GC issues.
> {code}
> CREATE TABLE test.test (
> row_key text, 
> column_key uuid, 
> column_value text, 
> PRIMARY KEY(row_key, column_key) )
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to