[ https://issues.apache.org/jira/browse/CASSANDRA-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359995#comment-14359995 ]
Aleksey Yeschenko commented on CASSANDRA-8961: ---------------------------------------------- CASSANDRA-8099 will make that query not use range tombstones, but you'll have to wait for 3.0 to get that. > Data rewrite case causes almost non-functional compaction > --------------------------------------------------------- > > Key: CASSANDRA-8961 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8961 > Project: Cassandra > Issue Type: Bug > Environment: Centos 6.6, Cassandra 2.0.12 (Also seen in Cassandra 2.1) > Reporter: Dan Kinder > Priority: Minor > > There seems to be a bug of some kind where compaction grinds to a halt in > this use case: from time to time we have a set of rows we need to "migrate", > changing their primary key by deleting the row and inserting a new row with > the same partition key and different cluster key. The python script below > demonstrates this; it takes a bit of time to run (didn't try to optimize it) > but when it's done it will be trying to compact a few hundred megs of data > for a long time... on the order of days, or it will never finish. > Not verified by this sandboxed experiment but it seems that compression > settings do not matter and that this seems to happen to STCS as well, not > just LCS. I am still testing if other patterns cause this terrible compaction > performance, like deleting all rows then inserting or vice versa. > Even if it isn't a "bug" per se, is there a way to fix or work around this > behavior? > {code} > import string > import random > from cassandra.cluster import Cluster > cluster = Cluster(['localhost']) > db = cluster.connect('walker') > db.execute("DROP KEYSPACE IF EXISTS trial") > db.execute("""CREATE KEYSPACE trial > WITH REPLICATION = { 'class': 'SimpleStrategy', > 'replication_factor': 1 }""") > db.execute("""CREATE TABLE trial.tbl ( > pk text, > data text, > PRIMARY KEY(pk, data) > ) WITH compaction = { 'class' : 'LeveledCompactionStrategy' } > AND compression = {'sstable_compression': ''}""") > # Number of rows to insert and "move" > n = 200000 > > # Insert n rows with the same partition key, 1KB of unique data in cluster key > for i in range(n): > db.execute("INSERT INTO trial.tbl (pk, data) VALUES ('thepk', %s)", > [str(i).zfill(1024)]) > # Update those n rows, deleting each and replacing with a very similar row > for i in range(n): > val = str(i).zfill(1024) > db.execute("DELETE FROM trial.tbl WHERE pk = 'thepk' AND data = %s", > [val]) > db.execute("INSERT INTO trial.tbl (pk, data) VALUES ('thepk', %s)", ["1" > + val]) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)