[ 
https://issues.apache.org/jira/browse/CASSANDRA-18118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-18118:
----------------------------------------
    Reviewers: Caleb Rackliffe, Caleb Rackliffe
               Caleb Rackliffe, Caleb Rackliffe  (was: Caleb Rackliffe)
       Status: Review In Progress  (was: Patch Available)

> Do not leak 2015 memtable synthetic Epoch
> -----------------------------------------
>
>                 Key: CASSANDRA-18118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18118
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Memtable
>            Reporter: Berenguer Blasi
>            Assignee: Berenguer Blasi
>            Priority: Normal
>             Fix For: 3.11.x, 4.0.x
>
>
> This 
> [Epoch|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/rows/EncodingStats.java#L48]
>  can 
> [leak|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/Memtable.java#L392]
>  affecting all the timestamps logic.  It has been observed in a production 
> env it can i.e. prevent proper sstable and tombstone cleanup.
> To reproduce create the following table:
> {noformat}
> drop keyspace test;
> create keyspace test WITH replication = {'class':'SimpleStrategy', 
> 'replication_factor' : 1};
> CREATE TABLE test.test (
>     key text PRIMARY KEY,
>     id text
> ) WITH bloom_filter_fp_chance = 0.01
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>     AND comment = ''
>     AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '2', 'tombstone_compaction_interval': 
> '3000', 'tombstone_threshold': '0.1', 'unchecked_tombstone_compaction': 
> 'true'}
>     AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND crc_check_chance = 1.0
>     AND dclocal_read_repair_chance = 0.0
>     AND default_time_to_live = 10
>     AND gc_grace_seconds = 10
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99PERCENTILE';
> CREATE INDEX id_idx ON test.test (id);
> {noformat}
> And stress load it with:
> {noformat}
> insert into test.test (key,id) values('$RANDOM_UUID $RANDOM_UUID', 
> 'eaca36a1-45f1-469c-a3f6-3ba54220363f') USING TTL 10
> {noformat}
> Notice how all inserts have a 10s TTL, the default 10s TTL and gc_grace is 
> also at 10s. This is to speed up the repro:
> - Run the load for a couple minutes and track sstables disk usage. You will 
> see it does only increase, nothing gets cleaned up and it doesn't stop 
> growing (notice all this is well past the 10s gc_grace and TTL)
> - Running a flush and a compaction while under load against the keyspace, 
> table or index doesn't solve the issue.
> - Stopping the load and running a compaction doesn't solve the issue. 
> Flushing does though.
> - On the original observation where TTL was around 600s and gc_grace around 
> 1800s we could get GBs of sstables that weren't cleaned up or compacted away 
> after hours of work.
> - Reproduction can also happen on plain sstables by repeatedly 
> inserting/deleting/overwriting the same values over and over again without 2i 
> indices or TTL being involved.
> The problem seems to be 
> [EncodingStats|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/rows/EncodingStats.java#L48]
>  using a synthetic Epoch in 2015 which plays nice with Vint serialization.  
> Unfortunately {{Memtable}} is using that to keep track of the 
> {{minTimestamp}} which can leak the 2015 Epoch. This confuses any logic 
> consuming that timestamp. In this particular case purge and fully expired 
> sstables weren't properly detected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to