[ https://issues.apache.org/jira/browse/CASSANDRA-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aleksey Yeschenko updated CASSANDRA-9661: ----------------------------------------- Fix Version/s: 3.0.x 2.2.x 2.1.x > Endless compaction to a tiny, tombstoned SStable > ------------------------------------------------ > > Key: CASSANDRA-9661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9661 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Reporter: WeiFan > Assignee: Yuki Morishita > Labels: compaction, dtcs > Fix For: 2.1.x, 2.2.x, 3.0.x > > > We deployed a 3-nodes cluster (with 2.1.5) which worked under stable write > requests ( about 2k wps) to a CF with DTCS, a default TTL as 43200s and > gc_grace as 21600s. The CF contained inserted only, complete time series > data. We found cassandra will occasionally keep writing logs like this: > INFO [CompactionExecutor:30551] 2015-06-26 18:10:06,195 > CompactionTask.java:270 - Compacted 1 sstables to > [/home/cassandra/workdata/data/sen_vaas_test/nodestatus-f96c7c50155811e589f69752ac9b06c7/sen_vaas_test-nodestatus-ka-2516270,]. > 449 bytes to 449 (~100% of original) in 12ms = 0.035683MB/s. 4 total > partitions merged to 4. Partition merge counts were {1:4, } > INFO [CompactionExecutor:30551] 2015-06-26 18:10:06,241 > CompactionTask.java:140 - Compacting > [SSTableReader(path='/home/cassandra/workdata/data/sen_vaas_test/nodestatus-f96c7c50155811e589f69752ac9b06c7/sen_vaas_test-nodestatus-ka-2516270-Data.db')] > INFO [CompactionExecutor:30551] 2015-06-26 18:10:06,253 > CompactionTask.java:270 - Compacted 1 sstables to > [/home/cassandra/workdata/data/sen_vaas_test/nodestatus-f96c7c50155811e589f69752ac9b06c7/sen_vaas_test-nodestatus-ka-2516271,]. > 449 bytes to 449 (~100% of original) in 12ms = 0.035683MB/s. 4 total > partitions merged to 4. Partition merge counts were {1:4, } > It seems that cassandra kept doing compacting to a single SStable, serveral > times per second, and lasted for many hours. Tons of logs were thrown and one > CPU core exhausted during this time. The endless compacting finally end when > another compaction started with a group of SStables (including previous one). > All of our 3 nodes have been hit by this problem, but occurred in different > time. > We could not figure out how the problematic SStable come up because the log > has wrapped around. > We have dumped the records in the SStable and found it has the oldest data in > our CF (again, our data was time series), and all of the record in this > SStable have bben expired for more than 18 hours (12 hrs TTL + 6 hrs gc) so > they should be dropped. However, c* do nothing to this SStable but compacting > it again and again, until more SStable were out-dated enough to be considered > for compacting together with this one by DTCS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)