[ https://issues.apache.org/jira/browse/CASSANDRA-16769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370368#comment-17370368 ]
Scott Carey commented on CASSANDRA-16769: ----------------------------------------- Pull request: [https://github.com/apache/cassandra/pull/1087] The PR above is based on the change for https://issues.apache.org/jira/browse/CASSANDRA-16767 and https://issues.apache.org/jira/browse/CASSANDRA-16768 to avoid merge conflicts. I can change that if need be. Like the other two, I would have preferred to avoid locking all of the sstables, but such a change would be a much larger undertaking so it behaves the same as normal garbagecollect in that regard – even if only half the tables are garbagecollected, they will all be locked for the duration. I tested with the included unit test, and by manually running nodetool garbagecollect on a local cassandra instance (3.11.9) where I replaced the cassandra jar with one that I built from the changes. > Add an option to nodetool garbagecollect that collects only a fraction of the > data > ---------------------------------------------------------------------------------- > > Key: CASSANDRA-16769 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16769 > Project: Cassandra > Issue Type: Improvement > Reporter: Scott Carey > Assignee: Scott Carey > Priority: Normal > > nodetool garbagecollect can currently only run across an entire table. > For a very large table, with many use cases, the most likely tables to be > full of 'garbage' are the oldest tables. With both LCS and STCS, the tables > with the lowest generation number are, under normal operation, going to have > the majority of data that is masked by a tombstone or overwritten. > In order to make 'nodetool garbagecollect' more useful for such large tables, > I propose that we add an option `--oldest-fraction` that takes a floating > point value between 0.00 and 1.00, and only runs 'garbagecollect' over the > oldest SSTables that cover at least that fraction of data. > This would mean, for insatnce, that if you ran this with `--oldest-fraction > 0.1` every week, that no table would be older than 10 weeks old, and there > would exist no data that has been overwritten, TTL'd, or deleted that was > originally written more than 10 weeks ago. > In my use case, the oldest LCS table is about 20 months old if the table > operates in steady-state on Cassandra 3.11.x, but only 5% of the data in > tables that age has not been overwritten. This breaks some of the performance > promise of LCS – if your last level is 50% filled with overwritten data, then > your chance of finding data only in that level is significantly less than > advertised. > 'nodetool compact' is extremely expensive, and not conducive to any sort of > incremental operation currently. But nodetool garbagecollect run on a > fraction of the oldest data would be. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org