[ https://issues.apache.org/jira/browse/CASSANDRA-14279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Constance Eustace updated CASSANDRA-14279: ------------------------------------------ Component/s: (was: Lifecycle) Repair > Row Tombstones in separate sstables / separate compaction path > -------------------------------------------------------------- > > Key: CASSANDRA-14279 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14279 > Project: Cassandra > Issue Type: Improvement > Components: Compaction, Local Write-Read Paths, Repair > Reporter: Constance Eustace > Priority: Major > > In my experience if data is not well organized into time windowed sstables, > cassandra has enormous difficulty in actually deleting data if the data has a > "medium term" lifetime. Or for example, you might have an active working set > and be archiving "unused" data to other tables or clusters. Or you may be > purging data. Or you may be migrating/sharding data. Whatever the case, you > want that disk space back. > In STCS and LCS, row tombstones are intermingled with column data and column > tombstones. But a row tombstone represents a big event: large amounts of > "droppable" data from an sstable, or even a shortcut from reading data from > other sstables. > I am wondering that if row tombstones were isolated in their own sstables, > separately compacted and merged, that it might enable compaction to work more > efficiently: > reads can prioritize bloom filter lookups that indicate a row tombstone, > getting the timestamp of the deletion first, then can use that in the data > sstables to filter data or shortcircuit the data if the row data had an > overall "most recent data timestamp". > compaction could be forced to reference all the row tombstone sstables, such > that every time two or more "data" sstables are compacted, they must > reference the row tombstones to purge data. > In LCS, this would be particularly useful in getting data out of the upper > levels without having to wait for data to trickle up the tree. The row > tombstones, being read-only inputs into the data sstable compactions, can be > referenced in each of the LCS levels' parallel compactors. > Based on discussions in the dev list, this would appear to require some sort > of customization to the memtable->sstable flushing process, and perhaps a > different set of bloom filters. > Since the row tombstone sstables are all <rowkey>,<tombstone timestamp>, they > should be comparitively smaller and take less time to compact. They could be > aggressively compacted on a different schedule than "data" sstables. > In addition, it may be easier to repair/synchronize row tombstones across the > cluster if they have already been separated into their own sstables. > Column/range tombstones may also benefit from a similar separation, but my > guess is those are much more numerous and large and fine-grained that they > might as well coexist with the data. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org