[jira] [Updated] (CASSANDRA-14279) Row Tombstones in separate sstables / separate compaction path

Constance Eustace (JIRA) Tue, 27 Feb 2018 11:45:26 -0800

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-14279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Constance Eustace updated CASSANDRA-14279:
------------------------------------------
    Component/s:     (was: Lifecycle)
                 Repair

> Row Tombstones in separate sstables / separate compaction path
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-14279
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14279
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Compaction, Local Write-Read Paths, Repair
>            Reporter: Constance Eustace
>            Priority: Major
>
> In my experience if data is not well organized into time windowed sstables, 
> cassandra has enormous difficulty in actually deleting data if the data has a 
> "medium term" lifetime. Or for example, you might have an active working set 
> and be archiving "unused" data to other tables or clusters. Or you may be 
> purging data. Or you may be migrating/sharding data. Whatever the case, you 
> want that disk space back. 
> In STCS and LCS, row tombstones are intermingled with column data and column 
> tombstones. But a row tombstone represents a big event: large amounts of 
> "droppable" data from an sstable, or even a shortcut from reading data from 
> other sstables.
> I am wondering that if row tombstones were isolated in their own sstables, 
> separately compacted and merged, that it might enable compaction to work more 
> efficiently: 
> reads can prioritize bloom filter lookups that indicate a row tombstone, 
> getting the timestamp of the deletion first, then can use that in the data 
> sstables to filter data or shortcircuit the data if the row data had an 
> overall "most recent data timestamp". 
> compaction could be forced to reference all the row tombstone sstables, such 
> that every time two or more "data" sstables are compacted, they must 
> reference the row tombstones to purge data. 
> In LCS, this would be particularly useful in getting data out of the upper 
> levels without having to wait for data to trickle up the tree. The row 
> tombstones, being read-only inputs into the data sstable compactions, can be 
> referenced in each of the LCS levels' parallel compactors. 
> Based on discussions in the dev list, this would appear to require some sort 
> of customization to the memtable->sstable flushing process, and perhaps a 
> different set of bloom filters. 
> Since the row tombstone sstables are all <rowkey>,<tombstone timestamp>, they 
> should be comparitively smaller and take less time to compact. They could be 
> aggressively compacted on a different schedule than "data" sstables. 
> In addition, it may be easier to repair/synchronize row tombstones across the 
> cluster if they have already been separated into their own sstables.
> Column/range tombstones may also benefit from a similar separation, but my 
> guess is those are much more numerous and large and fine-grained that they 
> might as well coexist with the data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-14279) Row Tombstones in separate sstables / separate compaction path

Reply via email to