[ 
https://issues.apache.org/jira/browse/CASSANDRA-18507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tobias Lindaaker updated CASSANDRA-18507:
-----------------------------------------
    Description: 
If there isn't enough disk space available to compact all existing sstables, 
Cassandra will attempt to perform a partial compaction by removing sstables 
from the set of candidate sstables to be compacted, starting with the largest 
one. It is possible that the sstable removed from the set of sstables to 
compact contains data for which there are tombstones in another (more recent) 
sstable. Since the overlaps between sstables is computed when the 
{{CompactionController}} is created, and the {{CompactionController}} is 
created before the removal of any sstables from the set of sstables to be 
compacted this computed overlap will be outdated when checking which sstables 
are covered by certain tombstones. This leads to the faulty conclusion that the 
tombstones can be pruned during the compaction, causing the data to be 
resurrected.

The issue is present in Cassandra 4.0 and 4.1. Cassandra 3.11 creates the 
{{CompactionController}} after the set of sstables to compact has been reduced, 
and is thus not affected. {{trunk}} does not appear to support partial 
compactions at all, but instead refuses to compact when the disk is full.

This regression appears to have been introduced by CASSANDRA-13068.

  was:
If there isn't enough disk space available to compact all existing sstables, 
Cassandra will attempt to perform a partial compaction by removing sstables 
from the set of candidate sstables to be compacted, starting with the largest 
one. It is possible that the sstable removed from the set of sstables to 
compact contains data for which there are tombstones in another (more recent) 
sstable. Since the overlaps between sstables is computed when the 
{{CompactionController}} is created, and the {{CompactionController}} is 
created before the removal of any sstables from the set of sstables to be 
compacted this computed overlap will be outdated when checking which sstables 
are covered by certain tombstones. This leads to the faulty conclusion that the 
tombstones can be pruned during the compaction, causing the data to be 
resurrected.

The issue is present in Cassandra 4.0 and 4.1. Cassandra 3.11 creates the 
{{CompactionController}} after the set of sstables to compact has been reduced, 
and is thus not affected. {{trunk}} does not appear to support partial 
compactions at all, but instead refuses to compact when the disk is full.

This regression appears to have been introduced by CASSANDRA-18507.


> Partial compaction can resurrect deleted data
> ---------------------------------------------
>
>                 Key: CASSANDRA-18507
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18507
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Compaction
>            Reporter: Tobias Lindaaker
>            Assignee: Tobias Lindaaker
>            Priority: Normal
>
> If there isn't enough disk space available to compact all existing sstables, 
> Cassandra will attempt to perform a partial compaction by removing sstables 
> from the set of candidate sstables to be compacted, starting with the largest 
> one. It is possible that the sstable removed from the set of sstables to 
> compact contains data for which there are tombstones in another (more recent) 
> sstable. Since the overlaps between sstables is computed when the 
> {{CompactionController}} is created, and the {{CompactionController}} is 
> created before the removal of any sstables from the set of sstables to be 
> compacted this computed overlap will be outdated when checking which sstables 
> are covered by certain tombstones. This leads to the faulty conclusion that 
> the tombstones can be pruned during the compaction, causing the data to be 
> resurrected.
> The issue is present in Cassandra 4.0 and 4.1. Cassandra 3.11 creates the 
> {{CompactionController}} after the set of sstables to compact has been 
> reduced, and is thus not affected. {{trunk}} does not appear to support 
> partial compactions at all, but instead refuses to compact when the disk is 
> full.
> This regression appears to have been introduced by CASSANDRA-13068.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to