[ 
https://issues.apache.org/jira/browse/CASSANDRA-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11684:
-------------------------------------------
    Description: 
Currently cleanup is considered an optional, manual operation that users are 
told to run to free disk space after a node was affected by topology changes. 
However, unmanaged key ranges could also end up on a node through other ways, 
e.g. manual added sstable files by an admin. 

I'm also not sure unmanaged data is really that harmless and cleanup should 
really be optional, if you don't need to reclaim the disk space. When it comes 
to repairs, users are expected to purge a node after downtime in case it was 
not fully covered by a repair within gc_grace afterwards, in order to avoid 
re-introducing deleted data. But the same could happen with unmanaged data, 
e.g. after topology changes activate unmanaged ranges again or after restoring 
backups.

I'd therefor suggest to avoid rewriting key ranges no longer belonging to a 
node and older than gc_grace during compactions. 

Maybe we could also introduce another CLEANUP_COMPACTION operation to find 
candidates based on SSTable.first/last in case we don't have pending regular or 
tombstone compactions.

  was:
Currently cleanup is considered an optional, manual operation that users are 
told to run to free disk space after a node was affected by topology changes. 
However, unmanaged key ranges could also end up on a node through other ways, 
e.g. manual added sstable files by an admin or over streaming during repairs. 

I'm also not sure unmanaged data is really that harmless and cleanup should 
really be optional, if you don't need to reclaim the disk space. When it comes 
to repairs, users are expected to purge a node after downtime in case it was 
not fully covered by a repair within gc_grace afterwards, in order to avoid 
re-introducing deleted data. But the same could happen with unmanaged data, 
e.g. after topology changes activate unmanaged ranges again or after restoring 
backups.

I'd therefor suggest to avoid rewriting key ranges no longer belonging to a 
node and older than gc_grace during compactions. 

Maybe we could also introduce another CLEANUP_COMPACTION operation to find 
candidates based on SSTable.first/last in case we don't have pending regular or 
tombstone compactions.


> Cleanup key ranges during compaction
> ------------------------------------
>
>                 Key: CASSANDRA-11684
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11684
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Compaction
>            Reporter: Stefan Podkowinski
>            Assignee: Stefan Podkowinski
>
> Currently cleanup is considered an optional, manual operation that users are 
> told to run to free disk space after a node was affected by topology changes. 
> However, unmanaged key ranges could also end up on a node through other ways, 
> e.g. manual added sstable files by an admin. 
> I'm also not sure unmanaged data is really that harmless and cleanup should 
> really be optional, if you don't need to reclaim the disk space. When it 
> comes to repairs, users are expected to purge a node after downtime in case 
> it was not fully covered by a repair within gc_grace afterwards, in order to 
> avoid re-introducing deleted data. But the same could happen with unmanaged 
> data, e.g. after topology changes activate unmanaged ranges again or after 
> restoring backups.
> I'd therefor suggest to avoid rewriting key ranges no longer belonging to a 
> node and older than gc_grace during compactions. 
> Maybe we could also introduce another CLEANUP_COMPACTION operation to find 
> candidates based on SSTable.first/last in case we don't have pending regular 
> or tombstone compactions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to