[ 
https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649193#comment-14649193
 ] 

Benedict commented on CASSANDRA-7066:
-------------------------------------

Thanks. Looks like great work as always. 

A few comments (I haven't done a full review):

* The last modified sort order was necessary for the same reason a full 
checksum of all of the old files is too heavy handed: restarting in the middle 
of deleting the files. We want to finish deleting them, and if we take the max 
timestamp we can say that will safely stay the same. We could keep the full 
checksum and include a count; if the count on disk is lower, and our max 
timestamp matches, we can proceed; if the count is the same we require a 
checksum match. This may or may not be overkill, but there's on harm in 
paranoia here.
* It looks like serialization of the sstables now assumes the ancestors won't 
be present, but this will break upgrade. We need to check the sstable version, 
and at least skip the ancestors if they're present
* The compaction metadata equality is probably not sensibly employed anywhere; 
we should just confirm this and remove it IMO since it makes very little sense, 
especially now that the ancestors have been removed
* I think you took my comments about panic warnings a little too literally. We 
should just use strong language to make it clear to the operator we've detected 
disk corruption of some kind, and have taken the most pessimistic course of 
action
** This warning should be less severe if only the last line is _incomplete_ 
(and all other lines are parsed correctly), as this would be expected if we 
crashed part way through serialization
** We should perhaps not worry at all, and continue if this last situation 
occurs, in the event that _all_ files logged are still present on the file 
system, and our other metadata all matches for them. In this case we can be 
very confident we just shutdown in the middle of writing a record, and can just 
clean up all of the new files. This should make us robust both to the scary 
situation, but also minimise the pessimism, hopefully giving us the best of 
both worlds.

WDYT?

> Simplify (and unify) cleanup of compaction leftovers
> ----------------------------------------------------
>
>                 Key: CASSANDRA-7066
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7066
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Stefania
>            Priority: Minor
>              Labels: benedict-to-commit, compaction
>             Fix For: 3.0 alpha 1
>
>         Attachments: 7066.txt
>
>
> Currently we manage a list of in-progress compactions in a system table, 
> which we use to cleanup incomplete compactions when we're done. The problem 
> with this is that 1) it's a bit clunky (and leaves us in positions where we 
> can unnecessarily cleanup completed files, or conversely not cleanup files 
> that have been superceded); and 2) it's only used for a regular compaction - 
> no other compaction types are guarded in the same way, so can result in 
> duplication if we fail before deleting the replacements.
> I'd like to see each sstable store in its metadata its direct ancestors, and 
> on startup we simply delete any sstables that occur in the union of all 
> ancestor sets. This way as soon as we finish writing we're capable of 
> cleaning up any leftovers, so we never get duplication. It's also much easier 
> to reason about.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to