[ 
https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645489#comment-14645489
 ] 

Stefania commented on CASSANDRA-7066:
-------------------------------------

[~benedict] first basic single log file version is available on [this 
branch|https://github.com/stef1927/cassandra/tree/7066-b]. I wait to hear from 
you regarding adding CRCs and update times.

Here is the write-up that I've added to NEWS.txt:
 
{quote}
     New transaction log files have been introduced to replace the 
compactions_in_progress
     system table. They control the sstable files involved in compactions and 
other operations
     such as flushing and streaming. Use the sstablelister tool to list any 
sstable files
     currently involved in operations not yet completed, which we define as 
temporary files.
     A transaction log file contains one sstable per line, with the prefix 
"add:" or "remove:".
     They also contain a final special line "commit", only inserted when the 
transaction is committed.
     On startup we use these files to cleanup any partial transactions that 
were in progress
     when the process exited. If the commit line is found, we keep new "add" 
prefix sstables and
     delete the old "remove" prefix sstables, vice-versa if the commit line is 
missing.
     Should you loose or delete these log files, both old and new sstable files 
will be kept
     as live files, which will result in duplicated sstables. Should you 
manually edit these
     files and remove or add the commit line for example, then this would 
change which sstable
     files are retained on startup. See CASSANDRA-7066 for full details.
{quote}

> Simplify (and unify) cleanup of compaction leftovers
> ----------------------------------------------------
>
>                 Key: CASSANDRA-7066
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7066
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Stefania
>            Priority: Minor
>              Labels: benedict-to-commit, compaction
>             Fix For: 3.0 alpha 1
>
>         Attachments: 7066.txt
>
>
> Currently we manage a list of in-progress compactions in a system table, 
> which we use to cleanup incomplete compactions when we're done. The problem 
> with this is that 1) it's a bit clunky (and leaves us in positions where we 
> can unnecessarily cleanup completed files, or conversely not cleanup files 
> that have been superceded); and 2) it's only used for a regular compaction - 
> no other compaction types are guarded in the same way, so can result in 
> duplication if we fail before deleting the replacements.
> I'd like to see each sstable store in its metadata its direct ancestors, and 
> on startup we simply delete any sstables that occur in the union of all 
> ancestor sets. This way as soon as we finish writing we're capable of 
> cleaning up any leftovers, so we never get duplication. It's also much easier 
> to reason about.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to