[ https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645489#comment-14645489 ]
Stefania commented on CASSANDRA-7066: ------------------------------------- [~benedict] first basic single log file version is available on [this branch|https://github.com/stef1927/cassandra/tree/7066-b]. I wait to hear from you regarding adding CRCs and update times. Here is the write-up that I've added to NEWS.txt: {quote} New transaction log files have been introduced to replace the compactions_in_progress system table. They control the sstable files involved in compactions and other operations such as flushing and streaming. Use the sstablelister tool to list any sstable files currently involved in operations not yet completed, which we define as temporary files. A transaction log file contains one sstable per line, with the prefix "add:" or "remove:". They also contain a final special line "commit", only inserted when the transaction is committed. On startup we use these files to cleanup any partial transactions that were in progress when the process exited. If the commit line is found, we keep new "add" prefix sstables and delete the old "remove" prefix sstables, vice-versa if the commit line is missing. Should you loose or delete these log files, both old and new sstable files will be kept as live files, which will result in duplicated sstables. Should you manually edit these files and remove or add the commit line for example, then this would change which sstable files are retained on startup. See CASSANDRA-7066 for full details. {quote} > Simplify (and unify) cleanup of compaction leftovers > ---------------------------------------------------- > > Key: CASSANDRA-7066 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7066 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Benedict > Assignee: Stefania > Priority: Minor > Labels: benedict-to-commit, compaction > Fix For: 3.0 alpha 1 > > Attachments: 7066.txt > > > Currently we manage a list of in-progress compactions in a system table, > which we use to cleanup incomplete compactions when we're done. The problem > with this is that 1) it's a bit clunky (and leaves us in positions where we > can unnecessarily cleanup completed files, or conversely not cleanup files > that have been superceded); and 2) it's only used for a regular compaction - > no other compaction types are guarded in the same way, so can result in > duplication if we fail before deleting the replacements. > I'd like to see each sstable store in its metadata its direct ancestors, and > on startup we simply delete any sstables that occur in the union of all > ancestor sets. This way as soon as we finish writing we're capable of > cleaning up any leftovers, so we never get duplication. It's also much easier > to reason about. -- This message was sent by Atlassian JIRA (v6.3.4#6332)