[ 
https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510511#comment-14510511
 ] 

Stefania commented on CASSANDRA-7066:
-------------------------------------

Hi [~benedict], should this be implemented over the new transactional framework 
(CASSANDRA-8984) or is it OK to start on trunk? I don't mind working from your 
8984-alt branch, so long as it is stable enough in terms of code review.

Here is how I plan to approach this, please feel free to suggest alternatives 
(e.g. different names or different operations to start from):

- Ultimately we need to touch all operations involing TMP files, but I will 
start with a regular compaction.

- We will create a new sub-directory in the CF data folder called "operations"
- In this folder for each operation we create two files:
-- <operation_name>_<uuid>_NEW.log, containing all the new file names and the 
OLD log file
-- <operation_name>_<uuid>_OLD.log, containing all the existing file names and 
the NEW log file
-- File format is text and paths are relative to the CF data folder.

- For regular compaction:
-- During regular compaction we make sure to create the transaction log files 
and add new and old file names to them
-- SSTableWriters will be created with FINAL descriptor types, so no tmp files 
are created.
-- For OPEN EARLY, descriptor type TMPLINK can be removed and replaced by a 
FINAL descriptor type (right?)
-- If compaction finishes successfully we delete the NEW log file and we need 
to somehow change SSTableDeletingTask to delete the OLD log file when all the 
old files have been deleted
-- If compaction is aborted we delete the contents of the NEW log file and the 
log file itself (after replacing any OPEN EARLY tables)
-- On start-up CassandraDaemon.setup() will read the operations folders for 
each cf and if we find the NEW log we delete all of its contents, including the 
OLD log, whereas if we find the OLD log we delete its contents.
- The table SYSTEM.COMPACTION_IN_PROGRESS can be removed entirely (any issues 
on upgrading from old cassandra versions?)

- Next should be flushing following by scrubbing and other smaller operations
- When all operations using TMP links have been changed to use transaction 
logs, Descriptor.Type.TMP can be removed (the scope is quite big however).

Does this sound reasonable or should I perhaps start from a simpler operation 
(e.g. scrubbing with which I am more familiar) to test drive the transaction 
logs first?

> Simplify (and unify) cleanup of compaction leftovers
> ----------------------------------------------------
>
>                 Key: CASSANDRA-7066
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7066
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Stefania
>            Priority: Minor
>              Labels: compaction
>             Fix For: 3.0
>
>
> Currently we manage a list of in-progress compactions in a system table, 
> which we use to cleanup incomplete compactions when we're done. The problem 
> with this is that 1) it's a bit clunky (and leaves us in positions where we 
> can unnecessarily cleanup completed files, or conversely not cleanup files 
> that have been superceded); and 2) it's only used for a regular compaction - 
> no other compaction types are guarded in the same way, so can result in 
> duplication if we fail before deleting the replacements.
> I'd like to see each sstable store in its metadata its direct ancestors, and 
> on startup we simply delete any sstables that occur in the union of all 
> ancestor sets. This way as soon as we finish writing we're capable of 
> cleaning up any leftovers, so we never get duplication. It's also much easier 
> to reason about.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to