[ https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634572#comment-14634572 ]
Stefania commented on CASSANDRA-7066: ------------------------------------- [~benedict] this is ready for another round of review, see [latest commit|https://github.com/stef1927/cassandra/commit/8be68fb0707215cc2818acfd02e5f6a6452148bf]. I've split the unit tests into two classes, the tests depending on SchemaLoader have been moved to RealTransactionsTest as they mimic real transactions such as compactions and flushing. These tests did not change, they were just moved. TransactionLogsTest now extends AbstractTransactionalTest. I'm not sure what you meanr by _perhaps multiple times, internally - or refactor to permit lists of states to be returned_. Which states exactly? I've left four comments in the code, just search for your name. They are mostly trivial except perhaps for the untrack comment in SSTableRewriter. We need to delete aborted writers files somewhere and I opted to keep untrack doing that except it no longer changes the log file contents, which concerned you. If you have a different preference, like letting the writer abort delete files as it did before just let me know. You will notice a couple of differences in SSTableReader related to the readMeter, these were most likely merge errors during a rebase and the code should now be the same as on trunk. I noticed them because of a failing unit test, IndexSummaryManagerTest. The TransactionLogs class is now pretty well encapsulated by LifecycleTransaction, I hope I didn't abuse this latter API too much. Only the methods dealing with deletions are still public (e.g. waitForDeletions), they are mostly used by test code and standalone tools. If we moved or encapsulated these methods too, the class would be pretty much package private except for SSTableReader.InstanceTidier. Let me know if you want to go down this route. It is just not very clear where to put InstanceTidier and the static deletion management methods, we would be back to an SSTableDeletingTask approach. > Simplify (and unify) cleanup of compaction leftovers > ---------------------------------------------------- > > Key: CASSANDRA-7066 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7066 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Benedict > Assignee: Stefania > Priority: Minor > Labels: benedict-to-commit, compaction > Fix For: 3.x > > Attachments: 7066.txt > > > Currently we manage a list of in-progress compactions in a system table, > which we use to cleanup incomplete compactions when we're done. The problem > with this is that 1) it's a bit clunky (and leaves us in positions where we > can unnecessarily cleanup completed files, or conversely not cleanup files > that have been superceded); and 2) it's only used for a regular compaction - > no other compaction types are guarded in the same way, so can result in > duplication if we fail before deleting the replacements. > I'd like to see each sstable store in its metadata its direct ancestors, and > on startup we simply delete any sstables that occur in the union of all > ancestor sets. This way as soon as we finish writing we're capable of > cleaning up any leftovers, so we never get duplication. It's also much easier > to reason about. -- This message was sent by Atlassian JIRA (v6.3.4#6332)