[ 
https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634572#comment-14634572
 ] 

Stefania commented on CASSANDRA-7066:
-------------------------------------

[~benedict] this is ready for another round of review, see [latest 
commit|https://github.com/stef1927/cassandra/commit/8be68fb0707215cc2818acfd02e5f6a6452148bf].

I've split the unit tests into two classes, the tests depending on SchemaLoader 
have been moved to RealTransactionsTest as they mimic real transactions such as 
compactions and flushing. These tests did not change, they were just moved. 
TransactionLogsTest now extends AbstractTransactionalTest. I'm not sure what 
you meanr by _perhaps multiple times, internally - or refactor to permit lists 
of states to be returned_. Which states exactly?

I've left four comments in the code, just search for your name. They are mostly 
trivial except perhaps for the untrack comment in SSTableRewriter. We need to 
delete aborted writers files somewhere and I opted to keep untrack doing that 
except it no longer changes the log file contents, which concerned you. If you 
have a different preference, like letting the writer abort delete files as it 
did before just let me know. 

You will notice a couple of differences in SSTableReader related to the 
readMeter, these were most likely merge errors during a rebase and the code 
should now be the same as on trunk. I noticed them because of a failing unit 
test, IndexSummaryManagerTest.

The TransactionLogs class is now pretty well encapsulated by 
LifecycleTransaction, I hope I didn't abuse this latter API too much. Only the 
methods dealing with deletions are still public (e.g. waitForDeletions), they 
are mostly used by test code and standalone tools. If we moved or encapsulated 
these methods too, the class would be pretty much package private except for 
SSTableReader.InstanceTidier. Let me know if you want to go down this route. It 
is just not very clear where to put InstanceTidier and the static deletion 
management methods, we would be back to an SSTableDeletingTask approach.

> Simplify (and unify) cleanup of compaction leftovers
> ----------------------------------------------------
>
>                 Key: CASSANDRA-7066
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7066
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Stefania
>            Priority: Minor
>              Labels: benedict-to-commit, compaction
>             Fix For: 3.x
>
>         Attachments: 7066.txt
>
>
> Currently we manage a list of in-progress compactions in a system table, 
> which we use to cleanup incomplete compactions when we're done. The problem 
> with this is that 1) it's a bit clunky (and leaves us in positions where we 
> can unnecessarily cleanup completed files, or conversely not cleanup files 
> that have been superceded); and 2) it's only used for a regular compaction - 
> no other compaction types are guarded in the same way, so can result in 
> duplication if we fail before deleting the replacements.
> I'd like to see each sstable store in its metadata its direct ancestors, and 
> on startup we simply delete any sstables that occur in the union of all 
> ancestor sets. This way as soon as we finish writing we're capable of 
> cleaning up any leftovers, so we never get duplication. It's also much easier 
> to reason about.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to