[ https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14680071#comment-14680071 ]
Benedict commented on CASSANDRA-7066: ------------------------------------- Sorry for the review delay. It's looking good. I've pushed a few minor stylistic suggestions [here|github.com/belliottsmith/cassandra/7066-b-suggestions] There are a couple of functional things that have occurred to me, though, on reading the code. # Should we perhaps have an ABORT record type as well, to ensure we don't accidentally introduce a bug where we think we've aborted but are still using the transaction? This is not important, but also not very hard to add, and it just looks a little unsafe to me, the abort() method, in contrast to commit (I realise it is not necessary given correct usage). # If we fail to read a log file, we may erroneously categorise files, and since we sometimes read them while the system is live we should probably retry a few times, to make sure we aren't reading a log file we're in the middle of mutating # I think we should also reassess just a little how we list data files and filter them with the log files. I have a lengthy comment explaining some (admittedly unlikely) race conditions, which could also affect the prior system, but I would prefer our new system were not at risk. I think we can make ourselves robust to any races by: ## Putting our log files in the same directory as our data files (this also makes them more visible to users, especially if we ensure they sort last) ## Listing the directory contents once, and reading all the log files that occur in the listing, and using these to filter the regular files in that same listing ## Retrying if for any reason we fail (especially, say, if a log file has been deleted by the time we read it) Sound reasonable? > Simplify (and unify) cleanup of compaction leftovers > ---------------------------------------------------- > > Key: CASSANDRA-7066 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7066 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Benedict > Assignee: Stefania > Priority: Minor > Labels: benedict-to-commit, compaction > Fix For: 3.0 alpha 1 > > Attachments: 7066.txt > > > Currently we manage a list of in-progress compactions in a system table, > which we use to cleanup incomplete compactions when we're done. The problem > with this is that 1) it's a bit clunky (and leaves us in positions where we > can unnecessarily cleanup completed files, or conversely not cleanup files > that have been superceded); and 2) it's only used for a regular compaction - > no other compaction types are guarded in the same way, so can result in > duplication if we fail before deleting the replacements. > I'd like to see each sstable store in its metadata its direct ancestors, and > on startup we simply delete any sstables that occur in the union of all > ancestor sets. This way as soon as we finish writing we're capable of > cleaning up any leftovers, so we never get duplication. It's also much easier > to reason about. -- This message was sent by Atlassian JIRA (v6.3.4#6332)