[ https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14578747#comment-14578747 ]
Benedict commented on CASSANDRA-7066: ------------------------------------- I've pushed a few suggestions [here|https://github.com/belliottsmith/cassandra/tree/7066-suggestions]. Mostly just simplifying the creation and running of SSTableDeletingTask following on from some of your cleanup. However, I do wonder if we shouldn't try and eliminate it altogether. We could, for instance, move the deletion work all into the TransactionLogs class, wherein a Set of sstables we are obsoleting is maintained; on releasing a reader we ask the TxnLogs to delete us from the disk and from the set (instead of releasing a ref). If any of the deletions fail, then once all files have been deleted the whole TxnLog can be enqueued to a "retry" queue (instead of SSTableDeletingTask). This would just repeatedly try the whole log cleanup, just as we do on startup. This seems like it would lead to an easier to understand lifecycle, and would also manage more clearly the problem you mention above of competing deletes (since it would be managed by the same TxnLog, ultimately). We can also rid ourselves of SSTableDeletingTask entirely, which I think would be nice. Either way, the current logic looks like it could lead to files not being deleted that should be, since we release our reference to the TxnLog as soon as we first try to delete, but the delete can fail, and be retried later (so if we crash before successful delete, we may startup thinking this file is to keep). A few other queries: * Why does SSTableWriter.finish() take a TxnLog? For convenience/clarity? Perhaps we should have a wrapped SSTableWriter that encapsulates this, so we don't confuse matters when we have another encapsulation (Rewriter, etc)? * I wonder if it might be more robust to only log the descriptors (except the opposing log file, which would be a slight special case, but acceptable since it would be logged first, so easily disambiguated). Just seems like we may be less likely to have some mistake that leaves components lying around that we should have cleaned up. * On upgrade I guess we should drop the compactions_in_progress table. [~iamaleksey]: i'm not at all familiar with cross version changes like this. Could you fill us in on protocol? Presumably this will be affected by downgrade support, which I don't think it planned anytime soon. * Are we going to upset users by deleting some of their sstables when their CQL sstable writer operation fails? Previously any written before close() completed would continue to exist, whereas now if there is a failure prior to close they will be deleted. So each CQLSSTableWriter creation effectively functions as a transaction, whereas before it would permit partial results. Any lurkers got any opinions? [~jshook] [~jeromatron] [~tupshin]? I'm really looking forward to this change going in. It cleans up quite a lot in one fell swoop (tmp, tmplink, compaction leftovers, sstable deletion...) > Simplify (and unify) cleanup of compaction leftovers > ---------------------------------------------------- > > Key: CASSANDRA-7066 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7066 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Benedict > Assignee: Stefania > Priority: Minor > Labels: compaction > Fix For: 3.x > > Attachments: 7066.txt > > > Currently we manage a list of in-progress compactions in a system table, > which we use to cleanup incomplete compactions when we're done. The problem > with this is that 1) it's a bit clunky (and leaves us in positions where we > can unnecessarily cleanup completed files, or conversely not cleanup files > that have been superceded); and 2) it's only used for a regular compaction - > no other compaction types are guarded in the same way, so can result in > duplication if we fail before deleting the replacements. > I'd like to see each sstable store in its metadata its direct ancestors, and > on startup we simply delete any sstables that occur in the union of all > ancestor sets. This way as soon as we finish writing we're capable of > cleaning up any leftovers, so we never get duplication. It's also much easier > to reason about. -- This message was sent by Atlassian JIRA (v6.3.4#6332)