[ 
https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14578747#comment-14578747
 ] 

Benedict commented on CASSANDRA-7066:
-------------------------------------

I've pushed a few suggestions 
[here|https://github.com/belliottsmith/cassandra/tree/7066-suggestions]. Mostly 
just simplifying the creation and running of SSTableDeletingTask following on 
from some of your cleanup. 

However, I do wonder if we shouldn't try and eliminate it altogether. We could, 
for instance, move the deletion work all into the TransactionLogs class, 
wherein a Set of sstables we are obsoleting is maintained; on releasing a 
reader we ask the TxnLogs to delete us from the disk and from the set (instead 
of releasing a ref). If any of the deletions fail, then once all files have 
been deleted the whole TxnLog can be enqueued to a "retry" queue (instead of 
SSTableDeletingTask). This would just repeatedly try the whole log cleanup, 
just as we do on startup. This seems like it would lead to an easier to 
understand lifecycle, and would also manage more clearly the problem you 
mention above of competing deletes (since it would be managed by the same 
TxnLog, ultimately). We can also rid ourselves of SSTableDeletingTask entirely, 
which I think would be nice.

Either way, the current logic looks like it could lead to files not being 
deleted that should be, since we release our reference to the TxnLog as soon as 
we first try to delete, but the delete can fail, and be retried later (so if we 
crash before successful delete, we may startup thinking this file is to keep). 

A few other queries:

* Why does SSTableWriter.finish() take a TxnLog? For convenience/clarity? 
Perhaps we should have a wrapped SSTableWriter that encapsulates this, so we 
don't confuse matters when we have another encapsulation (Rewriter, etc)?
* I wonder if it might be more robust to only log the descriptors (except the 
opposing log file, which would be a slight special case, but acceptable since 
it would be logged first, so easily disambiguated). Just seems like we may be 
less likely to have some mistake that leaves components lying around that we 
should have cleaned up.
* On upgrade I guess we should drop the compactions_in_progress table. 
[~iamaleksey]: i'm not at all familiar with cross version changes like this. 
Could you fill us in on protocol? Presumably this will be affected by downgrade 
support, which I don't think it planned anytime soon.
* Are we going to upset users by deleting some of their sstables when their CQL 
sstable writer operation fails? Previously any written before close() completed 
would continue to exist, whereas now if there is a failure prior to close they 
will be deleted. So each CQLSSTableWriter creation effectively functions as a 
transaction, whereas before it would permit partial results. Any lurkers got 
any opinions? [~jshook] [~jeromatron] [~tupshin]?

I'm really looking forward to this change going in. It cleans up quite a lot in 
one fell swoop (tmp, tmplink, compaction leftovers, sstable deletion...)

> Simplify (and unify) cleanup of compaction leftovers
> ----------------------------------------------------
>
>                 Key: CASSANDRA-7066
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7066
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Stefania
>            Priority: Minor
>              Labels: compaction
>             Fix For: 3.x
>
>         Attachments: 7066.txt
>
>
> Currently we manage a list of in-progress compactions in a system table, 
> which we use to cleanup incomplete compactions when we're done. The problem 
> with this is that 1) it's a bit clunky (and leaves us in positions where we 
> can unnecessarily cleanup completed files, or conversely not cleanup files 
> that have been superceded); and 2) it's only used for a regular compaction - 
> no other compaction types are guarded in the same way, so can result in 
> duplication if we fail before deleting the replacements.
> I'd like to see each sstable store in its metadata its direct ancestors, and 
> on startup we simply delete any sstables that occur in the union of all 
> ancestor sets. This way as soon as we finish writing we're capable of 
> cleaning up any leftovers, so we never get duplication. It's also much easier 
> to reason about.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to