[ https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648448#comment-14648448 ]
Robert Coli edited comment on CASSANDRA-7066 at 7/30/15 11:08 PM: ------------------------------------------------------------------ (apologies is this is sufficiently covered above in the giant list of comments, I see that [~tjake] mentions the "refresh" case and that there is related discussion, but the specifics don't seem addressed..) [~krummas] and [~yukim] and I discussed, in IRC, the following edge case regarding storing of ancestors : 1) NODE A compacts sstables 1 and 2 into sstable 3. 3 gets ancestor value "1,2". 2) sstable is copied into NODE B's data directory and NODE B is restarted OR sstable is copied and NODE B runs "nodetool refresh" (which doesn't, afaik, reset ancestor information) 3) sstable 3 on NODE B incorrectly believes its ancestors are NODE B's sstables 1 and 2. Marcus's response was that we likely need a facility to remove ancestor information from sstables. I agree with the up-thread statement that both Refresh and LoadNewSSTables are likely to be used by experts, but AFAICT those experts still have a need to clear ancestor information from sstables which are moving between nodes. Also, [~nickmbailey]'s question regarding cassandra's behavior on restart when it finds unexpected files in the data directory is a revisit of the resolved-WONTFIX CASSANDRA-6756. was (Author: rcoli): (apologies is this is sufficiently covered above in the giant list of comments, I see that [~tjake] mentions the "refresh" case and that there is related discussion, but the specifics don't seem addressed..) [~krummas] and [~yukim] and I discussed, in IRC, the following edge case regarding storing of ancestors : 1) NODE A compacts sstables 1 and 2 into sstable 3. 3 gets ancestor value "1,2". 2) sstable is copied into NODE B's data directory and NODE B is restarted OR sstable is copied and NODE B runs "nodetool refresh" (which doesn't, afaik, reset ancestor information) 3) sstable 3 on NODE B incorrectly believes its ancestors are NODE B's sstables 1 and 2. Marcus's response was that we likely need a facility to remove ancestor information from sstables. I agree with the up-thread statement that both Refresh and LoadNewSSTables are likely to be used by experts, but AFAICT those experts still have a need to clear ancestor information from sstables which are moving between nodes. > Simplify (and unify) cleanup of compaction leftovers > ---------------------------------------------------- > > Key: CASSANDRA-7066 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7066 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Benedict > Assignee: Stefania > Priority: Minor > Labels: benedict-to-commit, compaction > Fix For: 3.0 alpha 1 > > Attachments: 7066.txt > > > Currently we manage a list of in-progress compactions in a system table, > which we use to cleanup incomplete compactions when we're done. The problem > with this is that 1) it's a bit clunky (and leaves us in positions where we > can unnecessarily cleanup completed files, or conversely not cleanup files > that have been superceded); and 2) it's only used for a regular compaction - > no other compaction types are guarded in the same way, so can result in > duplication if we fail before deleting the replacements. > I'd like to see each sstable store in its metadata its direct ancestors, and > on startup we simply delete any sstables that occur in the union of all > ancestor sets. This way as soon as we finish writing we're capable of > cleaning up any leftovers, so we never get duplication. It's also much easier > to reason about. -- This message was sent by Atlassian JIRA (v6.3.4#6332)