[ https://issues.apache.org/jira/browse/CASSANDRA-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090244#comment-14090244 ]
Jonathan Ellis commented on CASSANDRA-7720: ------------------------------------------- Remember, we have NO guarantees on ordering. Hint replay, read repair, and failures-fixed-by-full-repair can all cause "newer" updates to be applied before "older" ones. So changing snapshot wouldn't really change the scenarios you have to tolerate. I'm not against making a "best effort" in principle, but doing seq scans of snapshots to build a copy with most but not all of the data is a pretty big deal for both performance and complexity. (What if you die partway through?) So my inclination is that there isn't a whole lot of benefit from doing this, and RAMP (which actually does give you guarantees in the faceo of HH/RR/etc) is a better solution. (and if they're not part of the same batch then ipso facto it's not really a problem). > Add a more consistent snapshot mechanism > ---------------------------------------- > > Key: CASSANDRA-7720 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7720 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Mike Schrag > > We’ve hit an interesting issue with snapshotting, which makes sense in > hindsight, but presents an interesting challenge for consistent restores: > * initiate snapshot > * snapshotting flushes table A and takes the snapshot > * insert into table A > * insert into table B > * snapshotting flushes table B and takes the snapshot > * snapshot finishes > So what happens here is that we end up having a B, but NOT having an A, even > though B was chronologically inserted after A. > It makes sense when I think about what snapshot is doing, but I wonder if > snapshots actually should get a little fancier to behave a little more like > what I think most people would expect. What I think should happen is > something along the lines of the following: > For each node: > * pass a client timestamp in the snapshot call corresponding to "now" > * snapshot the tables using the existing procedure > * walk backwards through the linked snapshot sstables in that snapshot > * if the earliest update in that sstable is after the client's timestamp, > delete the sstable in the snapshot > * if the earliest update in the sstable is before the client's timestamp, > then look at the last update. Walk backwards through that sstable. > * if any updates fall after the timestamp, make a copy of that sstable in > the snapshot folder only up to the point of the timestamp and then delete the > original sstable in the snapshot (we need to copy because we're likely > holding a shared hard linked sstable) > I think this would guarantee that you have a chronologically consistent view > of your snapshot across all machines and columnfamilies within a given > snapshot. -- This message was sent by Atlassian JIRA (v6.2#6252)