[ 
https://issues.apache.org/jira/browse/CASSANDRA-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090244#comment-14090244
 ] 

Jonathan Ellis commented on CASSANDRA-7720:
-------------------------------------------

Remember, we have NO guarantees on ordering.  Hint replay, read repair, and 
failures-fixed-by-full-repair can all cause "newer" updates to be applied 
before "older" ones.  So changing snapshot wouldn't really change the scenarios 
you have to tolerate.

I'm not against making a "best effort" in principle, but doing seq scans of 
snapshots to build a copy with most but not all of the data is a pretty big 
deal for both performance and complexity. (What if you die partway through?)

So my inclination is that there isn't a whole lot of benefit from doing this, 
and RAMP (which actually does give you guarantees in the faceo of HH/RR/etc) is 
a better solution.  (and if they're not part of the same batch then ipso facto 
it's not really a problem).


> Add a more consistent snapshot mechanism
> ----------------------------------------
>
>                 Key: CASSANDRA-7720
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7720
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Mike Schrag
>
> We’ve hit an interesting issue with snapshotting, which makes sense in 
> hindsight, but presents an interesting challenge for consistent restores:
> * initiate snapshot
> * snapshotting flushes table A and takes the snapshot
> * insert into table A
> * insert into table B
> * snapshotting flushes table B and takes the snapshot
> * snapshot finishes
> So what happens here is that we end up having a B, but NOT having an A, even 
> though B was chronologically inserted after A.
> It makes sense when I think about what snapshot is doing, but I wonder if 
> snapshots actually should get a little fancier to behave a little more like 
> what I think most people would expect. What I think should happen is 
> something along the lines of the following:
> For each node:
> * pass a client timestamp in the snapshot call corresponding to "now"
> * snapshot the tables using the existing procedure
> * walk backwards through the linked snapshot sstables in that snapshot
>   * if the earliest update in that sstable is after the client's timestamp, 
> delete the sstable in the snapshot
>   * if the earliest update in the sstable is before the client's timestamp, 
> then look at the last update. Walk backwards through that sstable.
>     * if any updates fall after the timestamp, make a copy of that sstable in 
> the snapshot folder only up to the point of the timestamp and then delete the 
> original sstable in the snapshot (we need to copy because we're likely 
> holding a shared hard linked sstable)
> I think this would guarantee that you have a chronologically consistent view 
> of your snapshot across all machines and columnfamilies within a given 
> snapshot.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to