[ https://issues.apache.org/jira/browse/CASSANDRA-14145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sam Tunnicliffe updated CASSANDRA-14145: ---------------------------------------- Resolution: Fixed Fix Version/s: (was: 4.x) 4.0 Status: Resolved (was: Ready to Commit) Thanks all, committed to trunk in {{5fbb938adaafd91e7bea1672f09a03c7ac5b9b9d}} > Detecting data resurrection during read > ---------------------------------------- > > Key: CASSANDRA-14145 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14145 > Project: Cassandra > Issue Type: Improvement > Reporter: sankalp kohli > Assignee: Sam Tunnicliffe > Priority: Minor > Fix For: 4.0 > > > We have seen several bugs in which deleted data gets resurrected. We should > try to see if we can detect this on the read path and possibly fix it. Here > are a few examples which brought back data > A replica lost an sstable on startup which caused one replica to lose the > tombstone and not the data. This tombstone was past gc grace which means this > could resurrect data. We can detect such invalid states by looking at other > replicas. > If we are running incremental repair, Cassandra will keep repaired and > non-repaired data separate. Every-time incremental repair will run, it will > move the data from non-repaired to repaired. Repaired data across all > replicas should be 100% consistent. > Here is an example of how we can detect and mitigate the issue in most cases. > Say we have 3 machines, A,B and C. All these machines will have data split > b/w repaired and non-repaired. > 1. Machine A due to some bug bring backs data D. This data D is in repaired > dataset. All other replicas will have data D and tombstone T > 2. Read for data D comes from application which involve replicas A and B. The > data being read involves data which is in repaired state. A will respond > back to co-ordinator with data D and B will send nothing as tombstone is past > gc grace. This will cause digest mismatch. > 3. This patch will only kick in when there is a digest mismatch. Co-ordinator > will ask both replicas to send back all data like we do today but with this > patch, replicas will respond back what data it is returning is coming from > repaired vs non-repaired. If data coming from repaired does not match, we > know there is a something wrong!! At this time, co-ordinator cannot determine > if replica A has resurrected some data or replica B has lost some data. We > can still log error in the logs saying we hit an invalid state. > 4. Besides the log, we can take this further and even correct the response to > the query. After logging an invalid state, we can ask replica A and B (and > also C if alive) to send back all data for this including gcable tombstones. > If any machine returns a tombstone which is after this data, we know we > cannot return this data. This way we can avoid returning data which has been > deleted. > Some Challenges with this > 1. When data will be moved from non-repaired to repaired, there could be a > race here. We can look at which incremental repairs have promoted things on > which replica to avoid false positives. > 2. If the third replica is down and live replica does not have any tombstone, > we wont be able to break the tie in deciding whether data was actually > deleted or resurrected. > 3. If the read is for latest data only, we wont be able to detect it as the > read will be served from non-repaired data. > 4. If the replica where we lose a tombstone is the last replica to compact > the tombstone, we wont be able to decide if data is coming back or rest of > the replicas has lost that data. But we will still detect something is wrong. > 5. We wont affect 99.9% of the read queries as we only do extra work during > digest mismatch. > 6. CL.ONE reads will not be able to detect this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org