[ 
https://issues.apache.org/jira/browse/CASSANDRA-14145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598154#comment-16598154
 ] 

Jordan West commented on CASSANDRA-14145:
-----------------------------------------

[~beobal] the changes are a definite improvement. I'm still reviewing but 
wanted to get you some feedback because of the tz difference. 

* I’m not a big fan of having to duplicate {{considerRepairedForTracking}} 
across {{ReadCommand}} and {{TimestampOrderPartitionReader}} because its 
race-condition prone code. While its not prone to the same sorts of bugs, 
duplicating {{shouldInclude}} also seems sub-optimal and there is some level of 
duplication between {{queryMemtableAndDiskInternal}} and 
{{TimestamOrderedPartitionReader.WithRepairedDataTracking}}. Since we lose a 
good chunk of the benefits of the optimization, would it make more sense to 
disable it entirely (at least for the initial commit) if 
{{isTrackingRepairedStatus()}}? We can revisit it as a performance optimization 
later. This would remove a bunch of the duplication and reduce the 
footprint/risk of the patch. 
* I would turn inconclusive tracking off by default. Especially because of the 
the window for potential false positives being widened by 
SinglePartitionReadCommand#L615-621.
* Missing new cassandra.yaml entries

Minor nits:
* ReadCommand#newDataRepairInfo is only used once, consider just inlining its 
definition at the call site
* I find it a little cleaner to get rid of the the default/empty implementation 
in the RepairedDataInfo and just make DefaultRepairedDataInfo a top-level 
class. That would get rid of the not-so-used {{NO_OP_REPAIRED_DATA_INFO}} and 
{{LocalDataResponse}} would be the only place to handle when 
getRepairedDataInfo is null. 
* In {{ReadResponse.Serializer::serializedSize}} the two additions to {{size} 
to account for the new fields could be collapsed into one line. Also, it could 
be useful to update the comment to explain what the extra byte is for.
* The inner class {{WithDigest} inside 
{{UnfilteredRowIterators#withRepairedDataTracking}} could be more aptly named 
since it doesn’t perform the digesting

>  Detecting data resurrection during read
> ----------------------------------------
>
>                 Key: CASSANDRA-14145
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14145
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Sam Tunnicliffe
>            Priority: Minor
>             Fix For: 4.x
>
>
> We have seen several bugs in which deleted data gets resurrected. We should 
> try to see if we can detect this on the read path and possibly fix it. Here 
> are a few examples which brought back data
> A replica lost an sstable on startup which caused one replica to lose the 
> tombstone and not the data. This tombstone was past gc grace which means this 
> could resurrect data. We can detect such invalid states by looking at other 
> replicas. 
> If we are running incremental repair, Cassandra will keep repaired and 
> non-repaired data separate. Every-time incremental repair will run, it will 
> move the data from non-repaired to repaired. Repaired data across all 
> replicas should be 100% consistent. 
> Here is an example of how we can detect and mitigate the issue in most cases. 
> Say we have 3 machines, A,B and C. All these machines will have data split 
> b/w repaired and non-repaired. 
> 1. Machine A due to some bug bring backs data D. This data D is in repaired 
> dataset. All other replicas will have data D and tombstone T 
> 2. Read for data D comes from application which involve replicas A and B. The 
> data being read involves data which is in repaired state.  A will respond 
> back to co-ordinator with data D and B will send nothing as tombstone is past 
> gc grace. This will cause digest mismatch. 
> 3. This patch will only kick in when there is a digest mismatch. Co-ordinator 
> will ask both replicas to send back all data like we do today but with this 
> patch, replicas will respond back what data it is returning is coming from 
> repaired vs non-repaired. If data coming from repaired does not match, we 
> know there is a something wrong!! At this time, co-ordinator cannot determine 
> if replica A has resurrected some data or replica B has lost some data. We 
> can still log error in the logs saying we hit an invalid state.
> 4. Besides the log, we can take this further and even correct the response to 
> the query. After logging an invalid state, we can ask replica A and B (and 
> also C if alive) to send back all data for this including gcable tombstones. 
> If any machine returns a tombstone which is after this data, we know we 
> cannot return this data. This way we can avoid returning data which has been 
> deleted. 
> Some Challenges with this 
> 1. When data will be moved from non-repaired to repaired, there could be a 
> race here. We can look at which incremental repairs have promoted things on 
> which replica to avoid false positives.  
> 2. If the third replica is down and live replica does not have any tombstone, 
> we wont be able to break the tie in deciding whether data was actually 
> deleted or resurrected. 
> 3. If the read is for latest data only, we wont be able to detect it as the 
> read will be served from non-repaired data. 
> 4. If the replica where we lose a tombstone is the last replica to compact 
> the tombstone, we wont be able to decide if data is coming back or rest of 
> the replicas has lost that data. But we will still detect something is wrong. 
> 5. We wont affect 99.9% of the read queries as we only do extra work during 
> digest mismatch.
> 6. CL.ONE reads will not be able to detect this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to