[ https://issues.apache.org/jira/browse/CASSANDRA-14543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523042#comment-16523042 ]
Kurt Greaves commented on CASSANDRA-14543: ------------------------------------------ I'm not sure about this because it only "works" if you assume the delete happened within the HH window, and then the node was down for an additional (*GCGS - HH*). It does reduce the number of cases where users could be bitten by this, but not significantly, and I suspect this will become another layer of confusion when someone does hit the problem. I think also [~slebresne]'s comment [here|https://issues.apache.org/jira/browse/CASSANDRA-14532?focusedCommentId=16519023&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16519023] does a good job of explaining why sending tombstones after GCGS is a bad idea. While I don't think the constant unnecessary read repair is good, there's not really many good solutions here. Ideally the read repair wouldn't happen in the first place and we exclude post gcgs tombstones from digest calc but that's likely very risky. So far it seems the best of a bunch of terrible solutions here is to either repair within GCGS, or never delete things. Going forward only_purge_repaired_tombstones w/ incremental repair should fix the underlying design problem here. > Hinted handoff to replay purgeable tombstones > ---------------------------------------------- > > Key: CASSANDRA-14543 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14543 > Project: Cassandra > Issue Type: Improvement > Reporter: Jay Zhuang > Priority: Minor > > Hinted-handoff currently only dispatches and applies the mutations that are > within GCGS: > [{{Hint.java:97}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/hints/Hint.java#L97]. > Which is to make sure it won't resurrect any deleted data. > But replaying tombstones should be safe, it could reduce the chance to have > [un-repairable inconsistent > data|https://lists.apache.org/thread.html/2d3d39d960143d4d2146ed2530821504ff855e832713dec7d0afd8ac@%3Cdev.cassandra.apache.org%3E]. > Here is the user scenario it tries to fix: > {noformat} > 1. Create a 3 nodes cluster > 2. Create a table with small gc_grace_seconds (for reproducing purpose): > CREATE KEYSPACE foo WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 3}; > CREATE TABLE foo.bar ( > id int PRIMARY KEY, > name text > ) WITH gc_grace_seconds=30; > 3. Insert data with consistency all: > INSERT INTO foo.bar (id, name) VALUES(1, 'cstar'); > 4. stop 1 node > $ ccm node2 stop > 5. Delete the data with consistency quorum: > DELETE FROM foo.bar WHERE id=1; > 6. Wait 30 seconds and then start node2: > $ ccm node2 start > {noformat} > Now, node2 has the data, node1/node3 have the purgeable tombstone. It > triggers RR every time which sends data from node2 to node1/node3 but repairs > nothing. > With purgeable tombstones hints handoff, it at least will dispatch the > tombstone and delete the data on node2. It won't fix the root cause but > reduce the chance to have this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org