> It's actually correct to do it how it is today.
> Insertion date does not matter, what matters is the time after tombstones are 
> supposed to be deleted.
> If the delete got to all nodes, sure, no problem, but if any of the nodes 
> didn't get the delete, and you would get rid of the tombstones before running 
> a repair, you might have nodes that still has that data.
> Then following a repair, that data will be copied to other replicas, and that 
> data you thought you deleted, will be brought back to life.

Sure, for regular data that does not have a TTL, this makes sense. But I claim 
that data with a TTL is deleted when it is inserted. It’s just that this delete 
only becomes effective at some future date.

In order to understand whether data might reappear, we have to consider four 
cases. Let us first consider the three cases where the INSERT / UPDATE did not 
overwrite any existing data that would have lived longer than the new data:

1. Let us assume that the data is successfully written to all nodes and no 
repair is run. After the TTL expires, the data turns into a tombstone, but 
because the data was present on all nodes, the tombstone is present on all 
nodes, so there is no risk of data reappearing.

2. Let us assume that this data is not written to all nodes but a repair is run 
within the TTL. After that, we effectively have the first situation, so there 
is no risk of data reappearing.

3. Let us assume that this data is not written to all nodes and no repair is 
run within the TTL. After the TTL has passed, the data expires on the nodes 
where it has been written. Now, we have tombstones on these nodes. If we get 
rid of the tombstones, there is no risk of the data reappearing, because there 
are no nodes that have the data, so even if we run a repair in the future, 
there is no risk that the data magically reappears.

Now, let us consider the cases where data that either had no TTL or had a TTL 
that expired after the TTL of the newly inserted data was overwritten. Again, 
there are three possible scenarios:

4. Let us assume that the data is successfully written to all nodes and no 
repair is run. After the TTL expires, the data turns into a tombstone, but 
because the data was present on all nodes, the tombstone is present on all 
nodes, so there is no risk of data reappearing.

5. Let us assume that this data is not written to all nodes but a repair is run 
within the TTL. After that, we effectively have the first situation, so there 
is no risk of data reappearing.

6. Let us assume that this data is not written to all nodes and no repair is 
run within the TTL. After the TTL has passed, the data expires on the nodes 
where it has been written. Now, we have tombstones on these nodes. If we get 
rid of the tombstones, there is the risk of the data reappearing, because the 
older data that was overwritten by the INSERT / UPDATE might still exist on 
some nodes, and as the data with the TTL never made it to these nodes, there is 
no tombstone on these nodes and thus the older data can reappear.

So, we only have to worry about the last scenario. In this scenario, we have to 
ensure that either the inserted data with the TTL is repaired (which brings us 
back to scenario 5), or that the tombstones are repaired before they are 
discarded.

This is why I claim that for data with a TTL, gc_grace_seconds should 
effectively start when the data is inserted, not when it is converted into a 
tombstone: It does not matter whether the data with the TTL is repaired or the 
tombstone is repaired. As long as either of these things between the data with 
the TTL being inserted and the tombstone being reclaimed, there is no risk of 
deleted or overwritten data reappearing.

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to