> It's actually correct to do it how it is today. > Insertion date does not matter, what matters is the time after tombstones are > supposed to be deleted. > If the delete got to all nodes, sure, no problem, but if any of the nodes > didn't get the delete, and you would get rid of the tombstones before running > a repair, you might have nodes that still has that data. > Then following a repair, that data will be copied to other replicas, and that > data you thought you deleted, will be brought back to life.
Sure, for regular data that does not have a TTL, this makes sense. But I claim that data with a TTL is deleted when it is inserted. It’s just that this delete only becomes effective at some future date. In order to understand whether data might reappear, we have to consider four cases. Let us first consider the three cases where the INSERT / UPDATE did not overwrite any existing data that would have lived longer than the new data: 1. Let us assume that the data is successfully written to all nodes and no repair is run. After the TTL expires, the data turns into a tombstone, but because the data was present on all nodes, the tombstone is present on all nodes, so there is no risk of data reappearing. 2. Let us assume that this data is not written to all nodes but a repair is run within the TTL. After that, we effectively have the first situation, so there is no risk of data reappearing. 3. Let us assume that this data is not written to all nodes and no repair is run within the TTL. After the TTL has passed, the data expires on the nodes where it has been written. Now, we have tombstones on these nodes. If we get rid of the tombstones, there is no risk of the data reappearing, because there are no nodes that have the data, so even if we run a repair in the future, there is no risk that the data magically reappears. Now, let us consider the cases where data that either had no TTL or had a TTL that expired after the TTL of the newly inserted data was overwritten. Again, there are three possible scenarios: 4. Let us assume that the data is successfully written to all nodes and no repair is run. After the TTL expires, the data turns into a tombstone, but because the data was present on all nodes, the tombstone is present on all nodes, so there is no risk of data reappearing. 5. Let us assume that this data is not written to all nodes but a repair is run within the TTL. After that, we effectively have the first situation, so there is no risk of data reappearing. 6. Let us assume that this data is not written to all nodes and no repair is run within the TTL. After the TTL has passed, the data expires on the nodes where it has been written. Now, we have tombstones on these nodes. If we get rid of the tombstones, there is the risk of the data reappearing, because the older data that was overwritten by the INSERT / UPDATE might still exist on some nodes, and as the data with the TTL never made it to these nodes, there is no tombstone on these nodes and thus the older data can reappear. So, we only have to worry about the last scenario. In this scenario, we have to ensure that either the inserted data with the TTL is repaired (which brings us back to scenario 5), or that the tombstones are repaired before they are discarded. This is why I claim that for data with a TTL, gc_grace_seconds should effectively start when the data is inserted, not when it is converted into a tombstone: It does not matter whether the data with the TTL is repaired or the tombstone is repaired. As long as either of these things between the data with the TTL being inserted and the tombstone being reclaimed, there is no risk of deleted or overwritten data reappearing.
smime.p7s
Description: S/MIME cryptographic signature