[ https://issues.apache.org/jira/browse/CASSANDRA-10727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
C. Scott Andreas updated CASSANDRA-10727: ----------------------------------------- Component/s: Core > Solution for getting rid of GC grace seconds > -------------------------------------------- > > Key: CASSANDRA-10727 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10727 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Sharvanath Pathak > Priority: Major > > There have been proposals for getting rid of the GC grace seconds, and > automating the GC of tombstones by waiting for acks from all the nodes about > the receipt of the tombstone. > 1. CASSANDRA-3620 > 2. CASSANDRA-6192 > This mechanism has two major benefits in my opinion: > * Since the GC of tomstones can be much more agressive, it minimizes the > number of tombstones in the system. Thereby, increasing the performance of > read operations. > * Eliminates the possibility of resurrection of keys in case a node is comes > up after being down for more than GC grace seconds. > As per CASSANDRA-3620, the main issue with the proposal seems to be its > potential race with the hinted handoff. Seems like we can have a good > solution to that race. > The solution is essentially to record the hint locations. So we before > writing any hints, we write a record on the alive replicas saying a hint was > written at so and so node. Now the GC will wait for an ack from all the > replicas, and also for all the related hints to be replayed and purged before > it clears the corresponding tombstone. > One potential problem with this scheme is that if the hints are written on > the coordinator node the same way they are being done right now, this process > will have to wait for a large number of nodes to be up before the GC could be > performed. However, this can be easily solve by writing the hints to a node > which is determined based on the key token. For instance, write the hint to > the node that comes up next to the replicas in the token ring. > Writing the hints in the way described in the last paragraph actually seems > like a good idea anyway, because it minimizes the number of nodes that have > to replay hints when a node comes up. The Dynamo paper actually describes > this pattern for hinted handoffs as well. > Lastly, it might also have a race with any concurrent read repairs. However, > it can be solved the same way, by writing the repairs in progress for a key > and then aborting them before the GC is performed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org