[ 
https://issues.apache.org/jira/browse/CASSANDRA-10727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

C. Scott Andreas updated CASSANDRA-10727:
-----------------------------------------
    Component/s: Core

> Solution for getting rid of GC grace seconds
> --------------------------------------------
>
>                 Key: CASSANDRA-10727
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10727
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sharvanath Pathak
>            Priority: Major
>
> There have been proposals for getting rid of the GC grace seconds, and 
> automating the GC of tombstones by waiting for acks from all the nodes about 
> the receipt of the tombstone. 
> 1. CASSANDRA-3620
> 2. CASSANDRA-6192
> This mechanism has two major benefits in my opinion:
> * Since the GC of tomstones can be much more agressive, it minimizes the 
> number of tombstones in the system. Thereby, increasing the performance of 
> read operations.
> * Eliminates the possibility of resurrection of keys in case a node is comes 
> up after being down for more than GC grace seconds.
> As per CASSANDRA-3620, the main issue with the proposal seems to be its 
> potential race with the hinted handoff. Seems like we can have a good 
> solution to that race. 
> The solution is essentially to record the hint locations. So we before 
> writing any hints, we write a record on the alive replicas saying a hint was 
> written at so and so node. Now the GC will wait for an ack from all the 
> replicas, and also for all the related hints to be replayed and purged before 
> it clears the corresponding tombstone. 
> One potential problem with this scheme is that if the hints are written on 
> the coordinator node the same way they are being done right now, this process 
> will have to wait for a large number of nodes to be up before the GC could be 
> performed. However, this can be easily solve by writing the hints to a node 
> which is determined based on the key token. For instance, write the hint to 
> the node that comes up next to the replicas in the token ring. 
> Writing the hints in the way described in the last paragraph actually seems 
> like a good idea anyway, because it minimizes the number of nodes that have 
> to replay hints when a node comes up. The Dynamo paper actually describes 
> this pattern for hinted handoffs as well. 
> Lastly, it might also have a race with any concurrent read repairs. However, 
> it can be solved the same way, by writing the repairs in progress for a key 
> and then aborting them before the GC is performed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to