[jira] Updated: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished

Nick Bailey (JIRA) Tue, 21 Sep 2010 12:46:57 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Nick Bailey updated CASSANDRA-1216:
-----------------------------------

    Attachment: 0001-Modify-removeToken-to-be-similar-to-decommission.patch
                0002-Additional-tests-for-removeToken.patch

Patches:
 * 0001
 ** Modifies the removeToken operation to follow a pattern of 
NORMAL->REMOVING->LEFT, rather than the current pattern of a coordinator node 
setting its own status to a special cased version of NORMAL.
 ** Fixes a small bug in StreamHeader serialization
 ** Adds the ability to either get the status of a remove operation taking 
place or force a remove operation to finish immediately
 * 0002
 ** Tests for removing tokens
 ** Move shared code for creating a ring to Util class


Removal Process:
 * Normal Case
 *# Coordinator sets status of failed node to REMOVING
 *# Coordinator blocks on confirmation from other nodes
 *# Any newly responsible nodes stream data
 *# Newly responsible nodes send confirmation once all data has streamed
 *# Coordinator updates status of failed node to LEFT
 *# Done
 * Failure Cases
 ** Coordinator failure
 *** If the coordinator fails the remove operation will need to be retried
 *** This can be done on any node in the cluster.  
 **  Newly responsible node failure
 *** If a newly responsible node fails but comes back up, it should see the 
REMOVING status in gossip and restart the operation
 *** If a newly responsible node fails permanently or a streaming operation 
fails and the node stays up, the coordinator will block forever while waiting 
for confirmation.  The best solution is to force the remove operation to 
complete and then run repair on the failed node.

> removetoken drops node from ring before re-replicating its data is finished
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1216
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7 beta 1
>            Reporter: Jonathan Ellis
>            Assignee: Nick Bailey
>             Fix For: 0.7.0
>
>         Attachments: 
> 0001-Modify-removeToken-to-be-similar-to-decommission.patch, 
> 0002-Additional-tests-for-removeToken.patch
>
>
> this means that if something goes wrong during the re-replication (e.g. a 
> source node is restarted) there is (a) no indication that anything has gone 
> wrong and (b) no way to restart the process (other than the Big Hammer of 
> running repair)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished

Reply via email to