[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nick Bailey updated CASSANDRA-1216: ----------------------------------- Attachment: 0001-Modify-removeToken-to-be-similar-to-decommission.patch 0002-Additional-tests-for-removeToken.patch Patches: * 0001 ** Modifies the removeToken operation to follow a pattern of NORMAL->REMOVING->LEFT, rather than the current pattern of a coordinator node setting its own status to a special cased version of NORMAL. ** Fixes a small bug in StreamHeader serialization ** Adds the ability to either get the status of a remove operation taking place or force a remove operation to finish immediately * 0002 ** Tests for removing tokens ** Move shared code for creating a ring to Util class Removal Process: * Normal Case *# Coordinator sets status of failed node to REMOVING *# Coordinator blocks on confirmation from other nodes *# Any newly responsible nodes stream data *# Newly responsible nodes send confirmation once all data has streamed *# Coordinator updates status of failed node to LEFT *# Done * Failure Cases ** Coordinator failure *** If the coordinator fails the remove operation will need to be retried *** This can be done on any node in the cluster. ** Newly responsible node failure *** If a newly responsible node fails but comes back up, it should see the REMOVING status in gossip and restart the operation *** If a newly responsible node fails permanently or a streaming operation fails and the node stays up, the coordinator will block forever while waiting for confirmation. The best solution is to force the remove operation to complete and then run repair on the failed node. > removetoken drops node from ring before re-replicating its data is finished > --------------------------------------------------------------------------- > > Key: CASSANDRA-1216 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 0.7 beta 1 > Reporter: Jonathan Ellis > Assignee: Nick Bailey > Fix For: 0.7.0 > > Attachments: > 0001-Modify-removeToken-to-be-similar-to-decommission.patch, > 0002-Additional-tests-for-removeToken.patch > > > this means that if something goes wrong during the re-replication (e.g. a > source node is restarted) there is (a) no indication that anything has gone > wrong and (b) no way to restart the process (other than the Big Hammer of > running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.