[jira] Commented: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished

Nick Bailey (JIRA) Wed, 25 Aug 2010 10:06:51 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902523#action_12902523
 ]


Nick Bailey commented on CASSANDRA-1216:
----------------------------------------

I believe the only consequences of calling removeToken on another node when the 
coordinator goes down would be that the entire operation would be repeated. So 
any data that was transferred before would be transferred again.  I think this 
is the right behavior since there is no way of knowing what was transferred 
before the coordinator went down.  

It might be useful to add a 'force' option though.  If the coordinator goes 
down and the token gets stuck in a REMOVING state you may want to force removal 
rather than redoing the entire operation. 

It should be possible to remove the timeout so that removeToken blocks until 
the transfer is completely finished.  The code for streaming in the remote data 
blocks until all streams are complete and the code for sending a confirmation 
to the coordinator will keep retrying until it is received or the coordinator 
dies.  

I think this would work if a check was added so that you can only call 
removeToken a second time if the coordinator is down.  It wouldn't handle two 
calls that occurred before the state made its way through gossip though.  



> removetoken drops node from ring before re-replicating its data is finished
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1216
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7 beta 1
>            Reporter: Jonathan Ellis
>            Assignee: Nick Bailey
>             Fix For: 0.7 beta 2
>
>         Attachments: 0001-Add-callbacks-to-streaming.patch, 
> 0002-Modify-removeToken-to-be-similar-to-decommission.patch, 
> 0003-Fixes-to-old-tests.patch, 0004-Additional-tests-for-removeToken.patch
>
>
> this means that if something goes wrong during the re-replication (e.g. a 
> source node is restarted) there is (a) no indication that anything has gone 
> wrong and (b) no way to restart the process (other than the Big Hammer of 
> running repair)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished

Reply via email to