[ https://issues.apache.org/jira/browse/CASSANDRA-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638597#comment-14638597 ]
Stefania commented on CASSANDRA-9871: ------------------------------------- I've reproduced the problem with this test: {code} def can_replace_down_node_test(self): """ @jira_ticket CASSANDRA-9871 Test that we can replace a node that is down and in status normal (DN) by using -Dcassandra.replace_address """ cluster = self.cluster cluster.populate(3) cluster.start(wait_for_binary_proto=True) version = cluster.version() stress_table = 'keyspace1.standard1' if self.cluster.version() >= '2.1' else '"Keyspace1"."Standard1"' # write some data node1, node2, node3 = cluster.nodelist() if version < "2.1": node1.stress(['-n', '10000']) else: node1.stress(['write', 'n=10000', '-rate', 'threads=8']) # Stop node 3 node3.stop(gently=True) # Sleep a bit to let GOSSIP settle time.sleep(2) out, err = node1.nodetool('status') self.assertEquals('', err) debug(out) # Create a new node to replace node3 node4 = new_node(cluster, bootstrap=True) node4.start(jvm_args=["-Dcassandra.replace_address=127.0.0.3"], wait_for_binary_proto=True) {code} Interestingly if the old node is shutdown with kill -9 (gently=False in the stop method), then it can be replace without problems. Here is the code determining if it's a fat client: {code} public boolean isFatClient(InetAddress endpoint) { EndpointState epState = endpointStateMap.get(endpoint); if (epState == null) { return false; } return !isDeadState(epState) && !StorageService.instance.getTokenMetadata().isMember(endpoint); } {code} The dead states are REMOVING, REMOVED, LEFT and HIBERNATE. The state for a clean shutdown should be SHUTDOWN, so {{!isDealState(epState)}} should be true. I still need to work out why the endpoint is not a member but it should be due to the "is now DOWN" log, which is not present when the old node is killed with -9. > Cannot replace token does not exist - DN node removed as Fat Client > ------------------------------------------------------------------- > > Key: CASSANDRA-9871 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9871 > Project: Cassandra > Issue Type: Bug > Reporter: Sebastian Estevez > Assignee: Stefania > Fix For: 2.1.x > > > We lost a node due to disk failure, we tried to replace it via > -Dcassandra.replace_address per -- > http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsReplaceNode.html > The node would not come up with these errors in the system.log: > {code} > INFO [main] 2015-07-22 03:20:06,722 StorageService.java:500 - Gathering > node replacement information for /10.171.115.233 > ... > INFO [SharedPool-Worker-1] 2015-07-22 03:22:34,281 Gossiper.java:954 - > InetAddress /10.111.183.101 is now UP > INFO [GossipTasks:1] 2015-07-22 03:22:59,300 Gossiper.java:735 - FatClient > /10.171.115.233 has been silent for 30000ms, removing from gossip > ERROR [main] 2015-07-22 03:23:28,485 CassandraDaemon.java:541 - Exception > encountered during startup > java.lang.UnsupportedOperationException: Cannot replace token > -1013652079972151677 which does not exist! > {code} > It is not clear why Gossiper removed the node as a FatClient, given that it > was a full node before it died and it had tokens assigned to it (including > -1013652079972151677) in system.peers and nodetool ring. -- This message was sent by Atlassian JIRA (v6.3.4#6332)