[ 
https://issues.apache.org/jira/browse/CASSANDRA-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleg Anastasyev updated CASSANDRA-8138:
---------------------------------------
    Description: 
If a node failed and a cluster was restarted (which is common case on massive 
outages), replace_address fails with
{code}
Caused by: java.lang.RuntimeException: Cannot replace_address /172.19.56.97 
because it doesn't exist in gossip
jvm 1    |      at 
org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:472)
jvm 1    |      at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:724)
jvm 1    |      at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:686)
jvm 1    |      at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:562)
{code}

Although neccessary information is saved in system tables on seed nodes, it is 
not loaded to gossip on seed node, so a replacement node cannot get this info.

Attached patch loads all information from system tables to gossip with 
generation 0 and fixes some bugs around this info on shadow gossip round.

  was:
If a node failed and a cluster (or one of seeds) was restarted (which is common 
case on massive outages), replace_address fails with
{code}
Caused by: java.lang.RuntimeException: Cannot replace_address /172.19.56.97 
because it doesn't exist in gossip
jvm 1    |      at 
org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:472)
jvm 1    |      at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:724)
jvm 1    |      at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:686)
jvm 1    |      at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:562)
{code}

Although neccessary information is saved in system tables on seed nodes, it is 
not loaded to gossip on seed node, so a replacement node cannot get this info.

Attached patch loads all information from system tables to gossip with 
generation 0 and fixes some bugs around this info on shadow gossip round.


> replace_address cannot find node to be replaced node after seed node restart
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8138
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8138
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Oleg Anastasyev
>         Attachments: ReplaceAfterSeedRestart.txt
>
>
> If a node failed and a cluster was restarted (which is common case on massive 
> outages), replace_address fails with
> {code}
> Caused by: java.lang.RuntimeException: Cannot replace_address /172.19.56.97 
> because it doesn't exist in gossip
> jvm 1    |    at 
> org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:472)
> jvm 1    |    at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:724)
> jvm 1    |    at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:686)
> jvm 1    |    at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:562)
> {code}
> Although neccessary information is saved in system tables on seed nodes, it 
> is not loaded to gossip on seed node, so a replacement node cannot get this 
> info.
> Attached patch loads all information from system tables to gossip with 
> generation 0 and fixes some bugs around this info on shadow gossip round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to