Re: gossipinfo contains two nodes dead for more than two years

2019-08-28 Thread John Sumsion
I've seen something similar if there is a node still referring to that IP as a 
seed node in cassandra.yaml.  You might want to check that.

From: Vincent Rischmann 
Sent: Wednesday, August 28, 2019 10:10 AM
To: user@cassandra.apache.org 
Subject: Re: gossipinfo contains two nodes dead for more than two years

Yep, they're not visible in both ring and status.

On Wed, Aug 28, 2019, at 17:08, Jeff Jirsa wrote:
Based on what you've posted, I assume the instances are not visible in 
`nodetool ring` or `nodetool status`, and the only reason you know they're 
still in gossipinfo is you see them in the logs? If that's the case, then yes, 
I would do `nodetool assassinate`.



On Wed, Aug 28, 2019 at 7:33 AM Vincent Rischmann 
mailto:vinc...@rischmann.fr>> wrote:

Hi,

while replacing a node in a cluster I saw this log:

2019-08-27 16:35:31,439 Gossiper.java:995 - InetAddress 
/10.15.53.27<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.15.53.27=DwMFAg=z0adcvxXWKG6LAMN6dVEqQ=W9UI0GQq10yOhf5LxSjoITGT9p69DtOfFK_UGgl4kx8=kS56sxxKgO_TMMOvPmtTFIEW8M-c-pm5Dh-dJVf7_pA=hzMFMit5iJlSQrtHTmcoepAiFg-t5CGPnjZQeLduo4A=>
 is now DOWN

it caught my attention because that ip address doesn't exist anymore in the 
cluster and it hasn't for a long time.

After some reading I ran `nodetool gossipinfo` and I saw these entries which 
are nodes that don't exist anymore:


/10.15.53.27<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.15.53.27=DwMFAg=z0adcvxXWKG6LAMN6dVEqQ=W9UI0GQq10yOhf5LxSjoITGT9p69DtOfFK_UGgl4kx8=kS56sxxKgO_TMMOvPmtTFIEW8M-c-pm5Dh-dJVf7_pA=hzMFMit5iJlSQrtHTmcoepAiFg-t5CGPnjZQeLduo4A=>
  generation:1503480618
  heartbeat:26970
  STATUS:2:hibernate,true
  LOAD:26810:6.17363354147E11
  SCHEMA:101:d21b1e47-f226-3417-8de7-5802518ae824
  DC:10:DC1
  RACK:12:RAC1
  RELEASE_VERSION:6:2.1.18
  INTERNAL_IP:8:10.15.53.27
  RPC_ADDRESS:5:10.15.53.27
  SEVERITY:26972:0.0
  NET_VERSION:3:8
  HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b
  TOKENS:1:

/10.5.1.16<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.5.1.16=DwMFAg=z0adcvxXWKG6LAMN6dVEqQ=W9UI0GQq10yOhf5LxSjoITGT9p69DtOfFK_UGgl4kx8=kS56sxxKgO_TMMOvPmtTFIEW8M-c-pm5Dh-dJVf7_pA=rb7LNU-vuRE1cs3Nzup8H-mjsgVNkaE5SgQYtCM5amA=>
  generation:1503636779
  heartbeat:324
  STATUS:2:hibernate,true
  LOAD:204:2.601990697532E12
  SCHEMA:14:d21b1e47-f226-3417-8de7-5802518ae824
  DC:10:DC1
  RACK:12:RAC1
  RELEASE_VERSION:6:2.1.18
  INTERNAL_IP:8:10.5.1.16
  RPC_ADDRESS:5:10.5.1.16
  SEVERITY:326:0.0
  NET_VERSION:3:8
  HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b
  TOKENS:1:

the generations are:

- Wed, 23 Aug 2017 09:30:18 GMT
- Fri, 25 Aug 2017 04:52:59 GMT

I don't remember what we did at that time but it looks like we botched 
something while joining a node or something.

After reading 
https://thelastpickle.com/blog/2018/09/18/assassinate.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__thelastpickle.com_blog_2018_09_18_assassinate.html=DwMFAg=z0adcvxXWKG6LAMN6dVEqQ=W9UI0GQq10yOhf5LxSjoITGT9p69DtOfFK_UGgl4kx8=kS56sxxKgO_TMMOvPmtTFIEW8M-c-pm5Dh-dJVf7_pA=nq2MU2bQmBvRn14-ALr4SpzhmqeeYYGXCOye1zjnQJw=>
 I'm thinking of doing the following:

* nodetool removenode 10.15.53.27
* if it doesn't work for some reason: nodetool assassinate 10.15.53.27

Since those nodes have been long dead and don't appear in system.peer I don't 
anticipate any problems but I'd like some confirmation that this can't break my 
cluster.

Thanks !


Re: gossipinfo contains two nodes dead for more than two years

2019-08-28 Thread Vincent Rischmann
Yep, they're not visible in both ring and status.

On Wed, Aug 28, 2019, at 17:08, Jeff Jirsa wrote:
> Based on what you've posted, I assume the instances are not visible in 
> `nodetool ring` or `nodetool status`, and the only reason you know they're 
> still in gossipinfo is you see them in the logs? If that's the case, then 
> yes, I would do `nodetool assassinate`.
> 
> 
> 
> On Wed, Aug 28, 2019 at 7:33 AM Vincent Rischmann  
> wrote:
>> __
>> Hi,
>> 
>> while replacing a node in a cluster I saw this log:
>> 
>>  2019-08-27 16:35:31,439 Gossiper.java:995 - InetAddress /10.15.53.27 is now 
>> DOWN
>> 
>> it caught my attention because that ip address doesn't exist anymore in the 
>> cluster and it hasn't for a long time.
>> 
>> After some reading I ran `nodetool gossipinfo` and I saw these entries which 
>> are nodes that don't exist anymore:
>> 
>>  /10.15.53.27
>>  generation:1503480618
>>  heartbeat:26970
>>  STATUS:2:hibernate,true
>>  LOAD:26810:6.17363354147E11
>>  SCHEMA:101:d21b1e47-f226-3417-8de7-5802518ae824
>>  DC:10:DC1
>>  RACK:12:RAC1
>>  RELEASE_VERSION:6:2.1.18
>>  INTERNAL_IP:8:10.15.53.27
>>  RPC_ADDRESS:5:10.15.53.27
>>  SEVERITY:26972:0.0
>>  NET_VERSION:3:8
>>  HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b
>>  TOKENS:1:
>>  /10.5.1.16
>>  generation:1503636779
>>  heartbeat:324
>>  STATUS:2:hibernate,true
>>  LOAD:204:2.601990697532E12
>>  SCHEMA:14:d21b1e47-f226-3417-8de7-5802518ae824
>>  DC:10:DC1
>>  RACK:12:RAC1
>>  RELEASE_VERSION:6:2.1.18
>>  INTERNAL_IP:8:10.5.1.16
>>  RPC_ADDRESS:5:10.5.1.16
>>  SEVERITY:326:0.0
>>  NET_VERSION:3:8
>>  HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b
>>  TOKENS:1:
>> 
>> the generations are:
>> 
>> - Wed, 23 Aug 2017 09:30:18 GMT
>> - Fri, 25 Aug 2017 04:52:59 GMT
>> 
>> I don't remember what we did at that time but it looks like we botched 
>> something while joining a node or something.
>> 
>> After reading https://thelastpickle.com/blog/2018/09/18/assassinate.html I'm 
>> thinking of doing the following:
>> 
>> * nodetool removenode 10.15.53.27
>> * if it doesn't work for some reason: nodetool assassinate 10.15.53.27
>> 
>> Since those nodes have been long dead and don't appear in system.peer I 
>> don't anticipate any problems but I'd like some confirmation that this can't 
>> break my cluster.
>> 
>> Thanks !

Re: gossipinfo contains two nodes dead for more than two years

2019-08-28 Thread Jeff Jirsa
Based on what you've posted, I assume the instances are not visible in
`nodetool ring` or `nodetool status`, and the only reason you know they're
still in gossipinfo is you see them in the logs? If that's the case, then
yes, I would do `nodetool assassinate`.



On Wed, Aug 28, 2019 at 7:33 AM Vincent Rischmann 
wrote:

> Hi,
>
> while replacing a node in a cluster I saw this log:
>
> 2019-08-27 16:35:31,439 Gossiper.java:995 - InetAddress /10.15.53.27
> is now DOWN
>
> it caught my attention because that ip address doesn't exist anymore in
> the cluster and it hasn't for a long time.
>
> After some reading I ran `nodetool gossipinfo` and I saw these entries
> which are nodes that don't exist anymore:
>
> /10.15.53.27
>   generation:1503480618
>   heartbeat:26970
>   STATUS:2:hibernate,true
>   LOAD:26810:6.17363354147E11
>   SCHEMA:101:d21b1e47-f226-3417-8de7-5802518ae824
>   DC:10:DC1
>   RACK:12:RAC1
>   RELEASE_VERSION:6:2.1.18
>   INTERNAL_IP:8:10.15.53.27
>   RPC_ADDRESS:5:10.15.53.27
>   SEVERITY:26972:0.0
>   NET_VERSION:3:8
>   HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b
>   TOKENS:1:
> /10.5.1.16
>   generation:1503636779
>   heartbeat:324
>   STATUS:2:hibernate,true
>   LOAD:204:2.601990697532E12
>   SCHEMA:14:d21b1e47-f226-3417-8de7-5802518ae824
>   DC:10:DC1
>   RACK:12:RAC1
>   RELEASE_VERSION:6:2.1.18
>   INTERNAL_IP:8:10.5.1.16
>   RPC_ADDRESS:5:10.5.1.16
>   SEVERITY:326:0.0
>   NET_VERSION:3:8
>   HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b
>   TOKENS:1:
>
> the generations are:
>
> - Wed, 23 Aug 2017 09:30:18 GMT
> - Fri, 25 Aug 2017 04:52:59 GMT
>
> I don't remember what we did at that time but it looks like we botched
> something while joining a node or something.
>
> After reading https://thelastpickle.com/blog/2018/09/18/assassinate.html
> I'm thinking of doing the following:
>
> * nodetool removenode 10.15.53.27
> * if it doesn't work for some reason: nodetool assassinate 10.15.53.27
>
> Since those nodes have been long dead and don't appear in system.peer I
> don't anticipate any problems but I'd like some confirmation that this
> can't break my cluster.
>
> Thanks !
>


gossipinfo contains two nodes dead for more than two years

2019-08-28 Thread Vincent Rischmann
Hi,

while replacing a node in a cluster I saw this log:

 2019-08-27 16:35:31,439 Gossiper.java:995 - InetAddress /10.15.53.27 is now 
DOWN

it caught my attention because that ip address doesn't exist anymore in the 
cluster and it hasn't for a long time.

After some reading I ran `nodetool gossipinfo` and I saw these entries which 
are nodes that don't exist anymore:

 /10.15.53.27
 generation:1503480618
 heartbeat:26970
 STATUS:2:hibernate,true
 LOAD:26810:6.17363354147E11
 SCHEMA:101:d21b1e47-f226-3417-8de7-5802518ae824
 DC:10:DC1
 RACK:12:RAC1
 RELEASE_VERSION:6:2.1.18
 INTERNAL_IP:8:10.15.53.27
 RPC_ADDRESS:5:10.15.53.27
 SEVERITY:26972:0.0
 NET_VERSION:3:8
 HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b
 TOKENS:1:
 /10.5.1.16
 generation:1503636779
 heartbeat:324
 STATUS:2:hibernate,true
 LOAD:204:2.601990697532E12
 SCHEMA:14:d21b1e47-f226-3417-8de7-5802518ae824
 DC:10:DC1
 RACK:12:RAC1
 RELEASE_VERSION:6:2.1.18
 INTERNAL_IP:8:10.5.1.16
 RPC_ADDRESS:5:10.5.1.16
 SEVERITY:326:0.0
 NET_VERSION:3:8
 HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b
 TOKENS:1:

the generations are:

- Wed, 23 Aug 2017 09:30:18 GMT
- Fri, 25 Aug 2017 04:52:59 GMT

I don't remember what we did at that time but it looks like we botched 
something while joining a node or something.

After reading https://thelastpickle.com/blog/2018/09/18/assassinate.html I'm 
thinking of doing the following:

* nodetool removenode 10.15.53.27
* if it doesn't work for some reason: nodetool assassinate 10.15.53.27

Since those nodes have been long dead and don't appear in system.peer I don't 
anticipate any problems but I'd like some confirmation that this can't break my 
cluster.

Thanks !