Re: gossipinfo contains two nodes dead for more than two years
I've seen something similar if there is a node still referring to that IP as a seed node in cassandra.yaml. You might want to check that. From: Vincent Rischmann Sent: Wednesday, August 28, 2019 10:10 AM To: user@cassandra.apache.org Subject: Re: gossipinfo contains two nodes dead for more than two years Yep, they're not visible in both ring and status. On Wed, Aug 28, 2019, at 17:08, Jeff Jirsa wrote: Based on what you've posted, I assume the instances are not visible in `nodetool ring` or `nodetool status`, and the only reason you know they're still in gossipinfo is you see them in the logs? If that's the case, then yes, I would do `nodetool assassinate`. On Wed, Aug 28, 2019 at 7:33 AM Vincent Rischmann mailto:vinc...@rischmann.fr>> wrote: Hi, while replacing a node in a cluster I saw this log: 2019-08-27 16:35:31,439 Gossiper.java:995 - InetAddress /10.15.53.27<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.15.53.27=DwMFAg=z0adcvxXWKG6LAMN6dVEqQ=W9UI0GQq10yOhf5LxSjoITGT9p69DtOfFK_UGgl4kx8=kS56sxxKgO_TMMOvPmtTFIEW8M-c-pm5Dh-dJVf7_pA=hzMFMit5iJlSQrtHTmcoepAiFg-t5CGPnjZQeLduo4A=> is now DOWN it caught my attention because that ip address doesn't exist anymore in the cluster and it hasn't for a long time. After some reading I ran `nodetool gossipinfo` and I saw these entries which are nodes that don't exist anymore: /10.15.53.27<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.15.53.27=DwMFAg=z0adcvxXWKG6LAMN6dVEqQ=W9UI0GQq10yOhf5LxSjoITGT9p69DtOfFK_UGgl4kx8=kS56sxxKgO_TMMOvPmtTFIEW8M-c-pm5Dh-dJVf7_pA=hzMFMit5iJlSQrtHTmcoepAiFg-t5CGPnjZQeLduo4A=> generation:1503480618 heartbeat:26970 STATUS:2:hibernate,true LOAD:26810:6.17363354147E11 SCHEMA:101:d21b1e47-f226-3417-8de7-5802518ae824 DC:10:DC1 RACK:12:RAC1 RELEASE_VERSION:6:2.1.18 INTERNAL_IP:8:10.15.53.27 RPC_ADDRESS:5:10.15.53.27 SEVERITY:26972:0.0 NET_VERSION:3:8 HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b TOKENS:1: /10.5.1.16<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.5.1.16=DwMFAg=z0adcvxXWKG6LAMN6dVEqQ=W9UI0GQq10yOhf5LxSjoITGT9p69DtOfFK_UGgl4kx8=kS56sxxKgO_TMMOvPmtTFIEW8M-c-pm5Dh-dJVf7_pA=rb7LNU-vuRE1cs3Nzup8H-mjsgVNkaE5SgQYtCM5amA=> generation:1503636779 heartbeat:324 STATUS:2:hibernate,true LOAD:204:2.601990697532E12 SCHEMA:14:d21b1e47-f226-3417-8de7-5802518ae824 DC:10:DC1 RACK:12:RAC1 RELEASE_VERSION:6:2.1.18 INTERNAL_IP:8:10.5.1.16 RPC_ADDRESS:5:10.5.1.16 SEVERITY:326:0.0 NET_VERSION:3:8 HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b TOKENS:1: the generations are: - Wed, 23 Aug 2017 09:30:18 GMT - Fri, 25 Aug 2017 04:52:59 GMT I don't remember what we did at that time but it looks like we botched something while joining a node or something. After reading https://thelastpickle.com/blog/2018/09/18/assassinate.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__thelastpickle.com_blog_2018_09_18_assassinate.html=DwMFAg=z0adcvxXWKG6LAMN6dVEqQ=W9UI0GQq10yOhf5LxSjoITGT9p69DtOfFK_UGgl4kx8=kS56sxxKgO_TMMOvPmtTFIEW8M-c-pm5Dh-dJVf7_pA=nq2MU2bQmBvRn14-ALr4SpzhmqeeYYGXCOye1zjnQJw=> I'm thinking of doing the following: * nodetool removenode 10.15.53.27 * if it doesn't work for some reason: nodetool assassinate 10.15.53.27 Since those nodes have been long dead and don't appear in system.peer I don't anticipate any problems but I'd like some confirmation that this can't break my cluster. Thanks !
Re: gossipinfo contains two nodes dead for more than two years
Yep, they're not visible in both ring and status. On Wed, Aug 28, 2019, at 17:08, Jeff Jirsa wrote: > Based on what you've posted, I assume the instances are not visible in > `nodetool ring` or `nodetool status`, and the only reason you know they're > still in gossipinfo is you see them in the logs? If that's the case, then > yes, I would do `nodetool assassinate`. > > > > On Wed, Aug 28, 2019 at 7:33 AM Vincent Rischmann > wrote: >> __ >> Hi, >> >> while replacing a node in a cluster I saw this log: >> >> 2019-08-27 16:35:31,439 Gossiper.java:995 - InetAddress /10.15.53.27 is now >> DOWN >> >> it caught my attention because that ip address doesn't exist anymore in the >> cluster and it hasn't for a long time. >> >> After some reading I ran `nodetool gossipinfo` and I saw these entries which >> are nodes that don't exist anymore: >> >> /10.15.53.27 >> generation:1503480618 >> heartbeat:26970 >> STATUS:2:hibernate,true >> LOAD:26810:6.17363354147E11 >> SCHEMA:101:d21b1e47-f226-3417-8de7-5802518ae824 >> DC:10:DC1 >> RACK:12:RAC1 >> RELEASE_VERSION:6:2.1.18 >> INTERNAL_IP:8:10.15.53.27 >> RPC_ADDRESS:5:10.15.53.27 >> SEVERITY:26972:0.0 >> NET_VERSION:3:8 >> HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b >> TOKENS:1: >> /10.5.1.16 >> generation:1503636779 >> heartbeat:324 >> STATUS:2:hibernate,true >> LOAD:204:2.601990697532E12 >> SCHEMA:14:d21b1e47-f226-3417-8de7-5802518ae824 >> DC:10:DC1 >> RACK:12:RAC1 >> RELEASE_VERSION:6:2.1.18 >> INTERNAL_IP:8:10.5.1.16 >> RPC_ADDRESS:5:10.5.1.16 >> SEVERITY:326:0.0 >> NET_VERSION:3:8 >> HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b >> TOKENS:1: >> >> the generations are: >> >> - Wed, 23 Aug 2017 09:30:18 GMT >> - Fri, 25 Aug 2017 04:52:59 GMT >> >> I don't remember what we did at that time but it looks like we botched >> something while joining a node or something. >> >> After reading https://thelastpickle.com/blog/2018/09/18/assassinate.html I'm >> thinking of doing the following: >> >> * nodetool removenode 10.15.53.27 >> * if it doesn't work for some reason: nodetool assassinate 10.15.53.27 >> >> Since those nodes have been long dead and don't appear in system.peer I >> don't anticipate any problems but I'd like some confirmation that this can't >> break my cluster. >> >> Thanks !
Re: gossipinfo contains two nodes dead for more than two years
Based on what you've posted, I assume the instances are not visible in `nodetool ring` or `nodetool status`, and the only reason you know they're still in gossipinfo is you see them in the logs? If that's the case, then yes, I would do `nodetool assassinate`. On Wed, Aug 28, 2019 at 7:33 AM Vincent Rischmann wrote: > Hi, > > while replacing a node in a cluster I saw this log: > > 2019-08-27 16:35:31,439 Gossiper.java:995 - InetAddress /10.15.53.27 > is now DOWN > > it caught my attention because that ip address doesn't exist anymore in > the cluster and it hasn't for a long time. > > After some reading I ran `nodetool gossipinfo` and I saw these entries > which are nodes that don't exist anymore: > > /10.15.53.27 > generation:1503480618 > heartbeat:26970 > STATUS:2:hibernate,true > LOAD:26810:6.17363354147E11 > SCHEMA:101:d21b1e47-f226-3417-8de7-5802518ae824 > DC:10:DC1 > RACK:12:RAC1 > RELEASE_VERSION:6:2.1.18 > INTERNAL_IP:8:10.15.53.27 > RPC_ADDRESS:5:10.15.53.27 > SEVERITY:26972:0.0 > NET_VERSION:3:8 > HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b > TOKENS:1: > /10.5.1.16 > generation:1503636779 > heartbeat:324 > STATUS:2:hibernate,true > LOAD:204:2.601990697532E12 > SCHEMA:14:d21b1e47-f226-3417-8de7-5802518ae824 > DC:10:DC1 > RACK:12:RAC1 > RELEASE_VERSION:6:2.1.18 > INTERNAL_IP:8:10.5.1.16 > RPC_ADDRESS:5:10.5.1.16 > SEVERITY:326:0.0 > NET_VERSION:3:8 > HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b > TOKENS:1: > > the generations are: > > - Wed, 23 Aug 2017 09:30:18 GMT > - Fri, 25 Aug 2017 04:52:59 GMT > > I don't remember what we did at that time but it looks like we botched > something while joining a node or something. > > After reading https://thelastpickle.com/blog/2018/09/18/assassinate.html > I'm thinking of doing the following: > > * nodetool removenode 10.15.53.27 > * if it doesn't work for some reason: nodetool assassinate 10.15.53.27 > > Since those nodes have been long dead and don't appear in system.peer I > don't anticipate any problems but I'd like some confirmation that this > can't break my cluster. > > Thanks ! >
gossipinfo contains two nodes dead for more than two years
Hi, while replacing a node in a cluster I saw this log: 2019-08-27 16:35:31,439 Gossiper.java:995 - InetAddress /10.15.53.27 is now DOWN it caught my attention because that ip address doesn't exist anymore in the cluster and it hasn't for a long time. After some reading I ran `nodetool gossipinfo` and I saw these entries which are nodes that don't exist anymore: /10.15.53.27 generation:1503480618 heartbeat:26970 STATUS:2:hibernate,true LOAD:26810:6.17363354147E11 SCHEMA:101:d21b1e47-f226-3417-8de7-5802518ae824 DC:10:DC1 RACK:12:RAC1 RELEASE_VERSION:6:2.1.18 INTERNAL_IP:8:10.15.53.27 RPC_ADDRESS:5:10.15.53.27 SEVERITY:26972:0.0 NET_VERSION:3:8 HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b TOKENS:1: /10.5.1.16 generation:1503636779 heartbeat:324 STATUS:2:hibernate,true LOAD:204:2.601990697532E12 SCHEMA:14:d21b1e47-f226-3417-8de7-5802518ae824 DC:10:DC1 RACK:12:RAC1 RELEASE_VERSION:6:2.1.18 INTERNAL_IP:8:10.5.1.16 RPC_ADDRESS:5:10.5.1.16 SEVERITY:326:0.0 NET_VERSION:3:8 HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b TOKENS:1: the generations are: - Wed, 23 Aug 2017 09:30:18 GMT - Fri, 25 Aug 2017 04:52:59 GMT I don't remember what we did at that time but it looks like we botched something while joining a node or something. After reading https://thelastpickle.com/blog/2018/09/18/assassinate.html I'm thinking of doing the following: * nodetool removenode 10.15.53.27 * if it doesn't work for some reason: nodetool assassinate 10.15.53.27 Since those nodes have been long dead and don't appear in system.peer I don't anticipate any problems but I'd like some confirmation that this can't break my cluster. Thanks !