Re: Cassandra is not showing a node up hours after restart
Hello Paul, The behavior looks similar to what we experienced and reported. https://issues.apache.org/jira/browse/CASSANDRA-15138 In our testing, "service cassandra stop" makes a cluster sometimes in a wrong state. How about doing kill -9 ? Thanks, Hiro On Sun, Dec 8, 2019 at 7:47 PM Hossein Ghiyasi Mehr wrote: > > Which version of Cassandra did you install? deb or tar? > If it's deb, its script should be used for start/stop. > If it's tar, kill pid of cassandra to stop and use bin/cassandra to start. > > Stop doesn't need any other actions: drain, disable gossip & etc. > > Where do you use Cassandra? > --- > VafaTech : A Total Solution for Data Gathering & Analysis > --- > > > On Fri, Dec 6, 2019 at 11:20 PM Paul Mena wrote: >> >> As we are still without a functional Cassandra cluster in our development >> environment, I thought I’d try restarting the same node (one of 4 in the >> cluster) with the following command: >> >> >> >> ip=$(cat /etc/hostname); nodetool disablethrift && nodetool disablebinary && >> sleep 5 && nodetool disablegossip && nodetool drain && sleep 10 && sudo >> service cassandra restart && until echo "SELECT * FROM system.peers LIMIT >> 1;" | cqlsh $ip > /dev/null 2>&1; do echo "Node $ip is still DOWN"; sleep >> 10; done && echo "Node $ip is now UP" >> >> >> >> The above command returned “Node is now UP” after about 40 seconds, >> confirmed on “node001” via “nodetool status”: >> >> >> >> user@node001=> nodetool status >> >> Datacenter: datacenter1 >> >> === >> >> Status=Up/Down >> >> |/ State=Normal/Leaving/Joining/Moving >> >> -- Address Load Tokens OwnsHost ID >> Rack >> >> UN 192.168.187.121 539.43 GB 256 ? >> c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 >> >> UN 192.168.187.122 633.92 GB 256 ? >> bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 >> >> UN 192.168.187.123 576.31 GB 256 ? >> 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 >> >> UN 192.168.187.124 628.5 GB 256 ? >> b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 >> >> >> >> As was the case before, running “nodetool status” on any of the other nodes >> shows that “node001” is still down: >> >> >> >> user@node002=> nodetool status >> >> Datacenter: datacenter1 >> >> === >> >> Status=Up/Down >> >> |/ State=Normal/Leaving/Joining/Moving >> >> -- Address Load Tokens OwnsHost ID >> Rack >> >> DN 192.168.187.121 538.94 GB 256 ? >> c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 >> >> UN 192.168.187.122 634.04 GB 256 ? >> bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 >> >> UN 192.168.187.123 576.42 GB 256 ? >> 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 >> >> UN 192.168.187.124 628.56 GB 256 ? >> b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 >> >> >> >> Is it inadvisable to continue with the rolling restart? >> >> >> >> Paul Mena >> >> Senior Application Administrator >> >> WHOI - Information Services >> >> 508-289-3539 >> >> >> >> From: Shalom Sagges >> Sent: Tuesday, November 26, 2019 12:59 AM >> To: user@cassandra.apache.org >> Subject: Re: Cassandra is not showing a node up hours after restart >> >> >> >> Hi Paul, >> >> >> >> From the gossipinfo output, it looks like the node's IP address and >> rpc_address are different. >> >> /192.168.187.121 vs RPC_ADDRESS:192.168.185.121 >> >> You can also see that there's a schema disagreement between nodes, e.g. >> schema_id on node001 is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 and on node002 >> it is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801. >> >> You can run nodetool describecluster to see it as well. >> >> So I suggest to change the rpc_address to the ip_address of the node or set >> it to 0.0.0.0 and it should resolve the issue. >> >> >> >> Hope this helps! >> >> >> >> >> >> On Tue, Nov 26, 20
Re: Cassandra is not showing a node up hours after restart
Which version of Cassandra did you install? deb or tar? If it's deb, its script should be used for start/stop. If it's tar, kill pid of cassandra to stop and use bin/cassandra to start. Stop doesn't need any other actions: drain, disable gossip & etc. Where do you use Cassandra? *---* *VafaTech <http://www.vafatech.com> : A Total Solution for Data Gathering & Analysis* *---* On Fri, Dec 6, 2019 at 11:20 PM Paul Mena wrote: > As we are still without a functional Cassandra cluster in our development > environment, I thought I’d try restarting the same node (one of 4 in the > cluster) with the following command: > > > > ip=$(cat /etc/hostname); nodetool disablethrift && nodetool disablebinary > && sleep 5 && nodetool disablegossip && nodetool drain && sleep 10 && sudo > service cassandra restart && until echo "SELECT * FROM system.peers LIMIT > 1;" | cqlsh $ip > /dev/null 2>&1; do echo "Node $ip is still DOWN"; sleep > 10; done && echo "Node $ip is now UP" > > > > The above command returned “Node is now UP” after about 40 seconds, > confirmed on “node001” via “nodetool status”: > > > > user@node001=> nodetool status > > Datacenter: datacenter1 > > === > > Status=Up/Down > > |/ State=Normal/Leaving/Joining/Moving > > -- Address Load Tokens OwnsHost > ID Rack > > UN 192.168.187.121 539.43 GB 256 ? > c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 > > UN 192.168.187.122 633.92 GB 256 ? > bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 > > UN 192.168.187.123 576.31 GB 256 ? > 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 > > UN 192.168.187.124 628.5 GB 256 ? > b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 > > > > As was the case before, running “nodetool status” on any of the other > nodes shows that “node001” is still down: > > > > user@node002=> nodetool status > > Datacenter: datacenter1 > > === > > Status=Up/Down > > |/ State=Normal/Leaving/Joining/Moving > > -- Address Load Tokens OwnsHost > ID Rack > > DN 192.168.187.121 538.94 GB 256 ? > c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 > > UN 192.168.187.122 634.04 GB 256 ? > bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 > > UN 192.168.187.123 576.42 GB 256 ? > 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 > > UN 192.168.187.124 628.56 GB 256 ? > b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 > > > > Is it inadvisable to continue with the rolling restart? > > > > *Paul Mena* > > Senior Application Administrator > > WHOI - Information Services > > 508-289-3539 > > > > *From:* Shalom Sagges > *Sent:* Tuesday, November 26, 2019 12:59 AM > *To:* user@cassandra.apache.org > *Subject:* Re: Cassandra is not showing a node up hours after restart > > > > Hi Paul, > > > > From the gossipinfo output, it looks like the node's IP address and > rpc_address are different. > > /192.168.*187*.121 vs RPC_ADDRESS:192.168.*185*.121 > > You can also see that there's a schema disagreement between nodes, e.g. > schema_id on node001 is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 and on node002 > it is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801. > > You can run nodetool describecluster to see it as well. > > So I suggest to change the rpc_address to the ip_address of the node or > set it to 0.0.0.0 and it should resolve the issue. > > > > Hope this helps! > > > > > > On Tue, Nov 26, 2019 at 4:05 AM Inquistive allen > wrote: > > Hello , > > > > Check and compare everything parameters > > > > 1. Java version should ideally match across all nodes in the cluster > > 2. Check if port 7000 is open between the nodes. Use telnet or nc commands > > 3. You must see some clues in system logs, why the gossip is failing. > > > > Do confirm on the above things. > > > > Thanks > > > > > > On Tue, 26 Nov, 2019, 2:50 AM Paul Mena, wrote: > > NTP was restarted on the Cassandra nodes, but unfortunately I’m still > getting the same result: the restarted node does not appear to be rejoining > the cluster. > > > > Here’s another data point: “nodetool gossipinfo”, when run from the > restarted node (“node001”) shows a status of “normal”: > > > > user@node001=> nodetool -u gossipinfo >
RE: Cassandra is not showing a node up hours after restart
As we are still without a functional Cassandra cluster in our development environment, I thought I’d try restarting the same node (one of 4 in the cluster) with the following command: ip=$(cat /etc/hostname); nodetool disablethrift && nodetool disablebinary && sleep 5 && nodetool disablegossip && nodetool drain && sleep 10 && sudo service cassandra restart && until echo "SELECT * FROM system.peers LIMIT 1;" | cqlsh $ip > /dev/null 2>&1; do echo "Node $ip is still DOWN"; sleep 10; done && echo "Node $ip is now UP" The above command returned “Node is now UP” after about 40 seconds, confirmed on “node001” via “nodetool status”: user@node001=> nodetool status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens OwnsHost ID Rack UN 192.168.187.121 539.43 GB 256 ? c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 UN 192.168.187.122 633.92 GB 256 ? bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 UN 192.168.187.123 576.31 GB 256 ? 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 UN 192.168.187.124 628.5 GB 256 ? b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 As was the case before, running “nodetool status” on any of the other nodes shows that “node001” is still down: user@node002=> nodetool status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens OwnsHost ID Rack DN 192.168.187.121 538.94 GB 256 ? c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 UN 192.168.187.122 634.04 GB 256 ? bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 UN 192.168.187.123 576.42 GB 256 ? 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 UN 192.168.187.124 628.56 GB 256 ? b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 Is it inadvisable to continue with the rolling restart? Paul Mena Senior Application Administrator WHOI - Information Services 508-289-3539 From: Shalom Sagges Sent: Tuesday, November 26, 2019 12:59 AM To: user@cassandra.apache.org Subject: Re: Cassandra is not showing a node up hours after restart Hi Paul, From the gossipinfo output, it looks like the node's IP address and rpc_address are different. /192.168.187.121 vs RPC_ADDRESS:192.168.185.121 You can also see that there's a schema disagreement between nodes, e.g. schema_id on node001 is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 and on node002 it is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801. You can run nodetool describecluster to see it as well. So I suggest to change the rpc_address to the ip_address of the node or set it to 0.0.0.0 and it should resolve the issue. Hope this helps! On Tue, Nov 26, 2019 at 4:05 AM Inquistive allen mailto:inquial...@gmail.com>> wrote: Hello , Check and compare everything parameters 1. Java version should ideally match across all nodes in the cluster 2. Check if port 7000 is open between the nodes. Use telnet or nc commands 3. You must see some clues in system logs, why the gossip is failing. Do confirm on the above things. Thanks On Tue, 26 Nov, 2019, 2:50 AM Paul Mena, mailto:pm...@whoi.edu>> wrote: NTP was restarted on the Cassandra nodes, but unfortunately I’m still getting the same result: the restarted node does not appear to be rejoining the cluster. Here’s another data point: “nodetool gossipinfo”, when run from the restarted node (“node001”) shows a status of “normal”: user@node001=> nodetool -u gossipinfo /192.168.187.121<http://192.168.187.121> generation:1574364410 heartbeat:209150 NET_VERSION:8 RACK:rack1 STATUS:NORMAL,-104847506331695918 RELEASE_VERSION:2.1.9 SEVERITY:0.0 LOAD:5.78684155614E11 HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 DC:datacenter1 RPC_ADDRESS:192.168.185.121 When run from one of the other nodes, however, node001’s status is shown as “shutdown”: user@node002=> nodetool gossipinfo /192.168.187.121<http://192.168.187.121> generation:1491825076 heartbeat:2147483647 STATUS:shutdown,true RACK:rack1 NET_VERSION:8 LOAD:5.78679987693E11 RELEASE_VERSION:2.1.9 DC:datacenter1 SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b RPC_ADDRESS:192.168.185.121 SEVERITY:0.0 Paul Mena Senior Application Administrator WHOI - Information Services 508-289-3539 From: Paul Mena Sent: Monday, November 25, 2019 9:29 AM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: RE: Cassandra is not showing a node up hours after restart I’ve just discovered that NTP is not running on any of these Cassandra nodes, and that the timestamps are all over the map. Could this be causing
Re: Cassandra is not showing a node up hours after restart
Sorry, disregard the schema ID. It's too early in the morning here ;) On Tue, Nov 26, 2019 at 7:58 AM Shalom Sagges wrote: > Hi Paul, > > From the gossipinfo output, it looks like the node's IP address and > rpc_address are different. > /192.168.*187*.121 vs RPC_ADDRESS:192.168.*185*.121 > You can also see that there's a schema disagreement between nodes, e.g. > schema_id on node001 is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 and on node002 > it is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801. > You can run nodetool describecluster to see it as well. > So I suggest to change the rpc_address to the ip_address of the node or > set it to 0.0.0.0 and it should resolve the issue. > > Hope this helps! > > > On Tue, Nov 26, 2019 at 4:05 AM Inquistive allen > wrote: > >> Hello , >> >> Check and compare everything parameters >> >> 1. Java version should ideally match across all nodes in the cluster >> 2. Check if port 7000 is open between the nodes. Use telnet or nc commands >> 3. You must see some clues in system logs, why the gossip is failing. >> >> Do confirm on the above things. >> >> Thanks >> >> >> On Tue, 26 Nov, 2019, 2:50 AM Paul Mena, wrote: >> >>> NTP was restarted on the Cassandra nodes, but unfortunately I’m still >>> getting the same result: the restarted node does not appear to be rejoining >>> the cluster. >>> >>> >>> >>> Here’s another data point: “nodetool gossipinfo”, when run from the >>> restarted node (“node001”) shows a status of “normal”: >>> >>> >>> >>> user@node001=> nodetool -u gossipinfo >>> >>> /192.168.187.121 >>> >>> generation:1574364410 >>> >>> heartbeat:209150 >>> >>> NET_VERSION:8 >>> >>> RACK:rack1 >>> >>> STATUS:NORMAL,-104847506331695918 >>> >>> RELEASE_VERSION:2.1.9 >>> >>> SEVERITY:0.0 >>> >>> LOAD:5.78684155614E11 >>> >>> HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b >>> >>> SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 >>> >>> DC:datacenter1 >>> >>> RPC_ADDRESS:192.168.185.121 >>> >>> >>> >>> When run from one of the other nodes, however, node001’s status is shown >>> as “shutdown”: >>> >>> >>> >>> user@node002=> nodetool gossipinfo >>> >>> /192.168.187.121 >>> >>> generation:1491825076 >>> >>> heartbeat:2147483647 >>> >>> STATUS:shutdown,true >>> >>> RACK:rack1 >>> >>> NET_VERSION:8 >>> >>> LOAD:5.78679987693E11 >>> >>> RELEASE_VERSION:2.1.9 >>> >>> DC:datacenter1 >>> >>> SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 >>> >>> HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b >>> >>> RPC_ADDRESS:192.168.185.121 >>> >>> SEVERITY:0.0 >>> >>> >>> >>> >>> >>> *Paul Mena* >>> >>> Senior Application Administrator >>> >>> WHOI - Information Services >>> >>> 508-289-3539 >>> >>> >>> >>> *From:* Paul Mena >>> *Sent:* Monday, November 25, 2019 9:29 AM >>> *To:* user@cassandra.apache.org >>> *Subject:* RE: Cassandra is not showing a node up hours after restart >>> >>> >>> >>> I’ve just discovered that NTP is not running on any of these Cassandra >>> nodes, and that the timestamps are all over the map. Could this be causing >>> my issue? >>> >>> >>> >>> user@remote=> ansible pre-prod-cassandra -a date >>> >>> node001.intra.myorg.org | CHANGED | rc=0 >> >>> >>> Mon Nov 25 13:58:17 UTC 2019 >>> >>> >>> >>> node004.intra.myorg.org | CHANGED | rc=0 >> >>> >>> Mon Nov 25 14:07:20 UTC 2019 >>> >>> >>> >>> node003.intra.myorg.org | CHANGED | rc=0 >> >>> >>> Mon Nov 25 13:57:06 UTC 2019 >>> >>> >>> >>> node001.intra.myorg.org | CHANGED | rc=0 >> >>> >>> Mon Nov 25 14:07:22 UTC 2019 >>> >>> >>> >>> *Paul Mena* >>> >>> Senior Application Administrator >>> >>> WHOI - Information Services >>> >&g
Re: Cassandra is not showing a node up hours after restart
Hi Paul, >From the gossipinfo output, it looks like the node's IP address and rpc_address are different. /192.168.*187*.121 vs RPC_ADDRESS:192.168.*185*.121 You can also see that there's a schema disagreement between nodes, e.g. schema_id on node001 is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 and on node002 it is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801. You can run nodetool describecluster to see it as well. So I suggest to change the rpc_address to the ip_address of the node or set it to 0.0.0.0 and it should resolve the issue. Hope this helps! On Tue, Nov 26, 2019 at 4:05 AM Inquistive allen wrote: > Hello , > > Check and compare everything parameters > > 1. Java version should ideally match across all nodes in the cluster > 2. Check if port 7000 is open between the nodes. Use telnet or nc commands > 3. You must see some clues in system logs, why the gossip is failing. > > Do confirm on the above things. > > Thanks > > > On Tue, 26 Nov, 2019, 2:50 AM Paul Mena, wrote: > >> NTP was restarted on the Cassandra nodes, but unfortunately I’m still >> getting the same result: the restarted node does not appear to be rejoining >> the cluster. >> >> >> >> Here’s another data point: “nodetool gossipinfo”, when run from the >> restarted node (“node001”) shows a status of “normal”: >> >> >> >> user@node001=> nodetool -u gossipinfo >> >> /192.168.187.121 >> >> generation:1574364410 >> >> heartbeat:209150 >> >> NET_VERSION:8 >> >> RACK:rack1 >> >> STATUS:NORMAL,-104847506331695918 >> >> RELEASE_VERSION:2.1.9 >> >> SEVERITY:0.0 >> >> LOAD:5.78684155614E11 >> >> HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b >> >> SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 >> >> DC:datacenter1 >> >> RPC_ADDRESS:192.168.185.121 >> >> >> >> When run from one of the other nodes, however, node001’s status is shown >> as “shutdown”: >> >> >> >> user@node002=> nodetool gossipinfo >> >> /192.168.187.121 >> >> generation:1491825076 >> >> heartbeat:2147483647 >> >> STATUS:shutdown,true >> >> RACK:rack1 >> >> NET_VERSION:8 >> >> LOAD:5.78679987693E11 >> >> RELEASE_VERSION:2.1.9 >> >> DC:datacenter1 >> >> SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 >> >> HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b >> >> RPC_ADDRESS:192.168.185.121 >> >> SEVERITY:0.0 >> >> >> >> >> >> *Paul Mena* >> >> Senior Application Administrator >> >> WHOI - Information Services >> >> 508-289-3539 >> >> >> >> *From:* Paul Mena >> *Sent:* Monday, November 25, 2019 9:29 AM >> *To:* user@cassandra.apache.org >> *Subject:* RE: Cassandra is not showing a node up hours after restart >> >> >> >> I’ve just discovered that NTP is not running on any of these Cassandra >> nodes, and that the timestamps are all over the map. Could this be causing >> my issue? >> >> >> >> user@remote=> ansible pre-prod-cassandra -a date >> >> node001.intra.myorg.org | CHANGED | rc=0 >> >> >> Mon Nov 25 13:58:17 UTC 2019 >> >> >> >> node004.intra.myorg.org | CHANGED | rc=0 >> >> >> Mon Nov 25 14:07:20 UTC 2019 >> >> >> >> node003.intra.myorg.org | CHANGED | rc=0 >> >> >> Mon Nov 25 13:57:06 UTC 2019 >> >> >> >> node001.intra.myorg.org | CHANGED | rc=0 >> >> >> Mon Nov 25 14:07:22 UTC 2019 >> >> >> >> *Paul Mena* >> >> Senior Application Administrator >> >> WHOI - Information Services >> >> 508-289-3539 >> >> >> >> *From:* Inquistive allen >> *Sent:* Monday, November 25, 2019 2:46 AM >> *To:* user@cassandra.apache.org >> *Subject:* Re: Cassandra is not showing a node up hours after restart >> >> >> >> Hello team, >> >> >> >> Just to add on to the discussion, one may run, >> >> Nodetool disablebinary followed by a nodetool disablethrift followed by >> nodetool drain. >> >> Nodetool drain also does the work of nodetool flush+ declaring in the >> cluster that I'm down and not accepting traffic. >> >> >> >> Thanks >> >> >> >> >> >> On Mon, 25 Nov, 2019, 12:55 AM Surbhi Gupta, >> wrote: >
Re: Cassandra is not showing a node up hours after restart
Hello , Check and compare everything parameters 1. Java version should ideally match across all nodes in the cluster 2. Check if port 7000 is open between the nodes. Use telnet or nc commands 3. You must see some clues in system logs, why the gossip is failing. Do confirm on the above things. Thanks On Tue, 26 Nov, 2019, 2:50 AM Paul Mena, wrote: > NTP was restarted on the Cassandra nodes, but unfortunately I’m still > getting the same result: the restarted node does not appear to be rejoining > the cluster. > > > > Here’s another data point: “nodetool gossipinfo”, when run from the > restarted node (“node001”) shows a status of “normal”: > > > > user@node001=> nodetool -u gossipinfo > > /192.168.187.121 > > generation:1574364410 > > heartbeat:209150 > > NET_VERSION:8 > > RACK:rack1 > > STATUS:NORMAL,-104847506331695918 > > RELEASE_VERSION:2.1.9 > > SEVERITY:0.0 > > LOAD:5.78684155614E11 > > HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b > > SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 > > DC:datacenter1 > > RPC_ADDRESS:192.168.185.121 > > > > When run from one of the other nodes, however, node001’s status is shown > as “shutdown”: > > > > user@node002=> nodetool gossipinfo > > /192.168.187.121 > > generation:1491825076 > > heartbeat:2147483647 > > STATUS:shutdown,true > > RACK:rack1 > > NET_VERSION:8 > > LOAD:5.78679987693E11 > > RELEASE_VERSION:2.1.9 > > DC:datacenter1 > > SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 > > HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b > > RPC_ADDRESS:192.168.185.121 > > SEVERITY:0.0 > > > > > > *Paul Mena* > > Senior Application Administrator > > WHOI - Information Services > > 508-289-3539 > > > > *From:* Paul Mena > *Sent:* Monday, November 25, 2019 9:29 AM > *To:* user@cassandra.apache.org > *Subject:* RE: Cassandra is not showing a node up hours after restart > > > > I’ve just discovered that NTP is not running on any of these Cassandra > nodes, and that the timestamps are all over the map. Could this be causing > my issue? > > > > user@remote=> ansible pre-prod-cassandra -a date > > node001.intra.myorg.org | CHANGED | rc=0 >> > > Mon Nov 25 13:58:17 UTC 2019 > > > > node004.intra.myorg.org | CHANGED | rc=0 >> > > Mon Nov 25 14:07:20 UTC 2019 > > > > node003.intra.myorg.org | CHANGED | rc=0 >> > > Mon Nov 25 13:57:06 UTC 2019 > > > > node001.intra.myorg.org | CHANGED | rc=0 >> > > Mon Nov 25 14:07:22 UTC 2019 > > > > *Paul Mena* > > Senior Application Administrator > > WHOI - Information Services > > 508-289-3539 > > > > *From:* Inquistive allen > *Sent:* Monday, November 25, 2019 2:46 AM > *To:* user@cassandra.apache.org > *Subject:* Re: Cassandra is not showing a node up hours after restart > > > > Hello team, > > > > Just to add on to the discussion, one may run, > > Nodetool disablebinary followed by a nodetool disablethrift followed by > nodetool drain. > > Nodetool drain also does the work of nodetool flush+ declaring in the > cluster that I'm down and not accepting traffic. > > > > Thanks > > > > > > On Mon, 25 Nov, 2019, 12:55 AM Surbhi Gupta, > wrote: > > Before Cassandra shutdown, nodetool drain should be executed first. As > soon as you do nodetool drain, others node will see this node down and no > new traffic will come to this node. > > I generally gives 10 seconds gap between nodetool drain and Cassandra > stop. > > > > On Sun, Nov 24, 2019 at 9:52 AM Paul Mena wrote: > > Thank you for the replies. I had made no changes to the config before the > rolling restart. > > > > I can try another restart but was wondering if I should do it differently. > I had simply done "service cassandra stop" followed by "service cassandra > start". Since then I've seen some suggestions to proceed the shutdown with > "nodetool disablegossip" and/or "nodetool drain". Are these commands > advisable? Are any other commands recommended either before the shutdown or > after the startup? > > > > Thanks again! > > > > Paul > -- > > *From:* Naman Gupta > *Sent:* Sunday, November 24, 2019 11:18:14 AM > *To:* user@cassandra.apache.org > *Subject:* Re: Cassandra is not showing a node up hours after restart > > > > Did you change the name of datacenter or any other config changes before > the rolling res
RE: Cassandra is not showing a node up hours after restart
NTP was restarted on the Cassandra nodes, but unfortunately I’m still getting the same result: the restarted node does not appear to be rejoining the cluster. Here’s another data point: “nodetool gossipinfo”, when run from the restarted node (“node001”) shows a status of “normal”: user@node001=> nodetool -u gossipinfo /192.168.187.121 generation:1574364410 heartbeat:209150 NET_VERSION:8 RACK:rack1 STATUS:NORMAL,-104847506331695918 RELEASE_VERSION:2.1.9 SEVERITY:0.0 LOAD:5.78684155614E11 HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 DC:datacenter1 RPC_ADDRESS:192.168.185.121 When run from one of the other nodes, however, node001’s status is shown as “shutdown”: user@node002=> nodetool gossipinfo /192.168.187.121 generation:1491825076 heartbeat:2147483647 STATUS:shutdown,true RACK:rack1 NET_VERSION:8 LOAD:5.78679987693E11 RELEASE_VERSION:2.1.9 DC:datacenter1 SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b RPC_ADDRESS:192.168.185.121 SEVERITY:0.0 Paul Mena Senior Application Administrator WHOI - Information Services 508-289-3539 From: Paul Mena Sent: Monday, November 25, 2019 9:29 AM To: user@cassandra.apache.org Subject: RE: Cassandra is not showing a node up hours after restart I’ve just discovered that NTP is not running on any of these Cassandra nodes, and that the timestamps are all over the map. Could this be causing my issue? user@remote=> ansible pre-prod-cassandra -a date node001.intra.myorg.org | CHANGED | rc=0 >> Mon Nov 25 13:58:17 UTC 2019 node004.intra.myorg.org | CHANGED | rc=0 >> Mon Nov 25 14:07:20 UTC 2019 node003.intra.myorg.org | CHANGED | rc=0 >> Mon Nov 25 13:57:06 UTC 2019 node001.intra.myorg.org | CHANGED | rc=0 >> Mon Nov 25 14:07:22 UTC 2019 Paul Mena Senior Application Administrator WHOI - Information Services 508-289-3539 From: Inquistive allen mailto:inquial...@gmail.com>> Sent: Monday, November 25, 2019 2:46 AM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Cassandra is not showing a node up hours after restart Hello team, Just to add on to the discussion, one may run, Nodetool disablebinary followed by a nodetool disablethrift followed by nodetool drain. Nodetool drain also does the work of nodetool flush+ declaring in the cluster that I'm down and not accepting traffic. Thanks On Mon, 25 Nov, 2019, 12:55 AM Surbhi Gupta, mailto:surbhi.gupt...@gmail.com>> wrote: Before Cassandra shutdown, nodetool drain should be executed first. As soon as you do nodetool drain, others node will see this node down and no new traffic will come to this node. I generally gives 10 seconds gap between nodetool drain and Cassandra stop. On Sun, Nov 24, 2019 at 9:52 AM Paul Mena mailto:pm...@whoi.edu>> wrote: Thank you for the replies. I had made no changes to the config before the rolling restart. I can try another restart but was wondering if I should do it differently. I had simply done "service cassandra stop" followed by "service cassandra start". Since then I've seen some suggestions to proceed the shutdown with "nodetool disablegossip" and/or "nodetool drain". Are these commands advisable? Are any other commands recommended either before the shutdown or after the startup? Thanks again! Paul From: Naman Gupta mailto:naman.gu...@girnarsoft.com>> Sent: Sunday, November 24, 2019 11:18:14 AM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Cassandra is not showing a node up hours after restart Did you change the name of datacenter or any other config changes before the rolling restart? On Sun, Nov 24, 2019 at 8:49 PM Paul Mena mailto:pm...@whoi.edu>> wrote: I am in the process of doing a rolling restart on a 4-node cluster running Cassandra 2.1.9. I stopped and started Cassandra on node 1 via "service cassandra stop/start", and noted nothing unusual in either system.log or cassandra.log. Doing a "nodetool status" from node 1 shows all four nodes up: user@node001=> nodetool status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens OwnsHost ID Rack UN 192.168.187.121 538.95 GB 256 ? c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 UN 192.168.187.122 630.72 GB 256 ? bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 UN 192.168.187.123 572.73 GB 256 ? 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 UN 192.168.187.124 625.05 GB 256 ? b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 But doing the same command from any other of the 3 nodes shows node 1 still down. user@node002=> nodetool status Datacenter
RE: Cassandra is not showing a node up hours after restart
I’ve just discovered that NTP is not running on any of these Cassandra nodes, and that the timestamps are all over the map. Could this be causing my issue? user@remote=> ansible pre-prod-cassandra -a date node001.intra.myorg.org | CHANGED | rc=0 >> Mon Nov 25 13:58:17 UTC 2019 node004.intra.myorg.org | CHANGED | rc=0 >> Mon Nov 25 14:07:20 UTC 2019 node003.intra.myorg.org | CHANGED | rc=0 >> Mon Nov 25 13:57:06 UTC 2019 node001.intra.myorg.org | CHANGED | rc=0 >> Mon Nov 25 14:07:22 UTC 2019 Paul Mena Senior Application Administrator WHOI - Information Services 508-289-3539 From: Inquistive allen Sent: Monday, November 25, 2019 2:46 AM To: user@cassandra.apache.org Subject: Re: Cassandra is not showing a node up hours after restart Hello team, Just to add on to the discussion, one may run, Nodetool disablebinary followed by a nodetool disablethrift followed by nodetool drain. Nodetool drain also does the work of nodetool flush+ declaring in the cluster that I'm down and not accepting traffic. Thanks On Mon, 25 Nov, 2019, 12:55 AM Surbhi Gupta, mailto:surbhi.gupt...@gmail.com>> wrote: Before Cassandra shutdown, nodetool drain should be executed first. As soon as you do nodetool drain, others node will see this node down and no new traffic will come to this node. I generally gives 10 seconds gap between nodetool drain and Cassandra stop. On Sun, Nov 24, 2019 at 9:52 AM Paul Mena mailto:pm...@whoi.edu>> wrote: Thank you for the replies. I had made no changes to the config before the rolling restart. I can try another restart but was wondering if I should do it differently. I had simply done "service cassandra stop" followed by "service cassandra start". Since then I've seen some suggestions to proceed the shutdown with "nodetool disablegossip" and/or "nodetool drain". Are these commands advisable? Are any other commands recommended either before the shutdown or after the startup? Thanks again! Paul From: Naman Gupta mailto:naman.gu...@girnarsoft.com>> Sent: Sunday, November 24, 2019 11:18:14 AM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Cassandra is not showing a node up hours after restart Did you change the name of datacenter or any other config changes before the rolling restart? On Sun, Nov 24, 2019 at 8:49 PM Paul Mena mailto:pm...@whoi.edu>> wrote: I am in the process of doing a rolling restart on a 4-node cluster running Cassandra 2.1.9. I stopped and started Cassandra on node 1 via "service cassandra stop/start", and noted nothing unusual in either system.log or cassandra.log. Doing a "nodetool status" from node 1 shows all four nodes up: user@node001=> nodetool status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens OwnsHost ID Rack UN 192.168.187.121 538.95 GB 256 ? c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 UN 192.168.187.122 630.72 GB 256 ? bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 UN 192.168.187.123 572.73 GB 256 ? 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 UN 192.168.187.124 625.05 GB 256 ? b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 But doing the same command from any other of the 3 nodes shows node 1 still down. user@node002=> nodetool status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens OwnsHost ID Rack DN 192.168.187.121 538.94 GB 256 ? c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 UN 192.168.187.122 630.72 GB 256 ? bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 UN 192.168.187.123 572.73 GB 256 ? 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 UN 192.168.187.124 625.04 GB 256 ? b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 Is there something I can do to remedy this current situation - so that I can continue with the rolling restart?
Re: Cassandra is not showing a node up hours after restart
Hello team, Just to add on to the discussion, one may run, Nodetool disablebinary followed by a nodetool disablethrift followed by nodetool drain. Nodetool drain also does the work of nodetool flush+ declaring in the cluster that I'm down and not accepting traffic. Thanks On Mon, 25 Nov, 2019, 12:55 AM Surbhi Gupta, wrote: > Before Cassandra shutdown, nodetool drain should be executed first. As > soon as you do nodetool drain, others node will see this node down and no > new traffic will come to this node. > I generally gives 10 seconds gap between nodetool drain and Cassandra > stop. > > On Sun, Nov 24, 2019 at 9:52 AM Paul Mena wrote: > >> Thank you for the replies. I had made no changes to the config before the >> rolling restart. >> >> >> I can try another restart but was wondering if I should do it >> differently. I had simply done "service cassandra stop" followed by >> "service cassandra start". Since then I've seen some suggestions to >> proceed the shutdown with "nodetool disablegossip" and/or "nodetool drain". >> Are these commands advisable? Are any other commands recommended either >> before the shutdown or after the startup? >> >> >> Thanks again! >> >> >> Paul >> -------------- >> *From:* Naman Gupta >> *Sent:* Sunday, November 24, 2019 11:18:14 AM >> *To:* user@cassandra.apache.org >> *Subject:* Re: Cassandra is not showing a node up hours after restart >> >> Did you change the name of datacenter or any other config changes before >> the rolling restart? >> >> On Sun, Nov 24, 2019 at 8:49 PM Paul Mena wrote: >> >>> I am in the process of doing a rolling restart on a 4-node cluster >>> running Cassandra 2.1.9. I stopped and started Cassandra on node 1 via >>> "service cassandra stop/start", and noted nothing unusual in either >>> system.log or cassandra.log. Doing a "nodetool status" from node 1 shows >>> all four nodes up: >>> >>> user@node001=> nodetool status >>> Datacenter: datacenter1 >>> >>> === >>> Status=Up/Down >>> |/ State=Normal/Leaving/Joining/Moving >>> -- Address Load Tokens OwnsHost ID >>> Rack >>> UN 192.168.187.121 538.95 GB 256 ? >>> c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 >>> UN 192.168.187.122 630.72 GB 256 ? >>> bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 >>> UN 192.168.187.123 572.73 GB 256 ? >>> 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 >>> UN 192.168.187.124 625.05 GB 256 ? >>> b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 >>> >>> But doing the same command from any other of the 3 nodes shows node 1 >>> still down. >>> >>> user@node002=> nodetool status >>> Datacenter: datacenter1 >>> === >>> Status=Up/Down >>> |/ State=Normal/Leaving/Joining/Moving >>> -- Address Load Tokens OwnsHost ID >>> Rack >>> DN 192.168.187.121 538.94 GB 256 ? >>> c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 >>> UN 192.168.187.122 630.72 GB 256 ? >>> bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 >>> UN 192.168.187.123 572.73 GB 256 ? >>> 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 >>> UN 192.168.187.124 625.04 GB 256 ? >>> b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 >>> >>> Is there something I can do to remedy this current situation - so that I >>> can continue with the rolling restart? >>> >>>
Re: Cassandra is not showing a node up hours after restart
Before Cassandra shutdown, nodetool drain should be executed first. As soon as you do nodetool drain, others node will see this node down and no new traffic will come to this node. I generally gives 10 seconds gap between nodetool drain and Cassandra stop. On Sun, Nov 24, 2019 at 9:52 AM Paul Mena wrote: > Thank you for the replies. I had made no changes to the config before the > rolling restart. > > > I can try another restart but was wondering if I should do it differently. > I had simply done "service cassandra stop" followed by "service cassandra > start". Since then I've seen some suggestions to proceed the shutdown with > "nodetool disablegossip" and/or "nodetool drain". Are these commands > advisable? Are any other commands recommended either before the shutdown or > after the startup? > > > Thanks again! > > > Paul > -- > *From:* Naman Gupta > *Sent:* Sunday, November 24, 2019 11:18:14 AM > *To:* user@cassandra.apache.org > *Subject:* Re: Cassandra is not showing a node up hours after restart > > Did you change the name of datacenter or any other config changes before > the rolling restart? > > On Sun, Nov 24, 2019 at 8:49 PM Paul Mena wrote: > >> I am in the process of doing a rolling restart on a 4-node cluster >> running Cassandra 2.1.9. I stopped and started Cassandra on node 1 via >> "service cassandra stop/start", and noted nothing unusual in either >> system.log or cassandra.log. Doing a "nodetool status" from node 1 shows >> all four nodes up: >> >> user@node001=> nodetool status >> Datacenter: datacenter1 >> >> === >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens OwnsHost ID >> Rack >> UN 192.168.187.121 538.95 GB 256 ? >> c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 >> UN 192.168.187.122 630.72 GB 256 ? >> bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 >> UN 192.168.187.123 572.73 GB 256 ? >> 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 >> UN 192.168.187.124 625.05 GB 256 ? >> b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 >> >> But doing the same command from any other of the 3 nodes shows node 1 >> still down. >> >> user@node002=> nodetool status >> Datacenter: datacenter1 >> === >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens OwnsHost ID >> Rack >> DN 192.168.187.121 538.94 GB 256 ? >> c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 >> UN 192.168.187.122 630.72 GB 256 ? >> bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 >> UN 192.168.187.123 572.73 GB 256 ? >> 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 >> UN 192.168.187.124 625.04 GB 256 ? >> b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 >> >> Is there something I can do to remedy this current situation - so that I >> can continue with the rolling restart? >> >>
Re: Cassandra is not showing a node up hours after restart
Thank you for the replies. I had made no changes to the config before the rolling restart. I can try another restart but was wondering if I should do it differently. I had simply done "service cassandra stop" followed by "service cassandra start". Since then I've seen some suggestions to proceed the shutdown with "nodetool disablegossip" and/or "nodetool drain". Are these commands advisable? Are any other commands recommended either before the shutdown or after the startup? Thanks again! Paul From: Naman Gupta Sent: Sunday, November 24, 2019 11:18:14 AM To: user@cassandra.apache.org Subject: Re: Cassandra is not showing a node up hours after restart Did you change the name of datacenter or any other config changes before the rolling restart? On Sun, Nov 24, 2019 at 8:49 PM Paul Mena mailto:pm...@whoi.edu>> wrote: I am in the process of doing a rolling restart on a 4-node cluster running Cassandra 2.1.9. I stopped and started Cassandra on node 1 via "service cassandra stop/start", and noted nothing unusual in either system.log or cassandra.log. Doing a "nodetool status" from node 1 shows all four nodes up: user@node001=> nodetool status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens OwnsHost ID Rack UN 192.168.187.121 538.95 GB 256 ? c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 UN 192.168.187.122 630.72 GB 256 ? bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 UN 192.168.187.123 572.73 GB 256 ? 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 UN 192.168.187.124 625.05 GB 256 ? b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 But doing the same command from any other of the 3 nodes shows node 1 still down. user@node002=> nodetool status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens OwnsHost ID Rack DN 192.168.187.121 538.94 GB 256 ? c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 UN 192.168.187.122 630.72 GB 256 ? bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 UN 192.168.187.123 572.73 GB 256 ? 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 UN 192.168.187.124 625.04 GB 256 ? b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 Is there something I can do to remedy this current situation - so that I can continue with the rolling restart?
Re: Cassandra is not showing a node up hours after restart
Did you change the name of datacenter or any other config changes before the rolling restart? On Sun, Nov 24, 2019 at 8:49 PM Paul Mena wrote: > I am in the process of doing a rolling restart on a 4-node cluster running > Cassandra 2.1.9. I stopped and started Cassandra on node 1 via "service > cassandra stop/start", and noted nothing unusual in either system.log or > cassandra.log. Doing a "nodetool status" from node 1 shows all four nodes > up: > > user@node001=> nodetool status > Datacenter: datacenter1 > > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens OwnsHost ID > Rack > UN 192.168.187.121 538.95 GB 256 ? > c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 > UN 192.168.187.122 630.72 GB 256 ? > bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 > UN 192.168.187.123 572.73 GB 256 ? > 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 > UN 192.168.187.124 625.05 GB 256 ? > b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 > > But doing the same command from any other of the 3 nodes shows node 1 > still down. > > user@node002=> nodetool status > Datacenter: datacenter1 > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens OwnsHost ID > Rack > DN 192.168.187.121 538.94 GB 256 ? > c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 > UN 192.168.187.122 630.72 GB 256 ? > bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 > UN 192.168.187.123 572.73 GB 256 ? > 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 > UN 192.168.187.124 625.04 GB 256 ? > b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 > > Is there something I can do to remedy this current situation - so that I > can continue with the rolling restart? > >
Re: Cassandra is not showing a node up hours after restart
It sounds silly but sometimes restarting again the node which is showing down from other nodes fix the issue. This looks like a gossip issue. On Sun, Nov 24, 2019 at 7:19 AM Paul Mena wrote: > I am in the process of doing a rolling restart on a 4-node cluster running > Cassandra 2.1.9. I stopped and started Cassandra on node 1 via "service > cassandra stop/start", and noted nothing unusual in either system.log or > cassandra.log. Doing a "nodetool status" from node 1 shows all four nodes > up: > > user@node001=> nodetool status > Datacenter: datacenter1 > > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens OwnsHost ID > Rack > UN 192.168.187.121 538.95 GB 256 ? > c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 > UN 192.168.187.122 630.72 GB 256 ? > bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 > UN 192.168.187.123 572.73 GB 256 ? > 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 > UN 192.168.187.124 625.05 GB 256 ? > b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 > > But doing the same command from any other of the 3 nodes shows node 1 > still down. > > user@node002=> nodetool status > Datacenter: datacenter1 > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens OwnsHost ID > Rack > DN 192.168.187.121 538.94 GB 256 ? > c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 > UN 192.168.187.122 630.72 GB 256 ? > bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 > UN 192.168.187.123 572.73 GB 256 ? > 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 > UN 192.168.187.124 625.04 GB 256 ? > b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 > > Is there something I can do to remedy this current situation - so that I > can continue with the rolling restart? > >
Cassandra is not showing a node up hours after restart
I am in the process of doing a rolling restart on a 4-node cluster running Cassandra 2.1.9. I stopped and started Cassandra on node 1 via "service cassandra stop/start", and noted nothing unusual in either system.log or cassandra.log. Doing a "nodetool status" from node 1 shows all four nodes up: user@node001=> nodetool status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens OwnsHost ID Rack UN 192.168.187.121 538.95 GB 256 ? c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 UN 192.168.187.122 630.72 GB 256 ? bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 UN 192.168.187.123 572.73 GB 256 ? 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 UN 192.168.187.124 625.05 GB 256 ? b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 But doing the same command from any other of the 3 nodes shows node 1 still down. user@node002=> nodetool status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens OwnsHost ID Rack DN 192.168.187.121 538.94 GB 256 ? c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 UN 192.168.187.122 630.72 GB 256 ? bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 UN 192.168.187.123 572.73 GB 256 ? 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 UN 192.168.187.124 625.04 GB 256 ? b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 Is there something I can do to remedy this current situation - so that I can continue with the rolling restart?