Re: Reads not returning data after adding node
Ok. Have to psych myself up to the add node task a bit. Didn't go well the first time round! Tasks - Make sure the new node is not in seeds list! - Check cluster name, listen address, rpc address - Give it its own rack in cassandra-rackdc.properties - Delete cassandra-topology.properties if it exists - Make sure no compactions are on the go - rm -rf /var/lib/cassandra/* - rm /data/cassandra/commitlog/* (this is on different disk) - systemctl start cassandra And it should start streaming data from the other nodes and join the cluster. Anything else I have to watch out for? Tx. On Tue, Apr 4, 2023 at 5:25 AM Jeff Jirsa wrote: > Because executing “removenode” streamed extra data from live nodes to the > “gaining” replica > > Oversimplified (if you had one token per node) > > If you start with A B C > > Then add D > > D should bootstrap a range from each of A B and C, but at the end, some of > the data that was A B C becomes B C D > > When you removenode, you tell B and C to send data back to A. > > A B and C will eventually contact that data away. Eventually. > > If you get around to adding D again, running “cleanup” when you’re done > (successfully) will remove a lot of it. > > > > On Apr 3, 2023, at 8:14 PM, David Tinker wrote: > > > Looks like the remove has sorted things out. Thanks. > > One thing I am wondering about is why the nodes are carrying a lot more > data? The loads were about 2.7T before, now 3.4T. > > # nodetool status > Datacenter: dc1 > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN xxx.xxx.xxx.105 3.4 TiB 256 100.0% > afd02287-3f88-4c6f-8b27-06f7a8192402 rack3 > UN xxx.xxx.xxx.253 3.34 TiB 256 100.0% > e1af72be-e5df-4c6b-a124-c7bc48c6602a rack2 > UN xxx.xxx.xxx.107 3.44 TiB 256 100.0% > ab72f017-be96-41d2-9bef-a551dec2c7b5 rack1 > > On Mon, Apr 3, 2023 at 5:42 PM Bowen Song via user < > user@cassandra.apache.org> wrote: > >> That's correct. nodetool removenode is strongly preferred when your node >> is already down. If the node is still functional, use nodetool >> decommission on the node instead. >> On 03/04/2023 16:32, Jeff Jirsa wrote: >> >> FWIW, `nodetool decommission` is strongly preferred. `nodetool >> removenode` is designed to be run when a host is offline. Only decommission >> is guaranteed to maintain consistency / correctness, and removemode >> probably streams a lot more data around than decommission. >> >> >> On Mon, Apr 3, 2023 at 6:47 AM Bowen Song via user < >> user@cassandra.apache.org> wrote: >> >>> Use nodetool removenode is strongly preferred in most circumstances, >>> and only resort to assassinate if you do not care about data >>> consistency or you know there won't be any consistency issue (e.g. no new >>> writes and did not run nodetool cleanup). >>> >>> Since the size of data on the new node is small, nodetool removenode >>> should finish fairly quickly and bring your cluster back. >>> >>> Next time when you are doing something like this again, please test it >>> out on a non-production environment, make sure everything works as expected >>> before moving onto the production. >>> >>> >>> On 03/04/2023 06:28, David Tinker wrote: >>> >>> Should I use assassinate or removenode? Given that there is some data on >>> the node. Or will that be found on the other nodes? Sorry for all the >>> questions but I really don't want to mess up. >>> >>> On Mon, Apr 3, 2023 at 7:21 AM Carlos Diaz wrote: >>> That's what nodetool assassinte will do. On Sun, Apr 2, 2023 at 10:19 PM David Tinker wrote: > Is it possible for me to remove the node from the cluster i.e. to undo > this mess and get the cluster operating again? > > On Mon, Apr 3, 2023 at 7:13 AM Carlos Diaz > wrote: > >> You can leave it in the seed list of the other nodes, just make sure >> it's not included in this node's seed list. However, if you do decide to >> fix the issue with the racks first assassinate this node (nodetool >> assassinate ), and update the rack name before you restart. >> >> On Sun, Apr 2, 2023 at 10:06 PM David Tinker >> wrote: >> >>> It is also in the seeds list for the other nodes. Should I remove it >>> from those, restart them one at a time, then restart it? >>> >>> /etc/cassandra # grep -i bootstrap * >>> doesn't show anything so I don't think I have auto_bootstrap false. >>> >>> Thanks very much for the help. >>> >>> >>> On Mon, Apr 3, 2023 at 7:01 AM Carlos Diaz >>> wrote: >>> Just remove it from the seed list in the cassandra.yaml file and restart the node. Make sure that auto_bootstrap is set to true first though. On Sun, Apr 2, 2023 at 9:59 PM David Tinker wrote: > So likely because I made it a seed node when I added it to
Re: Reads not returning data after adding node
Because executing “removenode” streamed extra data from live nodes to the “gaining” replicaOversimplified (if you had one token per node) If you start with A B CThen add DD should bootstrap a range from each of A B and C, but at the end, some of the data that was A B C becomes B C DWhen you removenode, you tell B and C to send data back to A. A B and C will eventually contact that data away. Eventually. If you get around to adding D again, running “cleanup” when you’re done (successfully) will remove a lot of it. On Apr 3, 2023, at 8:14 PM, David Tinker wrote:Looks like the remove has sorted things out. Thanks.One thing I am wondering about is why the nodes are carrying a lot more data? The loads were about 2.7T before, now 3.4T. # nodetool statusDatacenter: dc1===Status=Up/Down|/ State=Normal/Leaving/Joining/Moving-- Address Load Tokens Owns (effective) Host ID RackUN xxx.xxx.xxx.105 3.4 TiB 256 100.0% afd02287-3f88-4c6f-8b27-06f7a8192402 rack3UN xxx.xxx.xxx.253 3.34 TiB 256 100.0% e1af72be-e5df-4c6b-a124-c7bc48c6602a rack2UN xxx.xxx.xxx.107 3.44 TiB 256 100.0% ab72f017-be96-41d2-9bef-a551dec2c7b5 rack1On Mon, Apr 3, 2023 at 5:42 PM Bowen Song via userwrote: That's correct. nodetool removenode is strongly preferred when your node is already down. If the node is still functional, use nodetool decommission on the node instead. On 03/04/2023 16:32, Jeff Jirsa wrote: FWIW, `nodetool decommission` is strongly preferred. `nodetool removenode` is designed to be run when a host is offline. Only decommission is guaranteed to maintain consistency / correctness, and removemode probably streams a lot more data around than decommission. On Mon, Apr 3, 2023 at 6:47 AM Bowen Song via user wrote: Use nodetool removenode is strongly preferred in most circumstances, and only resort to assassinate if you do not care about data consistency or you know there won't be any consistency issue (e.g. no new writes and did not run nodetool cleanup). Since the size of data on the new node is small, nodetool removenode should finish fairly quickly and bring your cluster back. Next time when you are doing something like this again, please test it out on a non-production environment, make sure everything works as expected before moving onto the production. On 03/04/2023 06:28, David Tinker wrote: Should I use assassinate or removenode? Given that there is some data on the node. Or will that be found on the other nodes? Sorry for all the questions but I really don't want to mess up. On Mon, Apr 3, 2023 at 7:21 AM Carlos Diaz wrote: That's what nodetool assassinte will do. On Sun, Apr 2, 2023 at 10:19 PM David Tinker wrote: Is it possible for me to remove the node from the cluster i.e. to undo this mess and get the cluster operating again? On Mon, Apr 3, 2023 at 7:13 AM Carlos Diaz wrote: You can leave it in the seed list of the other nodes, just make sure it's not included in this node's seed list. However, if you do decide to fix the issue with the racks first assassinate this node (nodetool assassinate ), and update the rack name before you restart. On Sun, Apr 2, 2023 at 10:06 PM David Tinker
Re: Reads not returning data after adding node
Looks like the remove has sorted things out. Thanks. One thing I am wondering about is why the nodes are carrying a lot more data? The loads were about 2.7T before, now 3.4T. # nodetool status Datacenter: dc1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN xxx.xxx.xxx.105 3.4 TiB 256 100.0% afd02287-3f88-4c6f-8b27-06f7a8192402 rack3 UN xxx.xxx.xxx.253 3.34 TiB 256 100.0% e1af72be-e5df-4c6b-a124-c7bc48c6602a rack2 UN xxx.xxx.xxx.107 3.44 TiB 256 100.0% ab72f017-be96-41d2-9bef-a551dec2c7b5 rack1 On Mon, Apr 3, 2023 at 5:42 PM Bowen Song via user < user@cassandra.apache.org> wrote: > That's correct. nodetool removenode is strongly preferred when your node > is already down. If the node is still functional, use nodetool > decommission on the node instead. > On 03/04/2023 16:32, Jeff Jirsa wrote: > > FWIW, `nodetool decommission` is strongly preferred. `nodetool removenode` > is designed to be run when a host is offline. Only decommission is > guaranteed to maintain consistency / correctness, and removemode probably > streams a lot more data around than decommission. > > > On Mon, Apr 3, 2023 at 6:47 AM Bowen Song via user < > user@cassandra.apache.org> wrote: > >> Use nodetool removenode is strongly preferred in most circumstances, and >> only resort to assassinate if you do not care about data consistency or >> you know there won't be any consistency issue (e.g. no new writes and did >> not run nodetool cleanup). >> >> Since the size of data on the new node is small, nodetool removenode >> should finish fairly quickly and bring your cluster back. >> >> Next time when you are doing something like this again, please test it >> out on a non-production environment, make sure everything works as expected >> before moving onto the production. >> >> >> On 03/04/2023 06:28, David Tinker wrote: >> >> Should I use assassinate or removenode? Given that there is some data on >> the node. Or will that be found on the other nodes? Sorry for all the >> questions but I really don't want to mess up. >> >> On Mon, Apr 3, 2023 at 7:21 AM Carlos Diaz wrote: >> >>> That's what nodetool assassinte will do. >>> >>> On Sun, Apr 2, 2023 at 10:19 PM David Tinker >>> wrote: >>> Is it possible for me to remove the node from the cluster i.e. to undo this mess and get the cluster operating again? On Mon, Apr 3, 2023 at 7:13 AM Carlos Diaz wrote: > You can leave it in the seed list of the other nodes, just make sure > it's not included in this node's seed list. However, if you do decide to > fix the issue with the racks first assassinate this node (nodetool > assassinate ), and update the rack name before you restart. > > On Sun, Apr 2, 2023 at 10:06 PM David Tinker > wrote: > >> It is also in the seeds list for the other nodes. Should I remove it >> from those, restart them one at a time, then restart it? >> >> /etc/cassandra # grep -i bootstrap * >> doesn't show anything so I don't think I have auto_bootstrap false. >> >> Thanks very much for the help. >> >> >> On Mon, Apr 3, 2023 at 7:01 AM Carlos Diaz >> wrote: >> >>> Just remove it from the seed list in the cassandra.yaml file and >>> restart the node. Make sure that auto_bootstrap is set to true first >>> though. >>> >>> On Sun, Apr 2, 2023 at 9:59 PM David Tinker >>> wrote: >>> So likely because I made it a seed node when I added it to the cluster it didn't do the bootstrap process. How can I recover this? On Mon, Apr 3, 2023 at 6:41 AM David Tinker wrote: > Yes replication factor is 3. > > I ran nodetool repair -pr on all the nodes (one at a time) and am > still having issues getting data back from queries. > > I did make the new node a seed node. > > Re "rack4": I assumed that was just an indication as to the > physical location of the server for redundancy. This one is separate > from > the others so I used rack4. > > On Mon, Apr 3, 2023 at 6:30 AM Carlos Diaz > wrote: > >> I'm assuming that your replication factor is 3. If that's the >> case, did you intentionally put this node in rack 4? Typically, you >> want >> to add nodes in multiples of your replication factor in order to >> keep the >> "racks" balanced. In other words, this node should have been added >> to rack >> 1, 2 or 3. >> >> Having said that, you should be able to easily fix your problem >> by running a nodetool repair -pr on the new node. >> >> On Sun, Apr 2, 2023 at 8:16 PM David Tinker < >> david.tin...@gmail.com> wrote: >>
Re: Understanding rack in cassandra-rackdc.properties
Usually, it’s a good practice to resemble the real datacenter in the Cassandra topology, thus nodes mounted to distinct racks are know with different rack names to Cassandra. This is due to the usual datacenter infrastructure, having a single point of failure in each rack - e.g. a network switch. In that sense a rack can be considered a failure domain within the datacenter. Cassandra is making efforts to distribute its token ranges among the available nodes to minimize intersections within a singe rack. In the best case you can lose a whole rack without loosing more than a single replica of the affected partitions. (Note: this is just best effort) In some cases you can experience issues, e.g. if the number of nodes is very small, if nodes share some other resources that behave as single point of failure - like VMs do, etc. In such a case it might be better to configure each Cassandra node with the same rack. > On 3. Apr 2023, at 17:11, David Tinker wrote: > > I have a 3 node cluster using the GossipingPropertyFileSnitch and replication > factor of 3. All nodes are leased hardware and more or less the same. The > cassandra-rackdc.properties files look like this: > > dc=dc1 > rack=rack1 > (rack2 and rack3 for the other nodes) > > Now I need to expand the cluster. I was going to use rack4 for the next node, > then rack5 and rack6 because the nodes are physically all on different racks. > Elsewhere on this list someone mentioned that I should use rack1, rack2 and > rack3 again. > > Why is that? > > Thanks > David >
Re: Reads not returning data after adding node
That's correct. nodetool removenode is strongly preferred when your node is already down. If the node is still functional, use nodetool decommission on the node instead. On 03/04/2023 16:32, Jeff Jirsa wrote: FWIW, `nodetool decommission` is strongly preferred. `nodetool removenode` is designed to be run when a host is offline. Only decommission is guaranteed to maintain consistency / correctness, and removemode probably streams a lot more data around than decommission. On Mon, Apr 3, 2023 at 6:47 AM Bowen Song via user wrote: Use nodetool removenode is strongly preferred in most circumstances, and only resort to assassinate if you do not care about data consistency or you know there won't be any consistency issue (e.g. no new writes and did not run nodetool cleanup). Since the size of data on the new node is small, nodetool removenode should finish fairly quickly and bring your cluster back. Next time when you are doing something like this again, please test it out on a non-production environment, make sure everything works as expected before moving onto the production. On 03/04/2023 06:28, David Tinker wrote: Should I use assassinate or removenode? Given that there is some data on the node. Or will that be found on the other nodes? Sorry for all the questions but I really don't want to mess up. On Mon, Apr 3, 2023 at 7:21 AM Carlos Diaz wrote: That's what nodetool assassinte will do. On Sun, Apr 2, 2023 at 10:19 PM David Tinker wrote: Is it possible for me to remove the node from the cluster i.e. to undo this mess and get the cluster operating again? On Mon, Apr 3, 2023 at 7:13 AM Carlos Diaz wrote: You can leave it in the seed list of the other nodes, just make sure it's not included in this node's seed list. However, if you do decide to fix the issue with the racks first assassinate this node (nodetool assassinate ), and update the rack name before you restart. On Sun, Apr 2, 2023 at 10:06 PM David Tinker wrote: It is also in the seeds list for the other nodes. Should I remove it from those, restart them one at a time, then restart it? /etc/cassandra # grep -i bootstrap * doesn't show anything so I don't think I have auto_bootstrap false. Thanks very much for the help. On Mon, Apr 3, 2023 at 7:01 AM Carlos Diaz wrote: Just remove it from the seed list in the cassandra.yaml file and restart the node. Make sure that auto_bootstrap is set to true first though. On Sun, Apr 2, 2023 at 9:59 PM David Tinker wrote: So likely because I made it a seed node when I added it to the cluster it didn't do the bootstrap process. How can I recover this? On Mon, Apr 3, 2023 at 6:41 AM David Tinker wrote: Yes replication factor is 3. I ran nodetool repair -pr on all the nodes (one at a time) and am still having issues getting data back from queries. I did make the new node a seed node. Re "rack4": I assumed that was just an indication as to the physical location of the server for redundancy. This one is separate from the others so I used rack4. On Mon, Apr 3, 2023 at 6:30 AM Carlos Diaz wrote: I'm assuming that your replication factor is 3. If that's the case, did you intentionally put this node in rack 4? Typically, you want to add nodes in multiples of your replication factor in order to keep the "racks" balanced. In other words, this node should have been added to rack 1, 2 or 3. Having said that, you should be
Re: Understanding rack in cassandra-rackdc.properties
I just want to mention that the "rack" in Cassandra don't need to match the physical rack. As long as each "rack" in Cassandra fails independent of each other, it is fine. That means if you have 6 physical servers each in an unique physical rack and Cassandra RF=3, you can have any of the following configurations, and each of them makes sense and all of them will work correctly: 1. 6 racks in Cassandra, each contains only 1 server 2. 3 racks in Cassandra, each contains 2 servers 3. 1 rack in Cassandra, with all 6 servers in it On 03/04/2023 16:14, Jeff Jirsa wrote: As long as the number of racks is already at/above the number of nodes / replication factor, it's gonna be fine. Where it tends to surprise people is if you have RF=3 and either 1 or 2 racks, and then you add a third, that third rack gets one copy of "all" of the data, so you often run out of disk space. If you're already at 3 nodes / 3 racks / RF=3, you're already evenly distributed, the next (4th, 5th, 6th) racks will just be randomly assigned based on the random token allocation. On Mon, Apr 3, 2023 at 8:12 AM David Tinker wrote: I have a 3 node cluster using the GossipingPropertyFileSnitch and replication factor of 3. All nodes are leased hardware and more or less the same. The cassandra-rackdc.properties files look like this: dc=dc1 rack=rack1 (rack2 and rack3 for the other nodes) Now I need to expand the cluster. I was going to use rack4 for the next node, then rack5 and rack6 because the nodes are physically all on different racks. Elsewhere on this list someone mentioned that I should use rack1, rack2 and rack3 again. Why is that? Thanks David
Re: Reads not returning data after adding node
FWIW, `nodetool decommission` is strongly preferred. `nodetool removenode` is designed to be run when a host is offline. Only decommission is guaranteed to maintain consistency / correctness, and removemode probably streams a lot more data around than decommission. On Mon, Apr 3, 2023 at 6:47 AM Bowen Song via user < user@cassandra.apache.org> wrote: > Use nodetool removenode is strongly preferred in most circumstances, and > only resort to assassinate if you do not care about data consistency or > you know there won't be any consistency issue (e.g. no new writes and did > not run nodetool cleanup). > > Since the size of data on the new node is small, nodetool removenode > should finish fairly quickly and bring your cluster back. > > Next time when you are doing something like this again, please test it out > on a non-production environment, make sure everything works as expected > before moving onto the production. > > > On 03/04/2023 06:28, David Tinker wrote: > > Should I use assassinate or removenode? Given that there is some data on > the node. Or will that be found on the other nodes? Sorry for all the > questions but I really don't want to mess up. > > On Mon, Apr 3, 2023 at 7:21 AM Carlos Diaz wrote: > >> That's what nodetool assassinte will do. >> >> On Sun, Apr 2, 2023 at 10:19 PM David Tinker >> wrote: >> >>> Is it possible for me to remove the node from the cluster i.e. to undo >>> this mess and get the cluster operating again? >>> >>> On Mon, Apr 3, 2023 at 7:13 AM Carlos Diaz wrote: >>> You can leave it in the seed list of the other nodes, just make sure it's not included in this node's seed list. However, if you do decide to fix the issue with the racks first assassinate this node (nodetool assassinate ), and update the rack name before you restart. On Sun, Apr 2, 2023 at 10:06 PM David Tinker wrote: > It is also in the seeds list for the other nodes. Should I remove it > from those, restart them one at a time, then restart it? > > /etc/cassandra # grep -i bootstrap * > doesn't show anything so I don't think I have auto_bootstrap false. > > Thanks very much for the help. > > > On Mon, Apr 3, 2023 at 7:01 AM Carlos Diaz > wrote: > >> Just remove it from the seed list in the cassandra.yaml file and >> restart the node. Make sure that auto_bootstrap is set to true first >> though. >> >> On Sun, Apr 2, 2023 at 9:59 PM David Tinker >> wrote: >> >>> So likely because I made it a seed node when I added it to the >>> cluster it didn't do the bootstrap process. How can I recover this? >>> >>> On Mon, Apr 3, 2023 at 6:41 AM David Tinker >>> wrote: >>> Yes replication factor is 3. I ran nodetool repair -pr on all the nodes (one at a time) and am still having issues getting data back from queries. I did make the new node a seed node. Re "rack4": I assumed that was just an indication as to the physical location of the server for redundancy. This one is separate from the others so I used rack4. On Mon, Apr 3, 2023 at 6:30 AM Carlos Diaz wrote: > I'm assuming that your replication factor is 3. If that's the > case, did you intentionally put this node in rack 4? Typically, you > want > to add nodes in multiples of your replication factor in order to keep > the > "racks" balanced. In other words, this node should have been added > to rack > 1, 2 or 3. > > Having said that, you should be able to easily fix your problem by > running a nodetool repair -pr on the new node. > > On Sun, Apr 2, 2023 at 8:16 PM David Tinker < > david.tin...@gmail.com> wrote: > >> Hi All >> >> I recently added a node to my 3 node Cassandra 4.0.5 cluster and >> now many reads are not returning rows! What do I need to do to fix >> this? >> There weren't any errors in the logs or other problems that I could >> see. I >> expected the cluster to balance itself but this hasn't happened >> (yet?). The >> nodes are similar so I have num_tokens=256 for each. I am using the >> Murmur3Partitioner. >> >> # nodetool status >> Datacenter: dc1 >> === >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens Owns (effective) Host ID >> Rack >> UN xxx.xxx.xxx.105 2.65 TiB 256 72.9% >> afd02287-3f88-4c6f-8b27-06f7a8192402 rack3 >> UN xxx.xxx.xxx.253 2.6 TiB256 73.9% >> e1af72be-e5df-4c6b-a124-c7bc48c6602a rack2 >> UN xxx.xxx.xxx.24 93.82 KiB
Re: Reads not returning data after adding node
> > I just asked that question on this list and the answer was that adding the > new nodes as rack4, rack5 and rack6 is fine. They are all on > separate physical racks. Is that ok? > Yes, Jeff is right, all 6 nodes each on their own rack will work just fine. Should I do a full repair first or is the remove node operation basically > doing that? > I don't think you'll need a full repair. Removenode should be taking care of streaming that node's data to where it needs to go. On Mon, Apr 3, 2023 at 10:26 AM David Tinker wrote: > Thanks. Yes my big screwup here was to make the new node a seed node so it > didn't get any data. I am going to add 3 more nodes, one at a time when the > cluster has finished with the remove and everything seems stable. Should I > do a full repair first or is the remove node operation basically doing that? > > Re the racks. I just asked that question on this list and the answer was > that adding the new nodes as rack4, rack5 and rack6 is fine. They are all > on separate physical racks. Is that ok? > > On Mon, Apr 3, 2023 at 5:16 PM Aaron Ploetz wrote: > >> The time it takes to stream data off of a node varies by network, cloud >> region, and other factors. So it's not unheard of for it to take a bit to >> finish. >> >> Just thought I'd mention that auto_bootstrap is true by default. So if >> you're not setting it, the node should bootstrap as long as it's not a seed >> node. >> >> As for the rack issue, yes, it's a good idea to keep your racks in >> multiples of your RF. When performing token ownership calculations, >> Cassandra takes rack designation into consideration. It tries to ensure >> that multiple replicas for a row are not placed in the same rack. TBH - >> I'd build out two more nodes to have 6 nodes across 3 racks (2 in each), >> just to ensure even distribution. Otherwise, you might notice that the >> nodes sharing a rack will consume disk at a different rate than the nodes >> which have their own rack. >> >> On Mon, Apr 3, 2023 at 8:57 AM David Tinker >> wrote: >> >>> Thanks. Hmm, the remove has been busy for hours but seems to be >>> progressing. >>> >>> I have been running this on the nodes to monitor progress: >>> # nodetool netstats | grep Already >>> Receiving 92 files, 843934103369 bytes total. Already received >>> 82 files (89.13%), 590204687299 bytes total (69.93%) >>> Sending 84 files, 860198753783 bytes total. Already sent 56 >>> files (66.67%), 307038785732 bytes total (35.69%) >>> Sending 78 files, 815573435637 bytes total. Already sent 56 >>> files (71.79%), 313079823738 bytes total (38.39%) >>> >>> The percentages are ticking up. >>> >>> # nodetool ring | head -20 >>> Datacenter: dc1 >>> == >>> Address RackStatus State LoadOwns >>> Token >>> >>> 9189523899826545641 >>> xxx.xxx.xxx..24rack4 Down Leaving 26.62 GiB 79.95% >>> -9194674091837769168 >>> xxx.xxx.xxx.107 rack1 Up Normal 2.68 TiB73.25% >>> -9168781258594813088 >>> xxx.xxx.xxx.253 rack2 Up Normal 2.63 TiB73.92% >>> -9163037340977721917 >>> xxx.xxx.xxx.105 rack3 Up Normal 2.68 TiB72.88% >>> -9148860739730046229 >>> >>> >>> On Mon, Apr 3, 2023 at 3:46 PM Bowen Song via user < >>> user@cassandra.apache.org> wrote: >>> Use nodetool removenode is strongly preferred in most circumstances, and only resort to assassinate if you do not care about data consistency or you know there won't be any consistency issue (e.g. no new writes and did not run nodetool cleanup). Since the size of data on the new node is small, nodetool removenode should finish fairly quickly and bring your cluster back. Next time when you are doing something like this again, please test it out on a non-production environment, make sure everything works as expected before moving onto the production. On 03/04/2023 06:28, David Tinker wrote: Should I use assassinate or removenode? Given that there is some data on the node. Or will that be found on the other nodes? Sorry for all the questions but I really don't want to mess up. On Mon, Apr 3, 2023 at 7:21 AM Carlos Diaz wrote: > That's what nodetool assassinte will do. > > On Sun, Apr 2, 2023 at 10:19 PM David Tinker > wrote: > >> Is it possible for me to remove the node from the cluster i.e. to >> undo this mess and get the cluster operating again? >> >> On Mon, Apr 3, 2023 at 7:13 AM Carlos Diaz >> wrote: >> >>> You can leave it in the seed list of the other nodes, just make sure >>> it's not included in this node's seed list. However, if you do decide >>> to >>> fix the issue with the racks first assassinate this node (nodetool >>> assassinate ), and update the rack
Re: Reads not returning data after adding node
Thanks. Yes my big screwup here was to make the new node a seed node so it didn't get any data. I am going to add 3 more nodes, one at a time when the cluster has finished with the remove and everything seems stable. Should I do a full repair first or is the remove node operation basically doing that? Re the racks. I just asked that question on this list and the answer was that adding the new nodes as rack4, rack5 and rack6 is fine. They are all on separate physical racks. Is that ok? On Mon, Apr 3, 2023 at 5:16 PM Aaron Ploetz wrote: > The time it takes to stream data off of a node varies by network, cloud > region, and other factors. So it's not unheard of for it to take a bit to > finish. > > Just thought I'd mention that auto_bootstrap is true by default. So if > you're not setting it, the node should bootstrap as long as it's not a seed > node. > > As for the rack issue, yes, it's a good idea to keep your racks in > multiples of your RF. When performing token ownership calculations, > Cassandra takes rack designation into consideration. It tries to ensure > that multiple replicas for a row are not placed in the same rack. TBH - > I'd build out two more nodes to have 6 nodes across 3 racks (2 in each), > just to ensure even distribution. Otherwise, you might notice that the > nodes sharing a rack will consume disk at a different rate than the nodes > which have their own rack. > > On Mon, Apr 3, 2023 at 8:57 AM David Tinker > wrote: > >> Thanks. Hmm, the remove has been busy for hours but seems to be >> progressing. >> >> I have been running this on the nodes to monitor progress: >> # nodetool netstats | grep Already >> Receiving 92 files, 843934103369 bytes total. Already received 82 >> files (89.13%), 590204687299 bytes total (69.93%) >> Sending 84 files, 860198753783 bytes total. Already sent 56 files >> (66.67%), 307038785732 bytes total (35.69%) >> Sending 78 files, 815573435637 bytes total. Already sent 56 files >> (71.79%), 313079823738 bytes total (38.39%) >> >> The percentages are ticking up. >> >> # nodetool ring | head -20 >> Datacenter: dc1 >> == >> Address RackStatus State LoadOwns >>Token >> >>9189523899826545641 >> xxx.xxx.xxx..24rack4 Down Leaving 26.62 GiB 79.95% >> -9194674091837769168 >> xxx.xxx.xxx.107 rack1 Up Normal 2.68 TiB73.25% >>-9168781258594813088 >> xxx.xxx.xxx.253 rack2 Up Normal 2.63 TiB73.92% >>-9163037340977721917 >> xxx.xxx.xxx.105 rack3 Up Normal 2.68 TiB72.88% >>-9148860739730046229 >> >> >> On Mon, Apr 3, 2023 at 3:46 PM Bowen Song via user < >> user@cassandra.apache.org> wrote: >> >>> Use nodetool removenode is strongly preferred in most circumstances, >>> and only resort to assassinate if you do not care about data >>> consistency or you know there won't be any consistency issue (e.g. no new >>> writes and did not run nodetool cleanup). >>> >>> Since the size of data on the new node is small, nodetool removenode >>> should finish fairly quickly and bring your cluster back. >>> >>> Next time when you are doing something like this again, please test it >>> out on a non-production environment, make sure everything works as expected >>> before moving onto the production. >>> >>> >>> On 03/04/2023 06:28, David Tinker wrote: >>> >>> Should I use assassinate or removenode? Given that there is some data on >>> the node. Or will that be found on the other nodes? Sorry for all the >>> questions but I really don't want to mess up. >>> >>> On Mon, Apr 3, 2023 at 7:21 AM Carlos Diaz wrote: >>> That's what nodetool assassinte will do. On Sun, Apr 2, 2023 at 10:19 PM David Tinker wrote: > Is it possible for me to remove the node from the cluster i.e. to undo > this mess and get the cluster operating again? > > On Mon, Apr 3, 2023 at 7:13 AM Carlos Diaz > wrote: > >> You can leave it in the seed list of the other nodes, just make sure >> it's not included in this node's seed list. However, if you do decide to >> fix the issue with the racks first assassinate this node (nodetool >> assassinate ), and update the rack name before you restart. >> >> On Sun, Apr 2, 2023 at 10:06 PM David Tinker >> wrote: >> >>> It is also in the seeds list for the other nodes. Should I remove it >>> from those, restart them one at a time, then restart it? >>> >>> /etc/cassandra # grep -i bootstrap * >>> doesn't show anything so I don't think I have auto_bootstrap false. >>> >>> Thanks very much for the help. >>> >>> >>> On Mon, Apr 3, 2023 at 7:01 AM Carlos Diaz >>> wrote: >>> Just remove it from the seed list in the cassandra.yaml file and restart the node. Make sure that auto_bootstrap is set to true
Re: Reads not returning data after adding node
The time it takes to stream data off of a node varies by network, cloud region, and other factors. So it's not unheard of for it to take a bit to finish. Just thought I'd mention that auto_bootstrap is true by default. So if you're not setting it, the node should bootstrap as long as it's not a seed node. As for the rack issue, yes, it's a good idea to keep your racks in multiples of your RF. When performing token ownership calculations, Cassandra takes rack designation into consideration. It tries to ensure that multiple replicas for a row are not placed in the same rack. TBH - I'd build out two more nodes to have 6 nodes across 3 racks (2 in each), just to ensure even distribution. Otherwise, you might notice that the nodes sharing a rack will consume disk at a different rate than the nodes which have their own rack. On Mon, Apr 3, 2023 at 8:57 AM David Tinker wrote: > Thanks. Hmm, the remove has been busy for hours but seems to be > progressing. > > I have been running this on the nodes to monitor progress: > # nodetool netstats | grep Already > Receiving 92 files, 843934103369 bytes total. Already received 82 > files (89.13%), 590204687299 bytes total (69.93%) > Sending 84 files, 860198753783 bytes total. Already sent 56 files > (66.67%), 307038785732 bytes total (35.69%) > Sending 78 files, 815573435637 bytes total. Already sent 56 files > (71.79%), 313079823738 bytes total (38.39%) > > The percentages are ticking up. > > # nodetool ring | head -20 > Datacenter: dc1 > == > Address RackStatus State LoadOwns >Token > >9189523899826545641 > xxx.xxx.xxx..24rack4 Down Leaving 26.62 GiB 79.95% >-9194674091837769168 > xxx.xxx.xxx.107 rack1 Up Normal 2.68 TiB73.25% >-9168781258594813088 > xxx.xxx.xxx.253 rack2 Up Normal 2.63 TiB73.92% >-9163037340977721917 > xxx.xxx.xxx.105 rack3 Up Normal 2.68 TiB72.88% >-9148860739730046229 > > > On Mon, Apr 3, 2023 at 3:46 PM Bowen Song via user < > user@cassandra.apache.org> wrote: > >> Use nodetool removenode is strongly preferred in most circumstances, and >> only resort to assassinate if you do not care about data consistency or >> you know there won't be any consistency issue (e.g. no new writes and did >> not run nodetool cleanup). >> >> Since the size of data on the new node is small, nodetool removenode >> should finish fairly quickly and bring your cluster back. >> >> Next time when you are doing something like this again, please test it >> out on a non-production environment, make sure everything works as expected >> before moving onto the production. >> >> >> On 03/04/2023 06:28, David Tinker wrote: >> >> Should I use assassinate or removenode? Given that there is some data on >> the node. Or will that be found on the other nodes? Sorry for all the >> questions but I really don't want to mess up. >> >> On Mon, Apr 3, 2023 at 7:21 AM Carlos Diaz wrote: >> >>> That's what nodetool assassinte will do. >>> >>> On Sun, Apr 2, 2023 at 10:19 PM David Tinker >>> wrote: >>> Is it possible for me to remove the node from the cluster i.e. to undo this mess and get the cluster operating again? On Mon, Apr 3, 2023 at 7:13 AM Carlos Diaz wrote: > You can leave it in the seed list of the other nodes, just make sure > it's not included in this node's seed list. However, if you do decide to > fix the issue with the racks first assassinate this node (nodetool > assassinate ), and update the rack name before you restart. > > On Sun, Apr 2, 2023 at 10:06 PM David Tinker > wrote: > >> It is also in the seeds list for the other nodes. Should I remove it >> from those, restart them one at a time, then restart it? >> >> /etc/cassandra # grep -i bootstrap * >> doesn't show anything so I don't think I have auto_bootstrap false. >> >> Thanks very much for the help. >> >> >> On Mon, Apr 3, 2023 at 7:01 AM Carlos Diaz >> wrote: >> >>> Just remove it from the seed list in the cassandra.yaml file and >>> restart the node. Make sure that auto_bootstrap is set to true first >>> though. >>> >>> On Sun, Apr 2, 2023 at 9:59 PM David Tinker >>> wrote: >>> So likely because I made it a seed node when I added it to the cluster it didn't do the bootstrap process. How can I recover this? On Mon, Apr 3, 2023 at 6:41 AM David Tinker wrote: > Yes replication factor is 3. > > I ran nodetool repair -pr on all the nodes (one at a time) and am > still having issues getting data back from queries. > > I did make the new node a seed node. > > Re "rack4": I assumed that was just an indication as to the >
Re: Understanding rack in cassandra-rackdc.properties
As long as the number of racks is already at/above the number of nodes / replication factor, it's gonna be fine. Where it tends to surprise people is if you have RF=3 and either 1 or 2 racks, and then you add a third, that third rack gets one copy of "all" of the data, so you often run out of disk space. If you're already at 3 nodes / 3 racks / RF=3, you're already evenly distributed, the next (4th, 5th, 6th) racks will just be randomly assigned based on the random token allocation. On Mon, Apr 3, 2023 at 8:12 AM David Tinker wrote: > I have a 3 node cluster using the GossipingPropertyFileSnitch and > replication factor of 3. All nodes are leased hardware and more or less the > same. The cassandra-rackdc.properties files look like this: > > dc=dc1 > rack=rack1 > (rack2 and rack3 for the other nodes) > > Now I need to expand the cluster. I was going to use rack4 for the next > node, then rack5 and rack6 because the nodes are physically all on > different racks. Elsewhere on this list someone mentioned that I should use > rack1, rack2 and rack3 again. > > Why is that? > > Thanks > David > >
Understanding rack in cassandra-rackdc.properties
I have a 3 node cluster using the GossipingPropertyFileSnitch and replication factor of 3. All nodes are leased hardware and more or less the same. The cassandra-rackdc.properties files look like this: dc=dc1 rack=rack1 (rack2 and rack3 for the other nodes) Now I need to expand the cluster. I was going to use rack4 for the next node, then rack5 and rack6 because the nodes are physically all on different racks. Elsewhere on this list someone mentioned that I should use rack1, rack2 and rack3 again. Why is that? Thanks David
Re: Reads not returning data after adding node
Thanks. Hmm, the remove has been busy for hours but seems to be progressing. I have been running this on the nodes to monitor progress: # nodetool netstats | grep Already Receiving 92 files, 843934103369 bytes total. Already received 82 files (89.13%), 590204687299 bytes total (69.93%) Sending 84 files, 860198753783 bytes total. Already sent 56 files (66.67%), 307038785732 bytes total (35.69%) Sending 78 files, 815573435637 bytes total. Already sent 56 files (71.79%), 313079823738 bytes total (38.39%) The percentages are ticking up. # nodetool ring | head -20 Datacenter: dc1 == Address RackStatus State LoadOwns Token 9189523899826545641 xxx.xxx.xxx..24rack4 Down Leaving 26.62 GiB 79.95% -9194674091837769168 xxx.xxx.xxx.107 rack1 Up Normal 2.68 TiB73.25% -9168781258594813088 xxx.xxx.xxx.253 rack2 Up Normal 2.63 TiB73.92% -9163037340977721917 xxx.xxx.xxx.105 rack3 Up Normal 2.68 TiB72.88% -9148860739730046229 On Mon, Apr 3, 2023 at 3:46 PM Bowen Song via user < user@cassandra.apache.org> wrote: > Use nodetool removenode is strongly preferred in most circumstances, and > only resort to assassinate if you do not care about data consistency or > you know there won't be any consistency issue (e.g. no new writes and did > not run nodetool cleanup). > > Since the size of data on the new node is small, nodetool removenode > should finish fairly quickly and bring your cluster back. > > Next time when you are doing something like this again, please test it out > on a non-production environment, make sure everything works as expected > before moving onto the production. > > > On 03/04/2023 06:28, David Tinker wrote: > > Should I use assassinate or removenode? Given that there is some data on > the node. Or will that be found on the other nodes? Sorry for all the > questions but I really don't want to mess up. > > On Mon, Apr 3, 2023 at 7:21 AM Carlos Diaz wrote: > >> That's what nodetool assassinte will do. >> >> On Sun, Apr 2, 2023 at 10:19 PM David Tinker >> wrote: >> >>> Is it possible for me to remove the node from the cluster i.e. to undo >>> this mess and get the cluster operating again? >>> >>> On Mon, Apr 3, 2023 at 7:13 AM Carlos Diaz wrote: >>> You can leave it in the seed list of the other nodes, just make sure it's not included in this node's seed list. However, if you do decide to fix the issue with the racks first assassinate this node (nodetool assassinate ), and update the rack name before you restart. On Sun, Apr 2, 2023 at 10:06 PM David Tinker wrote: > It is also in the seeds list for the other nodes. Should I remove it > from those, restart them one at a time, then restart it? > > /etc/cassandra # grep -i bootstrap * > doesn't show anything so I don't think I have auto_bootstrap false. > > Thanks very much for the help. > > > On Mon, Apr 3, 2023 at 7:01 AM Carlos Diaz > wrote: > >> Just remove it from the seed list in the cassandra.yaml file and >> restart the node. Make sure that auto_bootstrap is set to true first >> though. >> >> On Sun, Apr 2, 2023 at 9:59 PM David Tinker >> wrote: >> >>> So likely because I made it a seed node when I added it to the >>> cluster it didn't do the bootstrap process. How can I recover this? >>> >>> On Mon, Apr 3, 2023 at 6:41 AM David Tinker >>> wrote: >>> Yes replication factor is 3. I ran nodetool repair -pr on all the nodes (one at a time) and am still having issues getting data back from queries. I did make the new node a seed node. Re "rack4": I assumed that was just an indication as to the physical location of the server for redundancy. This one is separate from the others so I used rack4. On Mon, Apr 3, 2023 at 6:30 AM Carlos Diaz wrote: > I'm assuming that your replication factor is 3. If that's the > case, did you intentionally put this node in rack 4? Typically, you > want > to add nodes in multiples of your replication factor in order to keep > the > "racks" balanced. In other words, this node should have been added > to rack > 1, 2 or 3. > > Having said that, you should be able to easily fix your problem by > running a nodetool repair -pr on the new node. > > On Sun, Apr 2, 2023 at 8:16 PM David Tinker < > david.tin...@gmail.com> wrote: > >> Hi All >> >> I recently added a node to my 3 node Cassandra 4.0.5 cluster and >> now many reads are not returning rows! What do I need to do to fix
Re: Reads not returning data after adding node
Use nodetool removenode is strongly preferred in most circumstances, and only resort to assassinate if you do not care about data consistency or you know there won't be any consistency issue (e.g. no new writes and did not run nodetool cleanup). Since the size of data on the new node is small, nodetool removenode should finish fairly quickly and bring your cluster back. Next time when you are doing something like this again, please test it out on a non-production environment, make sure everything works as expected before moving onto the production. On 03/04/2023 06:28, David Tinker wrote: Should I use assassinate or removenode? Given that there is some data on the node. Or will that be found on the other nodes? Sorry for all the questions but I really don't want to mess up. On Mon, Apr 3, 2023 at 7:21 AM Carlos Diaz wrote: That's what nodetool assassinte will do. On Sun, Apr 2, 2023 at 10:19 PM David Tinker wrote: Is it possible for me to remove the node from the cluster i.e. to undo this mess and get the cluster operating again? On Mon, Apr 3, 2023 at 7:13 AM Carlos Diaz wrote: You can leave it in the seed list of the other nodes, just make sure it's not included in this node's seed list. However, if you do decide to fix the issue with the racks first assassinate this node (nodetool assassinate ), and update the rack name before you restart. On Sun, Apr 2, 2023 at 10:06 PM David Tinker wrote: It is also in the seeds list for the other nodes. Should I remove it from those, restart them one at a time, then restart it? /etc/cassandra # grep -i bootstrap * doesn't show anything so I don't think I have auto_bootstrap false. Thanks very much for the help. On Mon, Apr 3, 2023 at 7:01 AM Carlos Diaz wrote: Just remove it from the seed list in the cassandra.yaml file and restart the node. Make sure that auto_bootstrap is set to true first though. On Sun, Apr 2, 2023 at 9:59 PM David Tinker wrote: So likely because I made it a seed node when I added it to the cluster it didn't do the bootstrap process. How can I recover this? On Mon, Apr 3, 2023 at 6:41 AM David Tinker wrote: Yes replication factor is 3. I ran nodetool repair -pr on all the nodes (one at a time) and am still having issues getting data back from queries. I did make the new node a seed node. Re "rack4": I assumed that was just an indication as to the physical location of the server for redundancy. This one is separate from the others so I used rack4. On Mon, Apr 3, 2023 at 6:30 AM Carlos Diaz wrote: I'm assuming that your replication factor is 3. If that's the case, did you intentionally put this node in rack 4? Typically, you want to add nodes in multiples of your replication factor in order to keep the "racks" balanced. In other words, this node should have been added to rack 1, 2 or 3. Having said that, you should be able to easily fix your problem by running a nodetool repair -pr on the new node. On Sun, Apr 2, 2023 at 8:16 PM David Tinker wrote: Hi All I recently added a node to my 3 node Cassandra 4.0.5 cluster and now many reads are not returning rows! What do I need to do to fix this? There weren't any errors in the logs or other problems that I could see. I expected the cluster to balance itself but this hasn't happened (yet?). The nodes are similar so I have num_tokens=256