Re: [ClusterLabs] node1 and node2 communication time question
On Wed, 2022-08-10 at 10:35 +0900, 권오성 wrote: > Thank you for your reply. > Then, could you explain how to activate and set the stonith? Hi, In normal circumstances, the cluster can tell each node what to do, and receive confirmation. This allows the cluster to coordinate so that resources don't run in conflict with each other. For example, mounting an e2fs filesystem on two nodes would cause data corruption. Bringing an IP address up on two nodes would render it unusable because packets would randomly go to one or the other. If cluster communication is disrupted, for example by a network card failure or extremely high load on a node, then the cluster can't safely start its resources elsewhere. In that situation, the node itself can't be relied on to stop resources. The cluster must have some way of forcing the node to be shut down or cut off from the rest of the cluster, and that's what fencing is. The classic example of fencing is an intelligent power switch. When a node becomes unreachable, the rest of the cluster can tell the power switch to cut power to the node. Then the cluster can assume the node is not running any resources, and recover them on other nodes. How fencing is configured depends on what you're using as nodes (virtual machines, physical machines, cloud instances, etc.) and what hardware you have available (IPMI, intelligent power switches, shared storage, etc.). > Or let me know the blog or site you know. > I looked up the site I found and proceeded with the setting, and > almost all the sites explained it with the setting I set. Unfortunately, many sites do that because it's easier than trying to explain all the ways fencing can be configured. But the result is a cluster that's vulnerable to unrecoverable situations and data loss. If your cluster nodes are virtual machines, and you have access to the host, this should work: https://wiki.clusterlabs.org/wiki/Guest_Fencing If you're using something else as cluster nodes, let us know. -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] node1 and node2 communication time question
On Wed, Aug 10, 2022 at 3:49 AM 권오성 wrote: > > Thank you for your reply. > Then, can I think of it as being able to adjust the time by changing the > token in /etc/corosync/corosync.conf? That would basically be the time after which a non responsive node in a cluster would be declared dead and drop out of the cluster. But be careful when you set that time too low as your node might drop out of the cluster because of a hickup in the network or because load might prevent corosync from being scheduled. Then think of what it really means that a node isn't reachable via network. It doesn't necessarily mean it is totally dead and doesn't interfere with anything anymore or might come back a second later. With this uncertainty recovering a service on another node is risky. And this is where fencing kicks in to assure that this potentially dead node is dead for sure before you proceed recovering services from it. > And the site I searched and found was explaining to disable fencing. > If so, could you introduce me to a site or blog that explains by activating > fencing? > I am a college student studying about ha. > I first learned about the concept of ha, and I don't know how to set it up or > what options to change. > And I am using a translator because I am not good at English, but I do not > understand how to apply it by looking at the document in the cluster lab. https://www.clusterlabs.org/pacemaker/doc/2.1/Clusters_from_Scratch/html/ should give you an introduction to all the important concepts and run you through an example. I don't know how well a translator does with that though. Klaus > Please check it out. > Thank you. > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] node1 and node2 communication time question
On Tue, 2022-08-09 at 15:23 +0900, 권오성 wrote: > Hello. > I installed linux ha on raspberry pi as below. > 1) 1) sudo apt-get install pacemaker pcs fence-agents resource-agents > 2) Host Settings > 3) 3) sudo reboot > 4) 4) sudo passwd hacluster > 5) 5) sudo systemctl enable pcsd, sudo systemctl start pcsd, sudo > systemctl enable pacemaker > 6) 6) sudo pcs cluster destroy > 7) 7) sudo pcs cluster auth -u hacluster -p for hacluster> > 8) 8) sudo pcs cluster setup --name > 9) 9) sudo pcs cluster start —all, sudo pcs cluster enable —all > 10) sudo pcs property set stonith-enabled=false > 11) sudo pcs status > 12) sudo pcs resource create Virtual IP ocf:heartbeat:IPaddr2 > ip= cidr_netmask=24op monitor interval=30s > > So, I've set it up like this way. > By the way, is it correct that node1 and node2 communicate every 30 > seconds and node2 will notice after 30 seconds when node1 dies? > Or do we communicate every few seconds? > And can node1 and node2 reduce communication time? > What I want is node1 and node2 to communicate every 10 ms and switch > as fast as possible. > Please answer. > Thank you. Unfortunately 10ms is not a realistic goal with the current software. Node loss is detected by Corosync, which passes a token around all nodes continuously. The token timeout is defined in /etc/corosync/corosync.conf and defaults to either 1 or 3 seconds. With 2 nodes and a dedicated network for corosync traffic you can probably get subsecond but I'm not sure what the practical limit is. Once node loss is detected, most of the time of switching over is in fencing (which should always be configured, otherwise you risk data loss or service malfuntions) and the stop/start time of your individual resources. Resource loss is detected by recurring monitors. That's where the interval=30s comes in; the cluster will check the resource's status that often. You can reduce that, I would say 5 or 10s would be fine, even below that could be OK. The cluster has to run the scheduler, invoke the resource agent, and record the result if changed. When resource loss is detected, the stop/start time of the resource is the main factor. -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] node1 and node2 communication time question
Hi, It seems that you are using pcs 0.9.x. That is an old and unmaintained version. I really recommend updating it. I can see that you disabled stonith. This is really a bad practice. Cluster cannot and will not function properly without working stonith. What makes you think nodes are communicating only every 30 seconds and not often? Setting 'monitor interval=30s' certainly doesn't do such thing. Regards, Tomas Dne 09. 08. 22 v 8:23 권오성 napsal(a): Hello. I installed linux ha on raspberry pi as below. 1) 1) sudo apt-get install pacemaker pcs fence-agents resource-agents 2) Host Settings 3) 3) sudo reboot 4) 4) sudo passwd hacluster 5) 5) sudo systemctl enable pcsd, sudo systemctl start pcsd, sudo systemctl enable pacemaker 6) 6) sudo pcs cluster destroy 7) 7) sudo pcs cluster auth -u hacluster -p for hacluster> 8) 8) sudo pcs cluster setup --name 9) 9) sudo pcs cluster start —all, sudo pcs cluster enable —all 10) sudo pcs property set stonith-enabled=false 11) sudo pcs status 12) sudo pcs resource create Virtual IP ocf:heartbeat:IPaddr2 ip= cidr_netmask=24op monitor interval=30s So, I've set it up like this way. By the way, is it correct that node1 and node2 communicate every 30 seconds and node2 will notice after 30 seconds when node1 dies? Or do we communicate every few seconds? And can node1 and node2 reduce communication time? What I want is node1 and node2 to communicate every 10 ms and switch as fast as possible. Please answer. Thank you. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/