Re: [ClusterLabs] node1 and node2 communication time question

2022-08-10 Thread Ken Gaillot
On Wed, 2022-08-10 at 10:35 +0900, 권오성 wrote:
> Thank you for your reply.
> Then, could you explain how to activate and set the stonith?

Hi,

In normal circumstances, the cluster can tell each node what to do, and
receive confirmation. This allows the cluster to coordinate so that
resources don't run in conflict with each other.

For example, mounting an e2fs filesystem on two nodes would cause data
corruption. Bringing an IP address up on two nodes would render it
unusable because packets would randomly go to one or the other.

If cluster communication is disrupted, for example by a network card
failure or extremely high load on a node, then the cluster can't safely
start its resources elsewhere. In that situation, the node itself can't
be relied on to stop resources. The cluster must have some way of
forcing the node to be shut down or cut off from the rest of the
cluster, and that's what fencing is.

The classic example of fencing is an intelligent power switch. When a
node becomes unreachable, the rest of the cluster can tell the power
switch to cut power to the node. Then the cluster can assume the node
is not running any resources, and recover them on other nodes.

How fencing is configured depends on what you're using as nodes
(virtual machines, physical machines, cloud instances, etc.) and what
hardware you have available (IPMI, intelligent power switches, shared
storage, etc.).

> Or let me know the blog or site you know.
> I looked up the site I found and proceeded with the setting, and
> almost all the sites explained it with the setting I set.

Unfortunately, many sites do that because it's easier than trying to
explain all the ways fencing can be configured. But the result is a
cluster that's vulnerable to unrecoverable situations and data loss.

If your cluster nodes are virtual machines, and you have access to the
host, this should work:

 https://wiki.clusterlabs.org/wiki/Guest_Fencing

If you're using something else as cluster nodes, let us know.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] node1 and node2 communication time question

2022-08-09 Thread Klaus Wenninger
On Wed, Aug 10, 2022 at 3:49 AM 권오성  wrote:
>
> Thank you for your reply.
> Then, can I think of it as being able to adjust the time by changing the 
> token in /etc/corosync/corosync.conf?

That would basically be the time after which a non responsive
node in a cluster  would be declared dead and drop out of
the cluster.
But be careful when you set that time too low as your node
might drop out of the cluster because of a hickup in the
network or because load might prevent corosync from
being scheduled.
Then think of what it really means that a node isn't reachable
via network. It doesn't necessarily mean it is totally dead and
doesn't interfere with anything anymore or might come back
a second later.
With this uncertainty recovering a service on another node
is risky.
And this is where fencing kicks in to assure that this potentially
dead node is dead for sure before you proceed recovering
services from it.

> And the site I searched and found was explaining to disable fencing.
> If so, could you introduce me to a site or blog that explains by activating 
> fencing?
> I am a college student studying about ha.
> I first learned about the concept of ha, and I don't know how to set it up or 
> what options to change.
> And I am using a translator because I am not good at English, but I do not 
> understand how to apply it by looking at the document in the cluster lab.

https://www.clusterlabs.org/pacemaker/doc/2.1/Clusters_from_Scratch/html/
should give you an introduction to all the important concepts and
run you through an example.
I don't know how well a translator does with that though.

Klaus

> Please check it out.
> Thank you.
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] node1 and node2 communication time question

2022-08-09 Thread Ken Gaillot
On Tue, 2022-08-09 at 15:23 +0900, 권오성 wrote:
> Hello.
> I installed linux ha on raspberry pi as below.
> 1) 1) sudo apt-get install pacemaker pcs fence-agents resource-agents
> 2) Host Settings
> 3) 3) sudo reboot
> 4) 4) sudo passwd hacluster
> 5) 5) sudo systemctl enable pcsd, sudo systemctl start pcsd, sudo
> systemctl enable pacemaker
> 6) 6) sudo pcs cluster destroy
> 7) 7) sudo pcs cluster auth   -u hacluster -p  for hacluster>
> 8) 8) sudo pcs cluster setup --name   
> 9) 9) sudo pcs cluster start —all, sudo pcs cluster enable —all
> 10) sudo pcs property set stonith-enabled=false
> 11) sudo pcs status
> 12) sudo pcs resource create Virtual IP ocf:heartbeat:IPaddr2
> ip= cidr_netmask=24op monitor interval=30s
> 
> So, I've set it up like this way.
> By the way, is it correct that node1 and node2 communicate every 30
> seconds and node2 will notice after 30 seconds when node1 dies?
> Or do we communicate every few seconds?
> And can node1 and node2 reduce communication time?
> What I want is node1 and node2 to communicate every 10 ms and switch
> as fast as possible.
> Please answer.
> Thank you.

Unfortunately 10ms is not a realistic goal with the current software.

Node loss is detected by Corosync, which passes a token around all
nodes continuously. The token timeout is defined in
/etc/corosync/corosync.conf and defaults to either 1 or 3 seconds. With
2 nodes and a dedicated network for corosync traffic you can probably
get subsecond but I'm not sure what the practical limit is.

Once node loss is detected, most of the time of switching over is in
fencing (which should always be configured, otherwise you risk data
loss or service malfuntions) and the stop/start time of your individual
resources.

Resource loss is detected by recurring monitors. That's where the
interval=30s comes in; the cluster will check the resource's status
that often. You can reduce that, I would say 5 or 10s would be fine,
even below that could be OK. The cluster has to run the scheduler,
invoke the resource agent, and record the result if changed.

When resource loss is detected, the stop/start time of the resource is
the main factor.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] node1 and node2 communication time question

2022-08-09 Thread Tomas Jelinek

Hi,

It seems that you are using pcs 0.9.x. That is an old and unmaintained 
version. I really recommend updating it.


I can see that you disabled stonith. This is really a bad practice. 
Cluster cannot and will not function properly without working stonith.


What makes you think nodes are communicating only every 30 seconds and 
not often? Setting 'monitor interval=30s' certainly doesn't do such thing.


Regards,
Tomas


Dne 09. 08. 22 v 8:23 권오성 napsal(a):

Hello.
I installed linux ha on raspberry pi as below.
1) 1) sudo apt-get install pacemaker pcs fence-agents resource-agents
2) Host Settings
3) 3) sudo reboot
4) 4) sudo passwd hacluster
5) 5) sudo systemctl enable pcsd, sudo systemctl start pcsd, sudo 
systemctl enable pacemaker

6) 6) sudo pcs cluster destroy
7) 7) sudo pcs cluster auth   -u hacluster -p for hacluster>

8) 8) sudo pcs cluster setup --name   
9) 9) sudo pcs cluster start —all, sudo pcs cluster enable —all
10) sudo pcs property set stonith-enabled=false
11) sudo pcs status
12) sudo pcs resource create Virtual IP ocf:heartbeat:IPaddr2 
ip= cidr_netmask=24op monitor interval=30s


So, I've set it up like this way.
By the way, is it correct that node1 and node2 communicate every 30 
seconds and node2 will notice after 30 seconds when node1 dies?

Or do we communicate every few seconds?
And can node1 and node2 reduce communication time?
What I want is node1 and node2 to communicate every 10 ms and switch as 
fast as possible.

Please answer.
Thank you.

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/