On 15.06.2023 13:58, Kadlecsik József wrote:
Hello,

We had a strange issue here: 7 node cluster, one node was put into standby
mode to test a new iscsi setting on it. During configuring the machine it
was rebooted and after the reboot the iscsi didn't come up. That caused a
malformed communication (atlas5 is the node in standby) with the cluster:

Jun 15 10:10:13 atlas0 pacemaker-schedulerd[7153]:  warning: Unexpected
result (error) was recorded for probe of ocsi on atlas5 at Jun 15 10:09:32 2023

It sounds like resource agent problem. You need to investigate why probe returned an error.

Jun 15 10:10:13 atlas0 pacemaker-schedulerd[7153]:  notice: If it is not
possible for ocsi to run on atlas5, see the resource-discovery option for
location constraints
Jun 15 10:10:13 atlas0 pacemaker-schedulerd[7153]:  error: Resource ocsi
is active on 2 nodes (attempting recovery)

The resource was definitely not active on 2 nodes. And that caused a storm
of killing all virtual machines as resources.

How could one prevent such cases to come up?


standby does not stop cluster from running, it simply tells pacemaker to exclude this node from possible candidates to run resources. To avoid any unwanted interaction (also due to possible resource agent or other software bugs) you could simply stop pacemaker and disable auto-startup.
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to