On 15.06.2023 13:58, Kadlecsik József wrote:
Hello,
We had a strange issue here: 7 node cluster, one node was put into standby
mode to test a new iscsi setting on it. During configuring the machine it
was rebooted and after the reboot the iscsi didn't come up. That caused a
malformed communication (atlas5 is the node in standby) with the cluster:
Jun 15 10:10:13 atlas0 pacemaker-schedulerd[7153]: warning: Unexpected
result (error) was recorded for probe of ocsi on atlas5 at Jun 15 10:09:32 2023
It sounds like resource agent problem. You need to investigate why probe
returned an error.
Jun 15 10:10:13 atlas0 pacemaker-schedulerd[7153]: notice: If it is not
possible for ocsi to run on atlas5, see the resource-discovery option for
location constraints
Jun 15 10:10:13 atlas0 pacemaker-schedulerd[7153]: error: Resource ocsi
is active on 2 nodes (attempting recovery)
The resource was definitely not active on 2 nodes. And that caused a storm
of killing all virtual machines as resources.
How could one prevent such cases to come up?
standby does not stop cluster from running, it simply tells pacemaker to
exclude this node from possible candidates to run resources. To avoid
any unwanted interaction (also due to possible resource agent or other
software bugs) you could simply stop pacemaker and disable auto-startup.
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/