Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help

Andrei Borzenkov Tue, 19 Dec 2023 01:00:23 -0800

On Tue, Dec 19, 2023 at 10:41 AM Artem <tyom...@gmail.com> wrote:
...
> Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] 
> (update_resource_action_runnable)    warning: OST4_stop_0 on lustre4 is 
> unrunnable (node is offline)
> Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] 
> (recurring_op_for_active)    info: Start 20s-interval monitor for OST4 on 
> lustre3
> Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] 
> (log_list_item)      notice: Actions: Stop       OST4        (     lustre4 )  
> blocked


This is the default for the failed stop operation. The only way
pacemaker can resolve failure to stop a resource is to fence the node
where this resource was active. If it is not possible (and IIRC you
refuse to use stonith), pacemaker has no other choice as to block it.
If you insist, you can of course sert on-fail=ignore, but this means
unreachable node will continue to run resources. Whether it can lead
to some corruption in your case I cannot guess.

> Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] 
> (pcmk__create_graph)         crit: Cannot fence lustre4 because of OST4: 
> blocked (OST4_stop_0)

That is a rather strange phrase. The resource is blocked because the
pacemaker could not fence the node, not the other way round.
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help

Reply via email to