Hello Ondrej,
thanks for your reply. I really appreciate that.
I have picked fence_multipath as I'm preparing for my EX436 and I can't know
what agent will be useful on the exam.
Also ,according to https://access.redhat.com/solutions/3201072 , there could be
a race condition with fence_scsi.
So, I've checked the cluster when fencing and the node immediately goes offline.
Last messages from pacemaker are:
Feb 17 08:21:57 node1.localdomain stonith-ng[23808]: notice: Client
stonith_admin.controld.23888.b57ceee7 wants to fence (reboot)
'node1.localdomain' with device '(any)'
Feb 17 08:21:57 node1.localdomain stonith-ng[23808]: notice: Requesting peer
fencing (reboot) of node1.localdomain
Feb 17 08:21:57 node1.localdomain stonith-ng[23808]: notice: FENCING can
fence (reboot) node1.localdomain (aka. '1'): static-list
Feb 17 08:21:58 node1.localdomain stonith-ng[23808]: notice: Operation reboot
of node1.localdomain by node2.localdomain for
stonith_admin.controld.23888@node1.localdomain.ede38ffb: OK
Feb 17 08:21:58 node1.localdomain crmd[23812]: crit: We were allegedly just
fenced by node2.localdomain for node1.localdomai
Which for me means - node1 just got fenced again. Actually fencing works ,as
I/O is immediately blocked and the reservation is removed.
I've used https://access.redhat.com/solutions/2766611 to setup the fence_mpath
, but I could have messed up something.
Cluster config is:
[root@node3 ~]# pcs config show
Cluster Name: HACLUSTER2
Corosync Nodes:
node1.localdomain node2.localdomain node3.localdomain
Pacemaker Nodes:
node1.localdomain node2.localdomain node3.localdomain
Resources:
Clone: dlm-clone
Meta Attrs: interleave=true ordered=true
Resource: dlm (class=ocf provider=pacemaker type=controld)
Operations: monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
start interval=0s timeout=90 (dlm-start-interval-0s)
stop interval=0s timeout=100 (dlm-stop-interval-0s)
Clone: clvmd-clone
Meta Attrs: interleave=true ordered=true
Resource: clvmd (class=ocf provider=heartbeat type=clvm)
Operations: monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
start interval=0s timeout=90s (clvmd-start-interval-0s)
stop interval=0s timeout=90s (clvmd-stop-interval-0s)
Clone: TESTGFS2-clone
Meta Attrs: interleave=true
Resource: TESTGFS2 (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/TEST/gfs2 directory=/GFS2 fstype=gfs2
options=noatime run_fsck=no
Operations: monitor interval=15s on-fail=fence OCF_CHECK_LEVEL=20
(TESTGFS2-monitor-interval-15s)
notify interval=0s timeout=60s (TESTGFS2-notify-interval-0s)
start interval=0s timeout=60s (TESTGFS2-start-interval-0s)
stop interval=0s timeout=60s (TESTGFS2-stop-interval-0s)
Stonith Devices:
Resource: FENCING (class=stonith type=fence_mpath)
Attributes: devices=/dev/mapper/36001405cb123d000
pcmk_host_argument=key
pcmk_host_map=node1.localdomain:1;node2.localdomain:2;node3.localdomain:3
pcmk_monitor_action=metadata pcmk_reboot_action=off
Meta Attrs: provides=unfencing
Operations: monitor interval=60s (FENCING-monitor-interval-60s)
Fencing Levels:
Location Constraints:
Ordering Constraints:
start dlm-clone then start clvmd-clone (kind:Mandatory)
(id:order-dlm-clone-clvmd-clone-mandatory)
start clvmd-clone then start TESTGFS2-clone (kind:Mandatory)
(id:order-clvmd-clone-TESTGFS2-clone-mandatory)
Colocation Constraints:
clvmd-clone with dlm-clone (score:INFINITY)
(id:colocation-clvmd-clone-dlm-clone-INFINITY)
TESTGFS2-clone with clvmd-clone (score:INFINITY)
(id:colocation-TESTGFS2-clone-clvmd-clone-INFINITY)
Ticket Constraints:
Alerts:
No alerts defined
Resources Defaults:
No defaults set
[root@node3 ~]# crm_mon -r1
Stack: corosync
Current DC: node3.localdomain (version 1.1.20-5.el7_7.2-3c4c782f70) - partition
with quorum
Last updated: Mon Feb 17 08:39:30 2020
Last change: Sun Feb 16 18:44:06 2020 by root via cibadmin on node1.localdomain
3 nodes configured
10 resources configured
Online: [ node2.localdomain node3.localdomain ]
OFFLINE: [ node1.localdomain ]
Full list of resources:
FENCING (stonith:fence_mpath): Started node2.localdomain
Clone Set: dlm-clone [dlm]
Started: [ node2.localdomain node3.localdomain ]
Stopped: [ node1.localdomain ]
Clone Set: clvmd-clone [clvmd]
Started: [ node2.localdomain node3.localdomain ]
Stopped: [ node1.localdomain ]
Clone Set: TESTGFS2-clone [TESTGFS2]
Started: [ node2.localdomain node3.localdomain ]
Stopped: [ node1.localdomain ]
In the logs , I've noticed that the node is first unfenced and later it is
fenced again... For the unfence , I believe "meta provides=unfencing" is
'guilty', yet I'm not sure about the action from node2.
So far I have used SCSI reservations only with ServiceGuard, while SBD on SUSE
- and I was wondering if the