** Description changed:
OBS: This bug was originally into LP: #1865523 but it was split.
SRU: pacemaker
[Impact]
- * fence_scsi is not currently working in a share disk environment
+ * fence_scsi is not currently working in a share disk environment
- * all clusters relying in fence_scsi and/or fence_scsi + watchdog won't
+ * all clusters relying in fence_scsi and/or fence_scsi + watchdog won't
be able to start the fencing agents OR, in worst case scenarios, the
fence_scsi agent might start but won't make scsi reservations in the
shared scsi disk.
- * this bug is taking care of pacemaker 1.1.18 issues with fence_scsi,
+ * this bug is taking care of pacemaker 1.1.18 issues with fence_scsi,
since the later was fixed at LP: #1865523.
[Test Case]
- * having a 3-node setup, nodes called "clubionic01, clubionic02,
+ * having a 3-node setup, nodes called "clubionic01, clubionic02,
clubionic03", with a shared scsi disk (fully supporting persistent
reservations) /dev/sda, with corosync and pacemaker operational and
running, one might try:
rafaeldtinoco@clubionic01:~$ crm configure
crm(live)configure# property stonith-enabled=on
crm(live)configure# property stonith-action=off
crm(live)configure# property no-quorum-policy=stop
crm(live)configure# property have-watchdog=true
crm(live)configure# commit
crm(live)configure# end
crm(live)# end
rafaeldtinoco@clubionic01:~$ crm configure primitive fence_clubionic \
- stonith:fence_scsi params \
- pcmk_host_list="clubionic01 clubionic02 clubionic03" \
- devices="/dev/sda" \
- meta provides=unfencing
+ stonith:fence_scsi params \
+ pcmk_host_list="clubionic01 clubionic02 clubionic03" \
+ devices="/dev/sda" \
+ meta provides=unfencing
And see the following errors:
Failed Actions:
* fence_clubionic_start_0 on clubionic02 'unknown error' (1): call=6,
status=Error, exitreason='',
- last-rc-change='Wed Mar 4 19:53:12 2020', queued=0ms, exec=1105ms
+ last-rc-change='Wed Mar 4 19:53:12 2020', queued=0ms, exec=1105ms
* fence_clubionic_start_0 on clubionic03 'unknown error' (1): call=6,
status=Error, exitreason='',
- last-rc-change='Wed Mar 4 19:53:13 2020', queued=0ms, exec=1109ms
+ last-rc-change='Wed Mar 4 19:53:13 2020', queued=0ms, exec=1109ms
* fence_clubionic_start_0 on clubionic01 'unknown error' (1): call=6,
status=Error, exitreason='',
- last-rc-change='Wed Mar 4 19:53:11 2020', queued=0ms, exec=1108ms
+ last-rc-change='Wed Mar 4 19:53:11 2020', queued=0ms, exec=1108ms
and corosync.log will show:
warning: unpack_rsc_op_failure: Processing failed op start for
fence_clubionic on clubionic01: unknown error (1)
[Regression Potential]
- * LP: #1865523 shows fence_scsi fully operational after SRU for that
+ * LP: #1865523 shows fence_scsi fully operational after SRU for that
bug is done.
- * LP: #1865523 used pacemaker 1.1.19 (vanilla) in order to fix
+ * LP: #1865523 used pacemaker 1.1.19 (vanilla) in order to fix
fence_scsi.
- * TODO
+ * There are changes to: cluster resource manager daemon, local resource
+ manager daemon and police engine. From all the changes, the police
+ engine fix is the biggest, but still not big for a SRU. This could cause
+ police engine, thus cluster decisions, to mal function.
+
+ * All patches are based in upstream fixes made right after
+ Pacemaker-1.1.18, used by Ubuntu Bionic and were tested with fence_scsi
+ to make sure it fixed the issues.
[Other Info]
- * Original Description:
+ * Original Description:
Trying to setup a cluster with an iscsi shared disk, using fence_scsi as
the fencing mechanism, I realized that fence_scsi is not working in
Ubuntu Bionic. I first thought it was related to Azure environment (LP:
#1864419), where I was trying this environment, but then, trying
locally, I figured out that somehow pacemaker 1.1.18 is not fencing the
shared scsi disk properly.
Note: I was able to "backport" vanilla 1.1.19 from upstream and
fence_scsi worked. I have then tried 1.1.18 without all quilt patches
and it didnt work as well. I think that bisecting 1.1.18 <-> 1.1.19
might tell us which commit has fixed the behaviour needed by the
fence_scsi agent.
(k)rafaeldtinoco@clubionic01:~$ crm conf show
node 1: clubionic01.private
node 2: clubionic02.private
node 3: clubionic03.private
primitive fence_clubionic stonith:fence_scsi \
params pcmk_host_list="10.250.3.10 10.250.3.11 10.250.3.12"
devices="/dev/sda" \
meta provides=unfencing
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.18-2b07d5c5a9 \
cluster-infrastructure=corosync \
cluster-name=clubionic \
stonith-enabled=on \
stonith-action=off \
no-quorum-policy=stop \
symmetric-cluster=true