[Bug 1866412] [NEW] resource-agents should depend on psmisc (and probably others) for the resources to work
Public bug reported: When configuring a Filesystem OCF resource with the following setup: (k)rafaeldtinoco@clubionic01:~$ crm conf show node 1: clubionic01 node 2: clubionic02 node 3: clubionic03 primitive ext4 Filesystem \ params device="/dev/clustervg/clustervol" directory="/clusterdata" fstype=ext4 primitive fence_clubionic stonith:fence_scsi \ params pcmk_host_list="clubionic01 clubionic02 clubionic03" plug="" devices="/dev/sda" \ meta provides=unfencing target-role=Started primitive lvm2 LVM-activate \ params vgname=clustervg vg_access_mode=system_id primitive virtual_ip IPaddr2 \ params ip=10.250.98.13 nic=eth3 \ op monitor interval=10s primitive webserver systemd:lighttpd \ op monitor interval=10 timeout=30 group webservergroup lvm2 ext4 virtual_ip webserver \ meta target-role=Started property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.18-2b07d5c5a9 \ cluster-infrastructure=corosync \ cluster-name=clubionic \ stonith-enabled=on \ stonith-action=off \ no-quorum-policy=stop \ last-lrm-refresh=1583529396 being clubionic01,02,03 all nodes configured with Ubuntu Cloud Images, I got the following error when trying to enable "ext4" resource: * ext4_monitor_0 on clubionic02 'not installed' (5): call=161, status=complete, exitreason='Setup problem: couldn't find command: fuser', last-rc-change='Fri Mar 6 21:14:36 2020', queued=0ms, exec=44ms Because the nodes were missing "psmisc" package. ** Affects: resource-agents (Ubuntu) Importance: Medium Assignee: Rafael David Tinoco (rafaeldtinoco) Status: Confirmed ** Changed in: resource-agents (Ubuntu) Status: New => Confirmed ** Changed in: resource-agents (Ubuntu) Importance: Undecided => Medium ** Changed in: resource-agents (Ubuntu) Assignee: (unassigned) => Rafael David Tinoco (rafaeldtinoco) -- You received this bug notification because you are a member of Ubuntu Server, which is subscribed to resource-agents in Ubuntu. https://bugs.launchpad.net/bugs/1866412 Title: resource-agents should depend on psmisc (and probably others) for the resources to work To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/1866412/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1866392] Re: [bionic] dlm_controld won't start due to missing device files
It only happens during package installation. Right after the package is already installed, you can "modprobe -r dlm" and "modprobe dlm" again and then the devices are properly translated from existing - from the pkg - udev rules. (k)rafaeldtinoco@clubionic03:~$ systemctl status dlm ● dlm.service - dlm control daemon Loaded: loaded (/lib/systemd/system/dlm.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2020-03-06 20:25:24 UTC; 1min 24s ago Docs: man:dlm_controld man:dlm.conf man:dlm_stonith Main PID: 1365 (dlm_controld) Tasks: 2 (limit: 2338) CGroup: /system.slice/dlm.service └─1365 /usr/sbin/dlm_controld --foreground Mar 06 20:25:23 clubionic03 systemd[1]: Starting dlm control daemon... Mar 06 20:25:23 clubionic03 dlm_controld[1365]: 67 dlm_controld 4.0.7 started Mar 06 20:25:24 clubionic03 systemd[1]: Started dlm control daemon. (k)rafaeldtinoco@clubionic03:~$ ls -1lah /dev/misc/* lrwxrwxrwx 1 root root 14 Mar 6 20:24 /dev/misc/dlm-control -> ../dlm-control lrwxrwxrwx 1 root root 14 Mar 6 20:24 /dev/misc/dlm-monitor -> ../dlm-monitor lrwxrwxrwx 1 root root 12 Mar 6 20:24 /dev/misc/dlm_plock -> ../dlm_plock -- You received this bug notification because you are a member of Ubuntu Server, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1866392 Title: [bionic] dlm_controld won't start due to missing device files To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/dlm/+bug/1866392/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1866392] Re: [bionic] dlm_controld won't start due to missing device files
I have monitored udev when modprobing dlm module and those are the udev events being triggered: (k)rafaeldtinoco@clubionic01:~$ sudo udevadm monitor --kernel monitor will print the received events for: KERNEL - the kernel uevent KERNEL[10516.364250] add /module/dlm (module) KERNEL[10516.364425] add /kernel/slab/:496 (slab) KERNEL[10516.364579] add /kernel/slab/:312 (slab) KERNEL[10516.364730] add /devices/virtual/misc/dlm-control (misc) KERNEL[10516.364849] add /devices/virtual/misc/dlm-monitor (misc) KERNEL[10516.364980] add /devices/virtual/misc/dlm_plock (misc) and when being removed: KERNEL[10713.488367] remove /devices/virtual/misc/dlm_plock (misc) KERNEL[10713.488465] remove /devices/virtual/misc/dlm-control (misc) KERNEL[10713.488539] remove /devices/virtual/misc/dlm-monitor (misc) KERNEL[10713.488635] remove /kernel/slab/:496 (slab) KERNEL[10713.488692] remove /kernel/slab/:312 (slab) KERNEL[10713.488825] remove /module/dlm (module) -- You received this bug notification because you are a member of Ubuntu Server, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1866392 Title: [bionic] dlm_controld won't start due to missing device files To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/dlm/+bug/1866392/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1866392] [NEW] [bionic] dlm_controld won't start due to missing device files
Public bug reported: Right after installing dlm and dlm_controld, when trying to start the service, you will face: rafaeldtinoco@clubionic01:~$ systemctl status dlm.service ● dlm.service - dlm control daemon Loaded: loaded (/lib/systemd/system/dlm.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Fri 2020-03-06 19:13:32 UTC; 35s ago Docs: man:dlm_controld man:dlm.conf man:dlm_stonith Process: 31644 ExecStart=/usr/sbin/dlm_controld --foreground $DLM_CONTROLD_OPTS (code=exited, status=1/FAILURE) Process: 31643 ExecStartPre=/sbin/modprobe dlm (code=exited, status=0/SUCCESS) Main PID: 31644 (code=exited, status=1/FAILURE) Mar 06 19:13:21 clubionic01 systemd[1]: Starting dlm control daemon... Mar 06 19:13:21 clubionic01 dlm_controld[31644]: 8746 dlm_controld 4.0.7 started Mar 06 19:13:32 clubionic01 dlm_controld[31644]: 8756 cannot find device /dev/misc/dlm-control with minor 56 Mar 06 19:13:32 clubionic01 systemd[1]: dlm.service: Main process exited, code=exited, status=1/FAILURE Mar 06 19:13:32 clubionic01 systemd[1]: dlm.service: Failed with result 'exit-code'. Mar 06 19:13:32 clubionic01 systemd[1]: Failed to start dlm control daemon. This happens even after the module "dlm" was loaded (as you can see in the ExecStartPre= line). (k)rafaeldtinoco@clubionic01:~$ ls -lah /dev/dlm* crw--- 1 root root 10, 56 Mar 6 19:08 /dev/dlm-control crw--- 1 root root 10, 55 Mar 6 19:08 /dev/dlm-monitor crw--- 1 root root 10, 54 Mar 6 19:08 /dev/dlm_plock And we can see hardcoded device filename paths: dlm_controld/action.c: rv = find_udev_device("/dev/misc/dlm-control", control_minor); dlm_controld/action.c: rv = find_udev_device("/dev/misc/dlm-monitor", monitor_minor); dlm_controld/action.c: rv = find_udev_device("/dev/misc/dlm_plock", plock_minor); And the way rules.d files are generated: rafaeldtinoco@workstation:~/.../dlm$ grep -r UDEVDIR * libdlm/Makefile:UDEVDIR=/usr/lib/udev/rules.d libdlm/Makefile:$(INSTALL) -d $(DESTDIR)/$(UDEVDIR) libdlm/Makefile:$(INSTALL) -m 644 $(UDEV_TARGET) $(DESTDIR)/$(UDEVDIR) And that the package contains those: (k)rafaeldtinoco@clubionic01:~$ cat /lib/udev/rules.d/51-dlm.rules KERNEL=="dlm-control", MODE="0666", SYMLINK+="misc/dlm-control" KERNEL=="dlm-monitor", MODE="0666", SYMLINK+="misc/dlm-monitor" KERNEL=="dlm_plock", MODE="0666", SYMLINK+="misc/dlm_plock" KERNEL=="dlm_*", MODE="0660", SYMLINK+="misc/%k" ** Affects: dlm (Ubuntu) Importance: Undecided Status: Triaged ** Affects: dlm (Ubuntu Bionic) Importance: Medium Assignee: Rafael David Tinoco (rafaeldtinoco) Status: Confirmed ** Changed in: dlm (Ubuntu) Status: New => Confirmed ** Changed in: dlm (Ubuntu) Status: Confirmed => Triaged ** Also affects: dlm (Ubuntu Bionic) Importance: Undecided Status: New ** Changed in: dlm (Ubuntu Bionic) Importance: Undecided => Medium ** Changed in: dlm (Ubuntu Bionic) Assignee: (unassigned) => Rafael David Tinoco (rafaeldtinoco) ** Changed in: dlm (Ubuntu Bionic) Status: New => Confirmed -- You received this bug notification because you are a member of Ubuntu Server, which is subscribed to dlm in Ubuntu. https://bugs.launchpad.net/bugs/1866392 Title: [bionic] dlm_controld won't start due to missing device files To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/dlm/+bug/1866392/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1866385] [NEW] [focal] kronosnet need fixes from just released upstream version
Public bug reported: >From Clusterlabs mailing list: All, We are pleased to announce the general availability of kronosnet v1.15 kronosnet (or knet for short) is the new underlying network protocol for Linux HA components (corosync), that features the ability to use multiple links between nodes, active/active and active/passive link failover policies, automatic link recovery, FIPS compliant encryption (nss and/or openssl), automatic PMTUd and in general better performance compared to the old network protocol. Highlights in this release: * Fix major interaction issues between applications gathering statistics and PMTUd. * Fix UDP socket options that could lead to knet not being properly functional * Man pages updates * Minor bug fixes Known issues in this release: * none The source tarballs can be downloaded here: https://www.kronosnet.org/releases/ Upstream resources and contacts: https://kronosnet.org/ https://github.com/kronosnet/kronosnet/ https://ci.kronosnet.org/ https://trello.com/kronosnet (TODO list and activities tracking) https://goo.gl/9ZvkLS (google shared drive with presentations and diagrams) IRC: #kronosnet on Freenode https://lists.kronosnet.org/mailman/listinfo/users https://lists.kronosnet.org/mailman/listinfo/devel https://lists.kronosnet.org/mailman/listinfo/commits Cheers, The knet developer team To be honest, we are so close to v1.15 that I think we should just merge latest: cd916c4 [stats] allow knet_link_get_status to operate in readlock context 86e0560 [stats] allow knet_handle_get_stats to operate in a readlock context 41f5a2a [rx] kill unused variable e61d086 [tests] rework test suite link port allocation e90cf36 [transports] use SO_REUSEADDR only for sctp fcbeda8 [man] Enhance prio description of POLICY_PASSIVE 3ba5ddf man: Change strcat to strncat ed7573d man: Fix covscan reports in doxyxml.c As it does not seem to need a freeze exception. ** Affects: kronosnet (Ubuntu) Importance: Wishlist Assignee: Rafael David Tinoco (rafaeldtinoco) Status: Confirmed ** Affects: kronosnet (Ubuntu Focal) Importance: Wishlist Assignee: Rafael David Tinoco (rafaeldtinoco) Status: Confirmed ** Also affects: kronosnet (Ubuntu Focal) Importance: Undecided Status: New ** Changed in: kronosnet (Ubuntu Focal) Status: New => Confirmed ** Changed in: kronosnet (Ubuntu Focal) Importance: Undecided => Wishlist ** Changed in: kronosnet (Ubuntu Focal) Assignee: (unassigned) => Rafael David Tinoco (rafaeldtinoco) -- You received this bug notification because you are a member of Ubuntu Server, which is subscribed to kronosnet in Ubuntu. https://bugs.launchpad.net/bugs/1866385 Title: [focal] kronosnet need fixes from just released upstream version To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/kronosnet/+bug/1866385/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1863970] Re: FTBFS on i386 due to autopkgtest failure due to dependence on gcc, make
I believe this was fixed already so Im closing it as Fix Released. ** Changed in: kronosnet (Ubuntu) Status: New => Fix Released -- You received this bug notification because you are a member of Ubuntu Server, which is subscribed to kronosnet in Ubuntu. https://bugs.launchpad.net/bugs/1863970 Title: FTBFS on i386 due to autopkgtest failure due to dependence on gcc, make To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/kronosnet/+bug/1863970/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1866383] [NEW] [focal] resource-agents need fixes from just released upstream version
Public bug reported: ClusterLabs released resource-agents v4.5.0 right after our freeze for Focal: ClusterLabs is happy to announce resource-agents v4.5.0. Source code is available at: https://github.com/ClusterLabs/resource-agents/releases/tag/v4.5.0 The most significant enhancements in this release are: - bugfixes and enhancements: - iSCSILogicalUnit: fix default value for OCF_RESKEY_liot_bstype - aws-vpc-move-ip: add parameter for role to use to query/update route table - Filesystem: add trigger_udev_rules_if_need() for -U, -L, or /dev/xxx device - Filesystem: refresh UUID in the start phase - IPaddr2: add noprefixroute parameter - IPaddr2: add info to metadata that ipt_CLUSTERIP "iptables" extension is not "nft" backend compatible, and iptables-legacy support for distros that still support it - IPsrcaddr: replace local rule if using local table, and set src back to primary for device on stop - IPsrcaddr: fix failure during probe when using destination/table parameters - LVM-activate: add OCF_CHECK_LEVEL 10 check that can be enabled to verify vg or lv validity with an additional "read 1 byte" test in special cases like iSCSI SAN - MailTo: fix variable expansion - SAPInstance: clear the $DIR_EXECUTABLE variable so we catch the situation when we lose the directory with binaries after first sapinstance_init invokation - aliyun-vpc-move-ip: add support for both 'go' and 'python' versions of Aliyun CLI, and auto-detect which to use by default - apache: use get_release_id() to detect OS/distro, and fix LOAD_STATUS_MODULE issue - azure-lb set socat to default on SUSE distributions. - exportfs: allow multiple exports of same directory - iSCSILogicalUnit: add liot_bstype to handle block/fileio for targetcli, and change behavior of lio-t with portals which do not use 0.0.0.0 - ldirectord: support sched-flags - lvmlockd: fix for LVM2 v2.03+ removing lvmetad - mysql-common: return correct rc during start-action - oralsnr: allow using the same tns_admin directory for different listeners - pgsql: Support for PostgreSQL 12 - podman: improve the code for checking if an image exists - rabbitmq-cluster: ensure we delete nodename if stop action fails - redis: validate_all: fix file status tests - spec: add missing requirement (lsb-release) The full list of changes for resource-agents is available at: https://github.com/ClusterLabs/resource-agents/blob/v4.5.0/ChangeLog The fixes among all delta are: 6d0b9652 iSCSILogicalUnit: fix default value for OCF_RESKEY_liot_bstype 617adbf6 redis: validate_all: fixes file status tests 7afc581f IPsrcaddr: fixes to avoid failing during probe d763318c [podman] Simplify the code for checking if an image exists 0e73d3f4 IPsrcaddr: fixes to replace local rule if using local table caaeec0b iSCSI logical unit fix (#1435) 34b46b17 IPaddr2: add noprefixroute parameter 20ff678e Low: MailTo: fix variable expansion d821ef33 iSCSILogicalUnit.in fixes (#1427) c718050a Low: mysql-common: fix startup check And I should revisit that. ** Affects: resource-agents (Ubuntu) Importance: Wishlist Assignee: Rafael David Tinoco (rafaeldtinoco) Status: Confirmed ** Affects: resource-agents (Ubuntu Focal) Importance: Wishlist Assignee: Rafael David Tinoco (rafaeldtinoco) Status: Confirmed ** Changed in: resource-agents (Ubuntu) Status: New => Confirmed ** Changed in: resource-agents (Ubuntu) Importance: Undecided => Wishlist ** Changed in: resource-agents (Ubuntu) Assignee: (unassigned) => Rafael David Tinoco (rafaeldtinoco) ** Also affects: resource-agents (Ubuntu Focal) Importance: Wishlist Assignee: Rafael David Tinoco (rafaeldtinoco) Status: Confirmed -- You received this bug notification because you are a member of Ubuntu Server, which is subscribed to resource-agents in Ubuntu. https://bugs.launchpad.net/bugs/1866383 Title: [focal] resource-agents need fixes from just released upstream version To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/1866383/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1866384] [NEW] delv fails with "ran out of space"
Public bug reported: Upstream report: https://gitlab.isc.org/isc-projects/bind9/issues/1647 Confirmed in ubuntu 20.04: ubuntu@f1:~$ delv isc.org ;; /etc/bind/bind.keys:31: failed to add trusted key '.': ran out of space ;; setup_dnsseckeys: failure ubuntu@f1:~$ dpkg -l bind9 Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==-=--= ii bind9 1:9.16.0-1ubuntu3 amd64Internet Domain Name Server ** Affects: bind9 (Ubuntu) Importance: High Assignee: Andreas Hasenack (ahasenack) Status: In Progress -- You received this bug notification because you are a member of Ubuntu Server, which is subscribed to bind9 in Ubuntu. https://bugs.launchpad.net/bugs/1866384 Title: delv fails with "ran out of space" To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/bind9/+bug/1866384/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1866378] [NEW] Error in handling TCP client quota limits
Public bug reported: See advisory at https://kb.isc.org/docs/operational-notification-an- error-in-handling-tcp-client-quota-limits-can-exhaust-tcp-connections- in-bind-9160 Patch: https://downloads.isc.org/isc/bind9/9.16.0/patches/bind-v9.16.0-tcp_quota_fix.patch ** Affects: bind9 (Ubuntu) Importance: High Assignee: Andreas Hasenack (ahasenack) Status: In Progress -- You received this bug notification because you are a member of Ubuntu Server, which is subscribed to bind9 in Ubuntu. https://bugs.launchpad.net/bugs/1866378 Title: Error in handling TCP client quota limits To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/bind9/+bug/1866378/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1866303] Re: slapd crash with pwdAccountLockedTime and stacked overlays
** Also affects: openldap (Ubuntu Disco) Importance: Undecided Status: New ** Also affects: openldap (Ubuntu Xenial) Importance: Undecided Status: New ** Also affects: openldap (Ubuntu Eoan) Importance: Undecided Status: New ** Also affects: openldap (Ubuntu Bionic) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Server, which is subscribed to openldap in Ubuntu. https://bugs.launchpad.net/bugs/1866303 Title: slapd crash with pwdAccountLockedTime and stacked overlays To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/1866303/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1866303] Re: slapd crash with pwdAccountLockedTime and stacked overlays
Thanks a lot for this Ryan, and awesome testing script! -- You received this bug notification because you are a member of Ubuntu Server, which is subscribed to openldap in Ubuntu. https://bugs.launchpad.net/bugs/1866303 Title: slapd crash with pwdAccountLockedTime and stacked overlays To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/1866303/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1866303] Re: slapd crash with pwdAccountLockedTime and stacked overlays
** Changed in: openldap (Ubuntu) Status: New => In Progress ** Changed in: openldap (Ubuntu) Assignee: (unassigned) => Andreas Hasenack (ahasenack) -- You received this bug notification because you are a member of Ubuntu Server, which is subscribed to openldap in Ubuntu. https://bugs.launchpad.net/bugs/1866303 Title: slapd crash with pwdAccountLockedTime and stacked overlays To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/1866303/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1866119] Re: [bionic] fence_scsi not working properly with Pacemaker 1.1.18-2ubuntu1.1
** Merge proposal unlinked: https://code.launchpad.net/~rafaeldtinoco/ubuntu/+source/pacemaker/+git/pacemaker/+merge/380336 -- You received this bug notification because you are a member of Ubuntu Server, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1866119 Title: [bionic] fence_scsi not working properly with Pacemaker 1.1.18-2ubuntu1.1 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1866119/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1866119] Re: [bionic] fence_scsi not working properly with Pacemaker 1.1.18-2ubuntu1.1
** Description changed: OBS: This bug was originally into LP: #1865523 but it was split. SRU: pacemaker [Impact] - * fence_scsi is not currently working in a share disk environment + * fence_scsi is not currently working in a share disk environment - * all clusters relying in fence_scsi and/or fence_scsi + watchdog won't + * all clusters relying in fence_scsi and/or fence_scsi + watchdog won't be able to start the fencing agents OR, in worst case scenarios, the fence_scsi agent might start but won't make scsi reservations in the shared scsi disk. - * this bug is taking care of pacemaker 1.1.18 issues with fence_scsi, + * this bug is taking care of pacemaker 1.1.18 issues with fence_scsi, since the later was fixed at LP: #1865523. [Test Case] - * having a 3-node setup, nodes called "clubionic01, clubionic02, + * having a 3-node setup, nodes called "clubionic01, clubionic02, clubionic03", with a shared scsi disk (fully supporting persistent reservations) /dev/sda, with corosync and pacemaker operational and running, one might try: rafaeldtinoco@clubionic01:~$ crm configure crm(live)configure# property stonith-enabled=on crm(live)configure# property stonith-action=off crm(live)configure# property no-quorum-policy=stop crm(live)configure# property have-watchdog=true crm(live)configure# commit crm(live)configure# end crm(live)# end rafaeldtinoco@clubionic01:~$ crm configure primitive fence_clubionic \ - stonith:fence_scsi params \ - pcmk_host_list="clubionic01 clubionic02 clubionic03" \ - devices="/dev/sda" \ - meta provides=unfencing + stonith:fence_scsi params \ + pcmk_host_list="clubionic01 clubionic02 clubionic03" \ + devices="/dev/sda" \ + meta provides=unfencing And see the following errors: Failed Actions: * fence_clubionic_start_0 on clubionic02 'unknown error' (1): call=6, status=Error, exitreason='', - last-rc-change='Wed Mar 4 19:53:12 2020', queued=0ms, exec=1105ms + last-rc-change='Wed Mar 4 19:53:12 2020', queued=0ms, exec=1105ms * fence_clubionic_start_0 on clubionic03 'unknown error' (1): call=6, status=Error, exitreason='', - last-rc-change='Wed Mar 4 19:53:13 2020', queued=0ms, exec=1109ms + last-rc-change='Wed Mar 4 19:53:13 2020', queued=0ms, exec=1109ms * fence_clubionic_start_0 on clubionic01 'unknown error' (1): call=6, status=Error, exitreason='', - last-rc-change='Wed Mar 4 19:53:11 2020', queued=0ms, exec=1108ms + last-rc-change='Wed Mar 4 19:53:11 2020', queued=0ms, exec=1108ms and corosync.log will show: warning: unpack_rsc_op_failure: Processing failed op start for fence_clubionic on clubionic01: unknown error (1) [Regression Potential] - * LP: #1865523 shows fence_scsi fully operational after SRU for that + * LP: #1865523 shows fence_scsi fully operational after SRU for that bug is done. - * LP: #1865523 used pacemaker 1.1.19 (vanilla) in order to fix + * LP: #1865523 used pacemaker 1.1.19 (vanilla) in order to fix fence_scsi. - * TODO + * There are changes to: cluster resource manager daemon, local resource + manager daemon and police engine. From all the changes, the police + engine fix is the biggest, but still not big for a SRU. This could cause + police engine, thus cluster decisions, to mal function. + + * All patches are based in upstream fixes made right after + Pacemaker-1.1.18, used by Ubuntu Bionic and were tested with fence_scsi + to make sure it fixed the issues. [Other Info] - * Original Description: + * Original Description: Trying to setup a cluster with an iscsi shared disk, using fence_scsi as the fencing mechanism, I realized that fence_scsi is not working in Ubuntu Bionic. I first thought it was related to Azure environment (LP: #1864419), where I was trying this environment, but then, trying locally, I figured out that somehow pacemaker 1.1.18 is not fencing the shared scsi disk properly. Note: I was able to "backport" vanilla 1.1.19 from upstream and fence_scsi worked. I have then tried 1.1.18 without all quilt patches and it didnt work as well. I think that bisecting 1.1.18 <-> 1.1.19 might tell us which commit has fixed the behaviour needed by the fence_scsi agent. (k)rafaeldtinoco@clubionic01:~$ crm conf show node 1: clubionic01.private node 2: clubionic02.private node 3: clubionic03.private primitive fence_clubionic stonith:fence_scsi \ params pcmk_host_list="10.250.3.10 10.250.3.11 10.250.3.12" devices="/dev/sda" \ meta provides=unfencing property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.18-2b07d5c5a9 \ cluster-infrastructure=corosync \ cluster-name=clubionic \ stonith-enabled=on \ stonith-action=off \ no-quorum-policy=stop \ symmetric-cluster=true (k)rafaeldtinoco@clubion