Hello, I'm trying to configure a simple 2 node cluster with drbd and HALVM (ocf:heartbeat:LVM) but I have a problem that I'm not able to solve, to I decided to write this long post. I need to really understand what I'm doing and where I'm doing wrong. More precisely, I'm configuring a pacemaker cluster with 2 nodes and only one drbd resource. Here all operations:
- System configuration hostnamectl set-hostname pcmk[12] yum update -y yum install vim wget git -y vim /etc/sysconfig/selinux -> permissive mode systemctl disable firewalld reboot - Network configuration [pcmk1] nmcli connection modify corosync ipv4.method manual ipv4.addresses 192.168.198.201/24 ipv6.method ignore connection.autoconnect yes nmcli connection modify replication ipv4.method manual ipv4.addresses 192.168.199.201/24 ipv6.method ignore connection.autoconnect yes [pcmk2] nmcli connection modify corosync ipv4.method manual ipv4.addresses 192.168.198.202/24 ipv6.method ignore connection.autoconnect yes nmcli connection modify replication ipv4.method manual ipv4.addresses 192.168.199.202/24 ipv6.method ignore connection.autoconnect yes ssh-keyget -t rsa ssh-copy-id root@pcmk[12] scp /etc/hosts root@pcmk2:/etc/hosts - Drbd Repo configuration and drbd installation rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm yum update -y yum install drbd84-utils kmod-drbd84 -y - Drbd Configuration: Creating a new partition on top of /dev/vdb -> /dev/vdb1 of type "Linux" (83) [/etc/drbd.d/global_common.conf] usage-count no; [/etc/drbd.d/myres.res] resource myres { on pcmk1 { device /dev/drbd0; disk /dev/vdb1; address 192.168.199.201:7789; meta-disk internal; } on pcmk2 { device /dev/drbd0; disk /dev/vdb1; address 192.168.199.202:7789; meta-disk internal; } } scp /etc/drbd.d/myres.res root@pcmk2:/etc/drbd.d/myres.res systemctl start drbd <-- only for test. The service is disabled at boot! drbdadm create-md myres drbdadm up myres drbdadm primary --force myres - LVM Configuration [root@pcmk1 ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sr0 11:0 1 1024M 0 rom vda 252:0 0 20G 0 disk ├─vda1 252:1 0 1G 0 part /boot └─vda2 252:2 0 19G 0 part ├─cl-root 253:0 0 17G 0 lvm / └─cl-swap 253:1 0 2G 0 lvm [SWAP] vdb 252:16 0 8G 0 disk └─vdb1 252:17 0 8G 0 part <--- /dev/vdb1 is the partition I'd like to use as backing device for drbd └─drbd0 147:0 0 8G 0 disk [/etc/lvm/lvm.conf] write_cache_state = 0 use_lvmetad = 0 filter = [ "a|drbd.*|", "a|vda.*|", "r|.*|" ] Disabling lvmetad service systemctl disable lvm2-lvmetad.service systemctl disable lvm2-lvmetad.socket reboot - Creating volume group and logical volume systemctl start drbd (both nodes) drbdadm primary myres pvcreate /dev/drbd0 vgcreate havolumegroup /dev/drbd0 lvcreate -n c-vol1 -L1G havolumegroup [root@pcmk1 ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root cl -wi-ao---- <17.00g swap cl -wi-ao---- 2.00g c-vol1 havolumegroup -wi-a----- 1.00g - Cluster Configuration yum install pcs fence-agents-all -y systemctl enable pcsd systemctl start pcsd echo redhat | passwd --stdin hacluster pcs cluster auth pcmk1 pcmk2 pcs cluster setup --name ha_cluster pcmk1 pcmk2 pcs cluster start --all pcs cluster enable --all pcs property set stonith-enabled=false <--- Just for test!!! pcs property set no-quorum-policy=ignore - Drbd resource configuration pcs cluster cib drbd_cfg pcs -f drbd_cfg resource create DrbdRes ocf:linbit:drbd drbd_resource=myres op monitor interval=60s pcs -f drbd_cfg resource master DrbdResClone DrbdRes master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true [root@pcmk1 ~]# pcs -f drbd_cfg resource show Master/Slave Set: DrbdResClone [DrbdRes] Stopped: [ pcmk1 pcmk2 ] [root@pcmk1 ~]# Testing the failover with a forced shutoff of pcmk1. When pcmk1 returns up, drbd is slave but logical volume is not active on pcmk2. So I need HALVM [root@pcmk2 ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root cl -wi-ao---- <17.00g swap cl -wi-ao---- 2.00g c-vol1 havolumegroup -wi------- 1.00g [root@pcmk2 ~]# - Lvm resource and constraints pcs cluster cib lvm_cfg pcs -f lvm_cfg resource create HALVM ocf:heartbeat:LVM volgrpname=havolumegroup pcs -f lvm_cfg constraint colocation add HALVM with master DrbdResClone INFINITY pcs -f lvm_cfg constraint order promote DrbdResClone then start HALVM [root@pcmk1 ~]# pcs -f lvm_cfg constraint Location Constraints: Ordering Constraints: promote DrbdResClone then start HALVM (kind:Mandatory) Colocation Constraints: HALVM with DrbdResClone (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master) Ticket Constraints: [root@pcmk1 ~]# [root@pcmk1 ~]# pcs status Cluster name: ha_cluster Stack: corosync Current DC: pcmk2 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum Last updated: Fri Apr 13 15:12:49 2018 Last change: Fri Apr 13 15:05:18 2018 by root via cibadmin on pcmk1 2 nodes configured 2 resources configured Online: [ pcmk1 pcmk2 ] Full list of resources: Master/Slave Set: DrbdResClone [DrbdRes] Masters: [ pcmk2 ] Slaves: [ pcmk1 ] Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled #########[PUSHING NEW CONFIGURATION]######### [root@pcmk1 ~]# pcs cluster cib-push lvm_cfg CIB updated [root@pcmk1 ~]# pcs status Cluster name: ha_cluster Stack: corosync Current DC: pcmk2 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum Last updated: Fri Apr 13 15:12:57 2018 Last change: Fri Apr 13 15:12:55 2018 by root via cibadmin on pcmk1 2 nodes configured 3 resources configured Online: [ pcmk1 pcmk2 ] Full list of resources: Master/Slave Set: DrbdResClone [DrbdRes] Masters: [ pcmk2 ] Slaves: [ pcmk1 ] HALVM (ocf::heartbeat:LVM): Started pcmk2 Failed Actions: * HALVM_monitor_0 on pcmk1 'unknown error' (1): call=13, status=complete, exitreason='LVM Volume havolumegroup is not available', last-rc-change='Fri Apr 13 15:12:56 2018', queued=0ms, exec=52ms Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled [root@pcmk1 ~]# ##########[TRYING TO CLEANUP RESOURCE CONFIGURATION]################## [root@pcmk1 ~]# pcs resource cleanup Waiting for 1 replies from the CRMd. OK [root@pcmk1 ~]# pcs status Cluster name: ha_cluster Stack: corosync Current DC: pcmk2 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum Last updated: Fri Apr 13 15:13:18 2018 Last change: Fri Apr 13 15:12:55 2018 by root via cibadmin on pcmk1 2 nodes configured 3 resources configured Online: [ pcmk1 pcmk2 ] Full list of resources: Master/Slave Set: DrbdResClone [DrbdRes] Masters: [ pcmk2 ] Slaves: [ pcmk1 ] HALVM (ocf::heartbeat:LVM): Started pcmk2 Failed Actions: * HALVM_monitor_0 on pcmk1 'unknown error' (1): call=26, status=complete, exitreason='LVM Volume havolumegroup is not available', last-rc-change='Fri Apr 13 15:13:17 2018', queued=0ms, exec=113ms Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled [root@pcmk1 ~]# ######################################################### some details about packages and versions: [root@pcmk1 ~]# rpm -qa | grep pacem pacemaker-cluster-libs-1.1.16-12.el7_4.8.x86_64 pacemaker-libs-1.1.16-12.el7_4.8.x86_64 pacemaker-1.1.16-12.el7_4.8.x86_64 pacemaker-cli-1.1.16-12.el7_4.8.x86_64 [root@pcmk1 ~]# rpm -qa | grep coro corosynclib-2.4.0-9.el7_4.2.x86_64 corosync-2.4.0-9.el7_4.2.x86_64 [root@pcmk1 ~]# rpm -qa | grep drbd drbd84-utils-9.1.0-1.el7.elrepo.x86_64 kmod-drbd84-8.4.10-1_2.el7_4.elrepo.x86_64 [root@pcmk1 ~]# cat /etc/redhat-release CentOS Linux release 7.4.1708 (Core) [root@pcmk1 ~]# uname -r 3.10.0-693.21.1.el7.x86_64 [root@pcmk1 ~]# ############################################################## So it seems to me that the problem is that the "monitor" action of the ocf:heartbeat:LVM resource is executed on both nodes even if I configured specific colocation and ordering constraints. I don't know where the problem is, but please I need to understand how to solve the issue. Please, if possible I invite someone to reproduce the configuration and possibly the issue. It seems a bug but obviously I'm not sure. What I'm worried is that it should be pacemaker that states where and when one resource should start so probably there is something wrong in my constraints configuration. I'm sorry for this long post. Thank you, Marco
_______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org