[ClusterLabs] HALVM monitor action fail on slave node. Possible bug?

Marco Marino Fri, 13 Apr 2018 06:30:14 -0700

Hello, I'm trying to configure a simple 2 node cluster with drbd and HALVM
(ocf:heartbeat:LVM) but I have a problem that I'm not able to solve, to I
decided to write this long post. I need to really understand what I'm doing
and where I'm doing wrong.
More precisely, I'm configuring a pacemaker cluster with 2 nodes and only
one drbd resource. Here all operations:


- System configuration
    hostnamectl set-hostname pcmk[12]
    yum update -y
    yum install vim wget git -y
    vim /etc/sysconfig/selinux  -> permissive mode
    systemctl disable firewalld
    reboot

- Network configuration
    [pcmk1]
    nmcli connection modify corosync ipv4.method manual ipv4.addresses
192.168.198.201/24 ipv6.method ignore connection.autoconnect yes
    nmcli connection modify replication ipv4.method manual ipv4.addresses
192.168.199.201/24 ipv6.method ignore connection.autoconnect yes
    [pcmk2]
    nmcli connection modify corosync ipv4.method manual ipv4.addresses
192.168.198.202/24 ipv6.method ignore connection.autoconnect yes
    nmcli connection modify replication ipv4.method manual ipv4.addresses
192.168.199.202/24 ipv6.method ignore connection.autoconnect yes

    ssh-keyget -t rsa
    ssh-copy-id root@pcmk[12]
    scp /etc/hosts root@pcmk2:/etc/hosts

- Drbd Repo configuration and drbd installation
    rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
    rpm -Uvh
http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
    yum update -y
    yum install drbd84-utils kmod-drbd84 -y

- Drbd Configuration:
    Creating a new partition on top of /dev/vdb -> /dev/vdb1 of type
"Linux" (83)
    [/etc/drbd.d/global_common.conf]
    usage-count no;
    [/etc/drbd.d/myres.res]
    resource myres {
        on pcmk1 {
                device /dev/drbd0;
                disk /dev/vdb1;
                address 192.168.199.201:7789;
                meta-disk internal;
        }
        on pcmk2 {
                device /dev/drbd0;
                disk /dev/vdb1;
                address 192.168.199.202:7789;
                meta-disk internal;
        }
    }

    scp /etc/drbd.d/myres.res root@pcmk2:/etc/drbd.d/myres.res
    systemctl start drbd <-- only for test. The service is disabled at boot!
    drbdadm create-md myres
    drbdadm up myres
    drbdadm primary --force myres

- LVM Configuration
    [root@pcmk1 ~]# lsblk
    NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    sr0          11:0    1 1024M  0 rom
    vda         252:0    0   20G  0 disk
    ├─vda1      252:1    0    1G  0 part /boot
    └─vda2      252:2    0   19G  0 part
      ├─cl-root 253:0    0   17G  0 lvm  /
      └─cl-swap 253:1    0    2G  0 lvm  [SWAP]
    vdb         252:16   0    8G  0 disk
    └─vdb1      252:17   0    8G  0 part  <--- /dev/vdb1 is the partition
I'd like to use as backing device for drbd
      └─drbd0   147:0    0    8G  0 disk

    [/etc/lvm/lvm.conf]
    write_cache_state = 0
    use_lvmetad = 0
    filter = [ "a|drbd.*|", "a|vda.*|", "r|.*|" ]

    Disabling lvmetad service
    systemctl disable lvm2-lvmetad.service
    systemctl disable lvm2-lvmetad.socket
    reboot

- Creating volume group and logical volume
    systemctl start drbd (both nodes)
    drbdadm primary myres
    pvcreate /dev/drbd0
    vgcreate havolumegroup /dev/drbd0
    lvcreate -n c-vol1 -L1G havolumegroup
    [root@pcmk1 ~]# lvs
        LV     VG            Attr       LSize   Pool Origin Data%  Meta%
Move Log Cpy%Sync Convert
        root   cl            -wi-ao----
<17.00g
        swap   cl            -wi-ao----
2.00g
        c-vol1 havolumegroup -wi-a-----   1.00g


- Cluster Configuration
    yum install pcs fence-agents-all -y
    systemctl enable pcsd
    systemctl start pcsd
    echo redhat | passwd --stdin hacluster
    pcs cluster auth pcmk1 pcmk2
    pcs cluster setup --name ha_cluster pcmk1 pcmk2
    pcs cluster start --all
    pcs cluster enable --all
    pcs property set stonith-enabled=false    <--- Just for test!!!
    pcs property set no-quorum-policy=ignore

- Drbd resource configuration
    pcs cluster cib drbd_cfg
    pcs -f drbd_cfg resource create DrbdRes ocf:linbit:drbd
drbd_resource=myres op monitor interval=60s
    pcs -f drbd_cfg resource master DrbdResClone DrbdRes master-max=1
master-node-max=1 clone-max=2 clone-node-max=1 notify=true
    [root@pcmk1 ~]# pcs -f drbd_cfg resource show
     Master/Slave Set: DrbdResClone [DrbdRes]
         Stopped: [ pcmk1 pcmk2 ]
    [root@pcmk1 ~]#

    Testing the failover with a forced shutoff of pcmk1. When pcmk1 returns
up, drbd is slave but logical volume is not active on pcmk2. So I need HALVM
    [root@pcmk2 ~]# lvs
      LV     VG            Attr       LSize   Pool Origin Data%  Meta%
Move Log Cpy%Sync Convert
      root   cl            -wi-ao----
<17.00g
      swap   cl            -wi-ao----
2.00g
      c-vol1 havolumegroup -wi-------
1.00g
    [root@pcmk2 ~]#



- Lvm resource and constraints
    pcs cluster cib lvm_cfg
    pcs -f lvm_cfg resource create HALVM ocf:heartbeat:LVM
volgrpname=havolumegroup
    pcs -f lvm_cfg constraint colocation add HALVM with master DrbdResClone
INFINITY
    pcs -f lvm_cfg constraint order promote DrbdResClone then start HALVM

    [root@pcmk1 ~]# pcs -f lvm_cfg constraint
    Location Constraints:
    Ordering Constraints:
      promote DrbdResClone then start HALVM (kind:Mandatory)
    Colocation Constraints:
      HALVM with DrbdResClone (score:INFINITY) (rsc-role:Started)
(with-rsc-role:Master)
    Ticket Constraints:
    [root@pcmk1 ~]#


    [root@pcmk1 ~]# pcs status
    Cluster name: ha_cluster
    Stack: corosync
    Current DC: pcmk2 (version 1.1.16-12.el7_4.8-94ff4df) - partition with
quorum
    Last updated: Fri Apr 13 15:12:49 2018
    Last change: Fri Apr 13 15:05:18 2018 by root via cibadmin on pcmk1

    2 nodes configured
    2 resources configured

    Online: [ pcmk1 pcmk2 ]

    Full list of resources:

     Master/Slave Set: DrbdResClone [DrbdRes]
         Masters: [ pcmk2 ]
         Slaves: [ pcmk1 ]

    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled

    #########[PUSHING NEW CONFIGURATION]#########
    [root@pcmk1 ~]# pcs cluster cib-push lvm_cfg
    CIB updated
    [root@pcmk1 ~]# pcs status
    Cluster name: ha_cluster
    Stack: corosync
    Current DC: pcmk2 (version 1.1.16-12.el7_4.8-94ff4df) - partition with
quorum
    Last updated: Fri Apr 13 15:12:57 2018
    Last change: Fri Apr 13 15:12:55 2018 by root via cibadmin on pcmk1

    2 nodes configured
    3 resources configured

    Online: [ pcmk1 pcmk2 ]

    Full list of resources:

     Master/Slave Set: DrbdResClone [DrbdRes]
         Masters: [ pcmk2 ]
         Slaves: [ pcmk1 ]
     HALVM    (ocf::heartbeat:LVM):    Started pcmk2

    Failed Actions:
    * HALVM_monitor_0 on pcmk1 'unknown error' (1): call=13,
status=complete, exitreason='LVM Volume havolumegroup is not available',
        last-rc-change='Fri Apr 13 15:12:56 2018', queued=0ms, exec=52ms


    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled
    [root@pcmk1 ~]#


    ##########[TRYING TO CLEANUP RESOURCE CONFIGURATION]##################
    [root@pcmk1 ~]# pcs resource cleanup
    Waiting for 1 replies from the CRMd. OK
    [root@pcmk1 ~]# pcs status
    Cluster name: ha_cluster
    Stack: corosync
    Current DC: pcmk2 (version 1.1.16-12.el7_4.8-94ff4df) - partition with
quorum
    Last updated: Fri Apr 13 15:13:18 2018
    Last change: Fri Apr 13 15:12:55 2018 by root via cibadmin on pcmk1

    2 nodes configured
    3 resources configured

    Online: [ pcmk1 pcmk2 ]

    Full list of resources:

     Master/Slave Set: DrbdResClone [DrbdRes]
         Masters: [ pcmk2 ]
         Slaves: [ pcmk1 ]
     HALVM    (ocf::heartbeat:LVM):    Started pcmk2

    Failed Actions:
    * HALVM_monitor_0 on pcmk1 'unknown error' (1): call=26,
status=complete, exitreason='LVM Volume havolumegroup is not available',
        last-rc-change='Fri Apr 13 15:13:17 2018', queued=0ms, exec=113ms


    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled
    [root@pcmk1 ~]#
#########################################################
some details about packages and versions:
[root@pcmk1 ~]# rpm -qa | grep pacem
pacemaker-cluster-libs-1.1.16-12.el7_4.8.x86_64
pacemaker-libs-1.1.16-12.el7_4.8.x86_64
pacemaker-1.1.16-12.el7_4.8.x86_64
pacemaker-cli-1.1.16-12.el7_4.8.x86_64
[root@pcmk1 ~]# rpm -qa | grep coro
corosynclib-2.4.0-9.el7_4.2.x86_64
corosync-2.4.0-9.el7_4.2.x86_64
[root@pcmk1 ~]# rpm -qa | grep drbd
drbd84-utils-9.1.0-1.el7.elrepo.x86_64
kmod-drbd84-8.4.10-1_2.el7_4.elrepo.x86_64
[root@pcmk1 ~]# cat /etc/redhat-release
CentOS Linux release 7.4.1708 (Core)
[root@pcmk1 ~]# uname -r
3.10.0-693.21.1.el7.x86_64
[root@pcmk1 ~]#
##############################################################


So it seems to me that the problem is that the "monitor" action of the
ocf:heartbeat:LVM resource is executed on both nodes even if I configured
specific colocation and ordering constraints. I don't know where the
problem is, but please I need to understand how to solve the issue. Please,
if possible I invite someone to reproduce the configuration and possibly
the issue. It seems a bug but obviously I'm not sure. What I'm worried is
that it should be pacemaker that states where and when one resource should
start so probably there is something wrong in my constraints configuration.
I'm sorry for this long post.
Thank you,
Marco

_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] HALVM monitor action fail on slave node. Possible bug?

Reply via email to