Re: [ClusterLabs] HALVM monitor action fail on slave node. Possible bug?

2018-04-16 Thread Marco Marino
Hi Emmanuel, thank you for you support. I did a lot of checks during the WE
and there are some updates:
- Main problem is that ocf:heartbeat:LVM is old. The current version on
centos 7 is 3.9.5 (package resource-agents). More precisely, in 3.9.5 the
monitor function has one important assumption: the underlying storage is
shared between all nodes in the cluster. So the monitor function checks the
presence of the volume group on all nodes. From version 3.9.6 this is not
the normal behavior and the monitor function (LVM_status) returns
$OCF_NOT_RUNNING from slaves nodes without errors. You can check this in
the file /usr/lib/ocf/resource.d/heartbeat/LVM in lines 340-351 that
disappears in version 3.9.6.

Obviously this is not error, but an important change in the cluster
architecture because I need to use drbd in dual primary mode when version
3.9.5 is used. My personal idea is that drbd in dual primary mode with lvm
is not a good idea due to the fact that I don't need an active/active
cluster.

Anyway, thank you for your time again
Marco

2018-04-13 15:54 GMT+02:00 emmanuel segura :

> the first thing that you need to configure is the stonith, because you
> have this constraint "constraint order promote DrbdResClone then start
> HALVM"
>
> To recover and promote drbd to master when you crash a node, configurare
> the drbd fencing handler.
>
> pacemaker execute monitor in both nodes, so this is normal, to test why
> monitor fail, use ocf-tester
>
> 2018-04-13 15:29 GMT+02:00 Marco Marino :
>
>> Hello, I'm trying to configure a simple 2 node cluster with drbd and
>> HALVM (ocf:heartbeat:LVM) but I have a problem that I'm not able to solve,
>> to I decided to write this long post. I need to really understand what I'm
>> doing and where I'm doing wrong.
>> More precisely, I'm configuring a pacemaker cluster with 2 nodes and only
>> one drbd resource. Here all operations:
>>
>> - System configuration
>> hostnamectl set-hostname pcmk[12]
>> yum update -y
>> yum install vim wget git -y
>> vim /etc/sysconfig/selinux  -> permissive mode
>> systemctl disable firewalld
>> reboot
>>
>> - Network configuration
>> [pcmk1]
>> nmcli connection modify corosync ipv4.method manual ipv4.addresses
>> 192.168.198.201/24 ipv6.method ignore connection.autoconnect yes
>> nmcli connection modify replication ipv4.method manual ipv4.addresses
>> 192.168.199.201/24 ipv6.method ignore connection.autoconnect yes
>> [pcmk2]
>> nmcli connection modify corosync ipv4.method manual ipv4.addresses
>> 192.168.198.202/24 ipv6.method ignore connection.autoconnect yes
>> nmcli connection modify replication ipv4.method manual ipv4.addresses
>> 192.168.199.202/24 ipv6.method ignore connection.autoconnect yes
>>
>> ssh-keyget -t rsa
>> ssh-copy-id root@pcmk[12]
>> scp /etc/hosts root@pcmk2:/etc/hosts
>>
>> - Drbd Repo configuration and drbd installation
>> rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
>> rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch
>> .rpm
>> yum update -y
>> yum install drbd84-utils kmod-drbd84 -y
>>
>> - Drbd Configuration:
>> Creating a new partition on top of /dev/vdb -> /dev/vdb1 of type
>> "Linux" (83)
>> [/etc/drbd.d/global_common.conf]
>> usage-count no;
>> [/etc/drbd.d/myres.res]
>> resource myres {
>> on pcmk1 {
>> device /dev/drbd0;
>> disk /dev/vdb1;
>> address 192.168.199.201:7789;
>> meta-disk internal;
>> }
>> on pcmk2 {
>> device /dev/drbd0;
>> disk /dev/vdb1;
>> address 192.168.199.202:7789;
>> meta-disk internal;
>> }
>> }
>>
>> scp /etc/drbd.d/myres.res root@pcmk2:/etc/drbd.d/myres.res
>> systemctl start drbd <-- only for test. The service is disabled at
>> boot!
>> drbdadm create-md myres
>> drbdadm up myres
>> drbdadm primary --force myres
>>
>> - LVM Configuration
>> [root@pcmk1 ~]# lsblk
>> NAMEMAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
>> sr0  11:01 1024M  0 rom
>> vda 252:00   20G  0 disk
>> ├─vda1  252:101G  0 part /boot
>> └─vda2  252:20   19G  0 part
>>   ├─cl-root 253:00   17G  0 lvm  /
>>   └─cl-swap 253:102G  0 lvm  [SWAP]
>> vdb 252:16   08G  0 disk
>> └─vdb1  252:17   08G  0 part  <--- /dev/vdb1 is the partition
>> I'd like to use as backing device for drbd
>>   └─drbd0   147:008G  0 disk
>>
>> [/etc/lvm/lvm.conf]
>> write_cache_state = 0
>> use_lvmetad = 0
>> filter = [ "a|drbd.*|", "a|vda.*|", "r|.*|" ]
>>
>> Disabling lvmetad service
>> systemctl disable lvm2-lvmetad.service
>> systemctl disable lvm2-lvmetad.socket
>> reboot
>>
>> - Creating volume group and logical volume
>> 

Re: [ClusterLabs] HALVM monitor action fail on slave node. Possible bug?

2018-04-13 Thread emmanuel segura
the first thing that you need to configure is the stonith, because you have
this constraint "constraint order promote DrbdResClone then start HALVM"

To recover and promote drbd to master when you crash a node, configurare
the drbd fencing handler.

pacemaker execute monitor in both nodes, so this is normal, to test why
monitor fail, use ocf-tester

2018-04-13 15:29 GMT+02:00 Marco Marino :

> Hello, I'm trying to configure a simple 2 node cluster with drbd and HALVM
> (ocf:heartbeat:LVM) but I have a problem that I'm not able to solve, to I
> decided to write this long post. I need to really understand what I'm doing
> and where I'm doing wrong.
> More precisely, I'm configuring a pacemaker cluster with 2 nodes and only
> one drbd resource. Here all operations:
>
> - System configuration
> hostnamectl set-hostname pcmk[12]
> yum update -y
> yum install vim wget git -y
> vim /etc/sysconfig/selinux  -> permissive mode
> systemctl disable firewalld
> reboot
>
> - Network configuration
> [pcmk1]
> nmcli connection modify corosync ipv4.method manual ipv4.addresses
> 192.168.198.201/24 ipv6.method ignore connection.autoconnect yes
> nmcli connection modify replication ipv4.method manual ipv4.addresses
> 192.168.199.201/24 ipv6.method ignore connection.autoconnect yes
> [pcmk2]
> nmcli connection modify corosync ipv4.method manual ipv4.addresses
> 192.168.198.202/24 ipv6.method ignore connection.autoconnect yes
> nmcli connection modify replication ipv4.method manual ipv4.addresses
> 192.168.199.202/24 ipv6.method ignore connection.autoconnect yes
>
> ssh-keyget -t rsa
> ssh-copy-id root@pcmk[12]
> scp /etc/hosts root@pcmk2:/etc/hosts
>
> - Drbd Repo configuration and drbd installation
> rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
> rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.
> noarch.rpm
> yum update -y
> yum install drbd84-utils kmod-drbd84 -y
>
> - Drbd Configuration:
> Creating a new partition on top of /dev/vdb -> /dev/vdb1 of type
> "Linux" (83)
> [/etc/drbd.d/global_common.conf]
> usage-count no;
> [/etc/drbd.d/myres.res]
> resource myres {
> on pcmk1 {
> device /dev/drbd0;
> disk /dev/vdb1;
> address 192.168.199.201:7789;
> meta-disk internal;
> }
> on pcmk2 {
> device /dev/drbd0;
> disk /dev/vdb1;
> address 192.168.199.202:7789;
> meta-disk internal;
> }
> }
>
> scp /etc/drbd.d/myres.res root@pcmk2:/etc/drbd.d/myres.res
> systemctl start drbd <-- only for test. The service is disabled at
> boot!
> drbdadm create-md myres
> drbdadm up myres
> drbdadm primary --force myres
>
> - LVM Configuration
> [root@pcmk1 ~]# lsblk
> NAMEMAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
> sr0  11:01 1024M  0 rom
> vda 252:00   20G  0 disk
> ├─vda1  252:101G  0 part /boot
> └─vda2  252:20   19G  0 part
>   ├─cl-root 253:00   17G  0 lvm  /
>   └─cl-swap 253:102G  0 lvm  [SWAP]
> vdb 252:16   08G  0 disk
> └─vdb1  252:17   08G  0 part  <--- /dev/vdb1 is the partition
> I'd like to use as backing device for drbd
>   └─drbd0   147:008G  0 disk
>
> [/etc/lvm/lvm.conf]
> write_cache_state = 0
> use_lvmetad = 0
> filter = [ "a|drbd.*|", "a|vda.*|", "r|.*|" ]
>
> Disabling lvmetad service
> systemctl disable lvm2-lvmetad.service
> systemctl disable lvm2-lvmetad.socket
> reboot
>
> - Creating volume group and logical volume
> systemctl start drbd (both nodes)
> drbdadm primary myres
> pvcreate /dev/drbd0
> vgcreate havolumegroup /dev/drbd0
> lvcreate -n c-vol1 -L1G havolumegroup
> [root@pcmk1 ~]# lvs
> LV VGAttr   LSize   Pool Origin Data%  Meta%
> Move Log Cpy%Sync Convert
> root   cl-wi-ao <17.00g
>
> swap   cl-wi-ao   2.00g
>
> c-vol1 havolumegroup -wi-a-   1.00g
>
>
> - Cluster Configuration
> yum install pcs fence-agents-all -y
> systemctl enable pcsd
> systemctl start pcsd
> echo redhat | passwd --stdin hacluster
> pcs cluster auth pcmk1 pcmk2
> pcs cluster setup --name ha_cluster pcmk1 pcmk2
> pcs cluster start --all
> pcs cluster enable --all
> pcs property set stonith-enabled=false<--- Just for test!!!
> pcs property set no-quorum-policy=ignore
>
> - Drbd resource configuration
> pcs cluster cib drbd_cfg
> pcs -f drbd_cfg resource create DrbdRes ocf:linbit:drbd
> drbd_resource=myres op monitor interval=60s
> pcs -f drbd_cfg resource master DrbdResClone DrbdRes master-max=1
> master-node-max=1 clone-max=2 clone-node-max=1 notify=true
> [root@pcmk1 ~]# pcs -f drbd_cfg 

[ClusterLabs] HALVM monitor action fail on slave node. Possible bug?

2018-04-13 Thread Marco Marino
Hello, I'm trying to configure a simple 2 node cluster with drbd and HALVM
(ocf:heartbeat:LVM) but I have a problem that I'm not able to solve, to I
decided to write this long post. I need to really understand what I'm doing
and where I'm doing wrong.
More precisely, I'm configuring a pacemaker cluster with 2 nodes and only
one drbd resource. Here all operations:

- System configuration
hostnamectl set-hostname pcmk[12]
yum update -y
yum install vim wget git -y
vim /etc/sysconfig/selinux  -> permissive mode
systemctl disable firewalld
reboot

- Network configuration
[pcmk1]
nmcli connection modify corosync ipv4.method manual ipv4.addresses
192.168.198.201/24 ipv6.method ignore connection.autoconnect yes
nmcli connection modify replication ipv4.method manual ipv4.addresses
192.168.199.201/24 ipv6.method ignore connection.autoconnect yes
[pcmk2]
nmcli connection modify corosync ipv4.method manual ipv4.addresses
192.168.198.202/24 ipv6.method ignore connection.autoconnect yes
nmcli connection modify replication ipv4.method manual ipv4.addresses
192.168.199.202/24 ipv6.method ignore connection.autoconnect yes

ssh-keyget -t rsa
ssh-copy-id root@pcmk[12]
scp /etc/hosts root@pcmk2:/etc/hosts

- Drbd Repo configuration and drbd installation
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh
http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
yum update -y
yum install drbd84-utils kmod-drbd84 -y

- Drbd Configuration:
Creating a new partition on top of /dev/vdb -> /dev/vdb1 of type
"Linux" (83)
[/etc/drbd.d/global_common.conf]
usage-count no;
[/etc/drbd.d/myres.res]
resource myres {
on pcmk1 {
device /dev/drbd0;
disk /dev/vdb1;
address 192.168.199.201:7789;
meta-disk internal;
}
on pcmk2 {
device /dev/drbd0;
disk /dev/vdb1;
address 192.168.199.202:7789;
meta-disk internal;
}
}

scp /etc/drbd.d/myres.res root@pcmk2:/etc/drbd.d/myres.res
systemctl start drbd <-- only for test. The service is disabled at boot!
drbdadm create-md myres
drbdadm up myres
drbdadm primary --force myres

- LVM Configuration
[root@pcmk1 ~]# lsblk
NAMEMAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sr0  11:01 1024M  0 rom
vda 252:00   20G  0 disk
├─vda1  252:101G  0 part /boot
└─vda2  252:20   19G  0 part
  ├─cl-root 253:00   17G  0 lvm  /
  └─cl-swap 253:102G  0 lvm  [SWAP]
vdb 252:16   08G  0 disk
└─vdb1  252:17   08G  0 part  <--- /dev/vdb1 is the partition
I'd like to use as backing device for drbd
  └─drbd0   147:008G  0 disk

[/etc/lvm/lvm.conf]
write_cache_state = 0
use_lvmetad = 0
filter = [ "a|drbd.*|", "a|vda.*|", "r|.*|" ]

Disabling lvmetad service
systemctl disable lvm2-lvmetad.service
systemctl disable lvm2-lvmetad.socket
reboot

- Creating volume group and logical volume
systemctl start drbd (both nodes)
drbdadm primary myres
pvcreate /dev/drbd0
vgcreate havolumegroup /dev/drbd0
lvcreate -n c-vol1 -L1G havolumegroup
[root@pcmk1 ~]# lvs
LV VGAttr   LSize   Pool Origin Data%  Meta%
Move Log Cpy%Sync Convert
root   cl-wi-ao
<17.00g
swap   cl-wi-ao
2.00g
c-vol1 havolumegroup -wi-a-   1.00g


- Cluster Configuration
yum install pcs fence-agents-all -y
systemctl enable pcsd
systemctl start pcsd
echo redhat | passwd --stdin hacluster
pcs cluster auth pcmk1 pcmk2
pcs cluster setup --name ha_cluster pcmk1 pcmk2
pcs cluster start --all
pcs cluster enable --all
pcs property set stonith-enabled=false<--- Just for test!!!
pcs property set no-quorum-policy=ignore

- Drbd resource configuration
pcs cluster cib drbd_cfg
pcs -f drbd_cfg resource create DrbdRes ocf:linbit:drbd
drbd_resource=myres op monitor interval=60s
pcs -f drbd_cfg resource master DrbdResClone DrbdRes master-max=1
master-node-max=1 clone-max=2 clone-node-max=1 notify=true
[root@pcmk1 ~]# pcs -f drbd_cfg resource show
 Master/Slave Set: DrbdResClone [DrbdRes]
 Stopped: [ pcmk1 pcmk2 ]
[root@pcmk1 ~]#

Testing the failover with a forced shutoff of pcmk1. When pcmk1 returns
up, drbd is slave but logical volume is not active on pcmk2. So I need HALVM
[root@pcmk2 ~]# lvs
  LV VGAttr   LSize   Pool Origin Data%  Meta%
Move Log Cpy%Sync Convert
  root   cl-wi-ao
<17.00g
  swap   cl-wi-ao
2.00g
  c-vol1 havolumegroup -wi---
1.00g
[root@pcmk2 ~]#



- Lvm resource and constraints
pcs cluster cib lvm_cfg
pcs -f lvm_cfg resource create HALVM