Re: [ClusterLabs] Resources always return to original node

2020-09-26 Thread Strahil Nikolov
Resource Stickiness for a group is the sum of all resources' resource stikiness 
-> 5 resources x 100 score (default stickiness) = 500 score.
If your location constraint has a bigger number -> it wins :)


Best Regards,
Strahil Nikolov






В събота, 26 септември 2020 г., 12:22:32 Гринуич+3, Michael Ivanov 
 написа: 







Hallo,

I have strange problem: when I reset the node on which my resources are 
running, they are correctly migrated to the other node. But when I turn the 
failed node back, then as soon as it is up all resources are returned back to 
it. I have set resource-stickiness default value to 100. When this did not help 
I have set up resource-stickiness meta attr also to 100 for all my resources. 
Still when the failed node recovers the resources are migrated back to it! 
Where should I look to try to understand this situation?

Here's the configuration of my cluster:

root@node1# pcs status
Cluster name: gcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: node1 (version 2.0.4-2deceaa3ae) - partition with quorum
  * Last updated: Sat Sep 26 11:12:34 2020
  * Last change:  Sat Sep 26 10:39:16 2020 by root via cibadmin on node1
  * 2 nodes configured
  * 14 resource instances configured (1 DISABLED)

Node List:
  * Online: [ node1 node2 ]

Full List of Resources:
  * ilo5_node1    (stonith:fence_ilo5_ssh): Started node2
  * ilo5_node2    (stonith:fence_ilo5_ssh): Started node1
  * Resource Group: VirtIP:
    * PrimaryIP    (ocf::heartbeat:IPaddr2): Started node2
    * PrimaryIP6    (ocf::heartbeat:IPv6addr): Started node2
    * AliasIP    (ocf::heartbeat:IPaddr2): Started node2
  * BackupFS    (ocf::redhat:netfs.sh): Started node2
  * Clone Set: MailVolume-clone [MailVolume] (promotable):
    * Masters: [ node2 ]
    * Slaves: [ node1 ]
  * MailFS    (ocf::heartbeat:Filesystem): Started node2
  * apache    (ocf::heartbeat:apache): Started node2
  * postfix    (ocf::heartbeat:postfix): Started node2
  * amavis    (service:amavis): Started node2
  * dovecot    (service:dovecot): Started node2
  * openvpn    (service:openvpn): Stopped (disabled)

And resources:

root@node1# pcs resource config
 Group: VirtIP
  Meta Attrs: resource-stickiness=100
  Resource: PrimaryIP (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: cidr_netmask=16 ip=xx.xx.xx.20 nic=br0
   Meta Attrs: resource-stickiness=100
   Operations: monitor interval=30s (PrimaryIP-monitor-interval-30s)
   start interval=0s timeout=20s (PrimaryIP-start-interval-0s)
   stop interval=0s timeout=20s (PrimaryIP-stop-interval-0s)
  Resource: PrimaryIP6 (class=ocf provider=heartbeat type=IPv6addr)
   Attributes: cidr_netmask=64 ipv6addr=::::0:0:0:20 nic=br0
   Meta Attrs: resource-stickiness=100
   Operations: monitor interval=30s (PrimaryIP6-monitor-interval-30s)
   start interval=0s timeout=15s (PrimaryIP6-start-interval-0s)
   stop interval=0s timeout=15s (PrimaryIP6-stop-interval-0s)
  Resource: AliasIP (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: cidr_netmask=16 ip=xx.xx.yy.20 nic=br0
   Meta Attrs: resource-stickiness=100
   Operations: monitor interval=30s (AliasIP-monitor-interval-30s)
   start interval=0s timeout=20s (AliasIP-start-interval-0s)
   stop interval=0s timeout=20s (AliasIP-stop-interval-0s)
 Resource: BackupFS (class=ocf provider=redhat type=netfs.sh)
  Attributes: export=/Backup/Gateway fstype=nfs host=atlas mountpoint=/Backup 
options=noatime,async
  Meta Attrs: resource-stickiness=100
  Operations: monitor interval=1m timeout=10 (BackupFS-monitor-interval-1m)
  monitor interval=5m timeout=30 OCF_CHECK_LEVEL=10 
(BackupFS-monitor-interval-5m)
  monitor interval=10m timeout=30 OCF_CHECK_LEVEL=20 
(BackupFS-monitor-interval-10m)
  start interval=0s timeout=900 (BackupFS-start-interval-0s)
  stop interval=0s timeout=30 (BackupFS-stop-interval-0s)
 Clone: MailVolume-clone
  Meta Attrs: clone-max=2 clone-node-max=1 notify=true promotable=true 
promoted-max=1 promoted-node-max=1 resource-stickiness=100
  Resource: MailVolume (class=ocf provider=linbit type=drbd)
   Attributes: drbd_resource=mail
   Meta Attrs: resource-stickiness=100
   Operations: demote interval=0s timeout=90 (MailVolume-demote-interval-0s)
   monitor interval=60s (MailVolume-monitor-interval-60s)
   notify interval=0s timeout=90 (MailVolume-notify-interval-0s)
   promote interval=0s timeout=90 (MailVolume-promote-interval-0s)
   reload interval=0s timeout=30 (MailVolume-reload-interval-0s)
   start interval=0s timeout=240 (MailVolume-start-interval-0s)
   stop interval=0s timeout=100 (MailVolume-stop-interval-0s)
 Resource: MailFS (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=/dev/drbd0 directory=/var/mail fstype=btrfs
  Meta Attrs: resource-stickiness

Re: [ClusterLabs] Resources always return to original node

2020-09-26 Thread Andrei Borzenkov
26.09.2020 12:22, Michael Ivanov пишет:
> Hallo,
> 
> I have strange problem: when I reset the node on which my resources are 
> running, 
> they are correctly migrated to the other node. But when I turn the failed 
> node 
> back, then as soon as it is up all resources are returned back to it. I have 
> set 
> resource-stickiness default value to 100. When this did not help I have set 
> up 
> resource-stickiness meta attr also to 100 for all my resources. Still when 
> the 
> failed node recovers the resources are migrated back to it! Where should I 
> look 
> to try to understand this situation?
> 

The first thing to check are location and colocation constraints.

> Here's the configuration of my cluster:
> 
> root@node1# pcs status
> Cluster name: gcluster
> Cluster Summary:
>* Stack: corosync
>* Current DC: node1 (version 2.0.4-2deceaa3ae) - partition with quorum
>* Last updated: Sat Sep 26 11:12:34 2020
>* Last change:  Sat Sep 26 10:39:16 2020 by root via cibadmin on node1
>* 2 nodes configured
>* 14 resource instances configured (1 DISABLED)
> 
> Node List:
>* Online: [ node1 node2 ]
> 
> Full List of Resources:
>* ilo5_node1(stonith:fence_ilo5_ssh): Started node2
>* ilo5_node2(stonith:fence_ilo5_ssh): Started node1
>* Resource Group: VirtIP:
>  * PrimaryIP(ocf::heartbeat:IPaddr2): Started node2
>  * PrimaryIP6(ocf::heartbeat:IPv6addr): Started node2
>  * AliasIP(ocf::heartbeat:IPaddr2): Started node2
>* BackupFS(ocf::redhat:netfs.sh): Started node2
>* Clone Set: MailVolume-clone [MailVolume] (promotable):
>  * Masters: [ node2 ]
>  * Slaves: [ node1 ]
>* MailFS(ocf::heartbeat:Filesystem): Started node2
>* apache(ocf::heartbeat:apache): Started node2
>* postfix(ocf::heartbeat:postfix): Started node2
>* amavis(service:amavis): Started node2
>* dovecot(service:dovecot): Started node2
>* openvpn(service:openvpn): Stopped (disabled)
> 
> And resources:
> 
> root@node1# pcs resource config
>   Group: VirtIP
>Meta Attrs: resource-stickiness=100
>Resource: PrimaryIP (class=ocf provider=heartbeat type=IPaddr2)
> Attributes: cidr_netmask=16 ip=xx.xx.xx.20 nic=br0
> Meta Attrs: resource-stickiness=100
> Operations: monitor interval=30s (PrimaryIP-monitor-interval-30s)
> start interval=0s timeout=20s (PrimaryIP-start-interval-0s)
> stop interval=0s timeout=20s (PrimaryIP-stop-interval-0s)
>Resource: PrimaryIP6 (class=ocf provider=heartbeat type=IPv6addr)
> Attributes: cidr_netmask=64 ipv6addr=::::0:0:0:20 nic=br0
> Meta Attrs: resource-stickiness=100
> Operations: monitor interval=30s (PrimaryIP6-monitor-interval-30s)
> start interval=0s timeout=15s (PrimaryIP6-start-interval-0s)
> stop interval=0s timeout=15s (PrimaryIP6-stop-interval-0s)
>Resource: AliasIP (class=ocf provider=heartbeat type=IPaddr2)
> Attributes: cidr_netmask=16 ip=xx.xx.yy.20 nic=br0
> Meta Attrs: resource-stickiness=100
> Operations: monitor interval=30s (AliasIP-monitor-interval-30s)
> start interval=0s timeout=20s (AliasIP-start-interval-0s)
> stop interval=0s timeout=20s (AliasIP-stop-interval-0s)
>   Resource: BackupFS (class=ocf provider=redhat type=netfs.sh)
>Attributes: export=/Backup/Gateway fstype=nfs host=atlas 
> mountpoint=/Backup 
> options=noatime,async
>Meta Attrs: resource-stickiness=100
>Operations: monitor interval=1m timeout=10 (BackupFS-monitor-interval-1m)
>monitor interval=5m timeout=30 OCF_CHECK_LEVEL=10 
> (BackupFS-monitor-interval-5m)
>monitor interval=10m timeout=30 OCF_CHECK_LEVEL=20 
> (BackupFS-monitor-interval-10m)
>start interval=0s timeout=900 (BackupFS-start-interval-0s)
>stop interval=0s timeout=30 (BackupFS-stop-interval-0s)
>   Clone: MailVolume-clone
>Meta Attrs: clone-max=2 clone-node-max=1 notify=true promotable=true 
> promoted-max=1 promoted-node-max=1 resource-stickiness=100
>Resource: MailVolume (class=ocf provider=linbit type=drbd)
> Attributes: drbd_resource=mail
> Meta Attrs: resource-stickiness=100
> Operations: demote interval=0s timeout=90 (MailVolume-demote-interval-0s)
> monitor interval=60s (MailVolume-monitor-interval-60s)
> notify interval=0s timeout=90 (MailVolume-notify-interval-0s)
> promote interval=0s timeout=90 
> (MailVolume-promote-interval-0s)
> reload interval=0s timeout=30 (MailVolume-reload-interval-0s)
> start interval=0s timeout=240 (MailVolume-start-interval-0s)
> stop interval=0s timeout=100 (MailVolume-stop-interval-0s)
>   Resource: MailFS (class=ocf provider=heartbeat type=Filesystem)
>Attributes: device=/dev/drbd0 directory=/v

[ClusterLabs] Resources always return to original node

2020-09-26 Thread Michael Ivanov

  
  
Hallo,
I have strange problem: when I reset the node on which my
  resources are running, they are correctly migrated to the other
  node. But when I turn the failed node back, then as soon as it is
  up all resources are returned back to it. I have set
  resource-stickiness default value to 100. When this did not help I
  have set up resource-stickiness meta attr also to 100 for all my
  resources. Still when the failed node recovers the resources are
  migrated back to it! Where should I look to try to understand this
  situation?
Here's the configuration of my cluster:

root@node1# pcs status
Cluster name: gcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: node1 (version 2.0.4-2deceaa3ae) - partition with
quorum
  * Last updated: Sat Sep 26 11:12:34 2020
  * Last change:  Sat Sep 26 10:39:16 2020 by root via cibadmin on
node1
  * 2 nodes configured
  * 14 resource instances configured (1 DISABLED)

Node List:
  * Online: [ node1 node2 ]

Full List of Resources:
  * ilo5_node1    (stonith:fence_ilo5_ssh): Started node2
  * ilo5_node2    (stonith:fence_ilo5_ssh): Started node1
  * Resource Group: VirtIP:
    * PrimaryIP    (ocf::heartbeat:IPaddr2): Started node2
    * PrimaryIP6    (ocf::heartbeat:IPv6addr): Started node2
    * AliasIP    (ocf::heartbeat:IPaddr2): Started node2
  * BackupFS    (ocf::redhat:netfs.sh): Started node2
  * Clone Set: MailVolume-clone [MailVolume] (promotable):
    * Masters: [ node2 ]
    * Slaves: [ node1 ]
  * MailFS    (ocf::heartbeat:Filesystem): Started node2
  * apache    (ocf::heartbeat:apache): Started node2
  * postfix    (ocf::heartbeat:postfix): Started node2
  * amavis    (service:amavis): Started node2
  * dovecot    (service:dovecot): Started node2
  * openvpn    (service:openvpn): Stopped (disabled)

And resources:

root@node1# pcs resource config
 Group: VirtIP
  Meta Attrs: resource-stickiness=100
  Resource: PrimaryIP (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: cidr_netmask=16 ip=xx.xx.xx.20 nic=br0
   Meta Attrs: resource-stickiness=100
   Operations: monitor interval=30s (PrimaryIP-monitor-interval-30s)
   start interval=0s timeout=20s
(PrimaryIP-start-interval-0s)
   stop interval=0s timeout=20s
(PrimaryIP-stop-interval-0s)
  Resource: PrimaryIP6 (class=ocf provider=heartbeat type=IPv6addr)
   Attributes: cidr_netmask=64 ipv6addr=::::0:0:0:20
nic=br0
   Meta Attrs: resource-stickiness=100
   Operations: monitor interval=30s
(PrimaryIP6-monitor-interval-30s)
   start interval=0s timeout=15s
(PrimaryIP6-start-interval-0s)
   stop interval=0s timeout=15s
(PrimaryIP6-stop-interval-0s)
  Resource: AliasIP (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: cidr_netmask=16 ip=xx.xx.yy.20 nic=br0
   Meta Attrs: resource-stickiness=100
   Operations: monitor interval=30s (AliasIP-monitor-interval-30s)
   start interval=0s timeout=20s
(AliasIP-start-interval-0s)
   stop interval=0s timeout=20s
(AliasIP-stop-interval-0s)
 Resource: BackupFS (class=ocf provider=redhat type=netfs.sh)
  Attributes: export=/Backup/Gateway fstype=nfs host=atlas
mountpoint=/Backup options=noatime,async
  Meta Attrs: resource-stickiness=100
  Operations: monitor interval=1m timeout=10
(BackupFS-monitor-interval-1m)
  monitor interval=5m timeout=30 OCF_CHECK_LEVEL=10
(BackupFS-monitor-interval-5m)
  monitor interval=10m timeout=30 OCF_CHECK_LEVEL=20
(BackupFS-monitor-interval-10m)
  start interval=0s timeout=900
(BackupFS-start-interval-0s)
  stop interval=0s timeout=30
(BackupFS-stop-interval-0s)
 Clone: MailVolume-clone
  Meta Attrs: clone-max=2 clone-node-max=1 notify=true
promotable=true promoted-max=1 promoted-node-max=1
resource-stickiness=100
  Resource: MailVolume (class=ocf provider=linbit type=drbd)
   Attributes: drbd_resource=mail
   Meta Attrs: resource-stickiness=100
   Operations: demote interval=0s timeout=90
(MailVolume-demote-interval-0s)
   monitor interval=60s
(MailVolume-monitor-interval-60s)
   notify interval=0s timeout=90
(MailVolume-notify-interval-0s)
   promote interval=0s timeout=90
(MailVolume-promote-interval-0s)
   reload interval=0s timeout=30
(MailVolume-reload-interval-0s)
   start interval=0s timeout=240
(MailVolume-start-interval-0s)
   stop interval=0s timeout=100
(MailVolume-stop-interval-0s)
 Resource: MailFS (class=ocf pr