Re: [ClusterLabs] HA problem: No live migration when setting node on standby

Andrei Borzenkov Thu, 13 Apr 2023 12:24:54 -0700

On 12.04.2023 15:44, Philip Schiller wrote:

Here are also some Additional some additional information for a failover with 
setting the node standby.


Apr 12 12:40:28 s1 pacemaker-controld[1611990]:  notice: State transition S_IDLE 
-> S_POLICY_ENGINE
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: On loss of quorum: 
Ignore
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop       
sto-ipmi-s0                (                        s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop       
pri-zfs-drbd_storage:0     (                        s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop       
pri-drbd-pluto:0           (               Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop       
pri-drbd-poserver:0        (               Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop       
pri-drbd-webserver:0       (               Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop       
pri-drbd-dhcp:0            (               Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop       
pri-drbd-wawi:0            (               Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop       
pri-drbd-wawius:0          (               Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop       
pri-drbd-saturn:0          (               Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop       
pri-drbd-openvpn:0         (               Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop       
pri-drbd-asterisk:0        (               Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop       
pri-drbd-alarmanlage:0     (               Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop       
pri-drbd-jabber:0          (               Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop       
pri-drbd-TESTOPTIXXX:0     (               Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Move       
pri-vm-jabber              (                  s1 -> s0 )  due to unrunnable 
mas-drbd-jabber demote
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Move       
pri-vm-alarmanlage         (                  s1 -> s0 )  due to unrunnable 
mas-drbd-alarmanlage demote

I had the same "unrunnable demote" yesterday when I tried to reproduceit, but I cannot reproduce it anymore. After some CIB modifications itworks as expected.


Using the original execution date of: 2023-04-13 18:35:11Z
Current cluster status:
  * Node List:
    * Node ha1: standby (with active resources)
    * Online: [ ha2 qnetd ]

  * Full List of Resources:
    * dummy_stonith     (stonith:external/_dummy):       Started ha1
    * Clone Set: cl-zfs_drbd_storage [zfs_drbd_storage]:
      * Started: [ ha1 ha2 ]
    * Clone Set: ms-drbd_fs [drbd_fs] (promotable):
      * Masters: [ ha1 ha2 ]
    * just_vm   (ocf::pacemaker:Dummy):  Started ha2
    * drbd_vm   (ocf::pacemaker:Dummy):  Started ha1

Transition Summary:
  * Move       dummy_stonith          ( ha1 -> ha2 )

* Stop zfs_drbd_storage:0 ( ha1 ) due to nodeavailability* Stop drbd_fs:0 ( Master ha1 ) due to nodeavailability

  * Migrate    drbd_vm                ( ha1 -> ha2 )

Executing Cluster Transition:
  * Resource action: dummy_stonith   stop on ha1
  * Pseudo action:   ms-drbd_fs_demote_0
  * Resource action: drbd_vm         migrate_to on ha1
  * Resource action: dummy_stonith   start on ha2
  * Resource action: drbd_fs         demote on ha1
  * Pseudo action:   ms-drbd_fs_demoted_0
  * Pseudo action:   ms-drbd_fs_stop_0
  * Resource action: drbd_vm         migrate_from on ha2
  * Resource action: drbd_vm         stop on ha1
  * Resource action: dummy_stonith   monitor=3600000 on ha2
  * Pseudo action:   cl-zfs_drbd_storage_stop_0
  * Resource action: drbd_fs         stop on ha1
  * Pseudo action:   ms-drbd_fs_stopped_0
  * Pseudo action:   drbd_vm_start_0
  * Resource action: zfs_drbd_storage stop on ha1
  * Pseudo action:   cl-zfs_drbd_storage_stopped_0
  * Resource action: drbd_vm         monitor=10000 on ha2
Using the original execution date of: 2023-04-13 18:35:11Z

Revised Cluster Status:
  * Node List:
    * Node ha1: standby
    * Online: [ ha2 qnetd ]

  * Full List of Resources:
    * dummy_stonith     (stonith:external/_dummy):       Started ha2
    * Clone Set: cl-zfs_drbd_storage [zfs_drbd_storage]:
      * Started: [ ha2 ]
      * Stopped: [ ha1 qnetd ]
    * Clone Set: ms-drbd_fs [drbd_fs] (promotable):
      * Masters: [ ha2 ]
      * Stopped: [ ha1 qnetd ]
    * just_vm   (ocf::pacemaker:Dummy):  Started ha2
    * drbd_vm   (ocf::pacemaker:Dummy):  Started ha2

where ordering constraints are

order drbd_fs_after_zfs_drbd_storage Mandatory: cl-zfs_drbd_storagems-drbd_fs:promote

order drbd_vm_after_drbd_fs Mandatory: ms-drbd_fs:promote drbd_vm
order just_vm_after_zfs_drbd_storage Mandatory: cl-zfs_drbd_storage just_vm

The "just_vm" was added to test behavior of ordering resource againstnormal, non promotable, clone.

OK, I compared CIB and the difference is that non-working case hasexplicit "start" action in order constraint. I.e.


order drbd_vm_after_drbd_fs Mandatory: ms-drbd_fs:promote drbd_vm:start

After I added it back I get the same failed "demote" action.

Transition Summary:

* Stop zfs_drbd_storage:0 ( ha1 ) due to nodeavailability* Stop drbd_fs:0 ( Master ha1 ) due to nodeavailability

  * Migrate    just_vm                ( ha1 -> ha2 )

* Move drbd_vm ( ha1 -> ha2 ) due to unrunnablems-drbd_fs demote


I was sure that "start" is default anyway. Go figure ...
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] HA problem: No live migration when setting node on standby

Reply via email to