Thanks for the update Ken!
From: Ken Gaillot Sent: Saturday, 21 October 2017 7:06 AM To: Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] crm_resource --wait I've narrowed down the cause. When the "standby" transition completes, vm2 has more remaining utilization capacity than vm1, so the cluster wants to run sv-fencer there. That should be taken into account in the same transition, but it isn't, so a second transition is needed to make it happen. Still investigating a fix. A workaround is to assign some stickiness or utilization to sv-fencer. On Wed, 2017-10-11 at 14:01 +1000, Leon Steffens wrote: > I've attached two files: > 314 = after standby step > 315 = after resource update > > On Wed, Oct 11, 2017 at 12:22 AM, Ken Gaillot <kgail...@redhat.com> > wrote: > > On Tue, 2017-10-10 at 15:19 +1000, Leon Steffens wrote: > > > Hi Ken, > > > > > > I managed to reproduce this on a simplified version of the > > cluster, > > > and on Pacemaker 1.1.15, 1.1.16, as well as 1.1.18-rc1 > > > > > The steps to create the cluster are: > > > > > > pcs property set stonith-enabled=false > > > pcs property set placement-strategy=balanced > > > > > > pcs node utilization vm1 cpu=100 > > > pcs node utilization vm2 cpu=100 > > > pcs node utilization vm3 cpu=100 > > > > > > pcs property set maintenance-mode=true > > > > > > pcs resource create sv-fencer ocf:pacemaker:Dummy > > > > > > pcs resource create sv ocf:pacemaker:Dummy clone notify=false > > > pcs resource create std ocf:pacemaker:Dummy meta resource- > > > stickiness=100 > > > > > > pcs resource create partition1 ocf:pacemaker:Dummy meta resource- > > > stickiness=100 > > > pcs resource create partition2 ocf:pacemaker:Dummy meta resource- > > > stickiness=100 > > > pcs resource create partition3 ocf:pacemaker:Dummy meta resource- > > > stickiness=100 > > > > > > pcs resource utilization partition1 cpu=5 > > > pcs resource utilization partition2 cpu=5 > > > pcs resource utilization partition3 cpu=5 > > > > > > pcs constraint colocation add std with sv-clone INFINITY > > > pcs constraint colocation add partition1 with sv-clone INFINITY > > > pcs constraint colocation add partition2 with sv-clone INFINITY > > > pcs constraint colocation add partition3 with sv-clone INFINITY > > > > > > pcs property set maintenance-mode=false > > > > > > > > > I can then reproduce the issues in the following way: > > > > > > $ pcs resource > > > sv-fencer (ocf::pacemaker:Dummy): Started vm1 > > > Clone Set: sv-clone [sv] > > > Started: [ vm1 vm2 vm3 ] > > > std (ocf::pacemaker:Dummy): Started vm2 > > > partition1 (ocf::pacemaker:Dummy): Started vm3 > > > partition2 (ocf::pacemaker:Dummy): Started vm1 > > > partition3 (ocf::pacemaker:Dummy): Started vm2 > > > > > > $ pcs cluster standby vm3 > > > > > > # Check that all resources have moved off vm3 > > > $ pcs resource > > > sv-fencer (ocf::pacemaker:Dummy): Started vm1 > > > Clone Set: sv-clone [sv] > > > Started: [ vm1 vm2 ] > > > Stopped: [ vm3 ] > > > std (ocf::pacemaker:Dummy): Started vm2 > > > partition1 (ocf::pacemaker:Dummy): Started vm1 > > > partition2 (ocf::pacemaker:Dummy): Started vm1 > > > partition3 (ocf::pacemaker:Dummy): Started vm2 > > > > Thanks for the detailed information, this should help me get to the > > bottom of it. From this description, it sounds like a new > > transition > > isn't being triggered when it should. > > > > Could you please attach the DC's pe-input file that is listed in > > the > > logs after the standby step above? That would simplify analysis. > > > > > # Wait for any outstanding actions to complete. > > > $ crm_resource --wait --timeout 300 > > > Pending actions: > > > Action 22: sv-fencer_monitor_10000 on vm2 > > > Action 21: sv-fencer_start_0 on vm2 > > > Action 20: sv-fencer_stop_0 on vm1 > > > Error performing operation: Timer expired > > > > > > # Check the resources again - sv-fencer is still on vm1 > > > $ pcs resource > > > sv-fencer (ocf::pacemaker:Dummy): Started vm1 > > > Clone Set: sv-clone [sv] > > > Started: [ vm1 vm2 ] > > > Stopped: [ vm3 ] > > > std (ocf::pacemaker:Dummy): Started vm2 > > > partition1 (ocf::pacemaker:Dummy): Started vm1 > > > partition2 (ocf::pacemaker:Dummy): Started vm1 > > > partition3 (ocf::pacemaker:Dummy): Started vm2 > > > > > > # Perform a random update to the CIB. > > > $ pcs resource update std op monitor interval=20 timeout=20 > > > > > > # Check resource status again - sv_fencer has now moved to vm2 > > (the > > > action crm_resource was waiting for) > > > $ pcs resource > > > sv-fencer (ocf::pacemaker:Dummy): Started vm2 > > <<<============ > > > Clone Set: sv-clone [sv] > > > Started: [ vm1 vm2 ] > > > Stopped: [ vm3 ] > > > std (ocf::pacemaker:Dummy): Started vm2 > > > partition1 (ocf::pacemaker:Dummy): Started vm1 > > > partition2 (ocf::pacemaker:Dummy): Started vm1 > > > partition3 (ocf::pacemaker:Dummy): Started vm2 > > > > > > I do not get the problem if I: > > > 1) remove the "std" resource; or > > > 2) remove the co-location constraints; or > > > 3) remove the utilization attributes for the partition resources. > > > > > > In these cases the sv-fencer resource is happy to stay on vm1, > > and > > > crm_resource --wait returns immediately. > > > > > > It looks like the pcs cluster standby call is > > creating/registering > > > the actions to move the sv-fencer resource to vm2, but it doesn't > > > include it in the cluster transition. When the CIB is later > > updated > > > by something else, the action is included in that transition. > > > > > > > > > Regards, > > > Leon > > > > _______________________________________________ > > Users mailing list: Users@clusterlabs.org > > http://lists.clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratc > > h.pdf > > Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch. > pdf > Bugs: http://bugs.clusterlabs.org -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
_______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org