Re: [ClusterLabs] crm_resource --wait
I've narrowed down the cause. When the "standby" transition completes, vm2 has more remaining utilization capacity than vm1, so the cluster wants to run sv-fencer there. That should be taken into account in the same transition, but it isn't, so a second transition is needed to make it happen. Still investigating a fix. A workaround is to assign some stickiness or utilization to sv-fencer. On Wed, 2017-10-11 at 14:01 +1000, Leon Steffens wrote: > I've attached two files: > 314 = after standby step > 315 = after resource update > > On Wed, Oct 11, 2017 at 12:22 AM, Ken Gaillot> wrote: > > On Tue, 2017-10-10 at 15:19 +1000, Leon Steffens wrote: > > > Hi Ken, > > > > > > I managed to reproduce this on a simplified version of the > > cluster, > > > and on Pacemaker 1.1.15, 1.1.16, as well as 1.1.18-rc1 > > > > > The steps to create the cluster are: > > > > > > pcs property set stonith-enabled=false > > > pcs property set placement-strategy=balanced > > > > > > pcs node utilization vm1 cpu=100 > > > pcs node utilization vm2 cpu=100 > > > pcs node utilization vm3 cpu=100 > > > > > > pcs property set maintenance-mode=true > > > > > > pcs resource create sv-fencer ocf:pacemaker:Dummy > > > > > > pcs resource create sv ocf:pacemaker:Dummy clone notify=false > > > pcs resource create std ocf:pacemaker:Dummy meta resource- > > > stickiness=100 > > > > > > pcs resource create partition1 ocf:pacemaker:Dummy meta resource- > > > stickiness=100 > > > pcs resource create partition2 ocf:pacemaker:Dummy meta resource- > > > stickiness=100 > > > pcs resource create partition3 ocf:pacemaker:Dummy meta resource- > > > stickiness=100 > > > > > > pcs resource utilization partition1 cpu=5 > > > pcs resource utilization partition2 cpu=5 > > > pcs resource utilization partition3 cpu=5 > > > > > > pcs constraint colocation add std with sv-clone INFINITY > > > pcs constraint colocation add partition1 with sv-clone INFINITY > > > pcs constraint colocation add partition2 with sv-clone INFINITY > > > pcs constraint colocation add partition3 with sv-clone INFINITY > > > > > > pcs property set maintenance-mode=false > > > > > > > > > I can then reproduce the issues in the following way: > > > > > > $ pcs resource > > > sv-fencer (ocf::pacemaker:Dummy): Started vm1 > > > Clone Set: sv-clone [sv] > > > Started: [ vm1 vm2 vm3 ] > > > std (ocf::pacemaker:Dummy): Started vm2 > > > partition1 (ocf::pacemaker:Dummy): Started vm3 > > > partition2 (ocf::pacemaker:Dummy): Started vm1 > > > partition3 (ocf::pacemaker:Dummy): Started vm2 > > > > > > $ pcs cluster standby vm3 > > > > > > # Check that all resources have moved off vm3 > > > $ pcs resource > > > sv-fencer (ocf::pacemaker:Dummy): Started vm1 > > > Clone Set: sv-clone [sv] > > > Started: [ vm1 vm2 ] > > > Stopped: [ vm3 ] > > > std (ocf::pacemaker:Dummy): Started vm2 > > > partition1 (ocf::pacemaker:Dummy): Started vm1 > > > partition2 (ocf::pacemaker:Dummy): Started vm1 > > > partition3 (ocf::pacemaker:Dummy): Started vm2 > > > > Thanks for the detailed information, this should help me get to the > > bottom of it. From this description, it sounds like a new > > transition > > isn't being triggered when it should. > > > > Could you please attach the DC's pe-input file that is listed in > > the > > logs after the standby step above? That would simplify analysis. > > > > > # Wait for any outstanding actions to complete. > > > $ crm_resource --wait --timeout 300 > > > Pending actions: > > > Action 22: sv-fencer_monitor_1 on vm2 > > > Action 21: sv-fencer_start_0 on vm2 > > > Action 20: sv-fencer_stop_0 on vm1 > > > Error performing operation: Timer expired > > > > > > # Check the resources again - sv-fencer is still on vm1 > > > $ pcs resource > > > sv-fencer (ocf::pacemaker:Dummy): Started vm1 > > > Clone Set: sv-clone [sv] > > > Started: [ vm1 vm2 ] > > > Stopped: [ vm3 ] > > > std (ocf::pacemaker:Dummy): Started vm2 > > > partition1 (ocf::pacemaker:Dummy): Started vm1 > > > partition2 (ocf::pacemaker:Dummy): Started vm1 > > > partition3 (ocf::pacemaker:Dummy): Started vm2 > > > > > > # Perform a random update to the CIB. > > > $ pcs resource update std op monitor interval=20 timeout=20 > > > > > > # Check resource status again - sv_fencer has now moved to vm2 > > (the > > > action crm_resource was waiting for) > > > $ pcs resource > > > sv-fencer (ocf::pacemaker:Dummy): Started vm2 > > <<< > > > Clone Set: sv-clone [sv] > > > Started: [ vm1 vm2 ] > > > Stopped: [ vm3 ] > > > std (ocf::pacemaker:Dummy): Started vm2 > > > partition1 (ocf::pacemaker:Dummy): Started vm1 > > > partition2 (ocf::pacemaker:Dummy): Started vm1 > > > partition3 (ocf::pacemaker:Dummy): Started vm2 > > > > > > I do not get the problem if I: > > > 1) remove the "std" resource; or > > > 2) remove
Re: [ClusterLabs] Pacemaker resource parameter reload confusion
On Fri, 2017-10-20 at 15:52 +0200, Ferenc Wágner wrote: > Ken Gaillotwrites: > > > On Fri, 2017-09-22 at 18:30 +0200, Ferenc Wágner wrote: > > > Ken Gaillot writes: > > > > > > > Hmm, stop+reload is definitely a bug. Can you attach (or email > > > > it to > > > > me privately, or file a bz with it attached) the above pe-input > > > > file > > > > with any sensitive info removed? > > > > > > I sent you the pe-input file privately. It indeed shows the > > > issue: > > > > > > $ /usr/sbin/crm_simulate -x pe-input-1033.bz2 -RS > > > [...] > > > Executing cluster transition: > > > * Resource action: vm-alderstop on vhbl05 > > > * Resource action: vm-alderreload on vhbl05 > > > [...] > > > > > > Hope you can easily get to the bottom of this. > > > > This turned out to have the same underlying cause as CLBZ#5309. I > > have > > a fix pending review, which I expect to make it into the soon-to- > > be- > > released 1.1.18. > > Great! > > > It is a regression introduced in 1.1.15 by commit 2558d76f. The > > logic > > for reloads was consolidated in one place, but that happened to be > > before restarts were scheduled, so it no longer had the right > > information about whether a restart was needed. Now, it sets an > > ordering flag that is used later to cancel the reload if the > > restart > > becomes required. I've also added a regression test for it. > > Restarts shouldn't even enter the picture here, so I don't get your > explanation. But I also don't know the code, so that doesn't mean a > thing. I'll test the next RC to be sure. :-) Reloads are done in place of restarts, when circumstances allow. So reloads are always related to (potential) restarts. The problem arose because not all of the relevant circumstances are known at the time the reload action is created. We may figure out later that a resource the reloading resource depends on must be restarted, therefore the reloading resource must be fully restarted instead of reloaded. E.g. a database resource might otherwise be able to reload, but not if the filesystem it's using is going away. Previously in those cases, we would end up scheduling both the reload and the restart. Now, we schedule only the restart. -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Corosync 2.4.3 is available at corosync.org!
On 2017-10-20 10:26 AM, Jan Friesse wrote: > I am pleased to announce the latest maintenance release of Corosync > 2.4.3 available immediately from our website at > http://build.clusterlabs.org/corosync/releases/. > > This release contains a lot of fixes. New feature is support for > heuristics in qdevice. > > Complete changelog for 2.4.3: > Adrian Vondendriesch (1): > doc: document watchdog_device parameter > > Andrew Price (1): > Main: Call mlockall after fork > > Bin Liu (7): > Totempg: remove duplicate memcpy in mcast_msg func > Qdevice: fix spell errors in qdevice > logconfig: Do not overwrite logger_subsys priority > totemconfig: Prefer nodelist over bindnetaddr > cpghum: Fix printf of size_t variable > Qnetd lms: Use UTILS_PRI_RING_ID printf format str > wd: Report error when close of wd fails > > Christine Caulfield (6): > votequorum: Don't update expected_votes display if value is too high > votequorum: simplify reconfigure message handling > quorumtool: Add option to show all node addresses > main: Don't ask libqb to handle segv, it doesn't work > man: Document -a option to corosync-quorumtool > main: use syslog & printf directly for early log messages > > Edwin Torok (1): > votequorum: make atb consistent on nodelist reload > > Ferenc Wágner (7): > Fix typo: Destorying -> Destroying > init: Add doc URIs to the systemd service files > wd: fix typo > corosync.conf.5: Fix watchdog documentation > corosync.conf.5: add warning about slow watchdogs > wd: remove extra capitalization typo > corosync.conf.5: watchdog support is conditional > > Hideo Yamauchi (1): > notifyd: Add the community name to an SNMP trap > > Jan Friesse (11): > Logsys: Change logsys syslog_priority priority > totemrrp: Fix situation when all rings are faulty > main: Display reason why cluster cannot be formed > totem: Propagate totem initialization failure > totemcrypto: Refactor symmetric key importing > totemcrypto: Use different method to import key > main: Add option to set priority > main: Add support for libcgroup > totemcrypto: Fix compiler warning > cmap: Remove noop highest config version check > qdevice: Add support for heuristics > > Jan Pokorný (2): > Spec: drop unneeded dependency > Spec: make internal dependencies arch-qualified > > Jonathan Davies (1): > cmap: don't shutdown highest config_version node > > Kazunori INOUE (1): > totemudp: Remove memb_join discarding > > Keisuke MORI (1): > Spec: fix arch-qualified dependencies > > Khem Raj (1): > Include fcntl.h for F_* and O_* defines > > Masse Nicolas (1): > totemudp: Retry if bind fails > > Richard B Winters (1): > Remove deprecated doxygen flags > > Takeshi MIZUTA (3): > man: Fix typos in man page > man: Modify man-page according to command usage > Remove redundant header file inclusion > > yuusuke (1): > upstart: Add softdog module loading example > > > Upgrade is (as usually) highly recommended. > > Thanks/congratulations to all people that contributed to achieve this > great milestone. Thanks to all who helped make this release happen! -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Corosync 2.4.3 is available at corosync.org!
I am pleased to announce the latest maintenance release of Corosync 2.4.3 available immediately from our website at http://build.clusterlabs.org/corosync/releases/. This release contains a lot of fixes. New feature is support for heuristics in qdevice. Complete changelog for 2.4.3: Adrian Vondendriesch (1): doc: document watchdog_device parameter Andrew Price (1): Main: Call mlockall after fork Bin Liu (7): Totempg: remove duplicate memcpy in mcast_msg func Qdevice: fix spell errors in qdevice logconfig: Do not overwrite logger_subsys priority totemconfig: Prefer nodelist over bindnetaddr cpghum: Fix printf of size_t variable Qnetd lms: Use UTILS_PRI_RING_ID printf format str wd: Report error when close of wd fails Christine Caulfield (6): votequorum: Don't update expected_votes display if value is too high votequorum: simplify reconfigure message handling quorumtool: Add option to show all node addresses main: Don't ask libqb to handle segv, it doesn't work man: Document -a option to corosync-quorumtool main: use syslog & printf directly for early log messages Edwin Torok (1): votequorum: make atb consistent on nodelist reload Ferenc Wágner (7): Fix typo: Destorying -> Destroying init: Add doc URIs to the systemd service files wd: fix typo corosync.conf.5: Fix watchdog documentation corosync.conf.5: add warning about slow watchdogs wd: remove extra capitalization typo corosync.conf.5: watchdog support is conditional Hideo Yamauchi (1): notifyd: Add the community name to an SNMP trap Jan Friesse (11): Logsys: Change logsys syslog_priority priority totemrrp: Fix situation when all rings are faulty main: Display reason why cluster cannot be formed totem: Propagate totem initialization failure totemcrypto: Refactor symmetric key importing totemcrypto: Use different method to import key main: Add option to set priority main: Add support for libcgroup totemcrypto: Fix compiler warning cmap: Remove noop highest config version check qdevice: Add support for heuristics Jan Pokorný (2): Spec: drop unneeded dependency Spec: make internal dependencies arch-qualified Jonathan Davies (1): cmap: don't shutdown highest config_version node Kazunori INOUE (1): totemudp: Remove memb_join discarding Keisuke MORI (1): Spec: fix arch-qualified dependencies Khem Raj (1): Include fcntl.h for F_* and O_* defines Masse Nicolas (1): totemudp: Retry if bind fails Richard B Winters (1): Remove deprecated doxygen flags Takeshi MIZUTA (3): man: Fix typos in man page man: Modify man-page according to command usage Remove redundant header file inclusion yuusuke (1): upstart: Add softdog module loading example Upgrade is (as usually) highly recommended. Thanks/congratulations to all people that contributed to achieve this great milestone. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker resource parameter reload confusion
Ken Gaillotwrites: > On Fri, 2017-09-22 at 18:30 +0200, Ferenc Wágner wrote: >> Ken Gaillot writes: >> >>> Hmm, stop+reload is definitely a bug. Can you attach (or email it to >>> me privately, or file a bz with it attached) the above pe-input file >>> with any sensitive info removed? >> >> I sent you the pe-input file privately. It indeed shows the issue: >> >> $ /usr/sbin/crm_simulate -x pe-input-1033.bz2 -RS >> [...] >> Executing cluster transition: >> * Resource action: vm-alderstop on vhbl05 >> * Resource action: vm-alderreload on vhbl05 >> [...] >> >> Hope you can easily get to the bottom of this. > > This turned out to have the same underlying cause as CLBZ#5309. I have > a fix pending review, which I expect to make it into the soon-to-be- > released 1.1.18. Great! > It is a regression introduced in 1.1.15 by commit 2558d76f. The logic > for reloads was consolidated in one place, but that happened to be > before restarts were scheduled, so it no longer had the right > information about whether a restart was needed. Now, it sets an > ordering flag that is used later to cancel the reload if the restart > becomes required. I've also added a regression test for it. Restarts shouldn't even enter the picture here, so I don't get your explanation. But I also don't know the code, so that doesn't mean a thing. I'll test the next RC to be sure. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] corosync continues even if node is removed from the cluster
Hi ClusterLabs, I have a query about safely removing a node from a corosync cluster. When "corosync-cfgtool -R" is issued, it causes all nodes to reload their config from corosync.conf. If I have removed a node from the nodelist but corosync is still running on that node, it will receive the reload signal but will try to continue as if nothing had happened. This then causes various problems on all nodes. A specific example: I have a running cluster containing two nodes: 10.71.217.70 (nodeid=1) and 10.71.217.71 (nodeid=2). When I remove node 1 from the nodelist in corosync.conf on both nodes then issue "corosync-cfgtool -R" on 10.71.217.71, I see this on 10.71.217.70: Quorum information -- Date: Fri Oct 20 13:23:02 2017 Quorum provider: corosync_votequorum Nodes:2 Node ID: 1 Ring ID: 124 Quorate: Yes Votequorum information -- Expected votes: 2 Highest expected: 2 Total votes: 2 Quorum: 2 Flags:Quorate AutoTieBreaker Membership information -- Nodeid Votes Name 1 1 cluster1 (local) 2 1 10.71.217.71 and this on 10.71.217.71: Quorum information -- Date: Fri Oct 20 13:22:46 2017 Quorum provider: corosync_votequorum Nodes:1 Node ID: 2 Ring ID: 132 Quorate: No Votequorum information -- Expected votes: 2 Highest expected: 2 Total votes: 1 Quorum: 2 Activity blocked Flags: Membership information -- Nodeid Votes Name 2 1 10.71.217.71 (local) Instead, I would expect corosync on node 1 to exit and node 2 to have "expected votes: 1, total votes: 1, quorate: yes". I notice that there is already some logic in votequorum.c that detects this condition, and it produces the following log messages on node 1: debug [VOTEQ ] No nodelist defined or our node is not in the nodelist crit[VOTEQ ] configuration error: nodelist or quorum.expected_votes must be configured! crit[VOTEQ ] will continue with current runtime data What is the rationale for continuing despite the obvious inconsistency? Surely this is destined to cause problems...? I find that I get my expected behaviour with the following patch: diff --git a/exec/votequorum.c b/exec/votequorum.c index 1a97c6d..4ff7ff2 100644 --- a/exec/votequorum.c +++ b/exec/votequorum.c @@ -1286,7 +1287,8 @@ static char *votequorum_readconfig(int runtime) error = (char *)"configuration error: nodelist or quorum.expected_votes must be configured!"; } else { log_printf(LOGSYS_LEVEL_CRIT, "configuration error: nodelist or quorum.expected_votes must be configured!"); - log_printf(LOGSYS_LEVEL_CRIT, "will continue with current runtime data"); + log_printf(LOGSYS_LEVEL_CRIT, "exiting..."); + exit(1); } goto out; } Is there any reason why that would not be a good idea? Thanks, Jonathan ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org