Hi, Quentin. Thanks for the logs! I see you highlighted the fact that SERVICE1 was in "Stopping" state on both node 1 and node 2 when node 1 was rejoining the cluster. I also noted the following later in the logs, as well as some similar messages earlier:
Aug 27 08:47:02 [1330] NODE2 pengine: info: determine_op_status: Operation monitor found resource SERVICE1 active on NODE1 Aug 27 08:47:02 [1330] NODE2 pengine: info: determine_op_status: Operation monitor found resource SERVICE1 active on NODE1 Aug 27 08:47:02 [1330] NODE2 pengine: info: determine_op_status: Operation monitor found resource SERVICE4 active on NODE2 Aug 27 08:47:02 [1330] NODE2 pengine: info: determine_op_status: Operation monitor found resource SERVICE1 active on NODE2 ... Aug 27 08:47:02 [1330] NODE2 pengine: info: common_print: 1 : NODE1 Aug 27 08:47:02 [1330] NODE2 pengine: info: common_print: 2 : NODE2 ... Aug 27 08:47:02 [1330] NODE2 pengine: error: native_create_actions: Resource SERVICE1 is active on 2 nodes (attempting recovery) Aug 27 08:47:02 [1330] NODE2 pengine: notice: native_create_actions: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information Can you make sure that all the cluster-managed systemd services are disabled from starting at boot (i.e., `systemctl is-enabled service1`, and the same for all the others) on both nodes? If they are enabled, disable them. On Thu, Aug 27, 2020 at 12:46 AM Citron Vert <citron_v...@hotmail.com> wrote: > Hi, > > Sorry for using this email adress, my name is Quentin. Thank you for your > reply. > > I have already tried the stickiness solution (with the deprecated value). > I tried the one you gave me, and it does not change anything. > > Resources don't seem to move from node to node (i don't see the changes > with crm_mon command). > > > In the logs i found this line *"error: native_create_actions: > Resource SERVICE1 is active on 2 nodes*" > > Which led me to contact you to understand and learn a little more about > this cluster. And why there are running resources on the passive node. > > > You will find attached the logs during the reboot of the passive node and > my cluster configuration. > > I think I'm missing out on something in the configuration / logs that I > don't understand.. > > > Thank you in advance for your help, > > Quentin > > > Le 26/08/2020 à 20:16, Reid Wahl a écrit : > > Hi, Citron. > > Based on your description, it sounds like some resources **might** be > moving from node 1 to node 2, failing on node 2, and then moving back to > node 1. If that's what's happening (and even if it's not), then it's > probably smart to set some resource stickiness as a resource default. The > below command sets a resource stickiness score of 1. > > # pcs resource defaults resource-stickiness=1 > > Also note that the "default-resource-stickiness" cluster property is > deprecated and should not be used. > > Finally, an explicit default resource stickiness score of 0 can interfere > with the placement of cloned resource instances. If you don't want any > stickiness, then it's better to leave stickiness unset. That way, > primitives will have a stickiness of 0, but clone instances will have a > stickiness of 1. > > If adding stickiness does not resolve the issue, can you share your > cluster configuration and some logs that show the issue happening? Off the > top of my head I'm not sure why resources would start and stop on node 2 > without moving away from node1, unless they're clone instances that are > starting and then failing a monitor operation on node 2. > > On Wed, Aug 26, 2020 at 8:42 AM Citron Vert <citron_v...@hotmail.com> > wrote: > >> Hello, >> I am contacting you because I have a problem with my cluster and I cannot >> find (nor understand) any information that can help me. >> >> I have a 2 nodes cluster (pacemaker, corosync, pcs) installed on CentOS 7 >> with a set of configuration. >> Everything seems to works fine, but here is what happens: >> >> - Node1 and Node2 are running well with Node1 as primary >> - I reboot Node2 wich is passive (no changes on Node1) >> - Node2 comes back in the cluster as passive >> - corosync logs shows resources getting started then stopped on Node2 >> - "crm_mon" command shows some ressources on Node1 getting restarted >> >> I don't understand how it should work. >> If a node comes back, and becomes passive (since Node1 is running >> primary), there is no reason for the resources to be started then stopped >> on the new passive node ? >> >> One of my resources becomes unstable because it gets started and then >> stoped too quickly on Node2, wich seems to make it restart on Node1 without >> a failover. >> >> I tried several things and solution proposed by different sites and >> forums but without success. >> >> >> Is there a way so that the node, which joins the cluster as passive, does >> not start its own resources ? >> >> >> thanks in advance >> >> >> Here are some information just in case : >> $ rpm -qa | grep -E "corosync|pacemaker|pcs" >> corosync-2.4.5-4.el7.x86_64 >> pacemaker-cli-1.1.21-4.el7.x86_64 >> pacemaker-1.1.21-4.el7.x86_64 >> pcs-0.9.168-4.el7.centos.x86_64 >> corosynclib-2.4.5-4.el7.x86_64 >> pacemaker-libs-1.1.21-4.el7.x86_64 >> pacemaker-cluster-libs-1.1.21-4.el7.x86_64 >> >> >> <nvpair id="cib-bootstrap-options-stonith-enabled" name= >> "stonith-enabled" value="false"/> >> <nvpair id="cib-bootstrap-options-no-quorum-policy" name= >> "no-quorum-policy" value="ignore"/> >> <nvpair id="cib-bootstrap-options-dc-deadtime" name="dc-deadtime" >> value="120s"/> >> <nvpair id="cib-bootstrap-options-have-watchdog" name= >> "have-watchdog" value="false"/> >> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" >> value="1.1.21-4.el7-f14e36fd43"/> >> <nvpair id="cib-bootstrap-options-cluster-infrastructure" name= >> "cluster-infrastructure" value="corosync"/> >> <nvpair id="cib-bootstrap-options-cluster-name" name= >> "cluster-name" value="CLUSTER"/> >> <nvpair id="cib-bootstrap-options-last-lrm-refresh" name= >> "last-lrm-refresh" value="1598446314"/> >> <nvpair id="cib-bootstrap-options-default-resource-stickiness" >> name="default-resource-stickiness" value="0"/> >> >> >> >> >> _______________________________________________ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ >> > > > -- > Regards, > > Reid Wahl, RHCA > Software Maintenance Engineer, Red Hat > CEE - Platform Support Delivery - ClusterHA > > -- Regards, Reid Wahl, RHCA Software Maintenance Engineer, Red Hat CEE - Platform Support Delivery - ClusterHA
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/