No problem! That's what we're here for. I'm glad it's sorted out :) On Fri, Aug 28, 2020 at 12:27 AM Citron Vert <citron_v...@hotmail.com> wrote:
> Hi, > > You are right, the problems seem to come from some services that are > started at startup. > > My installation script disables all startup options for all services we > use, that's why I didn't focus on this possibility. > > But after a quick investigation, a colleague had the good idea to make a > "security" script that monitors and starts certain services. > > > Sorry to have contacted you for this little mistake, > > Thank you for the help, it was effective > > Quentin > > > > Le 27/08/2020 à 09:56, Reid Wahl a écrit : > > Hi, Quentin. Thanks for the logs! > > I see you highlighted the fact that SERVICE1 was in "Stopping" state on > both node 1 and node 2 when node 1 was rejoining the cluster. I also noted > the following later in the logs, as well as some similar messages earlier: > > Aug 27 08:47:02 [1330] NODE2 pengine: info: determine_op_status: > Operation monitor found resource SERVICE1 active on NODE1 > Aug 27 08:47:02 [1330] NODE2 pengine: info: determine_op_status: > Operation monitor found resource SERVICE1 active on NODE1 > Aug 27 08:47:02 [1330] NODE2 pengine: info: determine_op_status: > Operation monitor found resource SERVICE4 active on NODE2 > Aug 27 08:47:02 [1330] NODE2 pengine: info: determine_op_status: > Operation monitor found resource SERVICE1 active on NODE2 > ... > Aug 27 08:47:02 [1330] NODE2 pengine: info: common_print: > 1 : NODE1 > Aug 27 08:47:02 [1330] NODE2 pengine: info: common_print: > 2 : NODE2 > ... > Aug 27 08:47:02 [1330] NODE2 pengine: error: native_create_actions: > Resource SERVICE1 is active on 2 nodes (attempting recovery) > Aug 27 08:47:02 [1330] NODE2 pengine: notice: native_create_actions: > See https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more > information > > > Can you make sure that all the cluster-managed systemd services are disabled > from starting at boot (i.e., `systemctl is-enabled service1`, and the same > for all the others) on both nodes? If they are enabled, disable them. > > > On Thu, Aug 27, 2020 at 12:46 AM Citron Vert <citron_v...@hotmail.com> > wrote: > >> Hi, >> >> Sorry for using this email adress, my name is Quentin. Thank you for your >> reply. >> >> I have already tried the stickiness solution (with the deprecated >> value). I tried the one you gave me, and it does not change anything. >> >> Resources don't seem to move from node to node (i don't see the changes >> with crm_mon command). >> >> >> In the logs i found this line *"error: native_create_actions: >> Resource SERVICE1 is active on 2 nodes*" >> >> Which led me to contact you to understand and learn a little more about >> this cluster. And why there are running resources on the passive node. >> >> >> You will find attached the logs during the reboot of the passive node and >> my cluster configuration. >> >> I think I'm missing out on something in the configuration / logs that I >> don't understand.. >> >> >> Thank you in advance for your help, >> >> Quentin >> >> >> Le 26/08/2020 à 20:16, Reid Wahl a écrit : >> >> Hi, Citron. >> >> Based on your description, it sounds like some resources **might** be >> moving from node 1 to node 2, failing on node 2, and then moving back to >> node 1. If that's what's happening (and even if it's not), then it's >> probably smart to set some resource stickiness as a resource default. The >> below command sets a resource stickiness score of 1. >> >> # pcs resource defaults resource-stickiness=1 >> >> Also note that the "default-resource-stickiness" cluster property is >> deprecated and should not be used. >> >> Finally, an explicit default resource stickiness score of 0 can interfere >> with the placement of cloned resource instances. If you don't want any >> stickiness, then it's better to leave stickiness unset. That way, >> primitives will have a stickiness of 0, but clone instances will have a >> stickiness of 1. >> >> If adding stickiness does not resolve the issue, can you share your >> cluster configuration and some logs that show the issue happening? Off the >> top of my head I'm not sure why resources would start and stop on node 2 >> without moving away from node1, unless they're clone instances that are >> starting and then failing a monitor operation on node 2. >> >> On Wed, Aug 26, 2020 at 8:42 AM Citron Vert <citron_v...@hotmail.com> >> wrote: >> >>> Hello, >>> I am contacting you because I have a problem with my cluster and I >>> cannot find (nor understand) any information that can help me. >>> >>> I have a 2 nodes cluster (pacemaker, corosync, pcs) installed on CentOS >>> 7 with a set of configuration. >>> Everything seems to works fine, but here is what happens: >>> >>> - Node1 and Node2 are running well with Node1 as primary >>> - I reboot Node2 wich is passive (no changes on Node1) >>> - Node2 comes back in the cluster as passive >>> - corosync logs shows resources getting started then stopped on Node2 >>> - "crm_mon" command shows some ressources on Node1 getting restarted >>> >>> I don't understand how it should work. >>> If a node comes back, and becomes passive (since Node1 is running >>> primary), there is no reason for the resources to be started then stopped >>> on the new passive node ? >>> >>> One of my resources becomes unstable because it gets started and then >>> stoped too quickly on Node2, wich seems to make it restart on Node1 without >>> a failover. >>> >>> I tried several things and solution proposed by different sites and >>> forums but without success. >>> >>> >>> Is there a way so that the node, which joins the cluster as passive, >>> does not start its own resources ? >>> >>> >>> thanks in advance >>> >>> >>> Here are some information just in case : >>> $ rpm -qa | grep -E "corosync|pacemaker|pcs" >>> corosync-2.4.5-4.el7.x86_64 >>> pacemaker-cli-1.1.21-4.el7.x86_64 >>> pacemaker-1.1.21-4.el7.x86_64 >>> pcs-0.9.168-4.el7.centos.x86_64 >>> corosynclib-2.4.5-4.el7.x86_64 >>> pacemaker-libs-1.1.21-4.el7.x86_64 >>> pacemaker-cluster-libs-1.1.21-4.el7.x86_64 >>> >>> >>> <nvpair id="cib-bootstrap-options-stonith-enabled" name= >>> "stonith-enabled" value="false"/> >>> <nvpair id="cib-bootstrap-options-no-quorum-policy" name= >>> "no-quorum-policy" value="ignore"/> >>> <nvpair id="cib-bootstrap-options-dc-deadtime" name= >>> "dc-deadtime" value="120s"/> >>> <nvpair id="cib-bootstrap-options-have-watchdog" name= >>> "have-watchdog" value="false"/> >>> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" >>> value="1.1.21-4.el7-f14e36fd43"/> >>> <nvpair id="cib-bootstrap-options-cluster-infrastructure" name= >>> "cluster-infrastructure" value="corosync"/> >>> <nvpair id="cib-bootstrap-options-cluster-name" name= >>> "cluster-name" value="CLUSTER"/> >>> <nvpair id="cib-bootstrap-options-last-lrm-refresh" name= >>> "last-lrm-refresh" value="1598446314"/> >>> <nvpair id="cib-bootstrap-options-default-resource-stickiness" >>> name="default-resource-stickiness" value="0"/> >>> >>> >>> >>> >>> _______________________________________________ >>> Manage your subscription: >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> ClusterLabs home: https://www.clusterlabs.org/ >>> >> >> >> -- >> Regards, >> >> Reid Wahl, RHCA >> Software Maintenance Engineer, Red Hat >> CEE - Platform Support Delivery - ClusterHA >> >> > > -- > Regards, > > Reid Wahl, RHCA > Software Maintenance Engineer, Red Hat > CEE - Platform Support Delivery - ClusterHA > > -- Regards, Reid Wahl, RHCA Software Maintenance Engineer, Red Hat CEE - Platform Support Delivery - ClusterHA
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/