Classified as: {OPEN}
Well… why do you say that « Well if corosync isn't there that this is to be expected and pacemaker won't recover corosync.”? In my mind, Corosync is managed by Pacemaker as any other cluster resource and the "pacemakerd: recover properly from > Corosync crash" fix implemented in version 2.1.2 seems confirm that. {OPEN} De : NOLIBOS Christophe Envoyé : jeudi 18 avril 2024 17:56 À : 'Klaus Wenninger' <kwenn...@redhat.com>; Cluster Labs - All topics related to open-source clustering welcomed <users@clusterlabs.org> Cc : Ken Gaillot <kgail...@redhat.com> Objet : RE: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix Classified as: {OPEN} [~]$ systemctl status corosync ● corosync.service - Corosync Cluster Engine Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled; vendor preset: disabled) Active: failed (Result: signal) since Thu 2024-04-18 14:58:42 UTC; 53min ago Docs: man:corosync man:corosync.conf man:corosync_overview Process: 2027251 ExecStop=/usr/sbin/corosync-cfgtool -H --force (code=exited, status=0/SUCCESS) Process: 1324906 ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS (code=killed, signal=KILL) Main PID: 1324906 (code=killed, signal=KILL) Apr 18 13:16:04 - corosync[1324906]: [QUORUM] Sync joined[1]: 1 Apr 18 13:16:04 - corosync[1324906]: [TOTEM ] A new membership (1.1c8) was formed. Members joined: 1 Apr 18 13:16:04 - corosync[1324906]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Apr 18 13:16:04 - corosync[1324906]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Apr 18 13:16:04 - corosync[1324906]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Apr 18 13:16:04 - corosync[1324906]: [QUORUM] Members[1]: 1 Apr 18 13:16:04 - corosync[1324906]: [MAIN ] Completed service synchronization, ready to provide service. Apr 18 13:16:04 - systemd[1]: Started Corosync Cluster Engine. Apr 18 14:58:42 - systemd[1]: corosync.service: Main process exited, code=killed, status=9/KILL Apr 18 14:58:42 - systemd[1]: corosync.service: Failed with result 'signal'. [~]$ De : Klaus Wenninger <kwenn...@redhat.com <mailto:kwenn...@redhat.com> > Envoyé : jeudi 18 avril 2024 17:43 À : Cluster Labs - All topics related to open-source clustering welcomed <users@clusterlabs.org <mailto:users@clusterlabs.org> > Cc : Ken Gaillot <kgail...@redhat.com <mailto:kgail...@redhat.com> >; NOLIBOS Christophe <christophe.noli...@thalesgroup.com <mailto:christophe.noli...@thalesgroup.com> > Objet : Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix On Thu, Apr 18, 2024 at 5:07 PM NOLIBOS Christophe via Users <users@clusterlabs.org <mailto:users@clusterlabs.org> > wrote: Classified as: {OPEN} I'm using RedHat 8.8 (4.18.0-477.21.1.el8_8.x86_64). When I kill Corosync, no new corosync process is created and pacemaker is in failure. The only solution is to restart the pacemaker service. [~]$ pcs status Error: unable to get cib [~]$ [~]$systemctl status pacemaker ● pacemaker.service - Pacemaker High Availability Cluster Manager Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2024-04-18 13:16:04 UTC; 1h 43min ago Docs: man:pacemakerd https://clusterlabs.org/pacemaker/doc/ Main PID: 1324923 (pacemakerd) Tasks: 91 Memory: 132.1M CGroup: /system.slice/pacemaker.service ... Apr 18 14:59:02 - pacemakerd[1324923]: crit: Could not connect to Corosync CFG: CS_ERR_LIBRARY Apr 18 14:59:03 - pacemakerd[1324923]: crit: Could not connect to Corosync CFG: CS_ERR_LIBRARY Apr 18 14:59:04 - pacemakerd[1324923]: crit: Could not connect to Corosync CFG: CS_ERR_LIBRARY Apr 18 14:59:05 - pacemakerd[1324923]: crit: Could not connect to Corosync CFG: CS_ERR_LIBRARY Apr 18 14:59:06 - pacemakerd[1324923]: crit: Could not connect to Corosync CFG: CS_ERR_LIBRARY Apr 18 14:59:07 - pacemakerd[1324923]: crit: Could not connect to Corosync CFG: CS_ERR_LIBRARY Apr 18 14:59:08 - pacemakerd[1324923]: crit: Could not connect to Corosync CFG: CS_ERR_LIBRARY Apr 18 14:59:09 - pacemakerd[1324923]: crit: Could not connect to Corosync CFG: CS_ERR_LIBRARY Apr 18 14:59:10 - pacemakerd[1324923]: crit: Could not connect to Corosync CFG: CS_ERR_LIBRARY Apr 18 14:59:11 - pacemakerd[1324923]: crit: Could not connect to Corosync CFG: CS_ERR_LIBRARY [~]$ Well if corosync isn't there that this is to be expected and pacemaker won't recover corosync. Can you check what systemd thinks about corosync (status/journal). Klaus {OPEN} -----Message d'origine----- De : Ken Gaillot <kgail...@redhat.com <mailto:kgail...@redhat.com> > Envoyé : jeudi 18 avril 2024 16:40 À : Cluster Labs - All topics related to open-source clustering welcomed <users@clusterlabs.org <mailto:users@clusterlabs.org> > Cc : NOLIBOS Christophe <christophe.noli...@thalesgroup.com <mailto:christophe.noli...@thalesgroup.com> > Objet : Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix What OS are you using? Does it use systemd? What does happen when you kill Corosync? On Thu, 2024-04-18 at 13:13 +0000, NOLIBOS Christophe via Users wrote: > Classified as: {OPEN} > > Dear All, > > I have a question about the "pacemakerd: recover properly from > Corosync crash" fix implemented in version 2.1.2. > I have observed the issue when testing pacemaker version 2.0.5, just > by killing the ‘corosync’ process: Corosync was not recovered. > > I am using now pacemaker version 2.1.5-8. > Doing the same test, I have the same result: Corosync is still not > recovered. > > Please confirm the "pacemakerd: recover properly from Corosync crash" > fix implemented in version 2.1.2 covers this scenario. > If it is, did I miss something in the configuration of my cluster? > > Best Regard. > > Christophe. > > > > {OPEN} > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ -- Ken Gaillot <kgail...@redhat.com <mailto:kgail...@redhat.com> > _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ {OPEN}
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/