Re: [ClusterLabs] corosync service stopping

Reid Wahl Thu, 25 Apr 2024 20:54:51 -0700

Any logs from Pacemaker?

On Thu, Apr 25, 2024 at 3:46 AM Alexander Eastwood via Users
<users@clusterlabs.org> wrote:
>
> Hi all,
>
> I’m trying to get a better understanding of why our cluster - or specifically 
> corosync.service - entered a failed state. Here are all of the relevant 
> corosync logs from this event, with the last line showing when I manually 
> started the service again:
>
> Apr 23 11:06:10 [1295854] testcluster-c1 corosync notice  [CFG   ] Node 1 was 
> shut down by sysadmin
> Apr 23 11:06:10 [1295854] testcluster-c1 corosync notice  [SERV  ] Unloading 
> all Corosync service engines.
> Apr 23 11:06:10 [1295854] testcluster-c1 corosync info    [QB    ] 
> withdrawing server sockets
> Apr 23 11:06:10 [1295854] testcluster-c1 corosync notice  [SERV  ] Service 
> engine unloaded: corosync vote quorum service v1.0
> Apr 23 11:06:10 [1295854] testcluster-c1 corosync info    [QB    ] 
> withdrawing server sockets
> Apr 23 11:06:10 [1295854] testcluster-c1 corosync notice  [SERV  ] Service 
> engine unloaded: corosync configuration map access
> Apr 23 11:06:10 [1295854] testcluster-c1 corosync info    [QB    ] 
> withdrawing server sockets
> Apr 23 11:06:10 [1295854] testcluster-c1 corosync notice  [SERV  ] Service 
> engine unloaded: corosync configuration service
> Apr 23 11:06:10 [1295854] testcluster-c1 corosync info    [QB    ] 
> withdrawing server sockets
> Apr 23 11:06:10 [1295854] testcluster-c1 corosync notice  [SERV  ] Service 
> engine unloaded: corosync cluster closed process group service v1.01
> Apr 23 11:06:10 [1295854] testcluster-c1 corosync info    [QB    ] 
> withdrawing server sockets
> Apr 23 11:06:10 [1295854] testcluster-c1 corosync notice  [SERV  ] Service 
> engine unloaded: corosync cluster quorum service v0.1
> Apr 23 11:06:10 [1295854] testcluster-c1 corosync notice  [SERV  ] Service 
> engine unloaded: corosync profile loading service
> Apr 23 11:06:10 [1295854] testcluster-c1 corosync notice  [SERV  ] Service 
> engine unloaded: corosync resource monitoring service
> Apr 23 11:06:10 [1295854] testcluster-c1 corosync notice  [SERV  ] Service 
> engine unloaded: corosync watchdog service
> Apr 23 11:06:11 [1295854] testcluster-c1 corosync info    [KNET  ] host: 
> host: 1 (passive) best link: 0 (pri: 0)
> Apr 23 11:06:11 [1295854] testcluster-c1 corosync warning [KNET  ] host: 
> host: 1 has no active links
> Apr 23 11:06:11 [1295854] testcluster-c1 corosync notice  [MAIN  ] Corosync 
> Cluster Engine exiting normally
> Apr 23 13:18:36 [796246] testcluster-c1 corosync notice  [MAIN  ] Corosync 
> Cluster Engine 3.1.6 starting up
>
> The first line suggests that a manual shutdown of one of the cluster nodes, 
> however neither me nor any of my colleagues did this. The ‘sysadmin’ surely 
> must mean a person logging on to the server and running some command, as 
> opposed to a system process?
>
> Then in the 3rd row from the bottom there is the warning “host: host: 1 has 
> no active links” which is followed by “Corosync Cluster Engine exiting 
> normally”. Does this mean that the reason for the Cluster Engine exiting is 
> the fact that there are no active links?
>
> Finally, I am considering adding a systemd override file for the corosync 
> service with the following content:
>
> [Service]
> Restart=on-failure
>
> Is there any reason not to do this? And, given that the process exited 
> normally, would I need to use Restart=always instead?
>
> Many thanks
>
> Alex
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/




-- 
Regards,

Reid Wahl (He/Him)
Senior Software Engineer, Red Hat
RHEL High Availability - Pacemaker

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] corosync service stopping

Reply via email to