What pve version?
Is this an update from previously PVE Versions???

---
Gilberto Nunes Ferreira
(47) 99676-7530 - Whatsapp / Telegram





Em qui., 25 de mar. de 2021 às 15:19, jameslipski via pve-user
<[email protected]> escreveu:
>
>
>
>
> ---------- Forwarded message ----------
> From: jameslipski <[email protected]>
> To: Proxmox VE user list <[email protected]>
> Cc:
> Bcc:
> Date: Thu, 25 Mar 2021 18:02:25 +0000
> Subject: Not sure if this is a corosync issue.
> Greetings,
>
> Today, one of my nodes seems to have rebooted randomly (node in question has 
> been in a production environment for several months; no issues since it was 
> added to the cluster). During my investigation, the following is what I see 
> before the crash; unfortunately, I'm having a little bit of an issue 
> deciphering this:
>
> Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: rrdentry_hash_set: 
> assertion 'data[len-1] == 0' failed
> Mar 25 12:54:54 node09 pmxcfs[5419]: [dcdb] crit: cpg_dispatch failed: 2
> Mar 25 12:54:54 node09 pmxcfs[5419]: [dcdb] crit: cpg_leave failed: 2
> Mar 25 12:54:54 node09 systemd[1]: corosync.service: Main process exited, 
> code=killed, status=11/SEGV
> Mar 25 12:54:54 node09 systemd[1]: corosync.service: Failed with result 
> 'signal'.
> Mar 25 12:54:54 node09 pmxcfs[5419]: [confdb] crit: cmap_dispatch failed: 2
> Mar 25 12:54:54 node09 pmxcfs[5419]: [quorum] crit: quorum_dispatch failed: 2
> Mar 25 12:54:54 node09 pmxcfs[5419]: [status] notice: node lost quorum
> Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: cpg_dispatch failed: 2
> Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: cpg_leave failed: 2
> Mar 25 12:54:55 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 
> 2
> Mar 25 12:54:55 node09 pmxcfs[5419]: [quorum] crit: can't initialize service
> Mar 25 12:54:55 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> Mar 25 12:54:55 node09 pmxcfs[5419]: [confdb] crit: can't initialize service
> Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] notice: start cluster connection
> Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] crit: can't initialize service
> Mar 25 12:54:55 node09 pmxcfs[5419]: [status] notice: start cluster connection
> Mar 25 12:54:55 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> Mar 25 12:54:55 node09 pmxcfs[5419]: [status] crit: can't initialize service
> Mar 25 12:54:56 node09 pve-ha-crm[2161]: status change slave => 
> wait_for_quorum
> Mar 25 12:55:00 node09 systemd[1]: Starting Proxmox VE replication runner...
> Mar 25 12:55:00 node09 pve-ha-lrm[2169]: lost lock 'ha_agent_node09_lock - 
> cfs lock update failed - Permission denied
> Mar 25 12:55:01 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 
> 2
> Mar 25 12:55:01 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> Mar 25 12:55:01 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> Mar 25 12:55:01 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> Mar 25 12:55:01 node09 pvesr[2755547]: trying to acquire cfs lock 
> 'file-replication_cfg' ...
> Mar 25 12:55:01 node09 CRON[2755555]: (root) CMD (command -v debian-sa1 > 
> /dev/null && debian-sa1 1 1)
> Mar 25 12:55:02 node09 pvesr[2755547]: trying to acquire cfs lock 
> 'file-replication_cfg' ...
> Mar 25 12:55:03 node09 pvesr[2755547]: trying to acquire cfs lock 
> 'file-replication_cfg' ...
> Mar 25 12:55:04 node09 pvesr[2755547]: trying to acquire cfs lock 
> 'file-replication_cfg' ...
> Mar 25 12:55:05 node09 pvesr[2755547]: trying to acquire cfs lock 
> 'file-replication_cfg' ...
> Mar 25 12:55:05 node09 pve-ha-lrm[2169]: status change active => 
> lost_agent_lock
> Mar 25 12:55:06 node09 pvesr[2755547]: trying to acquire cfs lock 
> 'file-replication_cfg' ...
> Mar 25 12:55:07 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 
> 2
> Mar 25 12:55:07 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> Mar 25 12:55:07 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> Mar 25 12:55:07 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> Mar 25 12:55:07 node09 pvesr[2755547]: trying to acquire cfs lock 
> 'file-replication_cfg' ...
> Mar 25 12:55:08 node09 pvesr[2755547]: trying to acquire cfs lock 
> 'file-replication_cfg' ...
> Mar 25 12:55:09 node09 pvesr[2755547]: trying to acquire cfs lock 
> 'file-replication_cfg' ...
> Mar 25 12:55:10 node09 pvesr[2755547]: error with cfs lock 
> 'file-replication_cfg': no quorum!
> Mar 25 12:55:10 node09 systemd[1]: pvesr.service: Main process exited, 
> code=exited, status=13/n/a
> Mar 25 12:55:10 node09 systemd[1]: pvesr.service: Failed with result 
> 'exit-code'.
> Mar 25 12:55:10 node09 systemd[1]: Failed to start Proxmox VE replication 
> runner.
> Mar 25 12:55:13 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 
> 2
> Mar 25 12:55:13 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> Mar 25 12:55:13 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> Mar 25 12:55:13 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> Mar 25 12:55:19 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 
> 2
> Mar 25 12:55:19 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> Mar 25 12:55:19 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> Mar 25 12:55:19 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> Mar 25 12:55:25 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 
> 2
> Mar 25 12:55:25 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> Mar 25 12:55:25 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> Mar 25 12:55:25 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> Mar 25 12:55:31 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 
> 2
> Mar 25 12:55:31 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> Mar 25 12:55:31 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> Mar 25 12:55:31 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
>
> I see that corosync experienced the following:
>
> Mar 25 12:54:54 node09 systemd[1]: corosync.service: Main process exited, 
> code=killed, status=11/SEGV
> Mar 25 12:54:54 node09 systemd[1]: corosync.service: Failed with result 
> 'signal'.
>
> and I'm not too sure why. Also not sure if that alone took down the system. 
> Any help is much appreciated. If any additional information is needed, please 
> let us know. Thank you.
>
>
> ---------- Forwarded message ----------
> From: jameslipski via pve-user <[email protected]>
> To: Proxmox VE user list <[email protected]>
> Cc: jameslipski <[email protected]>
> Bcc:
> Date: Thu, 25 Mar 2021 18:02:25 +0000
> Subject: [PVE-User] Not sure if this is a corosync issue.
> _______________________________________________
> pve-user mailing list
> [email protected]
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to