What pve version? Is this an update from previously PVE Versions???
--- Gilberto Nunes Ferreira (47) 99676-7530 - Whatsapp / Telegram Em qui., 25 de mar. de 2021 às 15:19, jameslipski via pve-user <[email protected]> escreveu: > > > > > ---------- Forwarded message ---------- > From: jameslipski <[email protected]> > To: Proxmox VE user list <[email protected]> > Cc: > Bcc: > Date: Thu, 25 Mar 2021 18:02:25 +0000 > Subject: Not sure if this is a corosync issue. > Greetings, > > Today, one of my nodes seems to have rebooted randomly (node in question has > been in a production environment for several months; no issues since it was > added to the cluster). During my investigation, the following is what I see > before the crash; unfortunately, I'm having a little bit of an issue > deciphering this: > > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: rrdentry_hash_set: > assertion 'data[len-1] == 0' failed > Mar 25 12:54:54 node09 pmxcfs[5419]: [dcdb] crit: cpg_dispatch failed: 2 > Mar 25 12:54:54 node09 pmxcfs[5419]: [dcdb] crit: cpg_leave failed: 2 > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Main process exited, > code=killed, status=11/SEGV > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Failed with result > 'signal'. > Mar 25 12:54:54 node09 pmxcfs[5419]: [confdb] crit: cmap_dispatch failed: 2 > Mar 25 12:54:54 node09 pmxcfs[5419]: [quorum] crit: quorum_dispatch failed: 2 > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] notice: node lost quorum > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: cpg_dispatch failed: 2 > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: cpg_leave failed: 2 > Mar 25 12:54:55 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: > 2 > Mar 25 12:54:55 node09 pmxcfs[5419]: [quorum] crit: can't initialize service > Mar 25 12:54:55 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2 > Mar 25 12:54:55 node09 pmxcfs[5419]: [confdb] crit: can't initialize service > Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] notice: start cluster connection > Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2 > Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] crit: can't initialize service > Mar 25 12:54:55 node09 pmxcfs[5419]: [status] notice: start cluster connection > Mar 25 12:54:55 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2 > Mar 25 12:54:55 node09 pmxcfs[5419]: [status] crit: can't initialize service > Mar 25 12:54:56 node09 pve-ha-crm[2161]: status change slave => > wait_for_quorum > Mar 25 12:55:00 node09 systemd[1]: Starting Proxmox VE replication runner... > Mar 25 12:55:00 node09 pve-ha-lrm[2169]: lost lock 'ha_agent_node09_lock - > cfs lock update failed - Permission denied > Mar 25 12:55:01 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: > 2 > Mar 25 12:55:01 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2 > Mar 25 12:55:01 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2 > Mar 25 12:55:01 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2 > Mar 25 12:55:01 node09 pvesr[2755547]: trying to acquire cfs lock > 'file-replication_cfg' ... > Mar 25 12:55:01 node09 CRON[2755555]: (root) CMD (command -v debian-sa1 > > /dev/null && debian-sa1 1 1) > Mar 25 12:55:02 node09 pvesr[2755547]: trying to acquire cfs lock > 'file-replication_cfg' ... > Mar 25 12:55:03 node09 pvesr[2755547]: trying to acquire cfs lock > 'file-replication_cfg' ... > Mar 25 12:55:04 node09 pvesr[2755547]: trying to acquire cfs lock > 'file-replication_cfg' ... > Mar 25 12:55:05 node09 pvesr[2755547]: trying to acquire cfs lock > 'file-replication_cfg' ... > Mar 25 12:55:05 node09 pve-ha-lrm[2169]: status change active => > lost_agent_lock > Mar 25 12:55:06 node09 pvesr[2755547]: trying to acquire cfs lock > 'file-replication_cfg' ... > Mar 25 12:55:07 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: > 2 > Mar 25 12:55:07 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2 > Mar 25 12:55:07 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2 > Mar 25 12:55:07 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2 > Mar 25 12:55:07 node09 pvesr[2755547]: trying to acquire cfs lock > 'file-replication_cfg' ... > Mar 25 12:55:08 node09 pvesr[2755547]: trying to acquire cfs lock > 'file-replication_cfg' ... > Mar 25 12:55:09 node09 pvesr[2755547]: trying to acquire cfs lock > 'file-replication_cfg' ... > Mar 25 12:55:10 node09 pvesr[2755547]: error with cfs lock > 'file-replication_cfg': no quorum! > Mar 25 12:55:10 node09 systemd[1]: pvesr.service: Main process exited, > code=exited, status=13/n/a > Mar 25 12:55:10 node09 systemd[1]: pvesr.service: Failed with result > 'exit-code'. > Mar 25 12:55:10 node09 systemd[1]: Failed to start Proxmox VE replication > runner. > Mar 25 12:55:13 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: > 2 > Mar 25 12:55:13 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2 > Mar 25 12:55:13 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2 > Mar 25 12:55:13 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2 > Mar 25 12:55:19 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: > 2 > Mar 25 12:55:19 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2 > Mar 25 12:55:19 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2 > Mar 25 12:55:19 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2 > Mar 25 12:55:25 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: > 2 > Mar 25 12:55:25 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2 > Mar 25 12:55:25 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2 > Mar 25 12:55:25 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2 > Mar 25 12:55:31 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: > 2 > Mar 25 12:55:31 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2 > Mar 25 12:55:31 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2 > Mar 25 12:55:31 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2 > > I see that corosync experienced the following: > > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Main process exited, > code=killed, status=11/SEGV > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Failed with result > 'signal'. > > and I'm not too sure why. Also not sure if that alone took down the system. > Any help is much appreciated. If any additional information is needed, please > let us know. Thank you. > > > ---------- Forwarded message ---------- > From: jameslipski via pve-user <[email protected]> > To: Proxmox VE user list <[email protected]> > Cc: jameslipski <[email protected]> > Bcc: > Date: Thu, 25 Mar 2021 18:02:25 +0000 > Subject: [PVE-User] Not sure if this is a corosync issue. > _______________________________________________ > pve-user mailing list > [email protected] > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user _______________________________________________ pve-user mailing list [email protected] https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
