Nice! Keep us posted. --- Gilberto Nunes Ferreira (47) 99676-7530 - Whatsapp / Telegram
Em qui., 25 de mar. de 2021 às 16:13, jameslipski via pve-user <[email protected]> escreveu: > > > > > ---------- Forwarded message ---------- > From: jameslipski <[email protected]> > To: Proxmox VE user list <[email protected]> > Cc: > Bcc: > Date: Thu, 25 Mar 2021 19:12:38 +0000 > Subject: Re: [PVE-User] Not sure if this is a corosync issue. > Hi, > > Alright. I'll try. Since these nodes are in production it might be a while > till I get a chance to. > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > On Thursday, March 25, 2021 2:58 PM, Gilberto Ferreira > <[email protected]> wrote: > > > Hi > > You should consider, with carefully, update it to new versions. > > > > ------------------------------------------------------------------- > > > > Gilberto Nunes Ferreira > > (47) 99676-7530 - Whatsapp / Telegram > > > > Em qui., 25 de mar. de 2021 às 15:56, jameslipski via pve-user > > [email protected] escreveu: > > > > > ---------- Forwarded message ---------- > > > From: jameslipski [email protected] > > > To: Proxmox VE user list [email protected] > > > Cc: > > > Bcc: > > > Date: Thu, 25 Mar 2021 18:56:10 +0000 > > > Subject: Re: [PVE-User] Not sure if this is a corosync issue. > > > Hello, > > > All nodes are running the same version 6.0-4. Pveversion -v shows: > > > proxmox-ve: 6.0-2 (running kernel: 5.0.15-1-pve) > > > pve-manager: 6.0-4 (running version: 6.0-4/2a719255) > > > pve-kernel-5.0: 6.0-5 > > > pve-kernel-helper: 6.0-5 > > > pve-kernel-5.0.15-1-pve: 5.0.15-1 > > > ceph: 14.2.2-pve1 > > > ceph-fuse: 14.2.2-pve1 > > > corosync: 3.0.2-pve2 > > > criu: 3.11-3 > > > glusterfs-client: 5.5-3 > > > ksm-control-daemon: 1.3-1 > > > libjs-extjs: 6.0.1-10 > > > libknet1: 1.10-pve1 > > > libpve-access-control: 6.0-2 > > > libpve-apiclient-perl: 3.0-2 > > > libpve-common-perl: 6.0-2 > > > libpve-guest-common-perl: 3.0-1 > > > libpve-http-server-perl: 3.0-2 > > > libpve-storage-perl: 6.0-5 > > > libqb0: 1.0.5-1 > > > lvm2: 2.03.02-pve3 > > > lxc-pve: 3.1.0-61 > > > lxcfs: 3.0.3-pve60 > > > novnc-pve: 1.0.0-60 > > > proxmox-mini-journalreader: 1.1-1 > > > proxmox-widget-toolkit: 2.0-5 > > > pve-cluster: 6.0-4 > > > pve-container: 3.0-3 > > > pve-docs: 6.0-4 > > > pve-edk2-firmware: 2.20190614-1 > > > pve-firewall: 4.0-5 > > > pve-firmware: 3.0-2 > > > pve-ha-manager: 3.0-2 > > > pve-i18n: 2.0-2 > > > pve-qemu-kvm: 4.0.0-3 > > > pve-xtermjs: 3.13.2-1 > > > qemu-server: 6.0-5 > > > smartmontools: 7.0-pve2 > > > spiceterm: 3.1-1 > > > vncterm: 1.6-1 > > > zfsutils-linux: 0.8.1-pve1 > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > > > On Thursday, March 25, 2021 2:26 PM, Gilberto Ferreira > > > [email protected] wrote: > > > > > > > What pve version? > > > > Is this an update from previously PVE Versions??? > > > > > > > > Gilberto Nunes Ferreira > > > > (47) 99676-7530 - Whatsapp / Telegram > > > > Em qui., 25 de mar. de 2021 às 15:19, jameslipski via pve-user > > > > [email protected] escreveu: > > > > > > > > > ---------- Forwarded message ---------- > > > > > From: jameslipski [email protected] > > > > > To: Proxmox VE user list [email protected] > > > > > Cc: > > > > > Bcc: > > > > > Date: Thu, 25 Mar 2021 18:02:25 +0000 > > > > > Subject: Not sure if this is a corosync issue. > > > > > Greetings, > > > > > Today, one of my nodes seems to have rebooted randomly (node in > > > > > question has been in a production environment for several months; no > > > > > issues since it was added to the cluster). During my investigation, > > > > > the following is what I see before the crash; unfortunately, I'm > > > > > having a little bit of an issue deciphering this: > > > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: > > > > > rrdentry_hash_set: assertion 'data[len-1] == 0' failed > > > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [dcdb] crit: cpg_dispatch > > > > > failed: 2 > > > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [dcdb] crit: cpg_leave failed: 2 > > > > > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Main process > > > > > exited, code=killed, status=11/SEGV > > > > > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Failed with > > > > > result 'signal'. > > > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [confdb] crit: cmap_dispatch > > > > > failed: 2 > > > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [quorum] crit: quorum_dispatch > > > > > failed: 2 > > > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] notice: node lost quorum > > > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: cpg_dispatch > > > > > failed: 2 > > > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: cpg_leave failed: > > > > > 2 > > > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize > > > > > failed: 2 > > > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [quorum] crit: can't initialize > > > > > service > > > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize > > > > > failed: 2 > > > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [confdb] crit: can't initialize > > > > > service > > > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] notice: start cluster > > > > > connection > > > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize > > > > > failed: 2 > > > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] crit: can't initialize > > > > > service > > > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [status] notice: start cluster > > > > > connection > > > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [status] crit: cpg_initialize > > > > > failed: 2 > > > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [status] crit: can't initialize > > > > > service > > > > > Mar 25 12:54:56 node09 pve-ha-crm[2161]: status change slave => > > > > > wait_for_quorum > > > > > Mar 25 12:55:00 node09 systemd[1]: Starting Proxmox VE replication > > > > > runner... > > > > > Mar 25 12:55:00 node09 pve-ha-lrm[2169]: lost lock > > > > > 'ha_agent_node09_lock - cfs lock update failed - Permission denied > > > > > Mar 25 12:55:01 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize > > > > > failed: 2 > > > > > Mar 25 12:55:01 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize > > > > > failed: 2 > > > > > Mar 25 12:55:01 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize > > > > > failed: 2 > > > > > Mar 25 12:55:01 node09 pmxcfs[5419]: [status] crit: cpg_initialize > > > > > failed: 2 > > > > > Mar 25 12:55:01 node09 pvesr[2755547]: trying to acquire cfs lock > > > > > 'file-replication_cfg' ... > > > > > Mar 25 12:55:01 node09 CRON[2755555]: (root) CMD (command -v > > > > > debian-sa1 > /dev/null && debian-sa1 1 1) > > > > > Mar 25 12:55:02 node09 pvesr[2755547]: trying to acquire cfs lock > > > > > 'file-replication_cfg' ... > > > > > Mar 25 12:55:03 node09 pvesr[2755547]: trying to acquire cfs lock > > > > > 'file-replication_cfg' ... > > > > > Mar 25 12:55:04 node09 pvesr[2755547]: trying to acquire cfs lock > > > > > 'file-replication_cfg' ... > > > > > Mar 25 12:55:05 node09 pvesr[2755547]: trying to acquire cfs lock > > > > > 'file-replication_cfg' ... > > > > > Mar 25 12:55:05 node09 pve-ha-lrm[2169]: status change active => > > > > > lost_agent_lock > > > > > Mar 25 12:55:06 node09 pvesr[2755547]: trying to acquire cfs lock > > > > > 'file-replication_cfg' ... > > > > > Mar 25 12:55:07 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize > > > > > failed: 2 > > > > > Mar 25 12:55:07 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize > > > > > failed: 2 > > > > > Mar 25 12:55:07 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize > > > > > failed: 2 > > > > > Mar 25 12:55:07 node09 pmxcfs[5419]: [status] crit: cpg_initialize > > > > > failed: 2 > > > > > Mar 25 12:55:07 node09 pvesr[2755547]: trying to acquire cfs lock > > > > > 'file-replication_cfg' ... > > > > > Mar 25 12:55:08 node09 pvesr[2755547]: trying to acquire cfs lock > > > > > 'file-replication_cfg' ... > > > > > Mar 25 12:55:09 node09 pvesr[2755547]: trying to acquire cfs lock > > > > > 'file-replication_cfg' ... > > > > > Mar 25 12:55:10 node09 pvesr[2755547]: error with cfs lock > > > > > 'file-replication_cfg': no quorum! > > > > > Mar 25 12:55:10 node09 systemd[1]: pvesr.service: Main process > > > > > exited, code=exited, status=13/n/a > > > > > Mar 25 12:55:10 node09 systemd[1]: pvesr.service: Failed with result > > > > > 'exit-code'. > > > > > Mar 25 12:55:10 node09 systemd[1]: Failed to start Proxmox VE > > > > > replication runner. > > > > > Mar 25 12:55:13 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize > > > > > failed: 2 > > > > > Mar 25 12:55:13 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize > > > > > failed: 2 > > > > > Mar 25 12:55:13 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize > > > > > failed: 2 > > > > > Mar 25 12:55:13 node09 pmxcfs[5419]: [status] crit: cpg_initialize > > > > > failed: 2 > > > > > Mar 25 12:55:19 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize > > > > > failed: 2 > > > > > Mar 25 12:55:19 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize > > > > > failed: 2 > > > > > Mar 25 12:55:19 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize > > > > > failed: 2 > > > > > Mar 25 12:55:19 node09 pmxcfs[5419]: [status] crit: cpg_initialize > > > > > failed: 2 > > > > > Mar 25 12:55:25 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize > > > > > failed: 2 > > > > > Mar 25 12:55:25 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize > > > > > failed: 2 > > > > > Mar 25 12:55:25 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize > > > > > failed: 2 > > > > > Mar 25 12:55:25 node09 pmxcfs[5419]: [status] crit: cpg_initialize > > > > > failed: 2 > > > > > Mar 25 12:55:31 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize > > > > > failed: 2 > > > > > Mar 25 12:55:31 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize > > > > > failed: 2 > > > > > Mar 25 12:55:31 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize > > > > > failed: 2 > > > > > Mar 25 12:55:31 node09 pmxcfs[5419]: [status] crit: cpg_initialize > > > > > failed: 2 > > > > > I see that corosync experienced the following: > > > > > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Main process > > > > > exited, code=killed, status=11/SEGV > > > > > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Failed with > > > > > result 'signal'. > > > > > and I'm not too sure why. Also not sure if that alone took down the > > > > > system. Any help is much appreciated. If any additional information > > > > > is needed, please let us know. Thank you. > > > > > ---------- Forwarded message ---------- > > > > > From: jameslipski via pve-user [email protected] > > > > > To: Proxmox VE user list [email protected] > > > > > Cc: jameslipski [email protected] > > > > > Bcc: > > > > > Date: Thu, 25 Mar 2021 18:02:25 +0000 > > > > > Subject: [PVE-User] Not sure if this is a corosync issue. > > > > > pve-user mailing list > > > > > [email protected] > > > > > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > > > > ---------- Forwarded message ---------- > > > From: jameslipski via pve-user [email protected] > > > To: Proxmox VE user list [email protected] > > > Cc: jameslipski [email protected] > > > Bcc: > > > Date: Thu, 25 Mar 2021 18:56:10 +0000 > > > Subject: Re: [PVE-User] Not sure if this is a corosync issue. > > > > > > pve-user mailing list > > > [email protected] > > > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > > > > > ---------- Forwarded message ---------- > From: jameslipski via pve-user <[email protected]> > To: Proxmox VE user list <[email protected]> > Cc: jameslipski <[email protected]> > Bcc: > Date: Thu, 25 Mar 2021 19:12:38 +0000 > Subject: Re: [PVE-User] Not sure if this is a corosync issue. > _______________________________________________ > pve-user mailing list > [email protected] > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user _______________________________________________ pve-user mailing list [email protected] https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
