Are you using LACP or linux bonding on node2,3 for the VM + cluster traffic ?
Are you using VLANs to separate VM/cluster traffic ? Have you checked multicast notes in the pve wiki ? Have you tried UDPU instead of multicast as last option ? No idea about missing rrd graphs... On Thu, 16 May 2019 at 16:41, Eneko Lacunza <elacu...@binovo.es> wrote: > Hi all, > > In a 3-node cluster, we're experiencing a strange clustering problem. > > Sometimes, the first node drops out of quorum, usually for some hours, > only to return back to quorum later. > > During the last 2 weeks, this has happened 7 times. > > Additionally, one time the second and third node dropped out of quorum, > and soon after first and third node reached quorum. Second node rejoined > after a manual restart of pve-cluster. > > The strange thing (at least for me) is that 2nd and 3rd node have lost > rrd data around the times 1st node was out (no graphics at GUI for those > hours). 1st node has all rrd data, graphics are complete. > > I understand that we could have a network problem (we're trying to catch > the problem live again for additional tests...), but why is rrd data > missing on cluster-joined nodes? Any idea? > > > Servers: > node1 - 1xE3-1240v6 4c8t - 64GB RAM - 1x10G for VM+cluster, 2x1G for > storage > node2 - 2xE5507 4c - 96GB RAM - 2x1G for VM + cluster, 2x1G > for storage > node3 - 2xE5507 4c - 96GB RAM - 2x1G for VM + cluster, 2x1G > for storage > > VM storage is EMC VNXe3200 > Switch is HP 5406zl with 5 switch-modules. > - Node1 is connected to module E (8x10G), > - node2 and node3 are connected to module A (24x1G). > Storage switches(2) are Cisco Catalyst 2960G > > Nodes have plenty of free RAM (usage below 50%), use less than 10-20% > max network, CPU mean use is below 20%) > > (for all three nodes) > # pveversion -v > proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve) > pve-manager: 5.3-5 (running version: 5.3-5/97ae681d) > pve-kernel-4.15: 5.2-12 > pve-kernel-4.15.18-9-pve: 4.15.18-30 > corosync: 2.4.4-pve1 > criu: 2.11.1-1~bpo90 > glusterfs-client: 3.8.8-1 > ksm-control-daemon: 1.2-2 > libjs-extjs: 6.0.1-2 > libpve-access-control: 5.1-3 > libpve-apiclient-perl: 2.0-5 > libpve-common-perl: 5.0-43 > libpve-guest-common-perl: 2.0-18 > libpve-http-server-perl: 2.0-11 > libpve-storage-perl: 5.0-33 > libqb0: 1.0.3-1~bpo9 > lvm2: 2.02.168-pve6 > lxc-pve: 3.0.2+pve1-5 > lxcfs: 3.0.2-2 > novnc-pve: 1.0.0-2 > proxmox-widget-toolkit: 1.0-22 > pve-cluster: 5.0-31 > pve-container: 2.0-31 > pve-docs: 5.3-1 > pve-edk2-firmware: 1.20181023-1 > pve-firewall: 3.0-16 > pve-firmware: 2.0-6 > pve-ha-manager: 2.0-5 > pve-i18n: 1.0-9 > pve-libspice-server1: 0.14.1-1 > pve-qemu-kvm: 2.12.1-1 > pve-xtermjs: 1.0-5 > qemu-server: 5.0-43 > smartmontools: 6.5+svn4324-1 > spiceterm: 3.0-5 > vncterm: 1.5-3 > zfsutils-linux: 0.7.12-pve1~bpo1 > > > Thanks a lot > Eneko > > -- > Zuzendari Teknikoa / Director Técnico > Binovo IT Human Project, S.L. > Telf. 943569206 > Astigarraga bidea 2 > <https://maps.google.com/?q=Astigarraga+bidea+2&entry=gmail&source=g>, 2º > izq. oficina 11; 20180 Oiartzun (Gipuzkoa) > www.binovo.es > > _______________________________________________ > pve-user mailing list > pve-user@pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > _______________________________________________ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user