--- Begin Message ---
Hi Eneko
If nodes had only one 1G interface, would you also une RRP? (one ring on 1G and
the other on 10G bond)
That’s pretty unlikely. Usually they come in pairs ;)
But yes, in that hypothetical case I’d use the available physical interface for
ring1 and build ring2 from a tagged interface.
For corosync interfaces I prefer two separate physical interfaces (simple,
resilient).
Bonding and tagging adds a layer of complexity you don’t want on a cluster
heartbeat.
Find below an actual configuration of a cluster with one node having just 2
interfaces while the other nodes all have 4.
The 2 interfaces are configured in an HA bond like yours and the corosync rings
are stacked on it as tagged interfaces in their specific VLANs.
VLAN684 exists on switch1 only and VLAN685 exists on switch2 only.
The most resilient solution under the circumstances given and has been working
like a charm for several years now.
Regards
Stefan
NODE1 - 4 interfaces
====================
iface eno1 inet manual
#Gb1 - Trunk
iface eno2 inet manual
#Gb2 - Trunk
auto eno3
iface eno3 inet static
address 192.168.84.1
netmask 255.255.255.0
#Gb3 - COROSYNC1 - VLAN684
auto eno4
iface eno4 inet static
address 192.168.85.1
netmask 255.255.255.0
#Gb4 - COROSYNC2 - VLAN685
auto bond0
iface bond0 inet manual
slaves eno1 eno2
bond_miimon 100
bond_mode active-backup
#HA Bundle Gb1/Gb2 - Trunk
NODE3 - 2 interfaces
====================
iface eno1 inet manual
#Gb1 - Trunk
iface eno2 inet manual
#Gb2 - Trunk
auto bond0
iface bond0 inet manual
slaves eno1 eno2
bond_miimon 100
bond_mode active-backup
#HA Bundle Gb1/Gb2 - Trunk
auto bond0.684
iface bond0.684 inet static
address 192.168.84.3
netmask 255.255.255.0
#COROSYNC1 - VLAN684
auto bond0.685
iface bond0.685 inet static
address 192.168.85.3
netmask 255.255.255.0
#COROSYNC2 - VLAN685
On Apr 14, 2021, at 16:07, Eneko Lacunza
<[email protected]<mailto:[email protected]>> wrote:
Hi Stefan,
Thanks for your advice. Seems a really good use for otherwise unused 1G ports
so I'll look into configuring that.
If nodes had only one 1G interface, would you also une RRP? (one ring on 1G and
the other on 10G bond)
Thanks
El 14/4/21 a las 15:57, Stefan M. Radman escribió:
Hi Eneko
That’s a nice setup and I bet it works well but you should do some hand-tuning
to increase resilience.
Are the unused eno1 and eno2 interfaces on-board 1GbE copper interfaces?
If that’s the case I’d strongly recommend to turn them into dedicated untagged
interfaces for the cluster traffic, running on two separate “rings".
https://pve.proxmox.com/wiki/Separate_Cluster_Network<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpve.proxmox.com%2Fwiki%2FSeparate_Cluster_Network&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380150598%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vWo26hj0ANMu6mtkk9WhdbKA0TJ0%2FgalkowwssJqmjA%3D&reserved=0>
https://pve.proxmox.com/wiki/Separate_Cluster_Network#Redundant_Ring_Protocol<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpve.proxmox.com%2Fwiki%2FSeparate_Cluster_Network%23Redundant_Ring_Protocol&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380160591%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=SRJn5cb7yUPxuUTOFRnUxiyBjtCindxzPjpNBMlYuf4%3D&reserved=0>
Create two corosync rings, using isolated VLANs on your two switches e.g.
VLAN4001 on Switch1 and VLAN4002 on Switch2.
eno1 => Switch1 => VLAN4001
eno2 => Switch2 => VLAN4002
Restrict VLAN4001 to the access ports where the eno1 interfaces are connected.
Prune VLAN4001 from ALL trunks.
Restrict VLAN4001 to the access ports where the eno2 interfaces are connected.
Prune VLAN4002 from ALL trunks.
Assign the eno1 and eno2 interfaces to two separate subnets and you are done.
With separate rings you don’t even have to stop your cluster while migrating
corosync to the new subnets.
Just do them one-by-one.
With corosync running on two separate rings isolated from the rest of your
network you should not see any further node fencing.
Stefan
On Apr 14, 2021, at 15:18, Eneko Lacunza
<[email protected]<mailto:[email protected]>> wrote:
Hi Stefan,
El 14/4/21 a las 13:22, Stefan M. Radman escribió:
Hi Eneko
Do you have separate physical interfaces for the cluster (corosync) traffic?
No.
Do you have them on separate VLANs on your switches?
Onyl Ceph traffic is on VLAN91, the rest is untagged.
Are you running 1 or 2 corosync rings?
This is standard... no hand tuning:
nodelist {
node {
name: proxmox1
nodeid: 2
quorum_votes: 1
ring0_addr: 192.168.90.11
}
node {
name: proxmox2
nodeid: 1
quorum_votes: 1
ring0_addr: 192.168.90.12
}
node {
name: proxmox3
nodeid: 3
quorum_votes: 1
ring0_addr: 192.168.90.13
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: CLUSTERNAME
config_version: 3
interface {
linknumber: 0
}
ip_version: ipv4-6
secauth: on
version: 2
}
Please post your /etc/network/interfaces and explain which interface connects
where.
auto lo
iface lo inet loopback
iface ens2f0np0 inet manual
# Switch2
iface ens2f1np1 inet manual
# Switch1
iface eno1 inet manual
iface eno2 inet manual
auto bond0
iface bond0 inet manual
bond-slaves ens2f0np0 ens2f1np1
bond-miimon 100
bond-mode active-backup
bond-primary ens2f0np1
auto bond0.91
iface bond0.91 inet static
address 192.168.91.11
#Ceph
auto vmbr0
iface vmbr0 inet static
address 192.168.90.11
gateway 192.168.90.1
bridge-ports bond0
bridge-stp off
bridge-fd 0
Thanks
Thanks
Stefan
On Apr 14, 2021, at 12:12, Eneko Lacunza via pve-user
<[email protected]<mailto:[email protected]>> wrote:
From: Eneko Lacunza <[email protected]<mailto:[email protected]>>
Subject: Re: [PVE-User] PVE 6.2 Strange cluster node fence
Date: April 14, 2021 at 12:12:09 GMT+2
To: [email protected]<mailto:[email protected]>
Hi Michael,
El 14/4/21 a las 11:21, Michael Rasmussen via pve-user escribió:
On Wed, 14 Apr 2021 11:04:10 +0200
Eneko Lacunza via
pve-user<[email protected]<mailto:[email protected]>> wrote:
Hi all,
Yesterday we had a strange fence happen in a PVE 6.2 cluster.
Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been
operating normally for a year. Last update was on January 21st 2021.
Storage is Ceph and nodes are connected to the same network switch
with active-pasive bonds.
proxmox1 was fenced and automatically rebooted, then everything
recovered. HA restarted VMs in other nodes too.
proxmox1 syslog: (no network link issues reported at device level)
I have seen this occasionally and every time the cause was high network
load/network congestion which caused token timeout. The default token
timeout in corosync IMHO is very optimistically configured to 1000 ms
so I have changed this setting to 5000 ms and after I have done this I
have never seen fencing happening caused by network load/network
congestion again. You could try this and see if that helps you.
PS. my cluster communication is on a dedicated gb bonded vlan.
Thanks for the info. In this case network is 10Gbit (I see I didn't include
this info) but only for proxmox nodes:
- We have 2 Dell N1124T 24x1Gbit 4xSFP+ switches
- Both switches are interconnected with a SFP+ DAC
- Active-passive Bonds in each proxmox node go one SFP+ interface on each
switch. Primary interfaces are configured to be on the same switch.
- Connectivity to the LAN is done with 1 Gbit link
- Proxmox 2x10G Bond is used for VM networking and Ceph public/private networks.
I wouldn't expect high network load/congestion because it's on an internal LAN,
with 1Gbit clients. No Ceph issues/backfilling were ocurring during the fence.
Network cards are Broadcom.
Thanks
Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project
Tel. +34 943 569 206 |
https://www.binovo.es<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.binovo.es%2F&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380160591%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4%2FgMjuUWXvASTXhGHY1jaebv1O9MS8YB7K7DUa9pq3E%3D&reserved=0>
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
https://www.youtube.com/user/CANALBINOVO<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fuser%2FCANALBINOVO&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380170585%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=7d1X%2F9o5%2Fds6UrrotnAjhZEiKo6X0Yfvi8AfZWr%2BbNk%3D&reserved=0>
https://www.linkedin.com/company/37269706/<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2F37269706%2F&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380180580%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=n0nBU%2FxY%2BGTFXjuDkhfiBI0EO7%2B0w50Lpw6VOpSfnnM%3D&reserved=0>
_______________________________________________
pve-user mailing list
[email protected]<mailto:[email protected]>
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user&data=04%7C01%7Csmr%40kmi.com%7C94935b3774c84a829c8008d8ff2dcd78%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637539919485970079%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0Lc31YKv%2Fm4RQEsAZlcdsuA1XidEZEgfmAwRgGT4Dlg%3D&reserved=0<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380180580%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=XrzD8uS%2BKzGFxKRvZlgfaJuroNOORvNGMpFwVE4efdo%3D&reserved=0>
CONFIDENTIALITY NOTICE: This communication may contain privileged and
confidential information, or may otherwise be protected from disclosure, and is
intended solely for use of the intended recipient(s). If you are not the
intended recipient of this communication, please notify the sender that you
have received this communication in error and delete and destroy all copies in
your possession.
CONFIDENTIALITY NOTICE: This communication may contain privileged and
confidential information, or may otherwise be protected from disclosure, and is
intended solely for use of the intended recipient(s). If you are not the
intended recipient of this communication, please notify the sender that you
have received this communication in error and delete and destroy all copies in
your possession.
Eneko Lacunza
Director
Técnico | Zuzendari teknikoa
Binovo IT Human Project
[https://cdn2.hubspot.net/hubfs/53/tools/email-signature-generator/icons/phone-icon-2x.png]
943
569 206<tel:943%20569%20206>
[https://cdn2.hubspot.net/hubfs/53/tools/email-signature-generator/icons/email-icon-2x.png]
[email protected]<mailto:[email protected]>
[https://cdn2.hubspot.net/hubfs/53/tools/email-signature-generator/icons/link-icon-2x.png]
binovo.es<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbinovo.es%2F&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380190574%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=sj%2FOgHYtiuuMLV2tjpjPAuX8ENMFOXfVP2A%2B%2F8e%2FxWw%3D&reserved=0>
[https://cdn2.hubspot.net/hubfs/53/tools/email-signature-generator/icons/address-icon-2x.png]
Astigarragako
Bidea, 2 - 2 izda. Oficina 10-11, 20180 Oiartzun
[https://odooticketbai.com/wp-content/uploads/2020/10/Logo-Binovo-firmas-de-correo.png]
[youtube]<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fuser%2FCANALBINOVO%2F&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380190574%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Czc3iDzP721b5qJh7maKPRjGg6DIkRWZRaOtSzUTd4Y%3D&reserved=0>
[linkedin]<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2F37269706%2F&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380200571%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2FUdWkOQy0PoPs0cqM0mm4Uzd6fDkqzYOCgAU5Gi2SmM%3D&reserved=0>
CONFIDENTIALITY NOTICE: This communication may contain privileged and
confidential information, or may otherwise be protected from disclosure, and is
intended solely for use of the intended recipient(s). If you are not the
intended recipient of this communication, please notify the sender that you
have received this communication in error and delete and destroy all copies in
your possession.
--- End Message ---
_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user