That is a big cluster, I like it, hope it works out. You should separate the corosync/heartbeat network on its own physical Ethernet link. This is probably where you are getting latency from. Even though you are using 25Gig NICs, pushing all your data/migration traffic/heartbeat traffic, across one physical link bonded or not, you can experience situations with a busy link where your corosync traffic is queued, even for a few milli seconds, this will add up across many nodes. Think about jumbo frames as well, slamming a NIC with 9000 byte packets for storage, and poor little heartbeat packets start queueing up in the waiting pool.
In the design notes for proxmox, it's highly recommended to separate all needed networks on physical NICs and switches as well. Good luck. JR Richardson Engineering for the Masses Chasing the Azeotrope JRx DistillCo 1'st Place Brisket 1'st Place Chili This is anecdotal but I have never seen one cluster that big. You might want to inquire about professional support which would give you a better perspective for that kind of scale. On Thu, Jun 24, 2021 at 10:30 AM Eneko Lacunza via pve-user < [email protected]> wrote: > > > > ---------- Forwarded message ---------- > From: Eneko Lacunza <[email protected]> > To: "[email protected]" <[email protected]> > Cc: > Bcc: > Date: Thu, 24 Jun 2021 16:30:31 +0200 > Subject: BIG cluster questions > Hi all, > > We're currently helping a customer to configure a virtualization > cluster with 88 servers for VDI. > _______________________________________________ pve-user mailing list [email protected] https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
