It's been quite a long time since I've done it, but for what it's worth, I never had problems live migrating KVM machines to hosts with other processors, **as long as it wasn't launched using a processor specific extension**.
Get the exact options kvm is running on both hosts, and compare. In openstack there's a tendency to auto-detect processor features and launch with all available, so when I had a cluster of mixed epyc generations, I had to declare features instead of letting it autodetect (previous job, over a year ago, so details sketchy). My guess is some auto-detection gone wrong. My home-cluster is homogeneous cast-off R610s, otherwise I'd test this myself Sorry. --Kyle On Tue, Nov 8, 2022 at 2:57 PM Jan Vlach <[email protected]> wrote: > > Hi Eneko, > > thank you a million for taking your time to re-test this! It really helps me > to understand what to expect that works and what doesn’t. I had a glimpse of > an idea to create cluster with mixed CPUs of EPYC gen1 and EPYC gen3, but > this really seems like a road to hell(tm). So I’ll keep the clusters > homogenous with the same gen of CPU. I have two sites, but fortunately, I can > keep the clusters homogenous (with one having “more power”). > > Honestly, up until now, I thought I could abstract from the version of linux > kernel I’m running. Because, hey, it’s all KVM. I’m setting my VMs with cpu > type host to have the benefit of accelerated AES and other instructions, but > I have yet to see if EPYCv1 is compatible with EPYCv3. (v being gen) Thanks > for teaching me a new trick or a thing to be aware of at least! (I remember > this to be an issue with VMware heterogenous clusters (with cpus of different > generations), but I really though KVM64 would help you to abstract from all > this, KVM64 being Pentium4-era CPU) > > Do you use virtio drivers for storage and network card at all? Can you see a > pattern there where the 3 Debian/Windows machines were not affected? Did they > use virtio or not? > > I really don’t see a reason why the migration back from 5.13 -> 5.19 should > bring that 50/100% CPU load and hanging. I’ve had some phantom load before > with having “Use tablet for pointer: Yes” before, but that was in the 5% > ballpark per VM. > > I’m just a fellow proxmox admin/user. Hope this would ring a bell or spark > interest in the core proxmox team. I’ve had struggles with 5.15 before with > GPU passthrough (wasn’t able to do this) and OpenBSD vm’s taking minutes > compared to tens of seconds to boot on 5.15 before. > > All and all, thanks for all the hints I could test before production, do it > won’t hurt “down the road” … > > JV > P.S. i’m trying to push my boss towards a commercial subscription for our > clusters, but at this point I really am no sure it would help ... > > > > On 8. 11. 2022, at 18:18, Eneko Lacunza via pve-user > > <[email protected]> wrote: > > > > > > From: Eneko Lacunza <[email protected]> > > Subject: Re: [PVE-User] VMs hung after live migration - Intel CPU > > Date: 8 November 2022 18:18:44 CET > > To: [email protected] > > > > > > Hi Jan, > > > > I had some time to re-test this. > > > > I tried live migration with KVM64 CPU between 2 nodes: > > > > node-ryzen1700 - kernel 5.19.7-1-pve > > node-ryzen5900x - kernel 5.19.7-1-pve > > > > I bulk-migrated 9 VMs (8 Debian 9/10/11 and 1 Windows 2008r2). > > This works OK in both directions. > > > > Then I downgraded a node to 5.13: > > node-ryzen1700 - kernel 5.19.7-1-pve > > node-ryzen5900x - kernel 5.13.19-6-pve > > > > Migration of those 9 VMs worked well from node-ryzen1700 -> node->ryzen5900x > > > > But migration of those 9 VMs back node->ryzen5900x -> node-ryzen1700 was a > > disaster: all 8 debian VMs hung with 50/100% CPU use. Window 2008r2 seems > > not affected by the issue at all. > > > > 3 other Debian/Windows VMs on node-ryzen1700 were not affected. > > > > After migrating both nodes to kernel 5.13: > > > > node-ryzen1700 - kernel 5.13.19-6-pve > > node-ryzen5900x - kernel 5.13.19-6-pve > > > > Migration of those 9 VMs node->ryzen5900x -> node-ryzen1700 works as > > intended :) > > > > Cheers > > > > > > > > El 8/11/22 a las 9:40, Eneko Lacunza via pve-user escribió: > >> Hi Jan, > >> > >> Yes, there's no issue if CPUs are the same. > >> > >> VMs hang when CPUs are of different enough generation, even being of the > >> same brand and using KVM64 vCPU. > >> > >> El 7/11/22 a las 22:59, Jan Vlach escribió: > >>> Hi, > >>> > >>> For what’s it worth, live VM migration with Linux VMs with various debian > >>> versions work here just fine. I’m using virtio for networking and virtio > >>> scsi for disks. (The only version where I had problems was debian6 where > >>> the kernel does not support virtio scsi and megaraid sas 8708EM2 needs to > >>> be used. I get kernel panic in mpt_sas on thaw after migration.) > >>> > >>> We're running 5.15.60-1-pve on three node cluster with AMD EPYC 7551P > >>> 32-Core Processor. These are supermicros with latest bios (latest > >>> microcode?) and BMC > >>> > >>> Storage is local ZFS pool, backed by SSDS in striped mirrors (4 devices > >>> on each node). Migration has dedicated 2x 10GigE LACP and dedicated VLAN > >>> on switch stack. > >>> > >>> I have more nodes with EPYC3/Milan on the way, so I’ll test those later > >>> as well. > >>> > >>> What does your cluster look hardware-wise? What are the problems you > >>> experienced with VM migratio on 5.13->5.19? > >>> > >>> Thanks, > >>> JV > > > > Eneko Lacunza > > Zuzendari teknikoa | Director técnico > > Binovo IT Human Project > > > > Tel. +34 943 569 206 |https://www.binovo.es > > Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun > > > > https://www.youtube.com/user/CANALBINOVO > > https://www.linkedin.com/company/37269706/ > > > > > > _______________________________________________ > > pve-user mailing list > > [email protected] > > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > _______________________________________________ > pve-user mailing list > [email protected] > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user _______________________________________________ pve-user mailing list [email protected] https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
