It's been quite a long time since I've done it, but for what it's
worth, I never had problems live migrating KVM machines to hosts with
other processors, **as long as it wasn't launched using a processor
specific extension**.

Get the exact options kvm is running on both hosts, and compare.

In openstack there's a tendency to auto-detect processor features and
launch with all available, so when I had a cluster of mixed epyc
generations, I had to declare features instead of letting it
autodetect (previous job, over a year ago, so details sketchy).   My
guess is some auto-detection gone wrong.

My home-cluster is homogeneous cast-off R610s, otherwise I'd test this
myself Sorry.

--Kyle

On Tue, Nov 8, 2022 at 2:57 PM Jan Vlach <[email protected]> wrote:
>
> Hi Eneko,
>
> thank you a million for taking your time to re-test this! It really helps me 
> to understand what to expect that works and what doesn’t. I had a glimpse of 
> an idea to create cluster with mixed CPUs of EPYC gen1 and EPYC gen3, but 
> this really seems like a road to hell(tm). So I’ll keep the clusters 
> homogenous with the same gen of CPU. I have two sites, but fortunately, I can 
> keep the clusters homogenous (with one having “more power”).
>
> Honestly, up until now, I thought I could abstract from the version of linux 
> kernel I’m running. Because, hey, it’s all KVM.  I’m setting my VMs with cpu 
> type host to have the benefit of accelerated AES and other instructions, but 
> I have yet to see if EPYCv1 is compatible with EPYCv3. (v being gen) Thanks 
> for teaching me a new trick or a thing to be aware of at least! (I remember 
> this to be an issue with VMware heterogenous clusters (with cpus of different 
> generations), but I really though KVM64 would help you to abstract from all 
> this, KVM64 being Pentium4-era CPU)
>
> Do you use virtio drivers for storage and network card at all? Can you see a 
> pattern there where the 3 Debian/Windows machines were not affected? Did they 
> use virtio or not?
>
> I really don’t see a reason why the migration back from 5.13 -> 5.19 should 
> bring that 50/100% CPU load and hanging. I’ve had some phantom load before 
> with having “Use tablet for pointer: Yes” before, but that was in the 5% 
> ballpark per VM.
>
> I’m just a fellow proxmox admin/user. Hope this would ring a bell or spark 
> interest in the core proxmox team. I’ve had struggles with 5.15 before with 
> GPU passthrough (wasn’t able to do this) and OpenBSD vm’s taking minutes 
> compared to tens of seconds to boot on 5.15 before.
>
> All and all, thanks for all the hints I could test before production, do it 
> won’t hurt “down the road” …
>
> JV
> P.S. i’m trying to push my boss towards a commercial subscription for our 
> clusters, but at this point I really am no sure it would help ...
>
>
> > On 8. 11. 2022, at 18:18, Eneko Lacunza via pve-user 
> > <[email protected]> wrote:
> >
> >
> > From: Eneko Lacunza <[email protected]>
> > Subject: Re: [PVE-User] VMs hung after live migration - Intel CPU
> > Date: 8 November 2022 18:18:44 CET
> > To: [email protected]
> >
> >
> > Hi Jan,
> >
> > I had some time to re-test this.
> >
> > I tried live migration with KVM64 CPU between 2 nodes:
> >
> > node-ryzen1700 - kernel 5.19.7-1-pve
> > node-ryzen5900x - kernel 5.19.7-1-pve
> >
> > I bulk-migrated 9 VMs (8 Debian 9/10/11 and 1 Windows 2008r2).
> > This works OK in both directions.
> >
> > Then I downgraded a node to 5.13:
> > node-ryzen1700 - kernel 5.19.7-1-pve
> > node-ryzen5900x - kernel 5.13.19-6-pve
> >
> > Migration of those 9 VMs worked well from node-ryzen1700 -> node->ryzen5900x
> >
> > But migration of those 9 VMs back node->ryzen5900x -> node-ryzen1700 was a 
> > disaster: all 8 debian VMs hung with 50/100% CPU use. Window 2008r2 seems 
> > not affected by the issue at all.
> >
> > 3 other Debian/Windows VMs on node-ryzen1700 were not affected.
> >
> > After migrating both nodes to kernel 5.13:
> >
> > node-ryzen1700 - kernel 5.13.19-6-pve
> > node-ryzen5900x - kernel 5.13.19-6-pve
> >
> > Migration of those 9 VMs node->ryzen5900x -> node-ryzen1700 works as 
> > intended :)
> >
> > Cheers
> >
> >
> >
> > El 8/11/22 a las 9:40, Eneko Lacunza via pve-user escribió:
> >> Hi Jan,
> >>
> >> Yes, there's no issue if CPUs are the same.
> >>
> >> VMs hang when CPUs are of different enough generation, even being of the 
> >> same brand and using KVM64 vCPU.
> >>
> >> El 7/11/22 a las 22:59, Jan Vlach escribió:
> >>> Hi,
> >>>
> >>> For what’s it worth, live VM migration with Linux VMs with various debian 
> >>> versions work here just fine. I’m using virtio for networking and virtio 
> >>> scsi for disks. (The only version where I had problems was debian6 where 
> >>> the kernel does not support virtio scsi and megaraid sas 8708EM2 needs to 
> >>> be used. I get kernel panic in mpt_sas on thaw after migration.)
> >>>
> >>> We're running 5.15.60-1-pve on three node cluster with AMD EPYC 7551P 
> >>> 32-Core Processor. These are supermicros with latest bios (latest 
> >>> microcode?) and BMC
> >>>
> >>> Storage is local ZFS pool, backed by SSDS in striped mirrors (4 devices 
> >>> on each node). Migration has dedicated 2x 10GigE LACP and dedicated VLAN 
> >>> on switch stack.
> >>>
> >>> I have more nodes with EPYC3/Milan on the way, so I’ll test those later 
> >>> as well.
> >>>
> >>> What does your cluster look hardware-wise? What are the problems you 
> >>> experienced with VM migratio on 5.13->5.19?
> >>>
> >>> Thanks,
> >>> JV
> >
> > Eneko Lacunza
> > Zuzendari teknikoa | Director técnico
> > Binovo IT Human Project
> >
> > Tel. +34 943 569 206 |https://www.binovo.es
> > Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
> >
> > https://www.youtube.com/user/CANALBINOVO
> > https://www.linkedin.com/company/37269706/
> >
> >
> > _______________________________________________
> > pve-user mailing list
> > [email protected]
> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
> _______________________________________________
> pve-user mailing list
> [email protected]
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to