On Tue, Jul 22, 2025 at 06:27:45PM +0800, Xiaoyao Li wrote:
> On 7/22/2025 5:21 PM, Mathias Krause wrote:
> > On 22.07.25 05:45, Xiaoyao Li wrote:
> > > On 6/20/2025 3:42 AM, Mathias Krause wrote:
> > > > KVM has a weird behaviour when a guest executes VMCALL on an AMD system
> > > > or VMMCALL on an Intel CPU. Both naturally generate an invalid opcode
> > > > exception (#UD) as they are just the wrong instruction for the CPU
> > > > given. But instead of forwarding the exception to the guest, KVM tries
> > > > to patch the guest instruction to match the host's actual hypercall
> > > > instruction. That is doomed to fail as read-only code is rather the
> > > > standard these days. But, instead of letting go the patching attempt and
> > > > falling back to #UD injection, KVM injects the page fault instead.
> > > > 
> > > > That's wrong on multiple levels. Not only isn't that a valid exception
> > > > to be generated by these instructions, confusing attempts to handle
> > > > them. It also destroys guest state by doing so, namely the value of CR2.
> > > > 
> > > > Sean attempted to fix that in KVM[1] but the patch was never applied.
> > > > 
> > > > Later, Oliver added a quirk bit in [2] so the behaviour can, at least,
> > > > conceptually be disabled. Paolo even called out to add this very
> > > > functionality to disable the quirk in QEMU[3]. So lets just do it.
> > > > 
> > > > A new property 'hypercall-patching=on|off' is added, for the very
> > > > unlikely case that there are setups that really need the patching.
> > > > However, these would be vulnerable to memory corruption attacks freely
> > > > overwriting code as they please. So, my guess is, there are exactly 0
> > > > systems out there requiring this quirk.
> > > 
> > > The default behavior is patching the hypercall for many years.
> > > 
> > > If you desire to change the default behavior, please at least keep it
> > > unchanged for old machine version. i.e., introduce compat_property,
> > > which sets KVMState->hypercall_patching_enabled to true.
> > 
> > Well, the thing is, KVM's patching is done with the effective
> > permissions of the guest which means, if the code in question isn't
> > writable from the guest's point of view, KVM's attempt to modify it will
> > fail. This failure isn't transparent for the guest as it sees a #PF
> > instead of a #UD, and that's what I'm trying to fix by disabling the quirk.
> > 
> > The hypercall patching was introduced in Linux commit 7aa81cc04781
> > ("KVM: Refactor hypercall infrastructure (v3)") in v2.6.25. Until then
> > it was based on a dedicated hypercall page that was handled by KVM to
> > use the proper instruction of the KVM module in use (VMX or SVM).
> > 
> > Patching code was fine back then, but the introduction of DEBUG_RO_DATA
> > made the patching attempts fail and, ultimately, lead to Paolo handle
> > this with commit c1118b3602c2 ("x86: kvm: use alternatives for VMCALL
> > vs. VMMCALL if kernel text is read-only").
> > 
> > However, his change still doesn't account for the cross-vendor live
> > migration case (Intel<->AMD), which will still be broken, causing the
> > before mentioned bogus #PF, which will just lead to misleading Oops
> > reports, confusing the poor souls, trying to make sense of it.
> > 
> > IMHO, there is no valid reason for still having the patching in place as
> > the .text of non-ancient kernel's  will be write-protected, making
> > patching attempts fail. And, as they fail with a #PF instead of #UD, the
> > guest cannot even handle them appropriately, as there was no memory
> > write attempt from its point of view. Therefore the default should be to
> > disable it, IMO. This won't prevent guests making use of the wrong
> > instruction from trapping, but, at least, now they'll get the correct
> > exception vector and can handle it appropriately.
> 
> But you don't accout for the case that guest kernel is built without
> CONFIG_STRICT_KERNEL_RWX enabled, or without CONFIG_DEBUG_RO_DATA, or for
> whatever reason the guest's text is not readonly, and the VM needs to be
> migrated among different vendors (Intel <-> AMD).
> 
> Before this patch, the above usecase works well. But with this patch, the
> guest will gets #UD after migrated to different vendors.
> 
> I heard from some small CSPs that they do want to the ability to live
> migrate VMs among Intel and AMD host.

Usually CSPs don't have full control over what their customers
are running as a guest. If their customers are running mainstream
modern guest OS, CONFIG_STRICT_KERNEL_RWX is pretty likely to be
set, so presumably migration between Intel & AMD will not work
and this isn't making it worse ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to