On 22.07.25 12:27, Xiaoyao Li wrote: > On 7/22/2025 5:21 PM, Mathias Krause wrote: >> IMHO, there is no valid reason for still having the patching in place as >> the .text of non-ancient kernel's will be write-protected, making >> patching attempts fail. And, as they fail with a #PF instead of #UD, the >> guest cannot even handle them appropriately, as there was no memory >> write attempt from its point of view. Therefore the default should be to >> disable it, IMO. This won't prevent guests making use of the wrong >> instruction from trapping, but, at least, now they'll get the correct >> exception vector and can handle it appropriately. > > But you don't accout for the case that guest kernel is built without > CONFIG_STRICT_KERNEL_RWX enabled, or without CONFIG_DEBUG_RO_DATA, or > for whatever reason the guest's text is not readonly, and the VM needs > to be migrated among different vendors (Intel <-> AMD). > > Before this patch, the above usecase works well. But with this patch, > the guest will gets #UD after migrated to different vendors.
I strongly doubt that use case works well, probably doesn't work at all right now. There's so much more to handle on the kernel side for such a thing to actually "work". Think of mitigations like RETPOLINE, RETHUNK, UNRET, SRSO, GDS,... and all the vendor-specific MSRs that need to be switched over. No way this "just works". Not patching VMCALL to VMMCALL or vice versa is the least of your problems in such a scenario. > I heard from some small CSPs that they do want to the ability to live > migrate VMs among Intel and AMD host. Well, they'll have a hard time trying to get it to work, sorry. >> [...] >> >> Ignoring pre-v2.6.25 kernels for a moment, we can assume that KVM will >> do the patching. So the lack of KVM_X86_QUIRK_FIX_HYPERCALL_INSN but >> having 'hypercall_patching_enabled == false' indicates that the user >> wants to disable it but QEMU cannot do so, because KVM lacks the >> extension to do so. This, IMO, legitimizes the warn_report("kvm: >> disabling hypercall patching not supported") -- as it's not supported. > > The minimum supported kernel version is 4.5 (see commit f180e367fce4). > So pre-v2.6.25 kernels is not the case. Perfect! > Surely we can list all the cases of different versions of KVM starting > from v4.5 and draw the conclusion that the semantics of "valid_quirks & > KVM_X86_QUIRK_FIX_HYPERCALL_INSN == 0" means KVM enables the hypercall > patching quirk but don't provide the interface for userspace to disable > it. So the code logic is correct. Thanks for confirming! > My statement of "I think it requires a new cap in KVM to return the > enabled quirks" is more for generic consideration. i.e., QEMU can know > whether a quirk is enabled or not without analysing the detailed history > of KVM. Well, all quirks are enabled by default, only explicitly disabled ones are, well, disabled. And that requires actively doing so via kvm_vm_enable_cap(KVM_CAP_DISABLE_QUIRKS[2]). Surely, QEMU can track these calls on its own if it really wants to. > Of course, it's more of the requirement on KVM to provide new interface > and Current QEMU can do nothing on it. That is, current implementation > of this PATCH is OK to me. Ok, I'll send a v2 incorporating the member name change to 'hypercall_patching'. >>> [...] >>> I think return 0 here is to avoid the warn_report() in the caller. But >>> for the correct semantics, we need to return -1 to indicate that it >>> fails to disable the hypercall patching? >> >> No, returning 0 here is very much on purpose, as you noticed, to avoid >> the warn_report() in the caller. The already issued warn_report() is the >> correct one for this case. > > We can use @Error to pass the error log instead of the trick on return > value. > > e.g., > > --- a/target/i386/kvm/kvm.c > +++ b/target/i386/kvm/kvm.c > @@ -3228,17 +3228,24 @@ static int kvm_vm_enable_energy_msrs(KVMState *s) > return 0; > } > > -static int kvm_vm_disable_hypercall_patching(KVMState *s) > +static int kvm_vm_disable_hypercall_patching(KVMState *s, Error **errp) > { > int valid_quirks = kvm_vm_check_extension(s, KVM_CAP_DISABLE_QUIRKS2); > + int ret = -1; > > if (valid_quirks & KVM_X86_QUIRK_FIX_HYPERCALL_INSN) { > - return kvm_vm_enable_cap(s, KVM_CAP_DISABLE_QUIRKS2, 0, > - KVM_X86_QUIRK_FIX_HYPERCALL_INSN); > + ret = kvm_vm_enable_cap(s, KVM_CAP_DISABLE_QUIRKS2, 0, > + KVM_X86_QUIRK_FIX_HYPERCALL_INSN); > + if (ret) { > + error_setg_errno(errp, -ret, "kvm: failed to disable " > + "hypercall patching quirk: %s", > + strerror(-ret)); > + } > + } else { > + error_setg(errp, "kvm: disabling hypercall patching not > supported"); > } > > - warn_report("kvm: disabling hypercall patching not supported"); > - return 0; > + return ret; > } > > int kvm_arch_init(MachineState *ms, KVMState *s) > @@ -3381,8 +3388,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s) > } > > if (s->hypercall_patching_enabled == false) { > - if (kvm_vm_disable_hypercall_patching(s)) { > - warn_report("kvm: failed to disable hypercall patching > quirk"); > + if (kvm_vm_disable_hypercall_patching(s, &local_err)) { > + error_report_err(local_err); > } > } > Ohh, neat. I'll integrate that as well. Thanks! Thanks, Mathias