Re: [PATCH RESEND v3 0/3] i386: Fix Hyper-V Gen1 guests stuck on boot with 'hv-passthrough'
Vitaly Kuznetsov writes: > Changes since 'RESEND v2': > - Included 'docs/system: Add recommendations to Hyper-V enlightenments doc' > in the set as it also requires a "RESEND") Ping) > > Hyper-V Gen1 guests are getting stuck on boot when 'hv-passthrough' is > used. While 'hv-passthrough' is a debug only feature, this significantly > limit its usefullness. While debugging the problem, I found that there are > two loosely connected issues: > - 'hv-passthrough' enables 'hv-syndbg' and this is undesired. > - 'hv-syndbg's support by KVM is detected incorrectly when !CONFIG_SYNDBG. > > Fix both issues; exclude 'hv-syndbg' from 'hv-passthrough' and don't allow > to turn on 'hv-syndbg' for !CONFIG_SYNDBG builds. > > Vitaly Kuznetsov (3): > i386: Fix conditional CONFIG_SYNDBG enablement > i386: Exclude 'hv-syndbg' from 'hv-passthrough' > docs/system: Add recommendations to Hyper-V enlightenments doc > > docs/system/i386/hyperv.rst | 43 + > target/i386/cpu.c | 2 ++ > target/i386/kvm/kvm.c | 18 ++-- > 3 files changed, 53 insertions(+), 10 deletions(-) -- Vitaly
Re: [PATCH RESEND v3 3/3] docs/system: Add recommendations to Hyper-V enlightenments doc
Zhao Liu writes: > Hi Vitaly, > > On Tue, Mar 05, 2024 at 05:42:04PM +0100, Vitaly Kuznetsov wrote: >> Date: Tue, 5 Mar 2024 17:42:04 +0100 >> From: Vitaly Kuznetsov >> Subject: [PATCH RESEND v3 3/3] docs/system: Add recommendations to Hyper-V >> enlightenments doc >> >> While hyperv.rst already has all currently implemented Hyper-V >> enlightenments documented, it may be unclear what is the recommended set to >> achieve the best result. Add the corresponding section to the doc. >> >> Signed-off-by: Vitaly Kuznetsov >> --- >> docs/system/i386/hyperv.rst | 30 ++ >> 1 file changed, 30 insertions(+) >> >> diff --git a/docs/system/i386/hyperv.rst b/docs/system/i386/hyperv.rst >> index 009947e39141..1c1de77feb65 100644 >> --- a/docs/system/i386/hyperv.rst >> +++ b/docs/system/i386/hyperv.rst >> @@ -283,6 +283,36 @@ Supplementary features >>feature alters this behavior and only allows the guest to use exposed >> Hyper-V >>enlightenments. >> >> +Recommendations >> +--- > > This guide is very helpful! > >> +To achieve the best performance of Windows and Hyper-V guests and unless >> there >> +are any specific requirements (e.g. migration to older QEMU/KVM versions, >> +emulating specific Hyper-V version, ...), it is recommended to enable all >> +currently implemented Hyper-V enlightenments with the following exceptions: >> + >> +- ``hv-syndbg``, ``hv-passthrough``, ``hv-enforce-cpuid`` should not be >> enabled >> + in production configurations as these are debugging/development features. >> +- ``hv-reset`` can be avoided as modern Hyper-V versions don't expose it. > > Does the "Hyper-V versions" means Hyper-V guest version or Microsoft's Hyper-V > hypervisor version? > It would be better to clarify Hyper-V guest and Hyper-v hypervisor. > > And it would be better to have a clear version number. This is about QEMU/KVM emulating certain Hyper-V version, not about guest Hyper-V version. To be honest, I'm not sure what was the last version of Hyper-V which was exposing HV_SYSTEM_RESET_RECOMMENDED. I don't have anything older that WS2016 around now and the bit is not there. If I'm not mistaken, it was already missing in 2012R2. I would appreciate if anyone has more precise historical info to add here. > >> +- ``hv-evmcs`` can (and should) be enabled on Intel CPUs only. While the >> feature >> + is only used in nested configurations (Hyper-V, WSL2), enabling it for >> regular >> + Windows guests should not have any negative effects. >> +- ``hv-no-nonarch-coresharing`` must only be enabled if vCPUs are properly >> pinned >> + so no non-architectural core sharing is possible. >> +- ``hv-vendor-id``, ``hv-version-id-build``, ``hv-version-id-major``, >> + ``hv-version-id-minor``, ``hv-version-id-spack``, >> ``hv-version-id-sbranch``, >> + ``hv-version-id-snumber`` can be left unchanged, guests are not supposed >> to >> + behave differently when different Hyper-V version is presented to them. >> +- ``hv-crash`` must only be enabled if the crash information is consumed via >> + QAPI by higher levels of the virtualization stack. Enabling this feature >> + effectively prevents Windows from creating dumps upon crashes. >> +- ``hv-reenlightenment`` can only be used on hardware which supports TSC >> + scaling or when guest migration is not needed. >> +- ``hv-spinlocks`` should be set to e.g. 0xfff when host CPUs are >> overcommited >> + (meaning there are other scheduled tasks or guests) and can be left >> unchanged >> + from the default value (0x) otherwise. >> +- ``hv-avic``/``hv-apicv`` should not be enabled if the hardware does not >> + support APIC virtualization (Intel APICv, AMD AVIC). >> > > It's also better to add blank lines between paragraphs above. Np, if I am to re-send this I'll add these (hope it's not an acceptance blocker, we can always do a follow-up). > > BTW, may I ask another Windows question? I understand that Windows such > as Windows 10 and later is already a virtualized architecture with > built-in Hyper-V to run root partation. > > So is it true that booting Windows VM via KVM + QEMU is running Windows > Guest in L2? Or what is the relationship between Hyper-V within Windows > and Hyper-V enlightenments with QEMU + KVM? Hyper-V is a role you can enable in various Windows versions, both server and client. When enabled, you get a hypervisor (which is called 'Microsoft Hypervisor' as I was told) and your Windows becomes the root partition (similar to Xen Dom0). In case you run this on KVM, Windows becomes L2. Hyper-V enlightenments provided by KVM/QEMU are consumed by the hypervisor then. Note: Hyper-V role is optional, in many cases Windows guests run without it (no Hyper-V VMs, no WSL2, ...) and thus consume KVM's Hyper-V enlightenments directly, no nested virt involved. -- Vitaly
[PATCH RESEND v3 2/3] i386: Exclude 'hv-syndbg' from 'hv-passthrough'
Windows with Hyper-V role enabled doesn't boot with 'hv-passthrough' when no debugger is configured, this significantly limits the usefulness of the feature as there's no support for subtracting Hyper-V features from CPU flags at this moment (e.g. "-cpu host,hv-passthrough,-hv-syndbg" does not work). While this is also theoretically fixable, 'hv-syndbg' is likely very special and unneeded in the default set. Genuine Hyper-V doesn't seem to enable it either. Introduce 'skip_passthrough' flag to 'kvm_hyperv_properties' and use it as one-off to skip 'hv-syndbg' when enabling features in 'hv-passthrough' mode. Note, "-cpu host,hv-passthrough,hv-syndbg" can still be used if needed. As both 'hv-passthrough' and 'hv-syndbg' are debug features, the change should not have any effect on production environments. Signed-off-by: Vitaly Kuznetsov --- docs/system/i386/hyperv.rst | 13 + target/i386/kvm/kvm.c | 7 +-- 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/docs/system/i386/hyperv.rst b/docs/system/i386/hyperv.rst index 2505dc4c86e0..009947e39141 100644 --- a/docs/system/i386/hyperv.rst +++ b/docs/system/i386/hyperv.rst @@ -262,14 +262,19 @@ Supplementary features ``hv-passthrough`` In some cases (e.g. during development) it may make sense to use QEMU in 'pass-through' mode and give Windows guests all enlightenments currently - supported by KVM. This pass-through mode is enabled by "hv-passthrough" CPU - flag. + supported by KVM. Note: ``hv-passthrough`` flag only enables enlightenments which are known to QEMU (have corresponding 'hv-' flag) and copies ``hv-spinlocks`` and ``hv-vendor-id`` values from KVM to QEMU. ``hv-passthrough`` overrides all other 'hv-' settings on - the command line. Also, enabling this flag effectively prevents migration as the - list of enabled enlightenments may differ between target and destination hosts. + the command line. + + Note: ``hv-passthrough`` does not enable ``hv-syndbg`` which can prevent certain + Windows guests from booting when used without proper configuration. If needed, + ``hv-syndbg`` can be enabled additionally. + + Note: ``hv-passthrough`` effectively prevents migration as the list of enabled + enlightenments may differ between target and destination hosts. ``hv-enforce-cpuid`` By default, KVM allows the guest to use all currently supported Hyper-V diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index f067e35d35b1..f01d19ad2d51 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -823,6 +823,7 @@ static struct { uint32_t bits; } flags[2]; uint64_t dependencies; +bool skip_passthrough; } kvm_hyperv_properties[] = { [HYPERV_FEAT_RELAXED] = { .desc = "relaxed timing (hv-relaxed)", @@ -951,7 +952,8 @@ static struct { {.func = HV_CPUID_FEATURES, .reg = R_EDX, .bits = HV_FEATURE_DEBUG_MSRS_AVAILABLE} }, -.dependencies = BIT(HYPERV_FEAT_SYNIC) | BIT(HYPERV_FEAT_RELAXED) +.dependencies = BIT(HYPERV_FEAT_SYNIC) | BIT(HYPERV_FEAT_RELAXED), +.skip_passthrough = true, }, [HYPERV_FEAT_MSR_BITMAP] = { .desc = "enlightened MSR-Bitmap (hv-emsr-bitmap)", @@ -1360,7 +1362,8 @@ bool kvm_hyperv_expand_features(X86CPU *cpu, Error **errp) * hv_build_cpuid_leaf() uses this info to build guest CPUIDs. */ for (feat = 0; feat < ARRAY_SIZE(kvm_hyperv_properties); feat++) { -if (hyperv_feature_supported(cs, feat)) { +if (hyperv_feature_supported(cs, feat) && +!kvm_hyperv_properties[feat].skip_passthrough) { cpu->hyperv_features |= BIT(feat); } } -- 2.43.2
[PATCH RESEND v3 0/3] i386: Fix Hyper-V Gen1 guests stuck on boot with 'hv-passthrough'
Changes since 'RESEND v2': - Included 'docs/system: Add recommendations to Hyper-V enlightenments doc' in the set as it also requires a "RESEND") Hyper-V Gen1 guests are getting stuck on boot when 'hv-passthrough' is used. While 'hv-passthrough' is a debug only feature, this significantly limit its usefullness. While debugging the problem, I found that there are two loosely connected issues: - 'hv-passthrough' enables 'hv-syndbg' and this is undesired. - 'hv-syndbg's support by KVM is detected incorrectly when !CONFIG_SYNDBG. Fix both issues; exclude 'hv-syndbg' from 'hv-passthrough' and don't allow to turn on 'hv-syndbg' for !CONFIG_SYNDBG builds. Vitaly Kuznetsov (3): i386: Fix conditional CONFIG_SYNDBG enablement i386: Exclude 'hv-syndbg' from 'hv-passthrough' docs/system: Add recommendations to Hyper-V enlightenments doc docs/system/i386/hyperv.rst | 43 + target/i386/cpu.c | 2 ++ target/i386/kvm/kvm.c | 18 ++-- 3 files changed, 53 insertions(+), 10 deletions(-) -- 2.43.2
[PATCH RESEND v3 3/3] docs/system: Add recommendations to Hyper-V enlightenments doc
While hyperv.rst already has all currently implemented Hyper-V enlightenments documented, it may be unclear what is the recommended set to achieve the best result. Add the corresponding section to the doc. Signed-off-by: Vitaly Kuznetsov --- docs/system/i386/hyperv.rst | 30 ++ 1 file changed, 30 insertions(+) diff --git a/docs/system/i386/hyperv.rst b/docs/system/i386/hyperv.rst index 009947e39141..1c1de77feb65 100644 --- a/docs/system/i386/hyperv.rst +++ b/docs/system/i386/hyperv.rst @@ -283,6 +283,36 @@ Supplementary features feature alters this behavior and only allows the guest to use exposed Hyper-V enlightenments. +Recommendations +--- + +To achieve the best performance of Windows and Hyper-V guests and unless there +are any specific requirements (e.g. migration to older QEMU/KVM versions, +emulating specific Hyper-V version, ...), it is recommended to enable all +currently implemented Hyper-V enlightenments with the following exceptions: + +- ``hv-syndbg``, ``hv-passthrough``, ``hv-enforce-cpuid`` should not be enabled + in production configurations as these are debugging/development features. +- ``hv-reset`` can be avoided as modern Hyper-V versions don't expose it. +- ``hv-evmcs`` can (and should) be enabled on Intel CPUs only. While the feature + is only used in nested configurations (Hyper-V, WSL2), enabling it for regular + Windows guests should not have any negative effects. +- ``hv-no-nonarch-coresharing`` must only be enabled if vCPUs are properly pinned + so no non-architectural core sharing is possible. +- ``hv-vendor-id``, ``hv-version-id-build``, ``hv-version-id-major``, + ``hv-version-id-minor``, ``hv-version-id-spack``, ``hv-version-id-sbranch``, + ``hv-version-id-snumber`` can be left unchanged, guests are not supposed to + behave differently when different Hyper-V version is presented to them. +- ``hv-crash`` must only be enabled if the crash information is consumed via + QAPI by higher levels of the virtualization stack. Enabling this feature + effectively prevents Windows from creating dumps upon crashes. +- ``hv-reenlightenment`` can only be used on hardware which supports TSC + scaling or when guest migration is not needed. +- ``hv-spinlocks`` should be set to e.g. 0xfff when host CPUs are overcommited + (meaning there are other scheduled tasks or guests) and can be left unchanged + from the default value (0x) otherwise. +- ``hv-avic``/``hv-apicv`` should not be enabled if the hardware does not + support APIC virtualization (Intel APICv, AMD AVIC). Useful links -- 2.43.2
[PATCH RESEND v3 1/3] i386: Fix conditional CONFIG_SYNDBG enablement
Putting HYPERV_FEAT_SYNDBG entry under "#ifdef CONFIG_SYNDBG" in 'kvm_hyperv_properties' array is wrong: as HYPERV_FEAT_SYNDBG is not the highest feature number, the result is an empty (zeroed) entry in the array (and not a skipped entry!). hyperv_feature_supported() is designed to check that all CPUID bits are set but for a zeroed feature in 'kvm_hyperv_properties' it returns 'true' so QEMU considers HYPERV_FEAT_SYNDBG as always supported, regardless of whether KVM host actually supports it. To fix the issue, leave HYPERV_FEAT_SYNDBG's definition in 'kvm_hyperv_properties' array, there's nothing wrong in having it defined even when 'CONFIG_SYNDBG' is not set. Instead, put "hv-syndbg" CPU property under '#ifdef CONFIG_SYNDBG' to alter the existing behavior when the flag is silently skipped in !CONFIG_SYNDBG builds. Leave an 'assert' sentinel in hyperv_feature_supported() making sure there are no 'holes' or improperly defined features in 'kvm_hyperv_properties'. Fixes: d8701185f40c ("hw: hyperv: Initial commit for Synthetic Debugging device") Signed-off-by: Vitaly Kuznetsov --- target/i386/cpu.c | 2 ++ target/i386/kvm/kvm.c | 11 +++ 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 2666ef380891..64ce7c4c8242 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -7866,8 +7866,10 @@ static Property x86_cpu_properties[] = { HYPERV_FEAT_TLBFLUSH_DIRECT, 0), DEFINE_PROP_ON_OFF_AUTO("hv-no-nonarch-coresharing", X86CPU, hyperv_no_nonarch_cs, ON_OFF_AUTO_OFF), +#ifdef CONFIG_SYNDBG DEFINE_PROP_BIT64("hv-syndbg", X86CPU, hyperv_features, HYPERV_FEAT_SYNDBG, 0), +#endif DEFINE_PROP_BOOL("hv-passthrough", X86CPU, hyperv_passthrough, false), DEFINE_PROP_BOOL("hv-enforce-cpuid", X86CPU, hyperv_enforce_cpuid, false), diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 42970ab046fa..f067e35d35b1 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -945,7 +945,6 @@ static struct { .bits = HV_DEPRECATING_AEOI_RECOMMENDED} } }, -#ifdef CONFIG_SYNDBG [HYPERV_FEAT_SYNDBG] = { .desc = "Enable synthetic kernel debugger channel (hv-syndbg)", .flags = { @@ -954,7 +953,6 @@ static struct { }, .dependencies = BIT(HYPERV_FEAT_SYNIC) | BIT(HYPERV_FEAT_RELAXED) }, -#endif [HYPERV_FEAT_MSR_BITMAP] = { .desc = "enlightened MSR-Bitmap (hv-emsr-bitmap)", .flags = { @@ -1206,6 +1204,13 @@ static bool hyperv_feature_supported(CPUState *cs, int feature) uint32_t func, bits; int i, reg; +/* + * kvm_hyperv_properties needs to define at least one CPUID flag which + * must be used to detect the feature, it's hard to say whether it is + * supported or not otherwise. + */ +assert(kvm_hyperv_properties[feature].flags[0].func); + for (i = 0; i < ARRAY_SIZE(kvm_hyperv_properties[feature].flags); i++) { func = kvm_hyperv_properties[feature].flags[i].func; @@ -3388,13 +3393,11 @@ static int kvm_put_msrs(X86CPU *cpu, int level) kvm_msr_entry_add(cpu, HV_X64_MSR_TSC_EMULATION_STATUS, env->msr_hv_tsc_emulation_status); } -#ifdef CONFIG_SYNDBG if (hyperv_feat_enabled(cpu, HYPERV_FEAT_SYNDBG) && has_msr_hv_syndbg_options) { kvm_msr_entry_add(cpu, HV_X64_MSR_SYNDBG_OPTIONS, hyperv_syndbg_query_options()); } -#endif } if (hyperv_feat_enabled(cpu, HYPERV_FEAT_VAPIC)) { kvm_msr_entry_add(cpu, HV_X64_MSR_APIC_ASSIST_PAGE, -- 2.43.2
Re: [PATCH] target/i386/kvm: call kvm_put_vcpu_events() before kvm_put_nested_state()
As I'm the addressee of the ping for some reason ... :-) the fix looks good to me but I'm not sure about all the consequences of moving kvm_put_vcpu_events() to an earlier stage. Max, Paolo, please take a look! Eiichi Tsukata writes: > Ping. > >> On Nov 8, 2023, at 10:12, Eiichi Tsukata wrote: >> >> Hi all, appreciate any comments or feedbacks on the patch. >> >> Thanks, >> Eiichi >> >>> On Nov 1, 2023, at 23:04, Vitaly Kuznetsov wrote: >>> >>> Eiichi Tsukata writes: >>> >>>> FYI: The EINVAL in vmx_set_nested_state() is caused by the following >>>> condition: >>>> * vcpu->arch.hflags == 0 >>>> * kvm_state->hdr.vmx.smm.flags == KVM_STATE_NESTED_SMM_VMXON >>> >>> This is a weird state indeed, >>> >>> 'vcpu->arch.hflags == 0' means we're not in SMM and not in guest mode >>> but kvm_state->hdr.vmx.smm.flags == KVM_STATE_NESTED_SMM_VMXON is a >>> reflection of vmx->nested.smm.vmxon (see >>> vmx_get_nested_state()). vmx->nested.smm.vmxon gets set (conditioally) >>> in vmx_enter_smm() and gets cleared in vmx_leave_smm() which means the >>> vCPU must be in SMM to have it set. >>> >>> In case the vCPU is in SMM upon migration, HF_SMM_MASK must be set from >>> kvm_vcpu_ioctl_x86_set_vcpu_events() -> kvm_smm_changed() but QEMU's >>> kvm_put_vcpu_events() calls kvm_put_nested_state() _before_ >>> kvm_put_vcpu_events(). This can explain "vcpu->arch.hflags == 0". >>> >>> Paolo, Max, any idea how this is supposed to work? >>> >>> -- >>> Vitaly >>> >> > -- Vitaly
[PATCH] docs/system: Add recommendations to Hyper-V enlightenments doc
While hyperv.rst already has all currently implemented Hyper-V enlightenments documented, it may be unclear what is the recommended set to achieve the best result. Add the corresponding section to the doc. Signed-off-by: Vitaly Kuznetsov --- docs/system/i386/hyperv.rst | 30 ++ 1 file changed, 30 insertions(+) diff --git a/docs/system/i386/hyperv.rst b/docs/system/i386/hyperv.rst index 2505dc4c86e0..1c7c4a3981ea 100644 --- a/docs/system/i386/hyperv.rst +++ b/docs/system/i386/hyperv.rst @@ -278,6 +278,36 @@ Supplementary features feature alters this behavior and only allows the guest to use exposed Hyper-V enlightenments. +Recommendations +--- + +To achieve the best performance of Windows and Hyper-V guests and unless there +are any specific requirements (e.g. migration to older QEMU/KVM versions, +emulating specific Hyper-V version, ...), it is recommended to enable all +currently implemented Hyper-V enlightenments with the following exceptions: + +- ``hv-syndbg``, ``hv-passthrough``, ``hv-enforce-cpuid`` should not be enabled + in production configurations as these are debugging/development features. +- ``hv-reset`` can be avoided as modern Hyper-V versions don't expose it. +- ``hv-evmcs`` can (and should) be enabled on Intel CPUs only. While the feature + is only used in nested configurations (Hyper-V, WSL2), enabling it for regular + Windows guests should not have any negative effects. +- ``hv-no-nonarch-coresharing`` must only be enabled if vCPUs are properly pinned + so no non-architectural core sharing is possible. +- ``hv-vendor-id``, ``hv-version-id-build``, ``hv-version-id-major``, + ``hv-version-id-minor``, ``hv-version-id-spack``, ``hv-version-id-sbranch``, + ``hv-version-id-snumber`` can be left unchanged, guests are not supposed to + behave differently when different Hyper-V version is presented to them. +- ``hv-crash`` must only be enabled if the crash information is consumed via + QAPI by higher levels of the virtualization stack. Enabling this feature + effectively prevents Windows from creating dumps upon crashes. +- ``hv-reenlightenment`` can only be used on hardware which supports TSC + scaling or when guest migration is not needed. +- ``hv-spinlocks`` should be set to e.g. 0xfff when host CPUs are overcommited + (meaning there are other scheduled tasks or guests) and can be left unchanged + from the default value (0x) otherwise. +- ``hv-avic``/``hv-apicv`` should not be enabled if the hardware does not + support APIC virtualization (Intel APICv, AMD AVIC). Useful links -- 2.41.0
[PATCH RESEND v2 2/2] i386: Exclude 'hv-syndbg' from 'hv-passthrough'
Windows with Hyper-V role enabled doesn't boot with 'hv-passthrough' when no debugger is configured, this significantly limits the usefulness of the feature as there's no support for subtracting Hyper-V features from CPU flags at this moment (e.g. "-cpu host,hv-passthrough,-hv-syndbg" does not work). While this is also theoretically fixable, 'hv-syndbg' is likely very special and unneeded in the default set. Genuine Hyper-V doesn't seem to enable it either. Introduce 'skip_passthrough' flag to 'kvm_hyperv_properties' and use it as one-off to skip 'hv-syndbg' when enabling features in 'hv-passthrough' mode. Note, "-cpu host,hv-passthrough,hv-syndbg" can still be used if needed. As both 'hv-passthrough' and 'hv-syndbg' are debug features, the change should not have any effect on production environments. Signed-off-by: Vitaly Kuznetsov --- docs/system/i386/hyperv.rst | 13 + target/i386/kvm/kvm.c | 7 +-- 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/docs/system/i386/hyperv.rst b/docs/system/i386/hyperv.rst index 2505dc4c86e0..009947e39141 100644 --- a/docs/system/i386/hyperv.rst +++ b/docs/system/i386/hyperv.rst @@ -262,14 +262,19 @@ Supplementary features ``hv-passthrough`` In some cases (e.g. during development) it may make sense to use QEMU in 'pass-through' mode and give Windows guests all enlightenments currently - supported by KVM. This pass-through mode is enabled by "hv-passthrough" CPU - flag. + supported by KVM. Note: ``hv-passthrough`` flag only enables enlightenments which are known to QEMU (have corresponding 'hv-' flag) and copies ``hv-spinlocks`` and ``hv-vendor-id`` values from KVM to QEMU. ``hv-passthrough`` overrides all other 'hv-' settings on - the command line. Also, enabling this flag effectively prevents migration as the - list of enabled enlightenments may differ between target and destination hosts. + the command line. + + Note: ``hv-passthrough`` does not enable ``hv-syndbg`` which can prevent certain + Windows guests from booting when used without proper configuration. If needed, + ``hv-syndbg`` can be enabled additionally. + + Note: ``hv-passthrough`` effectively prevents migration as the list of enabled + enlightenments may differ between target and destination hosts. ``hv-enforce-cpuid`` By default, KVM allows the guest to use all currently supported Hyper-V diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 2fcb1f6673d8..0c745562b667 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -823,6 +823,7 @@ static struct { uint32_t bits; } flags[2]; uint64_t dependencies; +bool skip_passthrough; } kvm_hyperv_properties[] = { [HYPERV_FEAT_RELAXED] = { .desc = "relaxed timing (hv-relaxed)", @@ -951,7 +952,8 @@ static struct { {.func = HV_CPUID_FEATURES, .reg = R_EDX, .bits = HV_FEATURE_DEBUG_MSRS_AVAILABLE} }, -.dependencies = BIT(HYPERV_FEAT_SYNIC) | BIT(HYPERV_FEAT_RELAXED) +.dependencies = BIT(HYPERV_FEAT_SYNIC) | BIT(HYPERV_FEAT_RELAXED), +.skip_passthrough = true, }, [HYPERV_FEAT_MSR_BITMAP] = { .desc = "enlightened MSR-Bitmap (hv-emsr-bitmap)", @@ -1360,7 +1362,8 @@ bool kvm_hyperv_expand_features(X86CPU *cpu, Error **errp) * hv_build_cpuid_leaf() uses this info to build guest CPUIDs. */ for (feat = 0; feat < ARRAY_SIZE(kvm_hyperv_properties); feat++) { -if (hyperv_feature_supported(cs, feat)) { +if (hyperv_feature_supported(cs, feat) && +!kvm_hyperv_properties[feat].skip_passthrough) { cpu->hyperv_features |= BIT(feat); } } -- 2.41.0
[PATCH RESEND v2 0/2] i386: Fix Hyper-V Gen1 guests stuck on boot with 'hv-passthrough'
Changes since v1/v1 RESEND: - No changes. Hyper-V Gen1 guests are getting stuck on boot when 'hv-passthrough' is used. While 'hv-passthrough' is a debug only feature, this significantly limit its usefullness. While debugging the problem, I found that there are two loosely connected issues: - 'hv-passthrough' enables 'hv-syndbg' and this is undesired. - 'hv-syndbg's support by KVM is detected incorrectly when !CONFIG_SYNDBG. Fix both issues; exclude 'hv-syndbg' from 'hv-passthrough' and don't allow to turn on 'hv-syndbg' for !CONFIG_SYNDBG builds. Vitaly Kuznetsov (2): i386: Fix conditional CONFIG_SYNDBG enablement i386: Exclude 'hv-syndbg' from 'hv-passthrough' docs/system/i386/hyperv.rst | 13 + target/i386/cpu.c | 2 ++ target/i386/kvm/kvm.c | 18 -- 3 files changed, 23 insertions(+), 10 deletions(-) -- 2.41.0
[PATCH RESEND v2 1/2] i386: Fix conditional CONFIG_SYNDBG enablement
Putting HYPERV_FEAT_SYNDBG entry under "#ifdef CONFIG_SYNDBG" in 'kvm_hyperv_properties' array is wrong: as HYPERV_FEAT_SYNDBG is not the highest feature number, the result is an empty (zeroed) entry in the array (and not a skipped entry!). hyperv_feature_supported() is designed to check that all CPUID bits are set but for a zeroed feature in 'kvm_hyperv_properties' it returns 'true' so QEMU considers HYPERV_FEAT_SYNDBG as always supported, regardless of whether KVM host actually supports it. To fix the issue, leave HYPERV_FEAT_SYNDBG's definition in 'kvm_hyperv_properties' array, there's nothing wrong in having it defined even when 'CONFIG_SYNDBG' is not set. Instead, put "hv-syndbg" CPU property under '#ifdef CONFIG_SYNDBG' to alter the existing behavior when the flag is silently skipped in !CONFIG_SYNDBG builds. Leave an 'assert' sentinel in hyperv_feature_supported() making sure there are no 'holes' or improperly defined features in 'kvm_hyperv_properties'. Fixes: d8701185f40c ("hw: hyperv: Initial commit for Synthetic Debugging device") Signed-off-by: Vitaly Kuznetsov --- target/i386/cpu.c | 2 ++ target/i386/kvm/kvm.c | 11 +++ 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 358d9c0a655a..f5fac3744173 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -7842,8 +7842,10 @@ static Property x86_cpu_properties[] = { HYPERV_FEAT_TLBFLUSH_DIRECT, 0), DEFINE_PROP_ON_OFF_AUTO("hv-no-nonarch-coresharing", X86CPU, hyperv_no_nonarch_cs, ON_OFF_AUTO_OFF), +#ifdef CONFIG_SYNDBG DEFINE_PROP_BIT64("hv-syndbg", X86CPU, hyperv_features, HYPERV_FEAT_SYNDBG, 0), +#endif DEFINE_PROP_BOOL("hv-passthrough", X86CPU, hyperv_passthrough, false), DEFINE_PROP_BOOL("hv-enforce-cpuid", X86CPU, hyperv_enforce_cpuid, false), diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 11b8177eff21..2fcb1f6673d8 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -945,7 +945,6 @@ static struct { .bits = HV_DEPRECATING_AEOI_RECOMMENDED} } }, -#ifdef CONFIG_SYNDBG [HYPERV_FEAT_SYNDBG] = { .desc = "Enable synthetic kernel debugger channel (hv-syndbg)", .flags = { @@ -954,7 +953,6 @@ static struct { }, .dependencies = BIT(HYPERV_FEAT_SYNIC) | BIT(HYPERV_FEAT_RELAXED) }, -#endif [HYPERV_FEAT_MSR_BITMAP] = { .desc = "enlightened MSR-Bitmap (hv-emsr-bitmap)", .flags = { @@ -1206,6 +1204,13 @@ static bool hyperv_feature_supported(CPUState *cs, int feature) uint32_t func, bits; int i, reg; +/* + * kvm_hyperv_properties needs to define at least one CPUID flag which + * must be used to detect the feature, it's hard to say whether it is + * supported or not otherwise. + */ +assert(kvm_hyperv_properties[feature].flags[0].func); + for (i = 0; i < ARRAY_SIZE(kvm_hyperv_properties[feature].flags); i++) { func = kvm_hyperv_properties[feature].flags[i].func; @@ -3391,13 +3396,11 @@ static int kvm_put_msrs(X86CPU *cpu, int level) kvm_msr_entry_add(cpu, HV_X64_MSR_TSC_EMULATION_STATUS, env->msr_hv_tsc_emulation_status); } -#ifdef CONFIG_SYNDBG if (hyperv_feat_enabled(cpu, HYPERV_FEAT_SYNDBG) && has_msr_hv_syndbg_options) { kvm_msr_entry_add(cpu, HV_X64_MSR_SYNDBG_OPTIONS, hyperv_syndbg_query_options()); } -#endif } if (hyperv_feat_enabled(cpu, HYPERV_FEAT_VAPIC)) { kvm_msr_entry_add(cpu, HV_X64_MSR_APIC_ASSIST_PAGE, -- 2.41.0
Re: [PATCH] target/i386/kvm: call kvm_put_vcpu_events() before kvm_put_nested_state()
Eiichi Tsukata writes: > FYI: The EINVAL in vmx_set_nested_state() is caused by the following > condition: > * vcpu->arch.hflags == 0 > * kvm_state->hdr.vmx.smm.flags == KVM_STATE_NESTED_SMM_VMXON This is a weird state indeed, 'vcpu->arch.hflags == 0' means we're not in SMM and not in guest mode but kvm_state->hdr.vmx.smm.flags == KVM_STATE_NESTED_SMM_VMXON is a reflection of vmx->nested.smm.vmxon (see vmx_get_nested_state()). vmx->nested.smm.vmxon gets set (conditioally) in vmx_enter_smm() and gets cleared in vmx_leave_smm() which means the vCPU must be in SMM to have it set. In case the vCPU is in SMM upon migration, HF_SMM_MASK must be set from kvm_vcpu_ioctl_x86_set_vcpu_events() -> kvm_smm_changed() but QEMU's kvm_put_vcpu_events() calls kvm_put_nested_state() _before_ kvm_put_vcpu_events(). This can explain "vcpu->arch.hflags == 0". Paolo, Max, any idea how this is supposed to work? -- Vitaly
Re: [PATCH] target/i386/kvm: call kvm_put_vcpu_events() before kvm_put_nested_state()
Cc'ing Max :-) At first glance the condition in vmx_set_nested_state() is correct so I guess we either have a stale KVM_STATE_NESTED_RUN_PENDING when in SMM or stale smm.flags when outside of it... Philippe Mathieu-Daudé writes: > Cc'ing Vitaly. > > On 26/10/23 07:49, Eiichi Tsukata wrote: >> Hi all, >> >> Here is additional details on the issue. >> >> We've found this issue when testing Windows Virtual Secure Mode (VSM) VMs. >> We sometimes saw live migration failures of VSM-enabled VMs. It turned >> out that the issue happens during live migration when VMs change boot related >> EFI variables (ex: BootOrder, Boot0001). >> After some debugging, I've found the race I mentioned in the commit message. >> >> Symptom >> === >> >> When it happnes with the latest Qemu which has commit >> https://github.com/qemu/qemu/commit/7191f24c7fcfbc1216d09 >> Qemu shows the following error message on destination. >> >>qemu-system-x86_64: Failed to put registers after init: Invalid argument >> >> If it happens with older Qemu which doesn't have the commit, then we see >> CPU dump something like this: >> >>KVM internal error. Suberror: 3 >>extra data[0]: 0x8b0e >>extra data[1]: 0x0031 >>extra data[2]: 0x0683 >>extra data[3]: 0x7f809000 >>extra data[4]: 0x0026 >>RAX= RBX= RCX= >> RDX=0f61 >>RSI= RDI= RBP= >> RSP= >>R8 = R9 = R10= >> R11= >>R12= R13= R14= >> R15= >>RIP=fff0 RFL=00010002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 >>ES =0020 00c09300 DPL=0 DS [-WA] >>CS =0038 00a09b00 DPL=0 CS64 [-RA] >>SS =0020 00c09300 DPL=0 DS [-WA] >>DS =0020 00c09300 DPL=0 DS [-WA] >>FS =0020 00c09300 DPL=0 DS [-WA] >>GS =0020 00c09300 DPL=0 DS [-WA] >>LDT= 00c0 >>TR =0040 7f7df050 00068fff 00808b00 DPL=0 TSS64-busy >>GDT= 7f7df000 004f >>IDT= 7f836000 01ff >>CR0=80010033 CR2=fff0 CR3=7f809000 CR4=0668 >>DR0= DR1= DR2= >> DR3=DR6=0ff0 DR7=0400 >>EFER=0d00 >>Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? >> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? >> ?? ?? ?? >> >> In the above dump, CR3 is pointing to SMRAM region though SMM=0. >> >> Repro >> = >> >> Repro step is pretty simple. >> >> * Run SMM enabled Linux guest with secure boot enabled OVMF. >> * Run the following script in the guest. >> >>/usr/libexec/qemu-kvm & >>while true >>do >> efibootmgr -n 1 >>done >> >> * Do live migration >> >> On my environment, live migration fails in 20%. >> >> VMX specific >> >> >> This issue is VMX sepcific and SVM is not affected as the validation >> in svm_set_nested_state() is a bit different from VMX one. >> >> VMX: >> >>static int vmx_set_nested_state(struct kvm_vcpu *vcpu, >>struct kvm_nested_state __user >> *user_kvm_nested_state, >>struct kvm_nested_state *kvm_state) >>{ >>.. /* * SMM temporarily disables VMX, so we cannot >> be in guest mode, >> * nor can VMLAUNCH/VMRESUME be pending. Outside SMM, SMM flags >> * must be zero. >> */ if (is_smm(vcpu) ? >> (kvm_state->flags & >> (KVM_STATE_NESTED_GUEST_MODE | >> KVM_STATE_NESTED_RUN_PENDING)) >> : kvm_state->hdr.vmx.smm.flags) >> return -EINVAL; >>.. >> >> SVM: >> >>static int svm_set_nested_state(struct kvm_vcpu *vcpu, >>struct kvm_nested_state __user >> *user_kvm_nested_state, >>struct kvm_nested_state *kvm_state) >>{ >>.. /* SMM temporarily disables SVM, so we cannot be in guest >> mode. */ if (is_smm(vcpu) && (kvm_state->flags & >> KVM_STATE_NESTED_GUEST_MODE)) >> return -EINVAL; >>.. >> >> Thanks, >> >> Eiichi >> >>> On Oct 26, 2023, at 14:42, Eiichi Tsukata >>> wrote: >>> >>> kvm_put_vcpu_events() needs to be called before kvm_put_nested_state() >>> because vCPU's hflag is referred in KVM vmx_get_nested_state() >>> validation. Otherwise kvm_put_nested_state() can fail with -EINVAL when >>> a vCPU is in VMX operation and enters SMM mode. This
[PATCH RESEND 1/2] i386: Fix conditional CONFIG_SYNDBG enablement
Putting HYPERV_FEAT_SYNDBG entry under "#ifdef CONFIG_SYNDBG" in 'kvm_hyperv_properties' array is wrong: as HYPERV_FEAT_SYNDBG is not the highest feature number, the result is an empty (zeroed) entry in the array (and not a skipped entry!). hyperv_feature_supported() is designed to check that all CPUID bits are set but for a zeroed feature in 'kvm_hyperv_properties' it returns 'true' so QEMU considers HYPERV_FEAT_SYNDBG as always supported, regardless of whether KVM host actually supports it. To fix the issue, leave HYPERV_FEAT_SYNDBG's definition in 'kvm_hyperv_properties' array, there's nothing wrong in having it defined even when 'CONFIG_SYNDBG' is not set. Instead, put "hv-syndbg" CPU property under '#ifdef CONFIG_SYNDBG' to alter the existing behavior when the flag is silently skipped in !CONFIG_SYNDBG builds. Leave an 'assert' sentinel in hyperv_feature_supported() making sure there are no 'holes' or improperly defined features in 'kvm_hyperv_properties'. Fixes: d8701185f40c ("hw: hyperv: Initial commit for Synthetic Debugging device") Signed-off-by: Vitaly Kuznetsov --- target/i386/cpu.c | 2 ++ target/i386/kvm/kvm.c | 11 +++ 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 2589c8e9294a..01c7e8414408 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -7840,8 +7840,10 @@ static Property x86_cpu_properties[] = { HYPERV_FEAT_TLBFLUSH_DIRECT, 0), DEFINE_PROP_ON_OFF_AUTO("hv-no-nonarch-coresharing", X86CPU, hyperv_no_nonarch_cs, ON_OFF_AUTO_OFF), +#ifdef CONFIG_SYNDBG DEFINE_PROP_BIT64("hv-syndbg", X86CPU, hyperv_features, HYPERV_FEAT_SYNDBG, 0), +#endif DEFINE_PROP_BOOL("hv-passthrough", X86CPU, hyperv_passthrough, false), DEFINE_PROP_BOOL("hv-enforce-cpuid", X86CPU, hyperv_enforce_cpuid, false), diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index af101fcdf6ff..51b381a2fbbc 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -993,7 +993,6 @@ static struct { .bits = HV_DEPRECATING_AEOI_RECOMMENDED} } }, -#ifdef CONFIG_SYNDBG [HYPERV_FEAT_SYNDBG] = { .desc = "Enable synthetic kernel debugger channel (hv-syndbg)", .flags = { @@ -1002,7 +1001,6 @@ static struct { }, .dependencies = BIT(HYPERV_FEAT_SYNIC) | BIT(HYPERV_FEAT_RELAXED) }, -#endif [HYPERV_FEAT_MSR_BITMAP] = { .desc = "enlightened MSR-Bitmap (hv-emsr-bitmap)", .flags = { @@ -1254,6 +1252,13 @@ static bool hyperv_feature_supported(CPUState *cs, int feature) uint32_t func, bits; int i, reg; +/* + * kvm_hyperv_properties needs to define at least one CPUID flag which + * must be used to detect the feature, it's hard to say whether it is + * supported or not otherwise. + */ +assert(kvm_hyperv_properties[feature].flags[0].func); + for (i = 0; i < ARRAY_SIZE(kvm_hyperv_properties[feature].flags); i++) { func = kvm_hyperv_properties[feature].flags[i].func; @@ -3483,13 +3488,11 @@ static int kvm_put_msrs(X86CPU *cpu, int level) kvm_msr_entry_add(cpu, HV_X64_MSR_TSC_EMULATION_STATUS, env->msr_hv_tsc_emulation_status); } -#ifdef CONFIG_SYNDBG if (hyperv_feat_enabled(cpu, HYPERV_FEAT_SYNDBG) && has_msr_hv_syndbg_options) { kvm_msr_entry_add(cpu, HV_X64_MSR_SYNDBG_OPTIONS, hyperv_syndbg_query_options()); } -#endif } if (hyperv_feat_enabled(cpu, HYPERV_FEAT_VAPIC)) { kvm_msr_entry_add(cpu, HV_X64_MSR_APIC_ASSIST_PAGE, -- 2.41.0
[PATCH RESEND 0/2] i386: Fix Hyper-V Gen1 guests stuck on boot with 'hv-passthrough'
Hyper-V Gen1 guests are getting stuck on boot when 'hv-passthrough' is used. While 'hv-passthrough' is a debug only feature, this significantly limit its usefullness. While debugging the problem, I found that there are two loosely connected issues: - 'hv-passthrough' enables 'hv-syndbg' and this is undesired. - 'hv-syndbg's support by KVM is detected incorrectly when !CONFIG_SYNDBG. Fix both issues; exclude 'hv-syndbg' from 'hv-passthrough' and don't allow to turn on 'hv-syndbg' for !CONFIG_SYNDBG builds. Vitaly Kuznetsov (2): i386: Fix conditional CONFIG_SYNDBG enablement i386: Exclude 'hv-syndbg' from 'hv-passthrough' docs/system/i386/hyperv.rst | 13 + target/i386/cpu.c | 2 ++ target/i386/kvm/kvm.c | 18 -- 3 files changed, 23 insertions(+), 10 deletions(-) -- 2.41.0
[PATCH RESEND 2/2] i386: Exclude 'hv-syndbg' from 'hv-passthrough'
Windows with Hyper-V role enabled doesn't boot with 'hv-passthrough' when no debugger is configured, this significantly limits the usefulness of the feature as there's no support for subtracting Hyper-V features from CPU flags at this moment (e.g. "-cpu host,hv-passthrough,-hv-syndbg" does not work). While this is also theoretically fixable, 'hv-syndbg' is likely very special and unneeded in the default set. Genuine Hyper-V doesn't seem to enable it either. Introduce 'skip_passthrough' flag to 'kvm_hyperv_properties' and use it as one-off to skip 'hv-syndbg' when enabling features in 'hv-passthrough' mode. Note, "-cpu host,hv-passthrough,hv-syndbg" can still be used if needed. As both 'hv-passthrough' and 'hv-syndbg' are debug features, the change should not have any effect on production environments. Signed-off-by: Vitaly Kuznetsov --- docs/system/i386/hyperv.rst | 13 + target/i386/kvm/kvm.c | 7 +-- 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/docs/system/i386/hyperv.rst b/docs/system/i386/hyperv.rst index 2505dc4c86e0..009947e39141 100644 --- a/docs/system/i386/hyperv.rst +++ b/docs/system/i386/hyperv.rst @@ -262,14 +262,19 @@ Supplementary features ``hv-passthrough`` In some cases (e.g. during development) it may make sense to use QEMU in 'pass-through' mode and give Windows guests all enlightenments currently - supported by KVM. This pass-through mode is enabled by "hv-passthrough" CPU - flag. + supported by KVM. Note: ``hv-passthrough`` flag only enables enlightenments which are known to QEMU (have corresponding 'hv-' flag) and copies ``hv-spinlocks`` and ``hv-vendor-id`` values from KVM to QEMU. ``hv-passthrough`` overrides all other 'hv-' settings on - the command line. Also, enabling this flag effectively prevents migration as the - list of enabled enlightenments may differ between target and destination hosts. + the command line. + + Note: ``hv-passthrough`` does not enable ``hv-syndbg`` which can prevent certain + Windows guests from booting when used without proper configuration. If needed, + ``hv-syndbg`` can be enabled additionally. + + Note: ``hv-passthrough`` effectively prevents migration as the list of enabled + enlightenments may differ between target and destination hosts. ``hv-enforce-cpuid`` By default, KVM allows the guest to use all currently supported Hyper-V diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 51b381a2fbbc..cfb24ba87df5 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -871,6 +871,7 @@ static struct { uint32_t bits; } flags[2]; uint64_t dependencies; +bool skip_passthrough; } kvm_hyperv_properties[] = { [HYPERV_FEAT_RELAXED] = { .desc = "relaxed timing (hv-relaxed)", @@ -999,7 +1000,8 @@ static struct { {.func = HV_CPUID_FEATURES, .reg = R_EDX, .bits = HV_FEATURE_DEBUG_MSRS_AVAILABLE} }, -.dependencies = BIT(HYPERV_FEAT_SYNIC) | BIT(HYPERV_FEAT_RELAXED) +.dependencies = BIT(HYPERV_FEAT_SYNIC) | BIT(HYPERV_FEAT_RELAXED), +.skip_passthrough = true, }, [HYPERV_FEAT_MSR_BITMAP] = { .desc = "enlightened MSR-Bitmap (hv-emsr-bitmap)", @@ -1408,7 +1410,8 @@ bool kvm_hyperv_expand_features(X86CPU *cpu, Error **errp) * hv_build_cpuid_leaf() uses this info to build guest CPUIDs. */ for (feat = 0; feat < ARRAY_SIZE(kvm_hyperv_properties); feat++) { -if (hyperv_feature_supported(cs, feat)) { +if (hyperv_feature_supported(cs, feat) && +!kvm_hyperv_properties[feat].skip_passthrough) { cpu->hyperv_features |= BIT(feat); } } -- 2.41.0
Re: [PATCH 0/2] i386: Fix Hyper-V Gen1 guests stuck on boot with 'hv-passthrough'
Vitaly Kuznetsov writes: > Vitaly Kuznetsov writes: > >> Vitaly Kuznetsov writes: >> >>> Hyper-V Gen1 guests are getting stuck on boot when 'hv-passthrough' is >>> used. While 'hv-passthrough' is a debug only feature, this significantly >>> limit its usefullness. While debugging the problem, I found that there are >>> two loosely connected issues: >>> - 'hv-passthrough' enables 'hv-syndbg' and this is undesired. >>> - 'hv-syndbg's support by KVM is detected incorrectly when !CONFIG_SYNDBG. >>> >>> Fix both issues; exclude 'hv-syndbg' from 'hv-passthrough' and don't allow >>> to turn on 'hv-syndbg' for !CONFIG_SYNDBG builds. >>> >>> Vitaly Kuznetsov (2): >>> i386: Fix conditional CONFIG_SYNDBG enablement >>> i386: Exclude 'hv-syndbg' from 'hv-passthrough' >>> >>> docs/system/i386/hyperv.rst | 13 + >>> target/i386/cpu.c | 2 ++ >>> target/i386/kvm/kvm.c | 18 -- >>> 3 files changed, 23 insertions(+), 10 deletions(-) > > Monthly ping) Turns out these patches were never merged and honestly I forgot about them myself. Will resend shortly. -- Vitaly
Re: [PATCH 0/2] i386: Fix Hyper-V Gen1 guests stuck on boot with 'hv-passthrough'
Vitaly Kuznetsov writes: > Vitaly Kuznetsov writes: > >> Hyper-V Gen1 guests are getting stuck on boot when 'hv-passthrough' is >> used. While 'hv-passthrough' is a debug only feature, this significantly >> limit its usefullness. While debugging the problem, I found that there are >> two loosely connected issues: >> - 'hv-passthrough' enables 'hv-syndbg' and this is undesired. >> - 'hv-syndbg's support by KVM is detected incorrectly when !CONFIG_SYNDBG. >> >> Fix both issues; exclude 'hv-syndbg' from 'hv-passthrough' and don't allow >> to turn on 'hv-syndbg' for !CONFIG_SYNDBG builds. >> >> Vitaly Kuznetsov (2): >> i386: Fix conditional CONFIG_SYNDBG enablement >> i386: Exclude 'hv-syndbg' from 'hv-passthrough' >> >> docs/system/i386/hyperv.rst | 13 + >> target/i386/cpu.c | 2 ++ >> target/i386/kvm/kvm.c | 18 -- >> 3 files changed, 23 insertions(+), 10 deletions(-) Monthly ping) -- Vitaly
Re: [PATCH 0/2] i386: Fix Hyper-V Gen1 guests stuck on boot with 'hv-passthrough'
Vitaly Kuznetsov writes: > Hyper-V Gen1 guests are getting stuck on boot when 'hv-passthrough' is > used. While 'hv-passthrough' is a debug only feature, this significantly > limit its usefullness. While debugging the problem, I found that there are > two loosely connected issues: > - 'hv-passthrough' enables 'hv-syndbg' and this is undesired. > - 'hv-syndbg's support by KVM is detected incorrectly when !CONFIG_SYNDBG. > > Fix both issues; exclude 'hv-syndbg' from 'hv-passthrough' and don't allow > to turn on 'hv-syndbg' for !CONFIG_SYNDBG builds. > > Vitaly Kuznetsov (2): > i386: Fix conditional CONFIG_SYNDBG enablement > i386: Exclude 'hv-syndbg' from 'hv-passthrough' > > docs/system/i386/hyperv.rst | 13 + > target/i386/cpu.c | 2 ++ > target/i386/kvm/kvm.c | 18 -- > 3 files changed, 23 insertions(+), 10 deletions(-) Ping) -- Vitaly
[PATCH 2/2] i386: Exclude 'hv-syndbg' from 'hv-passthrough'
Windows with Hyper-V role enabled doesn't boot with 'hv-passthrough' when no debugger is configured, this significantly limits the usefulness of the feature as there's no support for subtracting Hyper-V features from CPU flags at this moment (e.g. "-cpu host,hv-passthrough,-hv-syndbg" does not work). While this is also theoretically fixable, 'hv-syndbg' is likely very special and unneeded in the default set. Genuine Hyper-V doesn't seem to enable it either. Introduce 'skip_passthrough' flag to 'kvm_hyperv_properties' and use it as one-off to skip 'hv-syndbg' when enabling features in 'hv-passthrough' mode. Note, "-cpu host,hv-passthrough,hv-syndbg" can still be used if needed. As both 'hv-passthrough' and 'hv-syndbg' are debug features, the change should not have any effect on production environments. Signed-off-by: Vitaly Kuznetsov --- docs/system/i386/hyperv.rst | 13 + target/i386/kvm/kvm.c | 7 +-- 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/docs/system/i386/hyperv.rst b/docs/system/i386/hyperv.rst index 2505dc4c86e0..009947e39141 100644 --- a/docs/system/i386/hyperv.rst +++ b/docs/system/i386/hyperv.rst @@ -262,14 +262,19 @@ Supplementary features ``hv-passthrough`` In some cases (e.g. during development) it may make sense to use QEMU in 'pass-through' mode and give Windows guests all enlightenments currently - supported by KVM. This pass-through mode is enabled by "hv-passthrough" CPU - flag. + supported by KVM. Note: ``hv-passthrough`` flag only enables enlightenments which are known to QEMU (have corresponding 'hv-' flag) and copies ``hv-spinlocks`` and ``hv-vendor-id`` values from KVM to QEMU. ``hv-passthrough`` overrides all other 'hv-' settings on - the command line. Also, enabling this flag effectively prevents migration as the - list of enabled enlightenments may differ between target and destination hosts. + the command line. + + Note: ``hv-passthrough`` does not enable ``hv-syndbg`` which can prevent certain + Windows guests from booting when used without proper configuration. If needed, + ``hv-syndbg`` can be enabled additionally. + + Note: ``hv-passthrough`` effectively prevents migration as the list of enabled + enlightenments may differ between target and destination hosts. ``hv-enforce-cpuid`` By default, KVM allows the guest to use all currently supported Hyper-V diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 88c75f58f0a6..fbaaacf9877c 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -867,6 +867,7 @@ static struct { uint32_t bits; } flags[2]; uint64_t dependencies; +bool skip_passthrough; } kvm_hyperv_properties[] = { [HYPERV_FEAT_RELAXED] = { .desc = "relaxed timing (hv-relaxed)", @@ -995,7 +996,8 @@ static struct { {.func = HV_CPUID_FEATURES, .reg = R_EDX, .bits = HV_FEATURE_DEBUG_MSRS_AVAILABLE} }, -.dependencies = BIT(HYPERV_FEAT_SYNIC) | BIT(HYPERV_FEAT_RELAXED) +.dependencies = BIT(HYPERV_FEAT_SYNIC) | BIT(HYPERV_FEAT_RELAXED), +.skip_passthrough = true, }, [HYPERV_FEAT_MSR_BITMAP] = { .desc = "enlightened MSR-Bitmap (hv-emsr-bitmap)", @@ -1404,7 +1406,8 @@ bool kvm_hyperv_expand_features(X86CPU *cpu, Error **errp) * hv_build_cpuid_leaf() uses this info to build guest CPUIDs. */ for (feat = 0; feat < ARRAY_SIZE(kvm_hyperv_properties); feat++) { -if (hyperv_feature_supported(cs, feat)) { +if (hyperv_feature_supported(cs, feat) && +!kvm_hyperv_properties[feat].skip_passthrough) { cpu->hyperv_features |= BIT(feat); } } -- 2.40.1
[PATCH 0/2] i386: Fix Hyper-V Gen1 guests stuck on boot with 'hv-passthrough'
Hyper-V Gen1 guests are getting stuck on boot when 'hv-passthrough' is used. While 'hv-passthrough' is a debug only feature, this significantly limit its usefullness. While debugging the problem, I found that there are two loosely connected issues: - 'hv-passthrough' enables 'hv-syndbg' and this is undesired. - 'hv-syndbg's support by KVM is detected incorrectly when !CONFIG_SYNDBG. Fix both issues; exclude 'hv-syndbg' from 'hv-passthrough' and don't allow to turn on 'hv-syndbg' for !CONFIG_SYNDBG builds. Vitaly Kuznetsov (2): i386: Fix conditional CONFIG_SYNDBG enablement i386: Exclude 'hv-syndbg' from 'hv-passthrough' docs/system/i386/hyperv.rst | 13 + target/i386/cpu.c | 2 ++ target/i386/kvm/kvm.c | 18 -- 3 files changed, 23 insertions(+), 10 deletions(-) -- 2.40.1
[PATCH 1/2] i386: Fix conditional CONFIG_SYNDBG enablement
Putting HYPERV_FEAT_SYNDBG entry under "#ifdef CONFIG_SYNDBG" in 'kvm_hyperv_properties' array is wrong: as HYPERV_FEAT_SYNDBG is not the highest feature number, the result is an empty (zeroed) entry in the array (and not a skipped entry!). hyperv_feature_supported() is designed to check that all CPUID bits are set but for a zeroed feature in 'kvm_hyperv_properties' it returns 'true' so QEMU considers HYPERV_FEAT_SYNDBG as always supported, regardless of whether KVM host actually supports it. To fix the issue, leave HYPERV_FEAT_SYNDBG's definition in 'kvm_hyperv_properties' array, there's nothing wrong in having it defined even when 'CONFIG_SYNDBG' is not set. Instead, put "hv-syndbg" CPU property under '#ifdef CONFIG_SYNDBG' to alter the existing behavior when the flag is silently skipped in !CONFIG_SYNDBG builds. Leave an 'assert' sentinel in hyperv_feature_supported() making sure there are no 'holes' or improperly defined features in 'kvm_hyperv_properties'. Fixes: d8701185f40c ("hw: hyperv: Initial commit for Synthetic Debugging device") Signed-off-by: Vitaly Kuznetsov --- target/i386/cpu.c | 2 ++ target/i386/kvm/kvm.c | 11 +++ 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 1242bd541a53..caa207849e9a 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -7564,8 +7564,10 @@ static Property x86_cpu_properties[] = { HYPERV_FEAT_TLBFLUSH_DIRECT, 0), DEFINE_PROP_ON_OFF_AUTO("hv-no-nonarch-coresharing", X86CPU, hyperv_no_nonarch_cs, ON_OFF_AUTO_OFF), +#ifdef CONFIG_SYNDBG DEFINE_PROP_BIT64("hv-syndbg", X86CPU, hyperv_features, HYPERV_FEAT_SYNDBG, 0), +#endif DEFINE_PROP_BOOL("hv-passthrough", X86CPU, hyperv_passthrough, false), DEFINE_PROP_BOOL("hv-enforce-cpuid", X86CPU, hyperv_enforce_cpuid, false), diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index de531842f6b1..88c75f58f0a6 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -989,7 +989,6 @@ static struct { .bits = HV_DEPRECATING_AEOI_RECOMMENDED} } }, -#ifdef CONFIG_SYNDBG [HYPERV_FEAT_SYNDBG] = { .desc = "Enable synthetic kernel debugger channel (hv-syndbg)", .flags = { @@ -998,7 +997,6 @@ static struct { }, .dependencies = BIT(HYPERV_FEAT_SYNIC) | BIT(HYPERV_FEAT_RELAXED) }, -#endif [HYPERV_FEAT_MSR_BITMAP] = { .desc = "enlightened MSR-Bitmap (hv-emsr-bitmap)", .flags = { @@ -1250,6 +1248,13 @@ static bool hyperv_feature_supported(CPUState *cs, int feature) uint32_t func, bits; int i, reg; +/* + * kvm_hyperv_properties needs to define at least one CPUID flag which + * must be used to detect the feature, it's hard to say whether it is + * supported or not otherwise. + */ +assert(kvm_hyperv_properties[feature].flags[0].func); + for (i = 0; i < ARRAY_SIZE(kvm_hyperv_properties[feature].flags); i++) { func = kvm_hyperv_properties[feature].flags[i].func; @@ -3474,13 +3479,11 @@ static int kvm_put_msrs(X86CPU *cpu, int level) kvm_msr_entry_add(cpu, HV_X64_MSR_TSC_EMULATION_STATUS, env->msr_hv_tsc_emulation_status); } -#ifdef CONFIG_SYNDBG if (hyperv_feat_enabled(cpu, HYPERV_FEAT_SYNDBG) && has_msr_hv_syndbg_options) { kvm_msr_entry_add(cpu, HV_X64_MSR_SYNDBG_OPTIONS, hyperv_syndbg_query_options()); } -#endif } if (hyperv_feat_enabled(cpu, HYPERV_FEAT_VAPIC)) { kvm_msr_entry_add(cpu, HV_X64_MSR_APIC_ASSIST_PAGE, -- 2.40.1
Re: Expose support for HyperV features via QMP
Alex Bennée writes: > "manish.mishra" writes: > >> Hi Everyone, >> >> Checking if there is any feedback on this. > > I've expanded the CC list to some relevant maintainers and people who > have touched that code in case this was missed. > >> Thanks >> >> Manish Mishra >> >> On 31/01/23 8:17 pm, manish.mishra wrote: >> >> Hi Everyone, >> I hope everyone is doing great. We wanted to check why we do not expose >> support for HyperV features in >> Qemu similar to what we do for normal CPU features via query-cpu-defs or >> cpu-model-expansion QMP >> commands. This support is required for live migration with HyperV features >> as hyperv passthrough is not >> an option. If users had knowledge of what features are supported by source >> and destination, VM can be >> started with an intersection of features supported by both source and >> destination. >> If there is no specific reason for not doing this, does it make sense to >> add a new QMP which expose >> support (internally also validating with KVM or KVM_GET_SUPPORTED_HV_CPUID >> ioctl) for HyperV >> features. >> Apologies in advance if i misunderstood something. >> Thanks for Ccing me. Hyper-V features should appear in QMP since commit 071ce4b03becf9e2df6b758fde9609be8ddf56f1 Author: Vitaly Kuznetsov Date: Tue Jun 8 14:08:13 2021 +0200 i386: expand Hyper-V features during CPU feature expansion time also, the support for Hypre-V feature discovery was just added to libvirt: 903ea9370d qemu_capabilities: Report Hyper-V Enlightenments in domcapabilities 10f4784864 qemu_capabilities: Query for Hyper-V Enlightenments ff8731680b qemuMonitorJSONGetCPUModelExpansion: Introduce @hv_passthrough argument 7c12eb2397 qemuMonitorJSONMakeCPUModel: Introduce @hv_passthrough argument 7c1ecfd512 domain_capabilities: Expose Hyper-V Enlightenments 179e45d237 virDomainCapsEnumFormat: Retrun void a7789d9324 virDomainCapsEnumFormat: Switch to virXMLFormatElement() in case this is not enough, could you please elaborate on the use-case you have in mind? -- Vitaly
Re: [PATCH] target/i386/cpu: disable PERFCORE for AMD when cpu.pmu is off
Liang Yan writes: > With cpu.pmu=off, perfctr_core could still be seen in an AMD guest cpuid. > By further digging, I found cpu.perfctr_core did the trick. However, > considering the 'enable_pmu' in KVM could work on both Intel and AMD, > we may add AMD PMU control under 'enabe_pmu' in QEMU too. > > This change will overide the property 'perfctr_ctr' and change the AMD PMU > to off by default. > > Signed-off-by: Liang Yan > --- > target/i386/cpu.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c > index 22b681ca37..edf5413c90 100644 > --- a/target/i386/cpu.c > +++ b/target/i386/cpu.c > @@ -5706,6 +5706,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, > uint32_t count, > *ecx |= 1 << 1;/* CmpLegacy bit */ > } > } > + > +if (!cpu->enable_pmu) { > +*ecx &= ~CPUID_EXT3_PERFCORE; > +} > break; > case 0x8002: > case 0x8003: I may be missing something but my first impression is that this will make CPUID_EXT3_PERFCORE bit disappear when a !enable_pmu VM is migrated from an old QEMU (pre-patch) to a new one. If so, then additional precautions should be taking against that (e.g. tying the change to CPU/machine model versions, for example). -- Vitaly
Re: [PATCH] i386: Fix KVM_CAP_ADJUST_CLOCK capability check
Paolo Bonzini writes: > Hi, a similar patch is now in. > Indeed, commit c4ef867f2949bf2a2ae18a4e27cf1a34bbc8aecb Author: Ray Zhang Date: Thu Sep 22 18:05:23 2022 +0800 target/i386/kvm: fix kvmclock_current_nsec: Assertion `time.tsc_timestamp <= migration_tsc' failed solves the problem as well. -- Vitaly
Re: [PATCH] i386: Fix KVM_CAP_ADJUST_CLOCK capability check
Vitaly Kuznetsov writes: > Vitaly Kuznetsov writes: > >> KVM commit c68dc1b577ea ("KVM: x86: Report host tsc and realtime values in >> KVM_GET_CLOCK") broke migration of certain workloads, e.g. Win11 + WSL2 >> guest reboots immediately after migration. KVM, however, is not to >> blame this time. When KVM_CAP_ADJUST_CLOCK capability is checked, the >> result is all supported flags (which the above mentioned KVM commit >> enhanced) but kvm_has_adjust_clock_stable() wants it to be >> KVM_CLOCK_TSC_STABLE precisely. The result is that 'clock_is_reliable' >> is not set in vmstate and the saved clock reading is discarded in >> kvmclock_vm_state_change(). >> >> Signed-off-by: Vitaly Kuznetsov >> --- >> target/i386/kvm/kvm.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c >> index a1fd1f53791d..c33192a87dcb 100644 >> --- a/target/i386/kvm/kvm.c >> +++ b/target/i386/kvm/kvm.c >> @@ -157,7 +157,7 @@ bool kvm_has_adjust_clock_stable(void) >> { >> int ret = kvm_check_extension(kvm_state, KVM_CAP_ADJUST_CLOCK); >> >> -return (ret == KVM_CLOCK_TSC_STABLE); >> +return ret & KVM_CLOCK_TSC_STABLE; >> } >> >> bool kvm_has_adjust_clock(void) > > Ping) This issue seems to introduce major migration issues with KVM >= v5.16 Ping) -- Vitaly
Re: [PATCH] i386: Fix KVM_CAP_ADJUST_CLOCK capability check
Vitaly Kuznetsov writes: > KVM commit c68dc1b577ea ("KVM: x86: Report host tsc and realtime values in > KVM_GET_CLOCK") broke migration of certain workloads, e.g. Win11 + WSL2 > guest reboots immediately after migration. KVM, however, is not to > blame this time. When KVM_CAP_ADJUST_CLOCK capability is checked, the > result is all supported flags (which the above mentioned KVM commit > enhanced) but kvm_has_adjust_clock_stable() wants it to be > KVM_CLOCK_TSC_STABLE precisely. The result is that 'clock_is_reliable' > is not set in vmstate and the saved clock reading is discarded in > kvmclock_vm_state_change(). > > Signed-off-by: Vitaly Kuznetsov > --- > target/i386/kvm/kvm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c > index a1fd1f53791d..c33192a87dcb 100644 > --- a/target/i386/kvm/kvm.c > +++ b/target/i386/kvm/kvm.c > @@ -157,7 +157,7 @@ bool kvm_has_adjust_clock_stable(void) > { > int ret = kvm_check_extension(kvm_state, KVM_CAP_ADJUST_CLOCK); > > -return (ret == KVM_CLOCK_TSC_STABLE); > +return ret & KVM_CLOCK_TSC_STABLE; > } > > bool kvm_has_adjust_clock(void) Ping) This issue seems to introduce major migration issues with KVM >= v5.16 -- Vitaly
[PATCH] i386: Fix KVM_CAP_ADJUST_CLOCK capability check
KVM commit c68dc1b577ea ("KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK") broke migration of certain workloads, e.g. Win11 + WSL2 guest reboots immediately after migration. KVM, however, is not to blame this time. When KVM_CAP_ADJUST_CLOCK capability is checked, the result is all supported flags (which the above mentioned KVM commit enhanced) but kvm_has_adjust_clock_stable() wants it to be KVM_CLOCK_TSC_STABLE precisely. The result is that 'clock_is_reliable' is not set in vmstate and the saved clock reading is discarded in kvmclock_vm_state_change(). Signed-off-by: Vitaly Kuznetsov --- target/i386/kvm/kvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index a1fd1f53791d..c33192a87dcb 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -157,7 +157,7 @@ bool kvm_has_adjust_clock_stable(void) { int ret = kvm_check_extension(kvm_state, KVM_CAP_ADJUST_CLOCK); -return (ret == KVM_CLOCK_TSC_STABLE); +return ret & KVM_CLOCK_TSC_STABLE; } bool kvm_has_adjust_clock(void) -- 2.37.3
[PATCH v1 1/2] i386: reset KVM nested state upon CPU reset
Make sure env->nested_state is cleaned up when a vCPU is reset, it may be stale after an incoming migration, kvm_arch_put_registers() may end up failing or putting vCPU in a weird state. Reviewed-by: Maxim Levitsky Signed-off-by: Vitaly Kuznetsov --- target/i386/kvm/kvm.c | 37 +++-- 1 file changed, 27 insertions(+), 10 deletions(-) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index f148a6d52fa4..4f8dacc1d4b5 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -1695,6 +1695,30 @@ static void kvm_init_xsave(CPUX86State *env) env->xsave_buf_len); } +static void kvm_init_nested_state(CPUX86State *env) +{ +struct kvm_vmx_nested_state_hdr *vmx_hdr; +uint32_t size; + +if (!env->nested_state) { +return; +} + +size = env->nested_state->size; + +memset(env->nested_state, 0, size); +env->nested_state->size = size; + +if (cpu_has_vmx(env)) { +env->nested_state->format = KVM_STATE_NESTED_FORMAT_VMX; +vmx_hdr = >nested_state->hdr.vmx; +vmx_hdr->vmxon_pa = -1ull; +vmx_hdr->vmcs12_pa = -1ull; +} else if (cpu_has_svm(env)) { +env->nested_state->format = KVM_STATE_NESTED_FORMAT_SVM; +} +} + int kvm_arch_init_vcpu(CPUState *cs) { struct { @@ -2122,19 +2146,10 @@ int kvm_arch_init_vcpu(CPUState *cs) assert(max_nested_state_len >= offsetof(struct kvm_nested_state, data)); if (cpu_has_vmx(env) || cpu_has_svm(env)) { -struct kvm_vmx_nested_state_hdr *vmx_hdr; - env->nested_state = g_malloc0(max_nested_state_len); env->nested_state->size = max_nested_state_len; -if (cpu_has_vmx(env)) { -env->nested_state->format = KVM_STATE_NESTED_FORMAT_VMX; -vmx_hdr = >nested_state->hdr.vmx; -vmx_hdr->vmxon_pa = -1ull; -vmx_hdr->vmcs12_pa = -1ull; -} else { -env->nested_state->format = KVM_STATE_NESTED_FORMAT_SVM; -} +kvm_init_nested_state(env); } } @@ -2199,6 +2214,8 @@ void kvm_arch_reset_vcpu(X86CPU *cpu) /* enabled by default */ env->poll_control_msr = 1; +kvm_init_nested_state(env); + sev_es_set_reset_vector(CPU(cpu)); } -- 2.37.1
[PATCH v1 0/2] i386: KVM: Fix 'system_reset' failures when vCPU is in VMX root operation
Changes since RFC: - Call kvm_put_msr_feature_control() before kvm_put_sregs2() to achieve the same result [Paolo]. - Add Maxim's R-b to PATCH1. It was discovered that Windows 11 with WSL2 (Hyper-V) enabled guests fail to reboot when QEMU's 'system_reset' command is issued. The problem appears to be that KVM_SET_SREGS2 fails because zeroed CR4 register value doesn't pass vmx_is_valid_cr4() check in KVM as certain bits can't be zero while in VMX root operation (post-VMXON). kvm_arch_put_registers() does call kvm_put_nested_state() which is supposed to kick vCPU out of VMX root operation, however, it only does so after kvm_put_sregs2() and there's a good reason for that: 'real' nested state requires e.g. EFER.SVME to be set. The root cause of the issue seems to be that QEMU is doing quite a lot to forcefully reset a vCPU as KVM doesn't export kvm_vcpu_reset() (or, rather, it's super-set) yet. While all the numerous existing APIs for setting a vCPU state work fine for a newly created vCPU, using them for vCPU reset is a mess caused by various dependencies between different components of the state (VMX, SMM, MSRs, XCRs, CPUIDs, ...). It would've been possible to allow to set 'inconsistent' state and only validate it upon VCPU_RUN from the very beginning but that ship has long sailed for KVM. A new, dedicated API for vCPU reset is likely the way to go. Resolve the immediate issue by setting MSR_IA32_FEATURE_CONTROL before kvm_put_sregs2() (and kvm_put_nested_state()), this ensures vCPU gets kicked out of VMX root operation. Vitaly Kuznetsov (2): i386: reset KVM nested state upon CPU reset i386: do kvm_put_msr_feature_control() first thing when vCPU is reset target/i386/kvm/kvm.c | 54 +++ 1 file changed, 39 insertions(+), 15 deletions(-) -- 2.37.1
[PATCH v1 2/2] i386: do kvm_put_msr_feature_control() first thing when vCPU is reset
kvm_put_sregs2() fails to reset 'locked' CR4/CR0 bits upon vCPU reset when it is in VMX root operation. Do kvm_put_msr_feature_control() before kvm_put_sregs2() to (possibly) kick vCPU out of VMX root operation. It also seems logical to do kvm_put_msr_feature_control() before kvm_put_nested_state() and not after it, especially when 'real' nested state is set. Signed-off-by: Vitaly Kuznetsov --- target/i386/kvm/kvm.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 4f8dacc1d4b5..a1fd1f53791d 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -4529,6 +4529,18 @@ int kvm_arch_put_registers(CPUState *cpu, int level) assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu)); +/* + * Put MSR_IA32_FEATURE_CONTROL first, this ensures the VM gets out of VMX + * root operation upon vCPU reset. kvm_put_msr_feature_control() should also + * preceed kvm_put_nested_state() when 'real' nested state is set. + */ +if (level >= KVM_PUT_RESET_STATE) { +ret = kvm_put_msr_feature_control(x86_cpu); +if (ret < 0) { +return ret; +} +} + /* must be before kvm_put_nested_state so that EFER.SVME is set */ ret = has_sregs2 ? kvm_put_sregs2(x86_cpu) : kvm_put_sregs(x86_cpu); if (ret < 0) { @@ -4540,11 +4552,6 @@ int kvm_arch_put_registers(CPUState *cpu, int level) if (ret < 0) { return ret; } - -ret = kvm_put_msr_feature_control(x86_cpu); -if (ret < 0) { -return ret; -} } if (level == KVM_PUT_FULL_STATE) { -- 2.37.1
Re: [PATCH RFC v1 2/2] i386: reorder kvm_put_sregs2() and kvm_put_nested_state() when vCPU is reset
Maxim Levitsky writes: > On Wed, 2022-08-10 at 16:00 +0200, Vitaly Kuznetsov wrote: >> Setting nested state upon migration needs to happen after kvm_put_sregs2() >> to e.g. have EFER.SVME set. This, however, doesn't work for vCPU reset: >> when vCPU is in VMX root operation, certain CR bits are locked and >> kvm_put_sregs2() may fail. As nested state is fully cleaned up upon >> vCPU reset (kvm_arch_reset_vcpu() -> kvm_init_nested_state()), calling >> kvm_put_nested_state() before kvm_put_sregs2() is OK, this will ensure >> that vCPU is *not* in VMX root opertaion. >> >> Signed-off-by: Vitaly Kuznetsov >> --- >> Â target/i386/kvm/kvm.c | 20 ++-- >> Â 1 file changed, 18 insertions(+), 2 deletions(-) >> >> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c >> index 4f8dacc1d4b5..73e3880fa57b 100644 >> --- a/target/i386/kvm/kvm.c >> +++ b/target/i386/kvm/kvm.c >> @@ -4529,18 +4529,34 @@ int kvm_arch_put_registers(CPUState *cpu, int level) >> Â >> assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu)); >> Â >> -Â Â Â /* must be before kvm_put_nested_state so that EFER.SVME is set */ >> +Â Â Â /* >> + * When resetting a vCPU, make sure to reset nested state first to >> + * e.g clear VMXON state and unlock certain CR4 bits. >> + */ >> +Â Â Â if (level == KVM_PUT_RESET_STATE) { >> +Â Â Â ret = kvm_put_nested_state(x86_cpu); >> +Â Â Â if (ret < 0) { >> +Â Â Â return ret; >> +Â Â Â } > > I should have mentioned this, I actually already debugged the same issue while > trying to reproduce the smm int window bug. > 100% my fault. > > I also share the same feeling that this might be yet another 'whack a mole' > and > break somewhere else, but overall it does make sense. This certainly *is* a 'whack a mole' and I'm sure there are other cases when one of calls in kvm_arch_put_registers() fails. We need to work on what's missing so we can expose kvm_vcpu_reset() to VMMs. > > > Reviewed-by: Maxim Levitsky > Thanks! -- Vitaly
[PATCH RFC v1 1/2] i386: reset KVM nested state upon CPU reset
Make sure env->nested_state is cleaned up when a vCPU is reset, it may be stale after an incoming migration, kvm_arch_put_registers() may end up failing or putting vCPU in a weird state. Signed-off-by: Vitaly Kuznetsov --- target/i386/kvm/kvm.c | 37 +++-- 1 file changed, 27 insertions(+), 10 deletions(-) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index f148a6d52fa4..4f8dacc1d4b5 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -1695,6 +1695,30 @@ static void kvm_init_xsave(CPUX86State *env) env->xsave_buf_len); } +static void kvm_init_nested_state(CPUX86State *env) +{ +struct kvm_vmx_nested_state_hdr *vmx_hdr; +uint32_t size; + +if (!env->nested_state) { +return; +} + +size = env->nested_state->size; + +memset(env->nested_state, 0, size); +env->nested_state->size = size; + +if (cpu_has_vmx(env)) { +env->nested_state->format = KVM_STATE_NESTED_FORMAT_VMX; +vmx_hdr = >nested_state->hdr.vmx; +vmx_hdr->vmxon_pa = -1ull; +vmx_hdr->vmcs12_pa = -1ull; +} else if (cpu_has_svm(env)) { +env->nested_state->format = KVM_STATE_NESTED_FORMAT_SVM; +} +} + int kvm_arch_init_vcpu(CPUState *cs) { struct { @@ -2122,19 +2146,10 @@ int kvm_arch_init_vcpu(CPUState *cs) assert(max_nested_state_len >= offsetof(struct kvm_nested_state, data)); if (cpu_has_vmx(env) || cpu_has_svm(env)) { -struct kvm_vmx_nested_state_hdr *vmx_hdr; - env->nested_state = g_malloc0(max_nested_state_len); env->nested_state->size = max_nested_state_len; -if (cpu_has_vmx(env)) { -env->nested_state->format = KVM_STATE_NESTED_FORMAT_VMX; -vmx_hdr = >nested_state->hdr.vmx; -vmx_hdr->vmxon_pa = -1ull; -vmx_hdr->vmcs12_pa = -1ull; -} else { -env->nested_state->format = KVM_STATE_NESTED_FORMAT_SVM; -} +kvm_init_nested_state(env); } } @@ -2199,6 +2214,8 @@ void kvm_arch_reset_vcpu(X86CPU *cpu) /* enabled by default */ env->poll_control_msr = 1; +kvm_init_nested_state(env); + sev_es_set_reset_vector(CPU(cpu)); } -- 2.37.1
[PATCH RFC v1 2/2] i386: reorder kvm_put_sregs2() and kvm_put_nested_state() when vCPU is reset
Setting nested state upon migration needs to happen after kvm_put_sregs2() to e.g. have EFER.SVME set. This, however, doesn't work for vCPU reset: when vCPU is in VMX root operation, certain CR bits are locked and kvm_put_sregs2() may fail. As nested state is fully cleaned up upon vCPU reset (kvm_arch_reset_vcpu() -> kvm_init_nested_state()), calling kvm_put_nested_state() before kvm_put_sregs2() is OK, this will ensure that vCPU is *not* in VMX root opertaion. Signed-off-by: Vitaly Kuznetsov --- target/i386/kvm/kvm.c | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 4f8dacc1d4b5..73e3880fa57b 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -4529,18 +4529,34 @@ int kvm_arch_put_registers(CPUState *cpu, int level) assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu)); -/* must be before kvm_put_nested_state so that EFER.SVME is set */ +/* + * When resetting a vCPU, make sure to reset nested state first to + * e.g clear VMXON state and unlock certain CR4 bits. + */ +if (level == KVM_PUT_RESET_STATE) { +ret = kvm_put_nested_state(x86_cpu); +if (ret < 0) { +return ret; +} +} + ret = has_sregs2 ? kvm_put_sregs2(x86_cpu) : kvm_put_sregs(x86_cpu); if (ret < 0) { return ret; } -if (level >= KVM_PUT_RESET_STATE) { +/* + * When putting full CPU state, kvm_put_nested_state() must happen after + * kvm_put_sregs{,2} so that e.g. EFER.SVME is already set. + */ +if (level == KVM_PUT_FULL_STATE) { ret = kvm_put_nested_state(x86_cpu); if (ret < 0) { return ret; } +} +if (level >= KVM_PUT_RESET_STATE) { ret = kvm_put_msr_feature_control(x86_cpu); if (ret < 0) { return ret; -- 2.37.1
[PATCH RFC v1 0/2] i386: KVM: Fix 'system_reset' failures when vCPU is in VMX root operation
It was discovered that Windows 11 with WSL2 (Hyper-V) enabled guests fail to reboot when QEMU's 'system_reset' command is issued. The problem appears to be that KVM_SET_SREGS2 fails because zeroed CR4 register value doesn't pass vmx_is_valid_cr4() check in KVM as certain bits can't be zero while in VMX root operation (post-VMXON). kvm_arch_put_registers() does call kvm_put_nested_state() which is supposed to kick vCPU out of VMX root operation, however, it only does so after kvm_put_sregs2() and there's a good reason for that: 'real' nested state requires e.g. EFER.SVME to be set. While swapping kvm_put_sregs2()/kvm_put_nested_state() order in kvm_arch_put_registers() can't be done in KVM_PUT_FULL_STATE case, doing it in KVM_PUT_RESET_STATE seems like a reasonable band aid. The root cause of the issue seems to be that QEMU is doing quite a lot to forcefully reset a vCPU as KVM doesn't export kvm_vcpu_reset() (or, rather, it's super-set) yet. While all the numerous existing APIs for setting a vCPU state work fine for a newly created vCPU, using them for vCPU reset is a mess caused by various dependencies between different components of the state (VMX, SMM, MSRs, XCRs, CPUIDs, ...). It would've been possible to allow to set 'inconsistent' state and only validate it upon VCPU_RUN from the very beginning but that ship has long sailed for KVM. A new, dedicated API for vCPU reset is likely the way to go. RFC part: the immediate issue could've probably been solved in KVM too by avoiding vmx_is_valid_cr4() check from __set_sregs2() and hoping that someone will check for the resulting inconsistency later. I don't quite like this option so I didn't explore it in depth. Vitaly Kuznetsov (2): i386: reset KVM nested state upon CPU reset i386: reorder kvm_put_sregs2() and kvm_put_nested_state() when vCPU is reset target/i386/kvm/kvm.c | 57 ++- 1 file changed, 45 insertions(+), 12 deletions(-) -- 2.37.1
[PATCH v4 6/6] i386: docs: Convert hyperv.txt to rST
rSTify docs/hyperv.txt and link it from docs/system/target-i386.rst. Signed-off-by: Vitaly Kuznetsov --- docs/hyperv.txt | 303 docs/system/i386/hyperv.rst | 288 ++ docs/system/target-i386.rst | 1 + 3 files changed, 289 insertions(+), 303 deletions(-) delete mode 100644 docs/hyperv.txt create mode 100644 docs/system/i386/hyperv.rst diff --git a/docs/hyperv.txt b/docs/hyperv.txt deleted file mode 100644 index 14a7f449ead9.. --- a/docs/hyperv.txt +++ /dev/null @@ -1,303 +0,0 @@ -Hyper-V Enlightenments -== - - -1. Description -=== -In some cases when implementing a hardware interface in software is slow, KVM -implements its own paravirtualized interfaces. This works well for Linux as -guest support for such features is added simultaneously with the feature itself. -It may, however, be hard-to-impossible to add support for these interfaces to -proprietary OSes, namely, Microsoft Windows. - -KVM on x86 implements Hyper-V Enlightenments for Windows guests. These features -make Windows and Hyper-V guests think they're running on top of a Hyper-V -compatible hypervisor and use Hyper-V specific features. - - -2. Setup -= -No Hyper-V enlightenments are enabled by default by either KVM or QEMU. In -QEMU, individual enlightenments can be enabled through CPU flags, e.g: - - qemu-system-x86_64 --enable-kvm --cpu host,hv_relaxed,hv_vpindex,hv_time, ... - -Sometimes there are dependencies between enlightenments, QEMU is supposed to -check that the supplied configuration is sane. - -When any set of the Hyper-V enlightenments is enabled, QEMU changes hypervisor -identification (CPUID 0x4000..0x400A) to Hyper-V. KVM identification -and features are kept in leaves 0x4100..0x4101. - - -3. Existing enlightenments -=== - -3.1. hv-relaxed - -This feature tells guest OS to disable watchdog timeouts as it is running on a -hypervisor. It is known that some Windows versions will do this even when they -see 'hypervisor' CPU flag. - -3.2. hv-vapic -== -Provides so-called VP Assist page MSR to guest allowing it to work with APIC -more efficiently. In particular, this enlightenment allows paravirtualized -(exit-less) EOI processing. - -3.3. hv-spinlocks=xxx -== -Enables paravirtualized spinlocks. The parameter indicates how many times -spinlock acquisition should be attempted before indicating the situation to the -hypervisor. A special value 0x indicates "never notify". - -3.4. hv-vpindex - -Provides HV_X64_MSR_VP_INDEX (0x4002) MSR to the guest which has Virtual -processor index information. This enlightenment makes sense in conjunction with -hv-synic, hv-stimer and other enlightenments which require the guest to know its -Virtual Processor indices (e.g. when VP index needs to be passed in a -hypercall). - -3.5. hv-runtime - -Provides HV_X64_MSR_VP_RUNTIME (0x4010) MSR to the guest. The MSR keeps the -virtual processor run time in 100ns units. This gives guest operating system an -idea of how much time was 'stolen' from it (when the virtual CPU was preempted -to perform some other work). - -3.6. hv-crash -== -Provides HV_X64_MSR_CRASH_P0..HV_X64_MSR_CRASH_P5 (0x4100..0x4105) and -HV_X64_MSR_CRASH_CTL (0x4105) MSRs to the guest. These MSRs are written to -by the guest when it crashes, HV_X64_MSR_CRASH_P0..HV_X64_MSR_CRASH_P5 MSRs -contain additional crash information. This information is outputted in QEMU log -and through QAPI. -Note: unlike under genuine Hyper-V, write to HV_X64_MSR_CRASH_CTL causes guest -to shutdown. This effectively blocks crash dump generation by Windows. - -3.7. hv-time -= -Enables two Hyper-V-specific clocksources available to the guest: MSR-based -Hyper-V clocksource (HV_X64_MSR_TIME_REF_COUNT, 0x4020) and Reference TSC -page (enabled via MSR HV_X64_MSR_REFERENCE_TSC, 0x4021). Both clocksources -are per-guest, Reference TSC page clocksource allows for exit-less time stamp -readings. Using this enlightenment leads to significant speedup of all timestamp -related operations. - -3.8. hv-synic -== -Enables Hyper-V Synthetic interrupt controller - an extension of a local APIC. -When enabled, this enlightenment provides additional communication facilities -to the guest: SynIC messages and Events. This is a pre-requisite for -implementing VMBus devices (not yet in QEMU). Additionally, this enlightenment -is needed to enable Hyper-V synthetic timers. SynIC is controlled through MSRs -HV_X64_MSR_SCONTROL..HV_X64_MSR_EOM (0x4080..0x4084) and -HV_X64_MSR_SINT0..HV_X64_MSR_SINT15 (0x4090..0x409F) - -Requires: hv-vpindex - -3.9. hv-stimer -=== -Enables Hyper-V synthetic timers. There are four synthetic timers per virtual -CPU controll
[PATCH v4 5/6] i386: Hyper-V Direct TLB flush hypercall
Hyper-V TLFS allows for L0 and L1 hypervisors to collaborate on L2's TLB flush hypercalls handling. With the correct setup, L2's TLB flush hypercalls can be handled by L0 directly, without the need to exit to L1. Signed-off-by: Vitaly Kuznetsov --- docs/hyperv.txt| 11 +++ target/i386/cpu.c | 2 ++ target/i386/cpu.h | 1 + target/i386/kvm/hyperv-proto.h | 1 + target/i386/kvm/kvm.c | 8 5 files changed, 23 insertions(+) diff --git a/docs/hyperv.txt b/docs/hyperv.txt index 4b132b1c941a..14a7f449ead9 100644 --- a/docs/hyperv.txt +++ b/docs/hyperv.txt @@ -262,6 +262,17 @@ Allow for extended GVA ranges to be passed to Hyper-V TLB flush hypercalls Requires: hv-tlbflush +3.25. hv-tlbflush-direct += +The enlightenment is nested specific, it targets Hyper-V on KVM guests. When +enabled, it allows L0 (KVM) to directly handle TLB flush hypercalls from L2 +guest without the need to exit to L1 (Hyper-V) hypervisor. While the feature is +supported for both VMX (Intel) and SVM (AMD), the VMX implementation requires +Enlightened VMCS ('hv-evmcs') feature to also be enabled. + +Requires: hv-vapic +Recommended: hv-evmcs (Intel) + 4. Supplementary features = diff --git a/target/i386/cpu.c b/target/i386/cpu.c index a5331e6140fc..dfbf5a65f92f 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6966,6 +6966,8 @@ static Property x86_cpu_properties[] = { HYPERV_FEAT_XMM_INPUT, 0), DEFINE_PROP_BIT64("hv-tlbflush-ext", X86CPU, hyperv_features, HYPERV_FEAT_TLBFLUSH_EXT, 0), +DEFINE_PROP_BIT64("hv-tlbflush-direct", X86CPU, hyperv_features, + HYPERV_FEAT_TLBFLUSH_DIRECT, 0), DEFINE_PROP_ON_OFF_AUTO("hv-no-nonarch-coresharing", X86CPU, hyperv_no_nonarch_cs, ON_OFF_AUTO_OFF), DEFINE_PROP_BIT64("hv-syndbg", X86CPU, hyperv_features, diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 5ff48257e513..82004b65b944 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1109,6 +1109,7 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w, #define HYPERV_FEAT_MSR_BITMAP 17 #define HYPERV_FEAT_XMM_INPUT 18 #define HYPERV_FEAT_TLBFLUSH_EXT19 +#define HYPERV_FEAT_TLBFLUSH_DIRECT 20 #ifndef HYPERV_SPINLOCK_NEVER_NOTIFY #define HYPERV_SPINLOCK_NEVER_NOTIFY 0x diff --git a/target/i386/kvm/hyperv-proto.h b/target/i386/kvm/hyperv-proto.h index c7854ed6d306..464fbf09e35a 100644 --- a/target/i386/kvm/hyperv-proto.h +++ b/target/i386/kvm/hyperv-proto.h @@ -90,6 +90,7 @@ /* * HV_CPUID_NESTED_FEATURES.EAX bits */ +#define HV_NESTED_DIRECT_FLUSH (1u << 17) #define HV_NESTED_MSR_BITMAP(1u << 19) /* diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 7bd1b4396e8e..8b58bfd0fd4a 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -995,6 +995,14 @@ static struct { }, .dependencies = BIT(HYPERV_FEAT_TLBFLUSH) }, +[HYPERV_FEAT_TLBFLUSH_DIRECT] = { +.desc = "direct TLB flush (hv-tlbflush-direct)", +.flags = { +{.func = HV_CPUID_NESTED_FEATURES, .reg = R_EAX, + .bits = HV_NESTED_DIRECT_FLUSH} +}, +.dependencies = BIT(HYPERV_FEAT_VAPIC) +}, }; static struct kvm_cpuid2 *try_get_hv_cpuid(CPUState *cs, int max, -- 2.35.3
[PATCH v4 3/6] i386: Hyper-V XMM fast hypercall input feature
Hyper-V specification allows to pass parameters for certain hypercalls using XMM registers ("XMM Fast Hypercall Input"). When the feature is in use, it allows for faster hypercalls processing as KVM can avoid reading guest's memory. KVM supports the feature since v5.14. Rename HV_HYPERCALL_{PARAMS_XMM_AVAILABLE -> XMM_INPUT_AVAILABLE} to comply with KVM. Signed-off-by: Vitaly Kuznetsov --- docs/hyperv.txt| 6 ++ target/i386/cpu.c | 2 ++ target/i386/cpu.h | 1 + target/i386/kvm/hyperv-proto.h | 2 +- target/i386/kvm/kvm.c | 7 +++ 5 files changed, 17 insertions(+), 1 deletion(-) diff --git a/docs/hyperv.txt b/docs/hyperv.txt index 5d85569b9941..af1b10c0b3d1 100644 --- a/docs/hyperv.txt +++ b/docs/hyperv.txt @@ -249,6 +249,12 @@ Enlightened VMCS ('hv-evmcs') feature to also be enabled. Recommended: hv-evmcs (Intel) +3.23. hv-xmm-input +=== +Hyper-V specification allows to pass parameters for certain hypercalls using XMM +registers ("XMM Fast Hypercall Input"). When the feature is in use, it allows +for faster hypercalls processing as KVM can avoid reading guest's memory. + 4. Supplementary features = diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 5aabf0c12e8d..cb86c11f71d4 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6962,6 +6962,8 @@ static Property x86_cpu_properties[] = { HYPERV_FEAT_AVIC, 0), DEFINE_PROP_BIT64("hv-emsr-bitmap", X86CPU, hyperv_features, HYPERV_FEAT_MSR_BITMAP, 0), +DEFINE_PROP_BIT64("hv-xmm-input", X86CPU, hyperv_features, + HYPERV_FEAT_XMM_INPUT, 0), DEFINE_PROP_ON_OFF_AUTO("hv-no-nonarch-coresharing", X86CPU, hyperv_no_nonarch_cs, ON_OFF_AUTO_OFF), DEFINE_PROP_BIT64("hv-syndbg", X86CPU, hyperv_features, diff --git a/target/i386/cpu.h b/target/i386/cpu.h index c7882857366d..37e95535843b 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1107,6 +1107,7 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w, #define HYPERV_FEAT_AVIC15 #define HYPERV_FEAT_SYNDBG 16 #define HYPERV_FEAT_MSR_BITMAP 17 +#define HYPERV_FEAT_XMM_INPUT 18 #ifndef HYPERV_SPINLOCK_NEVER_NOTIFY #define HYPERV_SPINLOCK_NEVER_NOTIFY 0x diff --git a/target/i386/kvm/hyperv-proto.h b/target/i386/kvm/hyperv-proto.h index cea18dbc0e23..f5f16474fa25 100644 --- a/target/i386/kvm/hyperv-proto.h +++ b/target/i386/kvm/hyperv-proto.h @@ -54,7 +54,7 @@ #define HV_GUEST_DEBUGGING_AVAILABLE(1u << 1) #define HV_PERF_MONITOR_AVAILABLE (1u << 2) #define HV_CPU_DYNAMIC_PARTITIONING_AVAILABLE (1u << 3) -#define HV_HYPERCALL_PARAMS_XMM_AVAILABLE (1u << 4) +#define HV_HYPERCALL_XMM_INPUT_AVAILABLE(1u << 4) #define HV_GUEST_IDLE_STATE_AVAILABLE (1u << 5) #define HV_FREQUENCY_MSRS_AVAILABLE (1u << 8) #define HV_GUEST_CRASH_MSR_AVAILABLE(1u << 10) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 82d1f0275c42..96d6c50ad5d9 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -980,6 +980,13 @@ static struct { .bits = HV_NESTED_MSR_BITMAP} } }, +[HYPERV_FEAT_XMM_INPUT] = { +.desc = "XMM fast hypercall input (hv-xmm-input)", +.flags = { +{.func = HV_CPUID_FEATURES, .reg = R_EDX, + .bits = HV_HYPERCALL_XMM_INPUT_AVAILABLE} +} +}, }; static struct kvm_cpuid2 *try_get_hv_cpuid(CPUState *cs, int max, -- 2.35.3
[PATCH v4 0/6] i386: Enable newly introduced KVM Hyper-V enlightenments
Changes since v3: - Rebase, resolve merge conflict with 73d24074078a ("hyperv: Add support to process syndbg commands") - Include "i386: docs: Convert hyperv.txt to rST" patch which was previously posted separately. Original description: This series enables four new KVM Hyper-V enlightenmtes: 'XMM fast hypercall input feature' is supported by KVM since v5.14, it allows for faster Hyper-V hypercall processing. 'Enlightened MSR-Bitmap' is a new nested specific enlightenment speeds up L2 vmexits by avoiding unnecessary updates to L2 MSR-Bitmap. KVM support for the feature on Intel CPUs is in v5.17 and in 5.18 for AMD CPUs. 'Extended GVA ranges for TLB flush hypercalls' indicates that extended GVA ranges are allowed to be passed to Hyper-V TLB flush hypercalls. 'Direct TLB flush hypercall' features allows L0 (KVM) to directly handle L2's TLB flush hypercalls without the need to exit to L1 (Hyper-V). The last two features are not merged in KVM yet: https://lore.kernel.org/kvm/20220525090133.1264239-1-vkuzn...@redhat.com/ however, there's no direct dependency on the kernel part as thanks to KVM_GET_SUPPORTED_HV_CPUID no new capabilities are introduced. Vitaly Kuznetsov (6): i386: Use hv_build_cpuid_leaf() for HV_CPUID_NESTED_FEATURES i386: Hyper-V Enlightened MSR bitmap feature i386: Hyper-V XMM fast hypercall input feature i386: Hyper-V Support extended GVA ranges for TLB flush hypercalls i386: Hyper-V Direct TLB flush hypercall i386: docs: Convert hyperv.txt to rST docs/hyperv.txt| 270 --- docs/system/i386/hyperv.rst| 288 + docs/system/target-i386.rst| 1 + target/i386/cpu.c | 8 + target/i386/cpu.h | 5 +- target/i386/kvm/hyperv-proto.h | 9 +- target/i386/kvm/kvm.c | 55 +-- 7 files changed, 354 insertions(+), 282 deletions(-) delete mode 100644 docs/hyperv.txt create mode 100644 docs/system/i386/hyperv.rst -- 2.35.3
[PATCH v4 4/6] i386: Hyper-V Support extended GVA ranges for TLB flush hypercalls
KVM kind of supported "extended GVA ranges" (up to 4095 additional GFNs per hypercall) since the implementation of Hyper-V PV TLB flush feature (Linux-4.18) as regardless of the request, full TLB flush was always performed. "Extended GVA ranges for TLB flush hypercalls" feature bit wasn't exposed then. Now, as KVM gains support for fine-grained TLB flush handling, exposing this feature starts making sense. Signed-off-by: Vitaly Kuznetsov --- docs/hyperv.txt| 7 +++ target/i386/cpu.c | 2 ++ target/i386/cpu.h | 1 + target/i386/kvm/hyperv-proto.h | 1 + target/i386/kvm/kvm.c | 8 5 files changed, 19 insertions(+) diff --git a/docs/hyperv.txt b/docs/hyperv.txt index af1b10c0b3d1..4b132b1c941a 100644 --- a/docs/hyperv.txt +++ b/docs/hyperv.txt @@ -255,6 +255,13 @@ Hyper-V specification allows to pass parameters for certain hypercalls using XMM registers ("XMM Fast Hypercall Input"). When the feature is in use, it allows for faster hypercalls processing as KVM can avoid reading guest's memory. +3.24. hv-tlbflush-ext += +Allow for extended GVA ranges to be passed to Hyper-V TLB flush hypercalls +(HvFlushVirtualAddressList/HvFlushVirtualAddressListEx). + +Requires: hv-tlbflush + 4. Supplementary features = diff --git a/target/i386/cpu.c b/target/i386/cpu.c index cb86c11f71d4..a5331e6140fc 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6964,6 +6964,8 @@ static Property x86_cpu_properties[] = { HYPERV_FEAT_MSR_BITMAP, 0), DEFINE_PROP_BIT64("hv-xmm-input", X86CPU, hyperv_features, HYPERV_FEAT_XMM_INPUT, 0), +DEFINE_PROP_BIT64("hv-tlbflush-ext", X86CPU, hyperv_features, + HYPERV_FEAT_TLBFLUSH_EXT, 0), DEFINE_PROP_ON_OFF_AUTO("hv-no-nonarch-coresharing", X86CPU, hyperv_no_nonarch_cs, ON_OFF_AUTO_OFF), DEFINE_PROP_BIT64("hv-syndbg", X86CPU, hyperv_features, diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 37e95535843b..5ff48257e513 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1108,6 +1108,7 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w, #define HYPERV_FEAT_SYNDBG 16 #define HYPERV_FEAT_MSR_BITMAP 17 #define HYPERV_FEAT_XMM_INPUT 18 +#define HYPERV_FEAT_TLBFLUSH_EXT19 #ifndef HYPERV_SPINLOCK_NEVER_NOTIFY #define HYPERV_SPINLOCK_NEVER_NOTIFY 0x diff --git a/target/i386/kvm/hyperv-proto.h b/target/i386/kvm/hyperv-proto.h index f5f16474fa25..c7854ed6d306 100644 --- a/target/i386/kvm/hyperv-proto.h +++ b/target/i386/kvm/hyperv-proto.h @@ -59,6 +59,7 @@ #define HV_FREQUENCY_MSRS_AVAILABLE (1u << 8) #define HV_GUEST_CRASH_MSR_AVAILABLE(1u << 10) #define HV_FEATURE_DEBUG_MSRS_AVAILABLE (1u << 11) +#define HV_EXT_GVA_RANGES_FLUSH_AVAILABLE (1u << 14) #define HV_STIMER_DIRECT_MODE_AVAILABLE (1u << 19) /* diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 96d6c50ad5d9..7bd1b4396e8e 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -987,6 +987,14 @@ static struct { .bits = HV_HYPERCALL_XMM_INPUT_AVAILABLE} } }, +[HYPERV_FEAT_TLBFLUSH_EXT] = { +.desc = "Extended gva ranges for TLB flush hypercalls (hv-tlbflush-ext)", +.flags = { +{.func = HV_CPUID_FEATURES, .reg = R_EDX, + .bits = HV_EXT_GVA_RANGES_FLUSH_AVAILABLE} +}, +.dependencies = BIT(HYPERV_FEAT_TLBFLUSH) +}, }; static struct kvm_cpuid2 *try_get_hv_cpuid(CPUState *cs, int max, -- 2.35.3
[PATCH v4 1/6] i386: Use hv_build_cpuid_leaf() for HV_CPUID_NESTED_FEATURES
Previously, HV_CPUID_NESTED_FEATURES.EAX CPUID leaf was handled differently as it was only used to encode the supported eVMCS version range. In fact, there are also feature (e.g. Enlightened MSR-Bitmap) bits there. In preparation to adding these features, move HV_CPUID_NESTED_FEATURES leaf handling to hv_build_cpuid_leaf() and drop now-unneeded 'hyperv_nested'. No functional change intended. Signed-off-by: Vitaly Kuznetsov --- target/i386/cpu.h | 1 - target/i386/kvm/kvm.c | 25 +++-- 2 files changed, 15 insertions(+), 11 deletions(-) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 0d528ac58f32..2e918daf6bef 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1804,7 +1804,6 @@ struct ArchCPU { uint32_t hyperv_vendor_id[3]; uint32_t hyperv_interface_id[4]; uint32_t hyperv_limits[3]; -uint32_t hyperv_nested[4]; bool hyperv_enforce_cpuid; uint32_t hyperv_ver_id_build; uint16_t hyperv_ver_id_major; diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index a9ee8eebd76f..93bfefa4a79e 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -831,6 +831,8 @@ static bool tsc_is_stable_and_known(CPUX86State *env) || env->user_tsc_khz; } +#define DEFAULT_EVMCS_VERSION ((1 << 8) | 1) + static struct { const char *desc; struct { @@ -1254,6 +1256,13 @@ static uint32_t hv_build_cpuid_leaf(CPUState *cs, uint32_t func, int reg) } } +/* HV_CPUID_NESTED_FEATURES.EAX also encodes the supported eVMCS range */ +if (func == HV_CPUID_NESTED_FEATURES && reg == R_EAX) { +if (hyperv_feat_enabled(cpu, HYPERV_FEAT_EVMCS)) { +r |= DEFAULT_EVMCS_VERSION; +} +} + return r; } @@ -1384,11 +1393,11 @@ static int hyperv_fill_cpuids(CPUState *cs, struct kvm_cpuid_entry2 *c; uint32_t signature[3]; uint32_t cpuid_i = 0, max_cpuid_leaf = 0; +uint32_t nested_eax = +hv_build_cpuid_leaf(cs, HV_CPUID_NESTED_FEATURES, R_EAX); -max_cpuid_leaf = HV_CPUID_IMPLEMENT_LIMITS; -if (hyperv_feat_enabled(cpu, HYPERV_FEAT_EVMCS)) { -max_cpuid_leaf = MAX(max_cpuid_leaf, HV_CPUID_NESTED_FEATURES); -} +max_cpuid_leaf = nested_eax ? HV_CPUID_NESTED_FEATURES : +HV_CPUID_IMPLEMENT_LIMITS; if (hyperv_feat_enabled(cpu, HYPERV_FEAT_SYNDBG)) { max_cpuid_leaf = @@ -1461,7 +1470,7 @@ static int hyperv_fill_cpuids(CPUState *cs, c->ecx = cpu->hyperv_limits[1]; c->edx = cpu->hyperv_limits[2]; -if (hyperv_feat_enabled(cpu, HYPERV_FEAT_EVMCS)) { +if (nested_eax) { uint32_t function; /* Create zeroed 0x4006..0x4009 leaves */ @@ -1473,7 +1482,7 @@ static int hyperv_fill_cpuids(CPUState *cs, c = _ent[cpuid_i++]; c->function = HV_CPUID_NESTED_FEATURES; -c->eax = cpu->hyperv_nested[0]; +c->eax = nested_eax; } if (hyperv_feat_enabled(cpu, HYPERV_FEAT_SYNDBG)) { @@ -1522,8 +1531,6 @@ static bool evmcs_version_supported(uint16_t evmcs_version, (max_version <= max_supported_version); } -#define DEFAULT_EVMCS_VERSION ((1 << 8) | 1) - static int hyperv_init_vcpu(X86CPU *cpu) { CPUState *cs = CPU(cpu); @@ -1620,8 +1627,6 @@ static int hyperv_init_vcpu(X86CPU *cpu) supported_evmcs_version >> 8); return -ENOTSUP; } - -cpu->hyperv_nested[0] = evmcs_version; } if (cpu->hyperv_enforce_cpuid) { -- 2.35.3
Re: [PATCH v3 0/5] i386: Enable newly introduced KVM Hyper-V enlightenments
Vitaly Kuznetsov writes: > Paolo Bonzini writes: > >>> This series enables four new KVM Hyper-V enlightenmtes [...] >>> >>> docs/hyperv.txt| 34 ++ >> >> Queued, thanks. > > Thanks! > It seems these patches didn't make it upstream yet but there's a (small) conflict with commit 73d24074078a2cefb5305047e3bf50b73daa3f98 Author: Jon Doron Date: Wed Feb 16 12:24:59 2022 +0200 hyperv: Add support to process syndbg commands which did. >> Would you please convert hyperv.txt to rST in docs/system/i386? > > Sure, it's on my TODO list. I've sent it out some time ago: https://lore.kernel.org/qemu-devel/20220503144906.3618426-1-vkuzn...@redhat.com/ but it also conflicts with 73d24074078a now because of 'hv-syndbg'. I'm going to send out 'v4' including the conversion to rst to (hopefully) facilitate acceptance. -- Vitaly
[PATCH v4 2/6] i386: Hyper-V Enlightened MSR bitmap feature
The newly introduced enlightenment allow L0 (KVM) and L1 (Hyper-V) hypervisors to collaborate to avoid unnecessary updates to L2 MSR-Bitmap upon vmexits. Signed-off-by: Vitaly Kuznetsov --- docs/hyperv.txt| 9 + target/i386/cpu.c | 2 ++ target/i386/cpu.h | 1 + target/i386/kvm/hyperv-proto.h | 5 + target/i386/kvm/kvm.c | 7 +++ 5 files changed, 24 insertions(+) diff --git a/docs/hyperv.txt b/docs/hyperv.txt index 33588a03961f..5d85569b9941 100644 --- a/docs/hyperv.txt +++ b/docs/hyperv.txt @@ -239,6 +239,15 @@ This enlightenment requires a VMBus device (-device vmbus-bridge,irq=15) and the follow enlightenments to work: hv-relaxed,hv_time,hv-vapic,hv-vpindex,hv-synic,hv-runtime,hv-stimer +3.22. hv-emsr-bitmap += +The enlightenment is nested specific, it targets Hyper-V on KVM guests. When +enabled, it allows L0 (KVM) and L1 (Hyper-V) hypervisors to collaborate to +avoid unnecessary updates to L2 MSR-Bitmap upon vmexits. While the protocol is +supported for both VMX (Intel) and SVM (AMD), the VMX implementation requires +Enlightened VMCS ('hv-evmcs') feature to also be enabled. + +Recommended: hv-evmcs (Intel) 4. Supplementary features = diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 35c3475e6c90..5aabf0c12e8d 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6960,6 +6960,8 @@ static Property x86_cpu_properties[] = { HYPERV_FEAT_STIMER_DIRECT, 0), DEFINE_PROP_BIT64("hv-avic", X86CPU, hyperv_features, HYPERV_FEAT_AVIC, 0), +DEFINE_PROP_BIT64("hv-emsr-bitmap", X86CPU, hyperv_features, + HYPERV_FEAT_MSR_BITMAP, 0), DEFINE_PROP_ON_OFF_AUTO("hv-no-nonarch-coresharing", X86CPU, hyperv_no_nonarch_cs, ON_OFF_AUTO_OFF), DEFINE_PROP_BIT64("hv-syndbg", X86CPU, hyperv_features, diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 2e918daf6bef..c7882857366d 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1106,6 +1106,7 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w, #define HYPERV_FEAT_STIMER_DIRECT 14 #define HYPERV_FEAT_AVIC15 #define HYPERV_FEAT_SYNDBG 16 +#define HYPERV_FEAT_MSR_BITMAP 17 #ifndef HYPERV_SPINLOCK_NEVER_NOTIFY #define HYPERV_SPINLOCK_NEVER_NOTIFY 0x diff --git a/target/i386/kvm/hyperv-proto.h b/target/i386/kvm/hyperv-proto.h index e40e59411c83..cea18dbc0e23 100644 --- a/target/i386/kvm/hyperv-proto.h +++ b/target/i386/kvm/hyperv-proto.h @@ -86,6 +86,11 @@ */ #define HV_SYNDBG_CAP_ALLOW_KERNEL_DEBUGGING(1u << 1) +/* + * HV_CPUID_NESTED_FEATURES.EAX bits + */ +#define HV_NESTED_MSR_BITMAP(1u << 19) + /* * Basic virtualized MSRs */ diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 93bfefa4a79e..82d1f0275c42 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -973,6 +973,13 @@ static struct { .dependencies = BIT(HYPERV_FEAT_SYNIC) | BIT(HYPERV_FEAT_RELAXED) }, #endif +[HYPERV_FEAT_MSR_BITMAP] = { +.desc = "enlightened MSR-Bitmap (hv-emsr-bitmap)", +.flags = { +{.func = HV_CPUID_NESTED_FEATURES, .reg = R_EAX, + .bits = HV_NESTED_MSR_BITMAP} +} +}, }; static struct kvm_cpuid2 *try_get_hv_cpuid(CPUState *cs, int max, -- 2.35.3
Re: [PATCH] vmxcap: add tertiary execution controls
Paolo Bonzini writes: > Signed-off-by: Paolo Bonzini > --- > scripts/kvm/vmxcap | 17 + > 1 file changed, 17 insertions(+) > > diff --git a/scripts/kvm/vmxcap b/scripts/kvm/vmxcap > index f140040104..ce27f5e635 100755 > --- a/scripts/kvm/vmxcap > +++ b/scripts/kvm/vmxcap > @@ -23,6 +23,7 @@ MSR_IA32_VMX_TRUE_PROCBASED_CTLS = 0x48E > MSR_IA32_VMX_TRUE_EXIT_CTLS = 0x48F > MSR_IA32_VMX_TRUE_ENTRY_CTLS = 0x490 > MSR_IA32_VMX_VMFUNC = 0x491 > +MSR_IA32_VMX_PROCBASED_CTLS3 = 0x492 > > class msr(object): > def __init__(self): > @@ -71,6 +72,13 @@ class Control(object): > s = 'yes' > print(' %-40s %s' % (self.bits[bit], s)) > > +# All 64 bits in the tertiary controls MSR are allowed-1 > +class Allowed1Control(Control): > +def read2(self, nr): > +m = msr() > +val = m.read(nr, 0) > +return (0, val) > + > class Misc(object): > def __init__(self, name, bits, msr): > self.name = name > @@ -135,6 +143,7 @@ controls = [ > 12: 'RDTSC exiting', > 15: 'CR3-load exiting', > 16: 'CR3-store exiting', > +17: 'Activate tertiary controls', > 19: 'CR8-load exiting', > 20: 'CR8-store exiting', > 21: 'Use TPR shadow', > @@ -186,6 +195,14 @@ controls = [ > cap_msr = MSR_IA32_VMX_PROCBASED_CTLS2, > ), > > +Allowed1Control( > +name = 'tertiary processor-based controls', > +bits = { > +4: 'Enable IPI virtualization' > +}, > +cap_msr = MSR_IA32_VMX_PROCBASED_CTLS3, > +), > + > Control( > name = 'VM-Exit controls', > bits = { Not sure which particular CPUs are going to implement this (whould be nice to add this info to the blurb) but this matches Intel doc (https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html) and "IPI virtualization support for VM" series for KVM, so Reviewed-by: Vitaly Kuznetsov -- Vitaly
[PATCH] i386: docs: Convert hyperv.txt to rST
rSTify docs/hyperv.txt and link it from docs/system/target-i386.rst. Signed-off-by: Vitaly Kuznetsov --- - The patch is supposed to be applied on top of "[PATCH v3 0/5] i386: Enable newly introduced KVM Hyper-V enlightenments". --- docs/hyperv.txt | 289 docs/system/i386/hyperv.rst | 275 ++ docs/system/target-i386.rst | 1 + 3 files changed, 276 insertions(+), 289 deletions(-) delete mode 100644 docs/hyperv.txt create mode 100644 docs/system/i386/hyperv.rst diff --git a/docs/hyperv.txt b/docs/hyperv.txt deleted file mode 100644 index 9553e5c03c6b.. --- a/docs/hyperv.txt +++ /dev/null @@ -1,289 +0,0 @@ -Hyper-V Enlightenments -== - - -1. Description -=== -In some cases when implementing a hardware interface in software is slow, KVM -implements its own paravirtualized interfaces. This works well for Linux as -guest support for such features is added simultaneously with the feature itself. -It may, however, be hard-to-impossible to add support for these interfaces to -proprietary OSes, namely, Microsoft Windows. - -KVM on x86 implements Hyper-V Enlightenments for Windows guests. These features -make Windows and Hyper-V guests think they're running on top of a Hyper-V -compatible hypervisor and use Hyper-V specific features. - - -2. Setup -= -No Hyper-V enlightenments are enabled by default by either KVM or QEMU. In -QEMU, individual enlightenments can be enabled through CPU flags, e.g: - - qemu-system-x86_64 --enable-kvm --cpu host,hv_relaxed,hv_vpindex,hv_time, ... - -Sometimes there are dependencies between enlightenments, QEMU is supposed to -check that the supplied configuration is sane. - -When any set of the Hyper-V enlightenments is enabled, QEMU changes hypervisor -identification (CPUID 0x4000..0x400A) to Hyper-V. KVM identification -and features are kept in leaves 0x4100..0x4101. - - -3. Existing enlightenments -=== - -3.1. hv-relaxed - -This feature tells guest OS to disable watchdog timeouts as it is running on a -hypervisor. It is known that some Windows versions will do this even when they -see 'hypervisor' CPU flag. - -3.2. hv-vapic -== -Provides so-called VP Assist page MSR to guest allowing it to work with APIC -more efficiently. In particular, this enlightenment allows paravirtualized -(exit-less) EOI processing. - -3.3. hv-spinlocks=xxx -== -Enables paravirtualized spinlocks. The parameter indicates how many times -spinlock acquisition should be attempted before indicating the situation to the -hypervisor. A special value 0x indicates "never notify". - -3.4. hv-vpindex - -Provides HV_X64_MSR_VP_INDEX (0x4002) MSR to the guest which has Virtual -processor index information. This enlightenment makes sense in conjunction with -hv-synic, hv-stimer and other enlightenments which require the guest to know its -Virtual Processor indices (e.g. when VP index needs to be passed in a -hypercall). - -3.5. hv-runtime - -Provides HV_X64_MSR_VP_RUNTIME (0x4010) MSR to the guest. The MSR keeps the -virtual processor run time in 100ns units. This gives guest operating system an -idea of how much time was 'stolen' from it (when the virtual CPU was preempted -to perform some other work). - -3.6. hv-crash -== -Provides HV_X64_MSR_CRASH_P0..HV_X64_MSR_CRASH_P5 (0x4100..0x4105) and -HV_X64_MSR_CRASH_CTL (0x4105) MSRs to the guest. These MSRs are written to -by the guest when it crashes, HV_X64_MSR_CRASH_P0..HV_X64_MSR_CRASH_P5 MSRs -contain additional crash information. This information is outputted in QEMU log -and through QAPI. -Note: unlike under genuine Hyper-V, write to HV_X64_MSR_CRASH_CTL causes guest -to shutdown. This effectively blocks crash dump generation by Windows. - -3.7. hv-time -= -Enables two Hyper-V-specific clocksources available to the guest: MSR-based -Hyper-V clocksource (HV_X64_MSR_TIME_REF_COUNT, 0x4020) and Reference TSC -page (enabled via MSR HV_X64_MSR_REFERENCE_TSC, 0x4021). Both clocksources -are per-guest, Reference TSC page clocksource allows for exit-less time stamp -readings. Using this enlightenment leads to significant speedup of all timestamp -related operations. - -3.8. hv-synic -== -Enables Hyper-V Synthetic interrupt controller - an extension of a local APIC. -When enabled, this enlightenment provides additional communication facilities -to the guest: SynIC messages and Events. This is a pre-requisite for -implementing VMBus devices (not yet in QEMU). Additionally, this enlightenment -is needed to enable Hyper-V synthetic timers. SynIC is controlled through MSRs -HV_X64_MSR_SCONTROL..HV_X64_MSR_EOM (0x4080..0x4084) and -HV_X64_MSR_SINT0..HV_X64_MSR_SINT15 (0x4090..0x409F) - -Requires: hv-vpind
Re: [PATCH v3 0/5] i386: Enable newly introduced KVM Hyper-V enlightenments
Paolo Bonzini writes: >> This series enables four new KVM Hyper-V enlightenmtes [...] >> >> docs/hyperv.txt| 34 ++ > > Queued, thanks. Thanks! > Would you please convert hyperv.txt to rST in docs/system/i386? Sure, it's on my TODO list. -- Vitaly
[PATCH v3 4/5] i386: Hyper-V Support extended GVA ranges for TLB flush hypercalls
KVM kind of supported "extended GVA ranges" (up to 4095 additional GFNs per hypercall) since the implementation of Hyper-V PV TLB flush feature (Linux-4.18) as regardless of the request, full TLB flush was always performed. "Extended GVA ranges for TLB flush hypercalls" feature bit wasn't exposed then. Now, as KVM gains support for fine-grained TLB flush handling, exposing this feature starts making sense. Signed-off-by: Vitaly Kuznetsov --- docs/hyperv.txt| 7 +++ target/i386/cpu.c | 2 ++ target/i386/cpu.h | 1 + target/i386/kvm/hyperv-proto.h | 1 + target/i386/kvm/kvm.c | 8 5 files changed, 19 insertions(+) diff --git a/docs/hyperv.txt b/docs/hyperv.txt index 857268d37d61..acc411eb84cf 100644 --- a/docs/hyperv.txt +++ b/docs/hyperv.txt @@ -241,6 +241,13 @@ Hyper-V specification allows to pass parameters for certain hypercalls using XMM registers ("XMM Fast Hypercall Input"). When the feature is in use, it allows for faster hypercalls processing as KVM can avoid reading guest's memory. +3.23. hv-tlbflush-ext += +Allow for extended GVA ranges to be passed to Hyper-V TLB flush hypercalls +(HvFlushVirtualAddressList/HvFlushVirtualAddressListEx). + +Requires: hv-tlbflush + 4. Supplementary features = diff --git a/target/i386/cpu.c b/target/i386/cpu.c index c4be8ffe7988..f80db9a403bd 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6929,6 +6929,8 @@ static Property x86_cpu_properties[] = { HYPERV_FEAT_MSR_BITMAP, 0), DEFINE_PROP_BIT64("hv-xmm-input", X86CPU, hyperv_features, HYPERV_FEAT_XMM_INPUT, 0), +DEFINE_PROP_BIT64("hv-tlbflush-ext", X86CPU, hyperv_features, + HYPERV_FEAT_TLBFLUSH_EXT, 0), DEFINE_PROP_ON_OFF_AUTO("hv-no-nonarch-coresharing", X86CPU, hyperv_no_nonarch_cs, ON_OFF_AUTO_OFF), DEFINE_PROP_BOOL("hv-passthrough", X86CPU, hyperv_passthrough, false), diff --git a/target/i386/cpu.h b/target/i386/cpu.h index ea561e18f934..ec96b0e7a4cb 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1086,6 +1086,7 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w, #define HYPERV_FEAT_AVIC15 #define HYPERV_FEAT_MSR_BITMAP 16 #define HYPERV_FEAT_XMM_INPUT 17 +#define HYPERV_FEAT_TLBFLUSH_EXT18 #ifndef HYPERV_SPINLOCK_NEVER_NOTIFY #define HYPERV_SPINLOCK_NEVER_NOTIFY 0x diff --git a/target/i386/kvm/hyperv-proto.h b/target/i386/kvm/hyperv-proto.h index 74d91adb7a16..b3f42ab92051 100644 --- a/target/i386/kvm/hyperv-proto.h +++ b/target/i386/kvm/hyperv-proto.h @@ -55,6 +55,7 @@ #define HV_GUEST_IDLE_STATE_AVAILABLE (1u << 5) #define HV_FREQUENCY_MSRS_AVAILABLE (1u << 8) #define HV_GUEST_CRASH_MSR_AVAILABLE(1u << 10) +#define HV_EXT_GVA_RANGES_FLUSH_AVAILABLE (1u << 14) #define HV_STIMER_DIRECT_MODE_AVAILABLE (1u << 19) /* diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 7f752ef4376a..8a71de07f3c7 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -980,6 +980,14 @@ static struct { .bits = HV_HYPERCALL_XMM_INPUT_AVAILABLE} } }, +[HYPERV_FEAT_TLBFLUSH_EXT] = { +.desc = "Extended gva ranges for TLB flush hypercalls (hv-tlbflush-ext)", +.flags = { +{.func = HV_CPUID_FEATURES, .reg = R_EDX, + .bits = HV_EXT_GVA_RANGES_FLUSH_AVAILABLE} +}, +.dependencies = BIT(HYPERV_FEAT_TLBFLUSH) +}, }; static struct kvm_cpuid2 *try_get_hv_cpuid(CPUState *cs, int max, -- 2.35.1
[PATCH v3 2/5] i386: Hyper-V Enlightened MSR bitmap feature
The newly introduced enlightenment allow L0 (KVM) and L1 (Hyper-V) hypervisors to collaborate to avoid unnecessary updates to L2 MSR-Bitmap upon vmexits. Signed-off-by: Vitaly Kuznetsov --- docs/hyperv.txt| 10 ++ target/i386/cpu.c | 2 ++ target/i386/cpu.h | 1 + target/i386/kvm/hyperv-proto.h | 5 + target/i386/kvm/kvm.c | 7 +++ 5 files changed, 25 insertions(+) diff --git a/docs/hyperv.txt b/docs/hyperv.txt index 0417c183a3b0..08429124a634 100644 --- a/docs/hyperv.txt +++ b/docs/hyperv.txt @@ -225,6 +225,16 @@ default (WS2016). Note: hv-version-id-* are not enlightenments and thus don't enable Hyper-V identification when specified without any other enlightenments. +3.21. hv-emsr-bitmap += +The enlightenment is nested specific, it targets Hyper-V on KVM guests. When +enabled, it allows L0 (KVM) and L1 (Hyper-V) hypervisors to collaborate to +avoid unnecessary updates to L2 MSR-Bitmap upon vmexits. While the protocol is +supported for both VMX (Intel) and SVM (AMD), the VMX implementation requires +Enlightened VMCS ('hv-evmcs') feature to also be enabled. + +Recommended: hv-evmcs (Intel) + 4. Supplementary features = diff --git a/target/i386/cpu.c b/target/i386/cpu.c index cb6b5467d067..3f053919685f 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6925,6 +6925,8 @@ static Property x86_cpu_properties[] = { HYPERV_FEAT_STIMER_DIRECT, 0), DEFINE_PROP_BIT64("hv-avic", X86CPU, hyperv_features, HYPERV_FEAT_AVIC, 0), +DEFINE_PROP_BIT64("hv-emsr-bitmap", X86CPU, hyperv_features, + HYPERV_FEAT_MSR_BITMAP, 0), DEFINE_PROP_ON_OFF_AUTO("hv-no-nonarch-coresharing", X86CPU, hyperv_no_nonarch_cs, ON_OFF_AUTO_OFF), DEFINE_PROP_BOOL("hv-passthrough", X86CPU, hyperv_passthrough, false), diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 73dc387c52f5..9615c330315f 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1084,6 +1084,7 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w, #define HYPERV_FEAT_IPI 13 #define HYPERV_FEAT_STIMER_DIRECT 14 #define HYPERV_FEAT_AVIC15 +#define HYPERV_FEAT_MSR_BITMAP 16 #ifndef HYPERV_SPINLOCK_NEVER_NOTIFY #define HYPERV_SPINLOCK_NEVER_NOTIFY 0x diff --git a/target/i386/kvm/hyperv-proto.h b/target/i386/kvm/hyperv-proto.h index 89f81afda7c6..38e25468122d 100644 --- a/target/i386/kvm/hyperv-proto.h +++ b/target/i386/kvm/hyperv-proto.h @@ -72,6 +72,11 @@ #define HV_ENLIGHTENED_VMCS_RECOMMENDED (1u << 14) #define HV_NO_NONARCH_CORESHARING (1u << 18) +/* + * HV_CPUID_NESTED_FEATURES.EAX bits + */ +#define HV_NESTED_MSR_BITMAP(1u << 19) + /* * Basic virtualized MSRs */ diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index ff79994faa87..4059b46b9449 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -966,6 +966,13 @@ static struct { .bits = HV_DEPRECATING_AEOI_RECOMMENDED} } }, +[HYPERV_FEAT_MSR_BITMAP] = { +.desc = "enlightened MSR-Bitmap (hv-emsr-bitmap)", +.flags = { +{.func = HV_CPUID_NESTED_FEATURES, .reg = R_EAX, + .bits = HV_NESTED_MSR_BITMAP} +} +}, }; static struct kvm_cpuid2 *try_get_hv_cpuid(CPUState *cs, int max, -- 2.35.1
[PATCH v3 1/5] i386: Use hv_build_cpuid_leaf() for HV_CPUID_NESTED_FEATURES
Previously, HV_CPUID_NESTED_FEATURES.EAX CPUID leaf was handled differently as it was only used to encode the supported eVMCS version range. In fact, there are also feature (e.g. Enlightened MSR-Bitmap) bits there. In preparation to adding these features, move HV_CPUID_NESTED_FEATURES leaf handling to hv_build_cpuid_leaf() and drop now-unneeded 'hyperv_nested'. No functional change intended. Signed-off-by: Vitaly Kuznetsov --- target/i386/cpu.h | 1 - target/i386/kvm/kvm.c | 23 +++ 2 files changed, 15 insertions(+), 9 deletions(-) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 982c5323537c..73dc387c52f5 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1770,7 +1770,6 @@ struct ArchCPU { uint32_t hyperv_vendor_id[3]; uint32_t hyperv_interface_id[4]; uint32_t hyperv_limits[3]; -uint32_t hyperv_nested[4]; bool hyperv_enforce_cpuid; uint32_t hyperv_ver_id_build; uint16_t hyperv_ver_id_major; diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 9cf8e036698d..ff79994faa87 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -834,6 +834,8 @@ static bool tsc_is_stable_and_known(CPUX86State *env) || env->user_tsc_khz; } +#define DEFAULT_EVMCS_VERSION ((1 << 8) | 1) + static struct { const char *desc; struct { @@ -1241,6 +1243,13 @@ static uint32_t hv_build_cpuid_leaf(CPUState *cs, uint32_t func, int reg) } } +/* HV_CPUID_NESTED_FEATURES.EAX also encodes the supported eVMCS range */ +if (func == HV_CPUID_NESTED_FEATURES && reg == R_EAX) { +if (hyperv_feat_enabled(cpu, HYPERV_FEAT_EVMCS)) { +r |= DEFAULT_EVMCS_VERSION; +} +} + return r; } @@ -1370,11 +1379,13 @@ static int hyperv_fill_cpuids(CPUState *cs, X86CPU *cpu = X86_CPU(cs); struct kvm_cpuid_entry2 *c; uint32_t cpuid_i = 0; +uint32_t nested_eax = +hv_build_cpuid_leaf(cs, HV_CPUID_NESTED_FEATURES, R_EAX); c = _ent[cpuid_i++]; c->function = HV_CPUID_VENDOR_AND_MAX_FUNCTIONS; -c->eax = hyperv_feat_enabled(cpu, HYPERV_FEAT_EVMCS) ? -HV_CPUID_NESTED_FEATURES : HV_CPUID_IMPLEMENT_LIMITS; +c->eax = nested_eax ? HV_CPUID_NESTED_FEATURES : +HV_CPUID_IMPLEMENT_LIMITS; c->ebx = cpu->hyperv_vendor_id[0]; c->ecx = cpu->hyperv_vendor_id[1]; c->edx = cpu->hyperv_vendor_id[2]; @@ -1438,7 +1449,7 @@ static int hyperv_fill_cpuids(CPUState *cs, c->ecx = cpu->hyperv_limits[1]; c->edx = cpu->hyperv_limits[2]; -if (hyperv_feat_enabled(cpu, HYPERV_FEAT_EVMCS)) { +if (nested_eax) { uint32_t function; /* Create zeroed 0x4006..0x4009 leaves */ @@ -1450,7 +1461,7 @@ static int hyperv_fill_cpuids(CPUState *cs, c = _ent[cpuid_i++]; c->function = HV_CPUID_NESTED_FEATURES; -c->eax = cpu->hyperv_nested[0]; +c->eax = nested_eax; } return cpuid_i; @@ -1472,8 +1483,6 @@ static bool evmcs_version_supported(uint16_t evmcs_version, (max_version <= max_supported_version); } -#define DEFAULT_EVMCS_VERSION ((1 << 8) | 1) - static int hyperv_init_vcpu(X86CPU *cpu) { CPUState *cs = CPU(cpu); @@ -1577,8 +1586,6 @@ static int hyperv_init_vcpu(X86CPU *cpu) supported_evmcs_version >> 8); return -ENOTSUP; } - -cpu->hyperv_nested[0] = evmcs_version; } if (cpu->hyperv_enforce_cpuid) { -- 2.35.1
[PATCH v3 0/5] i386: Enable newly introduced KVM Hyper-V enlightenments
This is a continuation of "[PATCH v2 0/3] i386: Add support for Hyper-V Enlightened MSR-Bitmap and XMM fast hypercall input features": https://lore.kernel.org/qemu-devel/20220217142949.297454-1-vkuzn...@redhat.com/ work which wasn't merged for 7.0, thus 'v3'. This series enables four new KVM Hyper-V enlightenmtes: 'XMM fast hypercall input feature' is supported by KVM since v5.14, it allows for faster Hyper-V hypercall processing. 'Enlightened MSR-Bitmap' is a new nested specific enlightenment speeds up L2 vmexits by avoiding unnecessary updates to L2 MSR-Bitmap. KVM support for the feature on Intel CPUs is in v5.17 and in 5.18 for AMD CPUs. 'Extended GVA ranges for TLB flush hypercalls' indicates that extended GVA ranges are allowed to be passed to Hyper-V TLB flush hypercalls. 'Direct TLB flush hypercall' features allows L0 (KVM) to directly handle L2's TLB flush hypercalls without the need to exit to L1 (Hyper-V). The last two features are not merged in KVM yet: https://lore.kernel.org/kvm/20220414132013.1588929-1-vkuzn...@redhat.com/ however, there's no direct dependency on the kernel part as thanks to KVM_GET_SUPPORTED_HV_CPUID no new capabilities are introduced. Vitaly Kuznetsov (5): i386: Use hv_build_cpuid_leaf() for HV_CPUID_NESTED_FEATURES i386: Hyper-V Enlightened MSR bitmap feature i386: Hyper-V XMM fast hypercall input feature i386: Hyper-V Support extended GVA ranges for TLB flush hypercalls i386: Hyper-V Direct TLB flush hypercall docs/hyperv.txt| 34 ++ target/i386/cpu.c | 8 + target/i386/cpu.h | 5 +++- target/i386/kvm/hyperv-proto.h | 9 +- target/i386/kvm/kvm.c | 53 +- 5 files changed, 99 insertions(+), 10 deletions(-) -- 2.35.1
[PATCH v3 5/5] i386: Hyper-V Direct TLB flush hypercall
Hyper-V TLFS allows for L0 and L1 hypervisors to collaborate on L2's TLB flush hypercalls handling. With the correct setup, L2's TLB flush hypercalls can be handled by L0 directly, without the need to exit to L1. Signed-off-by: Vitaly Kuznetsov --- docs/hyperv.txt| 11 +++ target/i386/cpu.c | 2 ++ target/i386/cpu.h | 1 + target/i386/kvm/hyperv-proto.h | 1 + target/i386/kvm/kvm.c | 8 5 files changed, 23 insertions(+) diff --git a/docs/hyperv.txt b/docs/hyperv.txt index acc411eb84cf..9553e5c03c6b 100644 --- a/docs/hyperv.txt +++ b/docs/hyperv.txt @@ -248,6 +248,17 @@ Allow for extended GVA ranges to be passed to Hyper-V TLB flush hypercalls Requires: hv-tlbflush +3.24. hv-tlbflush-direct += +The enlightenment is nested specific, it targets Hyper-V on KVM guests. When +enabled, it allows L0 (KVM) to directly handle TLB flush hypercalls from L2 +guest without the need to exit to L1 (Hyper-V) hypervisor. While the feature is +supported for both VMX (Intel) and SVM (AMD), the VMX implementation requires +Enlightened VMCS ('hv-evmcs') feature to also be enabled. + +Requires: hv-vapic +Recommended: hv-evmcs (Intel) + 4. Supplementary features = diff --git a/target/i386/cpu.c b/target/i386/cpu.c index f80db9a403bd..e8bbaf24d38d 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6931,6 +6931,8 @@ static Property x86_cpu_properties[] = { HYPERV_FEAT_XMM_INPUT, 0), DEFINE_PROP_BIT64("hv-tlbflush-ext", X86CPU, hyperv_features, HYPERV_FEAT_TLBFLUSH_EXT, 0), +DEFINE_PROP_BIT64("hv-tlbflush-direct", X86CPU, hyperv_features, + HYPERV_FEAT_TLBFLUSH_DIRECT, 0), DEFINE_PROP_ON_OFF_AUTO("hv-no-nonarch-coresharing", X86CPU, hyperv_no_nonarch_cs, ON_OFF_AUTO_OFF), DEFINE_PROP_BOOL("hv-passthrough", X86CPU, hyperv_passthrough, false), diff --git a/target/i386/cpu.h b/target/i386/cpu.h index ec96b0e7a4cb..2d17d52c00c1 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1087,6 +1087,7 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w, #define HYPERV_FEAT_MSR_BITMAP 16 #define HYPERV_FEAT_XMM_INPUT 17 #define HYPERV_FEAT_TLBFLUSH_EXT18 +#define HYPERV_FEAT_TLBFLUSH_DIRECT 19 #ifndef HYPERV_SPINLOCK_NEVER_NOTIFY #define HYPERV_SPINLOCK_NEVER_NOTIFY 0x diff --git a/target/i386/kvm/hyperv-proto.h b/target/i386/kvm/hyperv-proto.h index b3f42ab92051..28d7759770e1 100644 --- a/target/i386/kvm/hyperv-proto.h +++ b/target/i386/kvm/hyperv-proto.h @@ -76,6 +76,7 @@ /* * HV_CPUID_NESTED_FEATURES.EAX bits */ +#define HV_NESTED_DIRECT_FLUSH (1u << 17) #define HV_NESTED_MSR_BITMAP(1u << 19) /* diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 8a71de07f3c7..e966ab467b74 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -988,6 +988,14 @@ static struct { }, .dependencies = BIT(HYPERV_FEAT_TLBFLUSH) }, +[HYPERV_FEAT_TLBFLUSH_DIRECT] = { +.desc = "direct TLB flush (hv-tlbflush-direct)", +.flags = { +{.func = HV_CPUID_NESTED_FEATURES, .reg = R_EAX, + .bits = HV_NESTED_DIRECT_FLUSH} +}, +.dependencies = BIT(HYPERV_FEAT_VAPIC) +}, }; static struct kvm_cpuid2 *try_get_hv_cpuid(CPUState *cs, int max, -- 2.35.1
[PATCH v3 3/5] i386: Hyper-V XMM fast hypercall input feature
Hyper-V specification allows to pass parameters for certain hypercalls using XMM registers ("XMM Fast Hypercall Input"). When the feature is in use, it allows for faster hypercalls processing as KVM can avoid reading guest's memory. KVM supports the feature since v5.14. Rename HV_HYPERCALL_{PARAMS_XMM_AVAILABLE -> XMM_INPUT_AVAILABLE} to comply with KVM. Signed-off-by: Vitaly Kuznetsov --- docs/hyperv.txt| 6 ++ target/i386/cpu.c | 2 ++ target/i386/cpu.h | 1 + target/i386/kvm/hyperv-proto.h | 2 +- target/i386/kvm/kvm.c | 7 +++ 5 files changed, 17 insertions(+), 1 deletion(-) diff --git a/docs/hyperv.txt b/docs/hyperv.txt index 08429124a634..857268d37d61 100644 --- a/docs/hyperv.txt +++ b/docs/hyperv.txt @@ -235,6 +235,12 @@ Enlightened VMCS ('hv-evmcs') feature to also be enabled. Recommended: hv-evmcs (Intel) +3.22. hv-xmm-input +=== +Hyper-V specification allows to pass parameters for certain hypercalls using XMM +registers ("XMM Fast Hypercall Input"). When the feature is in use, it allows +for faster hypercalls processing as KVM can avoid reading guest's memory. + 4. Supplementary features = diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 3f053919685f..c4be8ffe7988 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6927,6 +6927,8 @@ static Property x86_cpu_properties[] = { HYPERV_FEAT_AVIC, 0), DEFINE_PROP_BIT64("hv-emsr-bitmap", X86CPU, hyperv_features, HYPERV_FEAT_MSR_BITMAP, 0), +DEFINE_PROP_BIT64("hv-xmm-input", X86CPU, hyperv_features, + HYPERV_FEAT_XMM_INPUT, 0), DEFINE_PROP_ON_OFF_AUTO("hv-no-nonarch-coresharing", X86CPU, hyperv_no_nonarch_cs, ON_OFF_AUTO_OFF), DEFINE_PROP_BOOL("hv-passthrough", X86CPU, hyperv_passthrough, false), diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 9615c330315f..ea561e18f934 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1085,6 +1085,7 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w, #define HYPERV_FEAT_STIMER_DIRECT 14 #define HYPERV_FEAT_AVIC15 #define HYPERV_FEAT_MSR_BITMAP 16 +#define HYPERV_FEAT_XMM_INPUT 17 #ifndef HYPERV_SPINLOCK_NEVER_NOTIFY #define HYPERV_SPINLOCK_NEVER_NOTIFY 0x diff --git a/target/i386/kvm/hyperv-proto.h b/target/i386/kvm/hyperv-proto.h index 38e25468122d..74d91adb7a16 100644 --- a/target/i386/kvm/hyperv-proto.h +++ b/target/i386/kvm/hyperv-proto.h @@ -51,7 +51,7 @@ #define HV_GUEST_DEBUGGING_AVAILABLE(1u << 1) #define HV_PERF_MONITOR_AVAILABLE (1u << 2) #define HV_CPU_DYNAMIC_PARTITIONING_AVAILABLE (1u << 3) -#define HV_HYPERCALL_PARAMS_XMM_AVAILABLE (1u << 4) +#define HV_HYPERCALL_XMM_INPUT_AVAILABLE(1u << 4) #define HV_GUEST_IDLE_STATE_AVAILABLE (1u << 5) #define HV_FREQUENCY_MSRS_AVAILABLE (1u << 8) #define HV_GUEST_CRASH_MSR_AVAILABLE(1u << 10) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 4059b46b9449..7f752ef4376a 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -973,6 +973,13 @@ static struct { .bits = HV_NESTED_MSR_BITMAP} } }, +[HYPERV_FEAT_XMM_INPUT] = { +.desc = "XMM fast hypercall input (hv-xmm-input)", +.flags = { +{.func = HV_CPUID_FEATURES, .reg = R_EDX, + .bits = HV_HYPERCALL_XMM_INPUT_AVAILABLE} +} +}, }; static struct kvm_cpuid2 *try_get_hv_cpuid(CPUState *cs, int max, -- 2.35.1
Re: [Qemu-devel] [PATCH 6/8] i386/kvm: hv-stimer requires hv-time and hv-synic
Divya Garg writes: > On 12/04/22 6:18 pm, Vitaly Kuznetsov wrote: >> Divya Garg writes: >> >>> Hi Vitaly Kuznetsov ! >>> I was working on hyperv flags and saw that we introduced new >>> dependencies some >>> time back >>> (https://urldefense.proofpoint.com/v2/url?u=https-3A__sourcegraph.com_github.com_qemu_qemu_-2D_commit_c686193072a47032d83cb4e131dc49ae30f9e5d7-3Fvisible-3D1=DwIBAg=s883GpUCOChKOHiocYtGcg=2QGHz-fTCVWImEBKe1ZcSe5t6UfasnhvdzD5DcixwOE=ln-t0rKlkFkOEKe97jJTLi2BoKK5E9lLMPHjPihl4kpdbvBStPeD0Ku9wTed7GPf=AtipQDs1Mi-0FQtb1AyvBpR34bpjp64troGF_nr_08E= >>> ). >>> After these changes, if we try to live migrate a vm from older qemu to newer >>> one having these changes, it fails showing dependency issue. >>> >>> I was wondering if this is the expected behaviour or if there is any work >>> around for handing it ? Or something needs to be done to ensure backward >>> compatibility ? >> Hi Divya, >> >> configurations with 'hv-stimer' and without 'hv-synic'/'hv-time' were >> always incorrect as Windows can't use the feature, that's why the >> dependencies were added. It is true that it doesn't seem to be possible >> to forward-migrate such VMs to newer QEMU versions. We could've tied >> these new dependencies to newer machine types I guess (so old machine >> types would not fail to start) but we didn't do that back in 4.1 and >> it's been awhile since... Not sure whether it would make much sense to >> introduce something for pre-4.1 machine types now. >> >> Out of curiosity, why do such "incorrect" configurations exist? Can you >> just update them to include missing flags on older QEMU so they migrate >> to newer ones without issues? >> > Hi Vitaly ! > > Thanks for the response. I understand that these were incorrect > configurations > and should be corrected. Only issue is, we want to avoid power cycling those > VMs. But yeah I think, since the configurations were wrong we should > update and > power cycle the VM. Just for understanding purpose, is it possible to > disable > the feature by throwing out some warning message and update libvirt to > metigate > this change and handle live migration ? > I'm not exactly sure about libvirt, I was under the impression it makes sure that QEMU command line is the same on the destination and on the source. If there's a way to add something, I'd suggest you add the missing features (hv-time, hv-synic) on the destination rather than remove 'hv-stimer' as it is probably safer. > Or maybe update libvirt to not to ask for this feature from qemu during live > migration and handle different configuration on source and destination > host ? You can also modify QEMU locally and throw away these dependencies, it'll allow these configurations again but generally speaking checking that the set of hyper-v features is exactly the same on the source and destination is the right thing to do: there are no guarantees that guest OS (Windows) will keep behaving sane when the corresponding CPUIDs change while it's running, all sorts of things are possible I believe. -- Vitaly
Re: [Qemu-devel] [PATCH 6/8] i386/kvm: hv-stimer requires hv-time and hv-synic
Divya Garg writes: > Hi Vitaly Kuznetsov ! > I was working on hyperv flags and saw that we introduced new > dependencies some > time back > (https://sourcegraph.com/github.com/qemu/qemu/-/commit/c686193072a47032d83cb4e131dc49ae30f9e5d7?visible=1). > After these changes, if we try to live migrate a vm from older qemu to newer > one having these changes, it fails showing dependency issue. > > I was wondering if this is the expected behaviour or if there is any work > around for handing it ? Or something needs to be done to ensure backward > compatibility ? Hi Divya, configurations with 'hv-stimer' and without 'hv-synic'/'hv-time' were always incorrect as Windows can't use the feature, that's why the dependencies were added. It is true that it doesn't seem to be possible to forward-migrate such VMs to newer QEMU versions. We could've tied these new dependencies to newer machine types I guess (so old machine types would not fail to start) but we didn't do that back in 4.1 and it's been awhile since... Not sure whether it would make much sense to introduce something for pre-4.1 machine types now. Out of curiosity, why do such "incorrect" configurations exist? Can you just update them to include missing flags on older QEMU so they migrate to newer ones without issues? -- Vitaly
Re: [PATCH v2 0/3] i386: Add support for Hyper-V Enlightened MSR-Bitmap and XMM fast hypercall input features
Vitaly Kuznetsov writes: > 'XMM fast hypercall input feature' is supported by KVM since v5.14, > it allows for faster Hyper-V hypercall processing. > > 'Enlightened MSR-Bitmap' is a new nested specific enlightenment speeds up > L2 vmexits by avoiding unnecessary updates to L2 MSR-Bitmap. KVM support > for the feature on Intel CPUs is coming in v5.17 and is queued for 5.18 for > AMD CPUs. > Gentle ping) It seems the time is running out to get this in 7.0... -- Vitaly
[PATCH 1/2] i386: Add Icelake-Server-v6 CPU model with 5-level EPT support
Windows 11 with WSL2 enabled (Hyper-V) fails to boot with Icelake-Server {-v5} CPU model but boots well with '-cpu host'. Apparently, it expects 5-level paging and 5-level EPT support to come in pair but QEMU's Icelake-Server CPU model lacks the later. Introduce 'Icelake-Server-v6' CPU model with 'vmx-page-walk-5' enabled by default. Signed-off-by: Vitaly Kuznetsov --- target/i386/cpu.c | 8 1 file changed, 8 insertions(+) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index aa9e6368004c..6e25d1333971 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -3505,6 +3505,14 @@ static const X86CPUDefinition builtin_x86_defs[] = { { /* end of list */ } }, }, +{ +.version = 6, +.note = "5-level EPT", +.props = (PropValue[]) { +{ "vmx-page-walk-5", "on" }, +{ /* end of list */ } +}, +}, { /* end of list */ } } }, -- 2.35.1
[PATCH 2/2] vmxcap: Add 5-level EPT bit
5-level EPT is present in Icelake Server CPUs and is supported by QEMU ('vmx-page-walk-5'). Signed-off-by: Vitaly Kuznetsov --- scripts/kvm/vmxcap | 1 + 1 file changed, 1 insertion(+) diff --git a/scripts/kvm/vmxcap b/scripts/kvm/vmxcap index 6fe66d5f5753..f140040104bf 100755 --- a/scripts/kvm/vmxcap +++ b/scripts/kvm/vmxcap @@ -249,6 +249,7 @@ controls = [ bits = { 0: 'Execute-only EPT translations', 6: 'Page-walk length 4', +7: 'Page-walk length 5', 8: 'Paging-structure memory type UC', 14: 'Paging-structure memory type WB', 16: '2MB EPT pages', -- 2.35.1
[PATCH v2 3/3] i386: Hyper-V XMM fast hypercall input feature
Hyper-V specification allows to pass parameters for certain hypercalls using XMM registers ("XMM Fast Hypercall Input"). When the feature is in use, it allows for faster hypercalls processing as KVM can avoid reading guest's memory. KVM supports the feature since v5.14. Rename HV_HYPERCALL_{PARAMS_XMM_AVAILABLE -> XMM_INPUT_AVAILABLE} to comply with KVM. Signed-off-by: Vitaly Kuznetsov --- docs/hyperv.txt| 6 ++ target/i386/cpu.c | 2 ++ target/i386/cpu.h | 1 + target/i386/kvm/hyperv-proto.h | 2 +- target/i386/kvm/kvm.c | 7 +++ 5 files changed, 17 insertions(+), 1 deletion(-) diff --git a/docs/hyperv.txt b/docs/hyperv.txt index 08429124a634..857268d37d61 100644 --- a/docs/hyperv.txt +++ b/docs/hyperv.txt @@ -235,6 +235,12 @@ Enlightened VMCS ('hv-evmcs') feature to also be enabled. Recommended: hv-evmcs (Intel) +3.22. hv-xmm-input +=== +Hyper-V specification allows to pass parameters for certain hypercalls using XMM +registers ("XMM Fast Hypercall Input"). When the feature is in use, it allows +for faster hypercalls processing as KVM can avoid reading guest's memory. + 4. Supplementary features = diff --git a/target/i386/cpu.c b/target/i386/cpu.c index f7405fdf4fa5..0b171db1d046 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6841,6 +6841,8 @@ static Property x86_cpu_properties[] = { HYPERV_FEAT_AVIC, 0), DEFINE_PROP_BIT64("hv-emsr-bitmap", X86CPU, hyperv_features, HYPERV_FEAT_MSR_BITMAP, 0), +DEFINE_PROP_BIT64("hv-xmm-input", X86CPU, hyperv_features, + HYPERV_FEAT_XMM_INPUT, 0), DEFINE_PROP_ON_OFF_AUTO("hv-no-nonarch-coresharing", X86CPU, hyperv_no_nonarch_cs, ON_OFF_AUTO_OFF), DEFINE_PROP_BOOL("hv-passthrough", X86CPU, hyperv_passthrough, false), diff --git a/target/i386/cpu.h b/target/i386/cpu.h index d6ae9e60a9a0..da251d165d13 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1061,6 +1061,7 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS]; #define HYPERV_FEAT_STIMER_DIRECT 14 #define HYPERV_FEAT_AVIC15 #define HYPERV_FEAT_MSR_BITMAP 16 +#define HYPERV_FEAT_XMM_INPUT 17 #ifndef HYPERV_SPINLOCK_NEVER_NOTIFY #define HYPERV_SPINLOCK_NEVER_NOTIFY 0x diff --git a/target/i386/kvm/hyperv-proto.h b/target/i386/kvm/hyperv-proto.h index 38e25468122d..74d91adb7a16 100644 --- a/target/i386/kvm/hyperv-proto.h +++ b/target/i386/kvm/hyperv-proto.h @@ -51,7 +51,7 @@ #define HV_GUEST_DEBUGGING_AVAILABLE(1u << 1) #define HV_PERF_MONITOR_AVAILABLE (1u << 2) #define HV_CPU_DYNAMIC_PARTITIONING_AVAILABLE (1u << 3) -#define HV_HYPERCALL_PARAMS_XMM_AVAILABLE (1u << 4) +#define HV_HYPERCALL_XMM_INPUT_AVAILABLE(1u << 4) #define HV_GUEST_IDLE_STATE_AVAILABLE (1u << 5) #define HV_FREQUENCY_MSRS_AVAILABLE (1u << 8) #define HV_GUEST_CRASH_MSR_AVAILABLE(1u << 10) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index f719ef3f8384..8279b116fac6 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -941,6 +941,13 @@ static struct { .bits = HV_NESTED_MSR_BITMAP} } }, +[HYPERV_FEAT_XMM_INPUT] = { +.desc = "XMM fast hypercall input (hv-xmm-input)", +.flags = { +{.func = HV_CPUID_FEATURES, .reg = R_EDX, + .bits = HV_HYPERCALL_XMM_INPUT_AVAILABLE} +} +}, }; static struct kvm_cpuid2 *try_get_hv_cpuid(CPUState *cs, int max, -- 2.35.1
[PATCH v2 2/3] i386: Hyper-V Enlightened MSR bitmap feature
The newly introduced enlightenment allow L0 (KVM) and L1 (Hyper-V) hypervisors to collaborate to avoid unnecessary updates to L2 MSR-Bitmap upon vmexits. Signed-off-by: Vitaly Kuznetsov --- docs/hyperv.txt| 10 ++ target/i386/cpu.c | 2 ++ target/i386/cpu.h | 1 + target/i386/kvm/hyperv-proto.h | 5 + target/i386/kvm/kvm.c | 7 +++ 5 files changed, 25 insertions(+) diff --git a/docs/hyperv.txt b/docs/hyperv.txt index 0417c183a3b0..08429124a634 100644 --- a/docs/hyperv.txt +++ b/docs/hyperv.txt @@ -225,6 +225,16 @@ default (WS2016). Note: hv-version-id-* are not enlightenments and thus don't enable Hyper-V identification when specified without any other enlightenments. +3.21. hv-emsr-bitmap += +The enlightenment is nested specific, it targets Hyper-V on KVM guests. When +enabled, it allows L0 (KVM) and L1 (Hyper-V) hypervisors to collaborate to +avoid unnecessary updates to L2 MSR-Bitmap upon vmexits. While the protocol is +supported for both VMX (Intel) and SVM (AMD), the VMX implementation requires +Enlightened VMCS ('hv-evmcs') feature to also be enabled. + +Recommended: hv-evmcs (Intel) + 4. Supplementary features = diff --git a/target/i386/cpu.c b/target/i386/cpu.c index aa9e6368004c..f7405fdf4fa5 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6839,6 +6839,8 @@ static Property x86_cpu_properties[] = { HYPERV_FEAT_STIMER_DIRECT, 0), DEFINE_PROP_BIT64("hv-avic", X86CPU, hyperv_features, HYPERV_FEAT_AVIC, 0), +DEFINE_PROP_BIT64("hv-emsr-bitmap", X86CPU, hyperv_features, + HYPERV_FEAT_MSR_BITMAP, 0), DEFINE_PROP_ON_OFF_AUTO("hv-no-nonarch-coresharing", X86CPU, hyperv_no_nonarch_cs, ON_OFF_AUTO_OFF), DEFINE_PROP_BOOL("hv-passthrough", X86CPU, hyperv_passthrough, false), diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 537479d24928..d6ae9e60a9a0 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1060,6 +1060,7 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS]; #define HYPERV_FEAT_IPI 13 #define HYPERV_FEAT_STIMER_DIRECT 14 #define HYPERV_FEAT_AVIC15 +#define HYPERV_FEAT_MSR_BITMAP 16 #ifndef HYPERV_SPINLOCK_NEVER_NOTIFY #define HYPERV_SPINLOCK_NEVER_NOTIFY 0x diff --git a/target/i386/kvm/hyperv-proto.h b/target/i386/kvm/hyperv-proto.h index 89f81afda7c6..38e25468122d 100644 --- a/target/i386/kvm/hyperv-proto.h +++ b/target/i386/kvm/hyperv-proto.h @@ -72,6 +72,11 @@ #define HV_ENLIGHTENED_VMCS_RECOMMENDED (1u << 14) #define HV_NO_NONARCH_CORESHARING (1u << 18) +/* + * HV_CPUID_NESTED_FEATURES.EAX bits + */ +#define HV_NESTED_MSR_BITMAP(1u << 19) + /* * Basic virtualized MSRs */ diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index ceb331db8963..f719ef3f8384 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -934,6 +934,13 @@ static struct { .bits = HV_DEPRECATING_AEOI_RECOMMENDED} } }, +[HYPERV_FEAT_MSR_BITMAP] = { +.desc = "enlightened MSR-Bitmap (hv-emsr-bitmap)", +.flags = { +{.func = HV_CPUID_NESTED_FEATURES, .reg = R_EAX, + .bits = HV_NESTED_MSR_BITMAP} +} +}, }; static struct kvm_cpuid2 *try_get_hv_cpuid(CPUState *cs, int max, -- 2.35.1
[PATCH v2 0/3] i386: Add support for Hyper-V Enlightened MSR-Bitmap and XMM fast hypercall input features
'XMM fast hypercall input feature' is supported by KVM since v5.14, it allows for faster Hyper-V hypercall processing. 'Enlightened MSR-Bitmap' is a new nested specific enlightenment speeds up L2 vmexits by avoiding unnecessary updates to L2 MSR-Bitmap. KVM support for the feature on Intel CPUs is coming in v5.17 and is queued for 5.18 for AMD CPUs. Vitaly Kuznetsov (3): i386: Use hv_build_cpuid_leaf() for HV_CPUID_NESTED_FEATURES i386: Hyper-V Enlightened MSR bitmap feature i386: Hyper-V XMM fast hypercall input feature docs/hyperv.txt| 16 +++ target/i386/cpu.c | 4 target/i386/cpu.h | 3 ++- target/i386/kvm/hyperv-proto.h | 7 ++- target/i386/kvm/kvm.c | 37 ++ 5 files changed, 57 insertions(+), 10 deletions(-) -- 2.35.1
[PATCH v2 1/3] i386: Use hv_build_cpuid_leaf() for HV_CPUID_NESTED_FEATURES
Previously, HV_CPUID_NESTED_FEATURES.EAX CPUID leaf was handled differently as it was only used to encode the supported eVMCS version range. In fact, there are also feature (e.g. Enlightened MSR-Bitmap) bits there. In preparation to adding these features, move HV_CPUID_NESTED_FEATURES leaf handling to hv_build_cpuid_leaf() and drop now-unneeded 'hyperv_nested'. No functional change intended. Signed-off-by: Vitaly Kuznetsov --- target/i386/cpu.h | 1 - target/i386/kvm/kvm.c | 23 +++ 2 files changed, 15 insertions(+), 9 deletions(-) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 9911d7c8711b..537479d24928 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1725,7 +1725,6 @@ struct X86CPU { uint32_t hyperv_vendor_id[3]; uint32_t hyperv_interface_id[4]; uint32_t hyperv_limits[3]; -uint32_t hyperv_nested[4]; bool hyperv_enforce_cpuid; uint32_t hyperv_ver_id_build; uint16_t hyperv_ver_id_major; diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 2c8feb4a6f7b..ceb331db8963 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -802,6 +802,8 @@ static bool tsc_is_stable_and_known(CPUX86State *env) || env->user_tsc_khz; } +#define DEFAULT_EVMCS_VERSION ((1 << 8) | 1) + static struct { const char *desc; struct { @@ -1209,6 +1211,13 @@ static uint32_t hv_build_cpuid_leaf(CPUState *cs, uint32_t func, int reg) } } +/* HV_CPUID_NESTED_FEATURES.EAX also encodes the supported eVMCS range */ +if (func == HV_CPUID_NESTED_FEATURES && reg == R_EAX) { +if (hyperv_feat_enabled(cpu, HYPERV_FEAT_EVMCS)) { +r |= DEFAULT_EVMCS_VERSION; +} +} + return r; } @@ -1338,11 +1347,13 @@ static int hyperv_fill_cpuids(CPUState *cs, X86CPU *cpu = X86_CPU(cs); struct kvm_cpuid_entry2 *c; uint32_t cpuid_i = 0; +uint32_t nested_eax = +hv_build_cpuid_leaf(cs, HV_CPUID_NESTED_FEATURES, R_EAX); c = _ent[cpuid_i++]; c->function = HV_CPUID_VENDOR_AND_MAX_FUNCTIONS; -c->eax = hyperv_feat_enabled(cpu, HYPERV_FEAT_EVMCS) ? -HV_CPUID_NESTED_FEATURES : HV_CPUID_IMPLEMENT_LIMITS; +c->eax = nested_eax ? HV_CPUID_NESTED_FEATURES : +HV_CPUID_IMPLEMENT_LIMITS; c->ebx = cpu->hyperv_vendor_id[0]; c->ecx = cpu->hyperv_vendor_id[1]; c->edx = cpu->hyperv_vendor_id[2]; @@ -1406,7 +1417,7 @@ static int hyperv_fill_cpuids(CPUState *cs, c->ecx = cpu->hyperv_limits[1]; c->edx = cpu->hyperv_limits[2]; -if (hyperv_feat_enabled(cpu, HYPERV_FEAT_EVMCS)) { +if (nested_eax) { uint32_t function; /* Create zeroed 0x4006..0x4009 leaves */ @@ -1418,7 +1429,7 @@ static int hyperv_fill_cpuids(CPUState *cs, c = _ent[cpuid_i++]; c->function = HV_CPUID_NESTED_FEATURES; -c->eax = cpu->hyperv_nested[0]; +c->eax = nested_eax; } return cpuid_i; @@ -1440,8 +1451,6 @@ static bool evmcs_version_supported(uint16_t evmcs_version, (max_version <= max_supported_version); } -#define DEFAULT_EVMCS_VERSION ((1 << 8) | 1) - static int hyperv_init_vcpu(X86CPU *cpu) { CPUState *cs = CPU(cpu); @@ -1545,8 +1554,6 @@ static int hyperv_init_vcpu(X86CPU *cpu) supported_evmcs_version >> 8); return -ENOTSUP; } - -cpu->hyperv_nested[0] = evmcs_version; } if (cpu->hyperv_enforce_cpuid) { -- 2.35.1
Re: [PATCH 0/2] i386: Add support for Hyper-V Enlightened MSR-Bitmap feature
Vitaly Kuznetsov writes: > The new nested specific enlightenment speeds up L2 vmexits by avoiding > unnecessary updates to L2 MSR-Bitmap. Support for both VMX and SVM is > coming to KVM: > https://lore.kernel.org/kvm/20211129094704.326635-1-vkuzn...@redhat.com/ > https://lore.kernel.org/kvm/20211220152139.418372-1-vkuzn...@redhat.com/ > Ping) VMX part made it to KVM in v5.17-rc1: commit 502d2bf5f2fd7c05adc2d4f057910bd5d4c4c63e Author: Vitaly Kuznetsov Date: Mon Nov 29 10:47:04 2021 +0100 KVM: nVMX: Implement Enlightened MSR Bitmap feature SVM part is still pending, will likely go to 5.18. QEMU enablement code is, however, the same. > Vitaly Kuznetsov (2): > i386: Use hv_build_cpuid_leaf() for HV_CPUID_NESTED_FEATURES > i386: Hyper-V Enlightened MSR bitmap feature > > docs/hyperv.txt| 10 ++ > target/i386/cpu.c | 2 ++ > target/i386/cpu.h | 2 +- > target/i386/kvm/hyperv-proto.h | 5 + > target/i386/kvm/kvm.c | 30 ++ > 5 files changed, 40 insertions(+), 9 deletions(-) -- Vitaly
[PATCH 1/2] i386: Use hv_build_cpuid_leaf() for HV_CPUID_NESTED_FEATURES
Previously, HV_CPUID_NESTED_FEATURES.EAX CPUID leaf was handled differently as it was only used to encode the supported eVMCS version range. In fact, there are also feature (e.g. Enlightened MSR-Bitmap) bits there. In preparation to adding these features, move HV_CPUID_NESTED_FEATURES leaf handling to hv_build_cpuid_leaf() and drop now-unneeded 'hyperv_nested'. No functional change intended. Signed-off-by: Vitaly Kuznetsov --- target/i386/cpu.h | 1 - target/i386/kvm/kvm.c | 23 +++ 2 files changed, 15 insertions(+), 9 deletions(-) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 04f2b790c9fa..a1165215d972 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1722,7 +1722,6 @@ struct X86CPU { uint32_t hyperv_vendor_id[3]; uint32_t hyperv_interface_id[4]; uint32_t hyperv_limits[3]; -uint32_t hyperv_nested[4]; bool hyperv_enforce_cpuid; uint32_t hyperv_ver_id_build; uint16_t hyperv_ver_id_major; diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 13f8e30c2a54..c8f4956a4e0e 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -801,6 +801,8 @@ static bool tsc_is_stable_and_known(CPUX86State *env) || env->user_tsc_khz; } +#define DEFAULT_EVMCS_VERSION ((1 << 8) | 1) + static struct { const char *desc; struct { @@ -1208,6 +1210,13 @@ static uint32_t hv_build_cpuid_leaf(CPUState *cs, uint32_t func, int reg) } } +/* HV_CPUID_NESTED_FEATURES.EAX also encodes the supported eVMCS range */ +if (func == HV_CPUID_NESTED_FEATURES && reg == R_EAX) { +if (hyperv_feat_enabled(cpu, HYPERV_FEAT_EVMCS)) { +r |= DEFAULT_EVMCS_VERSION; +} +} + return r; } @@ -1337,11 +1346,13 @@ static int hyperv_fill_cpuids(CPUState *cs, X86CPU *cpu = X86_CPU(cs); struct kvm_cpuid_entry2 *c; uint32_t cpuid_i = 0; +uint32_t nested_eax = +hv_build_cpuid_leaf(cs, HV_CPUID_NESTED_FEATURES, R_EAX); c = _ent[cpuid_i++]; c->function = HV_CPUID_VENDOR_AND_MAX_FUNCTIONS; -c->eax = hyperv_feat_enabled(cpu, HYPERV_FEAT_EVMCS) ? -HV_CPUID_NESTED_FEATURES : HV_CPUID_IMPLEMENT_LIMITS; +c->eax = nested_eax ? HV_CPUID_NESTED_FEATURES : +HV_CPUID_IMPLEMENT_LIMITS; c->ebx = cpu->hyperv_vendor_id[0]; c->ecx = cpu->hyperv_vendor_id[1]; c->edx = cpu->hyperv_vendor_id[2]; @@ -1405,7 +1416,7 @@ static int hyperv_fill_cpuids(CPUState *cs, c->ecx = cpu->hyperv_limits[1]; c->edx = cpu->hyperv_limits[2]; -if (hyperv_feat_enabled(cpu, HYPERV_FEAT_EVMCS)) { +if (nested_eax) { uint32_t function; /* Create zeroed 0x4006..0x4009 leaves */ @@ -1417,7 +1428,7 @@ static int hyperv_fill_cpuids(CPUState *cs, c = _ent[cpuid_i++]; c->function = HV_CPUID_NESTED_FEATURES; -c->eax = cpu->hyperv_nested[0]; +c->eax = nested_eax; } return cpuid_i; @@ -1439,8 +1450,6 @@ static bool evmcs_version_supported(uint16_t evmcs_version, (max_version <= max_supported_version); } -#define DEFAULT_EVMCS_VERSION ((1 << 8) | 1) - static int hyperv_init_vcpu(X86CPU *cpu) { CPUState *cs = CPU(cpu); @@ -1544,8 +1553,6 @@ static int hyperv_init_vcpu(X86CPU *cpu) supported_evmcs_version >> 8); return -ENOTSUP; } - -cpu->hyperv_nested[0] = evmcs_version; } if (cpu->hyperv_enforce_cpuid) { -- 2.33.1
[PATCH 0/2] i386: Add support for Hyper-V Enlightened MSR-Bitmap feature
The new nested specific enlightenment speeds up L2 vmexits by avoiding unnecessary updates to L2 MSR-Bitmap. Support for both VMX and SVM is coming to KVM: https://lore.kernel.org/kvm/20211129094704.326635-1-vkuzn...@redhat.com/ https://lore.kernel.org/kvm/20211220152139.418372-1-vkuzn...@redhat.com/ Vitaly Kuznetsov (2): i386: Use hv_build_cpuid_leaf() for HV_CPUID_NESTED_FEATURES i386: Hyper-V Enlightened MSR bitmap feature docs/hyperv.txt| 10 ++ target/i386/cpu.c | 2 ++ target/i386/cpu.h | 2 +- target/i386/kvm/hyperv-proto.h | 5 + target/i386/kvm/kvm.c | 30 ++ 5 files changed, 40 insertions(+), 9 deletions(-) -- 2.33.1
[PATCH 2/2] i386: Hyper-V Enlightened MSR bitmap feature
The newly introduced enlightenment allow L0 (KVM) and L1 (Hyper-V) hypervisors to collaborate to avoid unnecessary updates to L2 MSR-Bitmap upon vmexits. Signed-off-by: Vitaly Kuznetsov --- docs/hyperv.txt| 10 ++ target/i386/cpu.c | 2 ++ target/i386/cpu.h | 1 + target/i386/kvm/hyperv-proto.h | 5 + target/i386/kvm/kvm.c | 7 +++ 5 files changed, 25 insertions(+) diff --git a/docs/hyperv.txt b/docs/hyperv.txt index 0417c183a3b0..08429124a634 100644 --- a/docs/hyperv.txt +++ b/docs/hyperv.txt @@ -225,6 +225,16 @@ default (WS2016). Note: hv-version-id-* are not enlightenments and thus don't enable Hyper-V identification when specified without any other enlightenments. +3.21. hv-emsr-bitmap += +The enlightenment is nested specific, it targets Hyper-V on KVM guests. When +enabled, it allows L0 (KVM) and L1 (Hyper-V) hypervisors to collaborate to +avoid unnecessary updates to L2 MSR-Bitmap upon vmexits. While the protocol is +supported for both VMX (Intel) and SVM (AMD), the VMX implementation requires +Enlightened VMCS ('hv-evmcs') feature to also be enabled. + +Recommended: hv-evmcs (Intel) + 4. Supplementary features = diff --git a/target/i386/cpu.c b/target/i386/cpu.c index aa9e6368004c..f7405fdf4fa5 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6839,6 +6839,8 @@ static Property x86_cpu_properties[] = { HYPERV_FEAT_STIMER_DIRECT, 0), DEFINE_PROP_BIT64("hv-avic", X86CPU, hyperv_features, HYPERV_FEAT_AVIC, 0), +DEFINE_PROP_BIT64("hv-emsr-bitmap", X86CPU, hyperv_features, + HYPERV_FEAT_MSR_BITMAP, 0), DEFINE_PROP_ON_OFF_AUTO("hv-no-nonarch-coresharing", X86CPU, hyperv_no_nonarch_cs, ON_OFF_AUTO_OFF), DEFINE_PROP_BOOL("hv-passthrough", X86CPU, hyperv_passthrough, false), diff --git a/target/i386/cpu.h b/target/i386/cpu.h index a1165215d972..04e3b38abf25 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1060,6 +1060,7 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS]; #define HYPERV_FEAT_IPI 13 #define HYPERV_FEAT_STIMER_DIRECT 14 #define HYPERV_FEAT_AVIC15 +#define HYPERV_FEAT_MSR_BITMAP 16 #ifndef HYPERV_SPINLOCK_NEVER_NOTIFY #define HYPERV_SPINLOCK_NEVER_NOTIFY 0x diff --git a/target/i386/kvm/hyperv-proto.h b/target/i386/kvm/hyperv-proto.h index 89f81afda7c6..38e25468122d 100644 --- a/target/i386/kvm/hyperv-proto.h +++ b/target/i386/kvm/hyperv-proto.h @@ -72,6 +72,11 @@ #define HV_ENLIGHTENED_VMCS_RECOMMENDED (1u << 14) #define HV_NO_NONARCH_CORESHARING (1u << 18) +/* + * HV_CPUID_NESTED_FEATURES.EAX bits + */ +#define HV_NESTED_MSR_BITMAP(1u << 19) + /* * Basic virtualized MSRs */ diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index c8f4956a4e0e..2baa9609e181 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -933,6 +933,13 @@ static struct { .bits = HV_DEPRECATING_AEOI_RECOMMENDED} } }, +[HYPERV_FEAT_MSR_BITMAP] = { +.desc = "enlightened MSR-Bitmap (hv-emsr-bitmap)", +.flags = { +{.func = HV_CPUID_NESTED_FEATURES, .reg = R_EAX, + .bits = HV_NESTED_MSR_BITMAP} +} +}, }; static struct kvm_cpuid2 *try_get_hv_cpuid(CPUState *cs, int max, -- 2.33.1
Re: [PATCH v3] i386: docs: Briefly describe KVM PV features
Igor Mammedov writes: > On Mon, 4 Oct 2021 16:04:45 +0200 > Vitaly Kuznetsov wrote: > Thanks for the review! As I can see, the patch already made it to 'master': commit 7f7c8d0ce3630849a4df3d627b11de354fcb3bb0 Author: Vitaly Kuznetsov Date: Mon Oct 4 16:04:45 2021 +0200 i386: docs: Briefly describe KVM PV features we can send follow-ups, of course. >> KVM PV features don't seem to be documented anywhere, in particular, the >> fact that some of the features are enabled by default and some are not can >> only be figured out from the code. >> >> Signed-off-by: Vitaly Kuznetsov >> --- >> Changes since "[PATCH v2 0/8] i386: Assorted KVM PV and Hyper-V feature >> improvements" [Paolo Bonzini]: >> - Convert to 'rst' and move to docs/system/i386/kvm-pv.rst. >> - Add information about the version of Linux that introduced the particular >> PV feature. >> --- >> docs/system/i386/kvm-pv.rst | 100 >> docs/system/target-i386.rst | 1 + >> 2 files changed, 101 insertions(+) >> create mode 100644 docs/system/i386/kvm-pv.rst >> >> diff --git a/docs/system/i386/kvm-pv.rst b/docs/system/i386/kvm-pv.rst >> new file mode 100644 >> index ..1e5a9923ef45 >> --- /dev/null >> +++ b/docs/system/i386/kvm-pv.rst >> @@ -0,0 +1,100 @@ >> +Paravirtualized KVM features >> + >> + >> +Description >> +--- >> + >> +In some cases when implementing hardware interfaces in software is slow, >> ``KVM`` >> +implements its own paravirtualized interfaces. >> + >> +Setup >> +- >> + >> +Paravirtualized ``KVM`` features are represented as CPU flags. The following >> +features are enabled by default for any CPU model when ``KVM`` acceleration >> is >> +enabled: > > /if host kernel supports them > It does as QEMU requires linux >= 4.5. I'm not sure what happens if it doesn't, maybe it won't start. >> + >> +- ``kvmclock`` >> +- ``kvm-nopiodelay`` > >> +- ``kvm-asyncpf`` > > later you say it's not enabled by default since x.y and something else > should be used instead The situation is a bit weird. QEMU will still be enabling kvm-asyncpf by default. This, however, has no effect currently as KVM dropped support for this feature (in favor of kvm-asyncpf-int but this one is *not* enabled by default) > > maybe add a kernel version for each item in this list aka: (since: ... > [,till]) > >> +- ``kvm-steal-time`` >> +- ``kvm-pv-eoi`` >> +- ``kvmclock-stable-bit`` >> + >> +``kvm-msi-ext-dest-id`` feature is enabled by default in x2apic mode with >> split >> +irqchip (e.g. "-machine ...,kernel-irqchip=split -cpu ...,x2apic"). > > >> +Note: when CPU model ``host`` is used, QEMU passes through all supported >> +paravirtualized ``KVM`` features to the guest. > > Is it true in case of kvm-pv-enforce-cpuid=on ? Yes, I believe these two things are orthogonal: CPU model 'host' will give you everything supported by the kernel, 'kvm-pv-enforce-cpuid' will tell KVM to forbid using features, not exposed in guest visible CPUIDs: but combined with 'host' this is going to be an empty set as all features are enabled. > > Also I'd s/passes through/enables/ > on the grounds that host CPUID simply doesn't have such CPUIDs > so it's a bit confusing. I meant to say 'passes through' from KVM, not from pCPU but I see why this is not clear. > > >> +Existing features >> +- >> + >> +``kvmclock`` >> + Expose a ``KVM`` specific paravirtualized clocksource to the guest. >> Supported >> + since Linux v2.6.26. >> + >> +``kvm-nopiodelay`` >> + The guest doesn't need to perform delays on PIO operations. Supported >> since >> + Linux v2.6.26. >> + >> +``kvm-mmu`` >> + This feature is deprecated. >> + >> +``kvm-asyncpf`` >> + Enable asynchronous page fault mechanism. Supported since Linux v2.6.38. >> + Note: since Linux v5.10 the feature is deprecated and not enabled by >> ``KVM``. > >> + Use ``kvm-asyncpf-int`` instead. > 'Use' or 'Used' by default? > 'kvm-asyncpf' is a dead feature now so in case users want to get Asynchronouse Page Faults they need to enable 'kvm-asyncpf-int' manually, thus 'use'. > >> +``kvm-steal-time`` >> + Enable stolen (when guest vCPU is not running) time accounting. Supported >> + since Linux v3.1. >> + >> +``kvm-pv-eoi`` >> + Enable paravirtualized end-of-interrupt signaling. Supported since Linux >> + v3.10. >> + >&
[PATCH v3] i386: docs: Briefly describe KVM PV features
KVM PV features don't seem to be documented anywhere, in particular, the fact that some of the features are enabled by default and some are not can only be figured out from the code. Signed-off-by: Vitaly Kuznetsov --- Changes since "[PATCH v2 0/8] i386: Assorted KVM PV and Hyper-V feature improvements" [Paolo Bonzini]: - Convert to 'rst' and move to docs/system/i386/kvm-pv.rst. - Add information about the version of Linux that introduced the particular PV feature. --- docs/system/i386/kvm-pv.rst | 100 docs/system/target-i386.rst | 1 + 2 files changed, 101 insertions(+) create mode 100644 docs/system/i386/kvm-pv.rst diff --git a/docs/system/i386/kvm-pv.rst b/docs/system/i386/kvm-pv.rst new file mode 100644 index ..1e5a9923ef45 --- /dev/null +++ b/docs/system/i386/kvm-pv.rst @@ -0,0 +1,100 @@ +Paravirtualized KVM features + + +Description +--- + +In some cases when implementing hardware interfaces in software is slow, ``KVM`` +implements its own paravirtualized interfaces. + +Setup +- + +Paravirtualized ``KVM`` features are represented as CPU flags. The following +features are enabled by default for any CPU model when ``KVM`` acceleration is +enabled: + +- ``kvmclock`` +- ``kvm-nopiodelay`` +- ``kvm-asyncpf`` +- ``kvm-steal-time`` +- ``kvm-pv-eoi`` +- ``kvmclock-stable-bit`` + +``kvm-msi-ext-dest-id`` feature is enabled by default in x2apic mode with split +irqchip (e.g. "-machine ...,kernel-irqchip=split -cpu ...,x2apic"). + +Note: when CPU model ``host`` is used, QEMU passes through all supported +paravirtualized ``KVM`` features to the guest. + +Existing features +- + +``kvmclock`` + Expose a ``KVM`` specific paravirtualized clocksource to the guest. Supported + since Linux v2.6.26. + +``kvm-nopiodelay`` + The guest doesn't need to perform delays on PIO operations. Supported since + Linux v2.6.26. + +``kvm-mmu`` + This feature is deprecated. + +``kvm-asyncpf`` + Enable asynchronous page fault mechanism. Supported since Linux v2.6.38. + Note: since Linux v5.10 the feature is deprecated and not enabled by ``KVM``. + Use ``kvm-asyncpf-int`` instead. + +``kvm-steal-time`` + Enable stolen (when guest vCPU is not running) time accounting. Supported + since Linux v3.1. + +``kvm-pv-eoi`` + Enable paravirtualized end-of-interrupt signaling. Supported since Linux + v3.10. + +``kvm-pv-unhalt`` + Enable paravirtualized spinlocks support. Supported since Linux v3.12. + +``kvm-pv-tlb-flush`` + Enable paravirtualized TLB flush mechanism. Supported since Linux v4.16. + +``kvm-pv-ipi`` + Enable paravirtualized IPI mechanism. Supported since Linux v4.19. + +``kvm-poll-control`` + Enable host-side polling on HLT control from the guest. Supported since Linux + v5.10. + +``kvm-pv-sched-yield`` + Enable paravirtualized sched yield feature. Supported since Linux v5.10. + +``kvm-asyncpf-int`` + Enable interrupt based asynchronous page fault mechanism. Supported since Linux + v5.10. + +``kvm-msi-ext-dest-id`` + Support 'Extended Destination ID' for external interrupts. The feature allows + to use up to 32768 CPUs without IRQ remapping (but other limits may apply making + the number of supported vCPUs for a given configuration lower). Supported since + Linux v5.10. + +``kvmclock-stable-bit`` + Tell the guest that guest visible TSC value can be fully trusted for kvmclock + computations and no warps are expected. Supported since Linux v2.6.35. + +Supplementary features +-- + +``kvm-pv-enforce-cpuid`` + Limit the supported paravirtualized feature set to the exposed features only. + Note, by default, ``KVM`` allows the guest to use all currently supported + paravirtualized features even when they were not announced in guest visible + CPUIDs. Supported since Linux v5.10. + + +Useful links + + +Please refer to Documentation/virt/kvm in Linux for additional details. diff --git a/docs/system/target-i386.rst b/docs/system/target-i386.rst index 6a86d638633a..4daa53c35d8f 100644 --- a/docs/system/target-i386.rst +++ b/docs/system/target-i386.rst @@ -26,6 +26,7 @@ Architectural features :maxdepth: 1 i386/cpu + i386/kvm-pv i386/sgx .. _pcsys_005freq: -- 2.31.1
Re: [PATCH v2 0/8] i386: Assorted KVM PV and Hyper-V feature improvements
Paolo Bonzini writes: > On 02/09/21 11:35, Vitaly Kuznetsov wrote: >> This is a continuation of "[PATCH 0/3] i386/kvm: Paravirtualized features >> usage >> enforcement" series, thus v2. >> >> This series implements several unrelated features but as there are code >> dependencies between them I'm sending it as one series. >> >> PATCH1 adds empty 6.2 machine types and the required compat infrastructure >> (to be used by PATCH8) >> PATCH2 adds documentation for KVM PV features >> PATCH3 adds support for KVM_CAP_ENFORCE_PV_FEATURE_CPUID >> PATCH4 adds support for KVM_CAP_HYPERV_ENFORCE_CPUID >> PATCHes5-6 add 'hv-avic' feature >> PATCH7 makes Hyper-V version info settable >> PATCH8 changes the default Hyper-V version to 2016 >> >> Vitaly Kuznetsov (8): >>i386: Add 6.2 machine types >>i386: docs: Briefly describe KVM PV features >>i386: Support KVM_CAP_ENFORCE_PV_FEATURE_CPUID >>i386: Support KVM_CAP_HYPERV_ENFORCE_CPUID >>i386: Move HV_APIC_ACCESS_RECOMMENDED bit setting to >> hyperv_fill_cpuids() >>i386: Implement pseudo 'hv-avic' ('hv-apicv') enlightenment >>i386: Make Hyper-V version id configurable >>i386: Change the default Hyper-V version to match WS2016 >> >> docs/hyperv.txt| 41 +++-- >> docs/kvm-pv.txt| 103 + >> hw/core/machine.c | 3 + >> hw/i386/pc.c | 7 +++ >> hw/i386/pc_piix.c | 14 - >> hw/i386/pc_q35.c | 13 - >> include/hw/boards.h| 3 + >> include/hw/i386/pc.h | 3 + >> target/i386/cpu.c | 22 +-- >> target/i386/cpu.h | 12 +++- >> target/i386/kvm/hyperv-proto.h | 1 + >> target/i386/kvm/kvm.c | 62 +++- >> 12 files changed, 260 insertions(+), 24 deletions(-) >> create mode 100644 docs/kvm-pv.txt >> > > Queued patches 3-8, thanks. Patch3 with the hunk to docs/kvm-pv.txt dropped I suppose (as PATCH2 introducing the file is not queued)? I can include it in the next submission then. Thanks! -- Vitaly
Re: [PATCH v2 0/8] i386: Assorted KVM PV and Hyper-V feature improvements
Vitaly Kuznetsov writes: > This is a continuation of "[PATCH 0/3] i386/kvm: Paravirtualized features > usage > enforcement" series, thus v2. > > This series implements several unrelated features but as there are code > dependencies between them I'm sending it as one series. > > PATCH1 adds empty 6.2 machine types and the required compat infrastructure > (to be used by PATCH8) > PATCH2 adds documentation for KVM PV features > PATCH3 adds support for KVM_CAP_ENFORCE_PV_FEATURE_CPUID > PATCH4 adds support for KVM_CAP_HYPERV_ENFORCE_CPUID > PATCHes5-6 add 'hv-avic' feature > PATCH7 makes Hyper-V version info settable > PATCH8 changes the default Hyper-V version to 2016 Eduardo, Paolo, all, any comments? It seems patches can still be applied to 'master' with no issues. -- Vitaly
[PATCH v2 5/8] i386: Move HV_APIC_ACCESS_RECOMMENDED bit setting to hyperv_fill_cpuids()
In preparation to enabling Hyper-V + APICv/AVIC move HV_APIC_ACCESS_RECOMMENDED setting out of kvm_hyperv_properties[]: the 'real' feature bit for the vAPIC features is HV_APIC_ACCESS_AVAILABLE, HV_APIC_ACCESS_RECOMMENDED is a recommendation to use the feature which we may not always want to give. Signed-off-by: Vitaly Kuznetsov --- target/i386/kvm/kvm.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index bd0b53416315..430007c2691a 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -821,9 +821,7 @@ static struct { .desc = "virtual APIC (hv-vapic)", .flags = { {.func = HV_CPUID_FEATURES, .reg = R_EAX, - .bits = HV_APIC_ACCESS_AVAILABLE}, -{.func = HV_CPUID_ENLIGHTMENT_INFO, .reg = R_EAX, - .bits = HV_APIC_ACCESS_RECOMMENDED} + .bits = HV_APIC_ACCESS_AVAILABLE} } }, [HYPERV_FEAT_TIME] = { @@ -1366,6 +1364,7 @@ static int hyperv_fill_cpuids(CPUState *cs, c->ebx |= HV_POST_MESSAGES | HV_SIGNAL_EVENTS; } + /* Not exposed by KVM but needed to make CPU hotplug in Windows work */ c->edx |= HV_CPU_DYNAMIC_PARTITIONING_AVAILABLE; @@ -1374,6 +1373,10 @@ static int hyperv_fill_cpuids(CPUState *cs, c->eax = hv_build_cpuid_leaf(cs, HV_CPUID_ENLIGHTMENT_INFO, R_EAX); c->ebx = cpu->hyperv_spinlock_attempts; +if (hyperv_feat_enabled(cpu, HYPERV_FEAT_VAPIC)) { +c->eax |= HV_APIC_ACCESS_RECOMMENDED; +} + if (cpu->hyperv_no_nonarch_cs == ON_OFF_AUTO_ON) { c->eax |= HV_NO_NONARCH_CORESHARING; } else if (cpu->hyperv_no_nonarch_cs == ON_OFF_AUTO_AUTO) { -- 2.31.1
[PATCH v2 0/8] i386: Assorted KVM PV and Hyper-V feature improvements
This is a continuation of "[PATCH 0/3] i386/kvm: Paravirtualized features usage enforcement" series, thus v2. This series implements several unrelated features but as there are code dependencies between them I'm sending it as one series. PATCH1 adds empty 6.2 machine types and the required compat infrastructure (to be used by PATCH8) PATCH2 adds documentation for KVM PV features PATCH3 adds support for KVM_CAP_ENFORCE_PV_FEATURE_CPUID PATCH4 adds support for KVM_CAP_HYPERV_ENFORCE_CPUID PATCHes5-6 add 'hv-avic' feature PATCH7 makes Hyper-V version info settable PATCH8 changes the default Hyper-V version to 2016 Vitaly Kuznetsov (8): i386: Add 6.2 machine types i386: docs: Briefly describe KVM PV features i386: Support KVM_CAP_ENFORCE_PV_FEATURE_CPUID i386: Support KVM_CAP_HYPERV_ENFORCE_CPUID i386: Move HV_APIC_ACCESS_RECOMMENDED bit setting to hyperv_fill_cpuids() i386: Implement pseudo 'hv-avic' ('hv-apicv') enlightenment i386: Make Hyper-V version id configurable i386: Change the default Hyper-V version to match WS2016 docs/hyperv.txt| 41 +++-- docs/kvm-pv.txt| 103 + hw/core/machine.c | 3 + hw/i386/pc.c | 7 +++ hw/i386/pc_piix.c | 14 - hw/i386/pc_q35.c | 13 - include/hw/boards.h| 3 + include/hw/i386/pc.h | 3 + target/i386/cpu.c | 22 +-- target/i386/cpu.h | 12 +++- target/i386/kvm/hyperv-proto.h | 1 + target/i386/kvm/kvm.c | 62 +++- 12 files changed, 260 insertions(+), 24 deletions(-) create mode 100644 docs/kvm-pv.txt -- 2.31.1
[PATCH v2 3/8] i386: Support KVM_CAP_ENFORCE_PV_FEATURE_CPUID
By default, KVM allows the guest to use all currently supported PV features even when they were not announced in guest visible CPUIDs. Introduce a new "kvm-pv-enforce-cpuid" flag to limit the supported feature set to the exposed features. The feature is supported by Linux >= 5.10 and is not enabled by default in QEMU. Signed-off-by: Vitaly Kuznetsov --- docs/kvm-pv.txt | 13 - target/i386/cpu.c | 2 ++ target/i386/cpu.h | 3 +++ target/i386/kvm/kvm.c | 10 ++ 4 files changed, 27 insertions(+), 1 deletion(-) diff --git a/docs/kvm-pv.txt b/docs/kvm-pv.txt index 84ad7fa60f8d..d1aac533feea 100644 --- a/docs/kvm-pv.txt +++ b/docs/kvm-pv.txt @@ -87,6 +87,17 @@ the number of supported vCPUs for a given configuration lower). Tells the guest that guest visible TSC value can be fully trusted for kvmclock computations and no warps are expected. -4. Useful links +4. Supplementary features += + +4.1. kvm-pv-enforce-cpuid += +By default, KVM allows the guest to use all currently supported PV features even +when they were not announced in guest visible CPUIDs. 'kvm-pv-enforce-cpuid' +feature alters this behavior and limits the supported feature set to the +exposed features only. + + +5. Useful links Please refer to Documentation/virt/kvm in Linux for additional detail. diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 97e250e8760d..a70038f172d9 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6691,6 +6691,8 @@ static Property x86_cpu_properties[] = { DEFINE_PROP_BOOL("l3-cache", X86CPU, enable_l3_cache, true), DEFINE_PROP_BOOL("kvm-no-smi-migration", X86CPU, kvm_no_smi_migration, false), +DEFINE_PROP_BOOL("kvm-pv-enforce-cpuid", X86CPU, kvm_pv_enforce_cpuid, + false), DEFINE_PROP_BOOL("vmware-cpuid-freq", X86CPU, vmware_cpuid_freq, true), DEFINE_PROP_BOOL("tcg-cpuid", X86CPU, expose_tcg, true), DEFINE_PROP_BOOL("x-migrate-smi-count", X86CPU, migrate_smi_count, diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 6c50d3ab4f1d..20273a8069dd 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1782,6 +1782,9 @@ struct X86CPU { /* Stop SMI delivery for migration compatibility with old machines */ bool kvm_no_smi_migration; +/* Forcefully disable KVM PV features not exposed in guest CPUIDs */ +bool kvm_pv_enforce_cpuid; + /* Number of physical address bits supported */ uint32_t phys_bits; diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 500d2e0e686f..49f97f345069 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -1629,6 +1629,16 @@ int kvm_arch_init_vcpu(CPUState *cs) cpu_x86_cpuid(env, 0, 0, , , , ); +if (cpu->kvm_pv_enforce_cpuid) { +r = kvm_vcpu_enable_cap(cs, KVM_CAP_ENFORCE_PV_FEATURE_CPUID, 0, 1); +if (r < 0) { +fprintf(stderr, +"failed to enable KVM_CAP_ENFORCE_PV_FEATURE_CPUID: %s", +strerror(-r)); +abort(); +} +} + for (i = 0; i <= limit; i++) { if (cpuid_i == KVM_MAX_CPUID_ENTRIES) { fprintf(stderr, "unsupported level value: 0x%x\n", limit); -- 2.31.1
[PATCH v2 8/8] i386: Change the default Hyper-V version to match WS2016
KVM implements some Hyper-V 2016 functions so providing WS2008R2 version is somewhat incorrect. While generally guests shouldn't care about it and always check feature bits, it is known that some tools in Windows actually check version info. For compatibility reasons make the change for 7.2 machine types only. Signed-off-by: Vitaly Kuznetsov --- docs/hyperv.txt | 2 +- hw/i386/pc.c | 6 +- target/i386/cpu.c | 6 +++--- 3 files changed, 9 insertions(+), 5 deletions(-) diff --git a/docs/hyperv.txt b/docs/hyperv.txt index 7803495468b7..5d99fd9a72b8 100644 --- a/docs/hyperv.txt +++ b/docs/hyperv.txt @@ -214,7 +214,7 @@ exposing correct vCPU topology and vCPU pinning. 3.20. hv-version-id-{build,major,minor,spack,sbranch,snumber} = This changes Hyper-V version identification in CPUID 0x4002.EAX-EDX from the -default (WS2008R2). +default (WS2016). - hv-version-id-build sets 'Build Number' (32 bits) - hv-version-id-major sets 'Major Version' (16 bits) - hv-version-id-minor sets 'Minor Version' (16 bits) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 1276bfeee456..b2e4eef9d211 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -93,7 +93,11 @@ #include "trace.h" #include CONFIG_DEVICES -GlobalProperty pc_compat_6_1[] = {}; +GlobalProperty pc_compat_6_1[] = { +{ TYPE_X86_CPU, "hv-version-id-build", "0x1bbc" }, +{ TYPE_X86_CPU, "hv-version-id-major", "0x0006" }, +{ TYPE_X86_CPU, "hv-version-id-minor", "0x0001" }, +}; const size_t pc_compat_6_1_len = G_N_ELEMENTS(pc_compat_6_1); GlobalProperty pc_compat_6_0[] = { diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 5766e720093d..569840deaf93 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6669,11 +6669,11 @@ static Property x86_cpu_properties[] = { /* WS2008R2 identify by default */ DEFINE_PROP_UINT32("hv-version-id-build", X86CPU, hyperv_ver_id_build, - 0x1bbc), + 0x3839), DEFINE_PROP_UINT16("hv-version-id-major", X86CPU, hyperv_ver_id_major, - 0x0006), + 0x000A), DEFINE_PROP_UINT16("hv-version-id-minor", X86CPU, hyperv_ver_id_minor, - 0x0001), + 0x), DEFINE_PROP_UINT32("hv-version-id-spack", X86CPU, hyperv_ver_id_sp, 0), DEFINE_PROP_UINT8("hv-version-id-sbranch", X86CPU, hyperv_ver_id_sb, 0), DEFINE_PROP_UINT32("hv-version-id-snumber", X86CPU, hyperv_ver_id_sn, 0), -- 2.31.1
[PATCH v2 1/8] i386: Add 6.2 machine types
Introduce 6.2 machine types and the required infrastructure for adding compat properties to pre-6.2 machine types. Signed-off-by: Vitaly Kuznetsov --- hw/core/machine.c| 3 +++ hw/i386/pc.c | 3 +++ hw/i386/pc_piix.c| 14 +- hw/i386/pc_q35.c | 13 - include/hw/boards.h | 3 +++ include/hw/i386/pc.h | 3 +++ 6 files changed, 37 insertions(+), 2 deletions(-) diff --git a/hw/core/machine.c b/hw/core/machine.c index 54e040587dd3..9d0d1194e1ef 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -46,6 +46,9 @@ GlobalProperty hw_compat_6_0[] = { }; const size_t hw_compat_6_0_len = G_N_ELEMENTS(hw_compat_6_0); +GlobalProperty hw_compat_6_1[] = {}; +const size_t hw_compat_6_1_len = G_N_ELEMENTS(hw_compat_6_1); + GlobalProperty hw_compat_5_2[] = { { "ICH9-LPC", "smm-compat", "on"}, { "PIIX4_PM", "smm-compat", "on"}, diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 102b22394689..1276bfeee456 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -93,6 +93,9 @@ #include "trace.h" #include CONFIG_DEVICES +GlobalProperty pc_compat_6_1[] = {}; +const size_t pc_compat_6_1_len = G_N_ELEMENTS(pc_compat_6_1); + GlobalProperty pc_compat_6_0[] = { { "qemu64" "-" TYPE_X86_CPU, "family", "6" }, { "qemu64" "-" TYPE_X86_CPU, "model", "6" }, diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 1bc30167acc0..c5da7739cef7 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -412,7 +412,7 @@ static void pc_i440fx_machine_options(MachineClass *m) machine_class_allow_dynamic_sysbus_dev(m, TYPE_VMBUS_BRIDGE); } -static void pc_i440fx_6_1_machine_options(MachineClass *m) +static void pc_i440fx_6_2_machine_options(MachineClass *m) { PCMachineClass *pcmc = PC_MACHINE_CLASS(m); pc_i440fx_machine_options(m); @@ -421,6 +421,18 @@ static void pc_i440fx_6_1_machine_options(MachineClass *m) pcmc->default_cpu_version = 1; } +DEFINE_I440FX_MACHINE(v6_2, "pc-i440fx-6.2", NULL, + pc_i440fx_6_2_machine_options); + +static void pc_i440fx_6_1_machine_options(MachineClass *m) +{ +pc_i440fx_6_2_machine_options(m); +m->alias = NULL; +m->is_default = false; +compat_props_add(m->compat_props, hw_compat_6_1, hw_compat_6_1_len); +compat_props_add(m->compat_props, pc_compat_6_1, pc_compat_6_1_len); +} + DEFINE_I440FX_MACHINE(v6_1, "pc-i440fx-6.1", NULL, pc_i440fx_6_1_machine_options); diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index eeb0b185b118..565fadce540c 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -354,7 +354,7 @@ static void pc_q35_machine_options(MachineClass *m) m->max_cpus = 288; } -static void pc_q35_6_1_machine_options(MachineClass *m) +static void pc_q35_6_2_machine_options(MachineClass *m) { PCMachineClass *pcmc = PC_MACHINE_CLASS(m); pc_q35_machine_options(m); @@ -362,6 +362,17 @@ static void pc_q35_6_1_machine_options(MachineClass *m) pcmc->default_cpu_version = 1; } +DEFINE_Q35_MACHINE(v6_2, "pc-q35-6.2", NULL, + pc_q35_6_2_machine_options); + +static void pc_q35_6_1_machine_options(MachineClass *m) +{ +pc_q35_6_2_machine_options(m); +m->alias = NULL; +compat_props_add(m->compat_props, hw_compat_6_1, hw_compat_6_1_len); +compat_props_add(m->compat_props, pc_compat_6_1, pc_compat_6_1_len); +} + DEFINE_Q35_MACHINE(v6_1, "pc-q35-6.1", NULL, pc_q35_6_1_machine_options); diff --git a/include/hw/boards.h b/include/hw/boards.h index accd6eff35ab..463a5514f97d 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -353,6 +353,9 @@ struct MachineState { } \ type_init(machine_initfn##_register_types) +extern GlobalProperty hw_compat_6_1[]; +extern const size_t hw_compat_6_1_len; + extern GlobalProperty hw_compat_6_0[]; extern const size_t hw_compat_6_0_len; diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 88dffe751724..97b4ab79b534 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -196,6 +196,9 @@ void pc_system_parse_ovmf_flash(uint8_t *flash_ptr, size_t flash_size); void pc_madt_cpu_entry(AcpiDeviceIf *adev, int uid, const CPUArchIdList *apic_ids, GArray *entry); +extern GlobalProperty pc_compat_6_1[]; +extern const size_t pc_compat_6_1_len; + extern GlobalProperty pc_compat_6_0[]; extern const size_t pc_compat_6_0_len; -- 2.31.1
[PATCH v2 4/8] i386: Support KVM_CAP_HYPERV_ENFORCE_CPUID
By default, KVM allows the guest to use all currently supported Hyper-V enlightenments when Hyper-V CPUID interface was exposed, regardless of if some features were not announced in guest visible CPUIDs. hv-enforce-cpuid feature alters this behavior and only allows the guest to use exposed Hyper-V enlightenments. The feature is supported by Linux >= 5.14 and is not enabled by default in QEMU. Signed-off-by: Vitaly Kuznetsov --- docs/hyperv.txt | 17 ++--- target/i386/cpu.c | 1 + target/i386/cpu.h | 1 + target/i386/kvm/kvm.c | 9 + 4 files changed, 25 insertions(+), 3 deletions(-) diff --git a/docs/hyperv.txt b/docs/hyperv.txt index 000638a2fd38..072709a68f47 100644 --- a/docs/hyperv.txt +++ b/docs/hyperv.txt @@ -203,8 +203,11 @@ When the option is set to 'on' QEMU will always enable the feature, regardless of host setup. To keep guests secure, this can only be used in conjunction with exposing correct vCPU topology and vCPU pinning. -4. Development features - +4. Supplementary features += + +4.1. hv-passthrough +=== In some cases (e.g. during development) it may make sense to use QEMU in 'pass-through' mode and give Windows guests all enlightenments currently supported by KVM. This pass-through mode is enabled by "hv-passthrough" CPU @@ -215,8 +218,16 @@ values from KVM to QEMU. "hv-passthrough" overrides all other "hv-*" settings on the command line. Also, enabling this flag effectively prevents migration as the list of enabled enlightenments may differ between target and destination hosts. +4.2. hv-enforce-cpuid += +By default, KVM allows the guest to use all currently supported Hyper-V +enlightenments when Hyper-V CPUID interface was exposed, regardless of if +some features were not announced in guest visible CPUIDs. 'hv-enforce-cpuid' +feature alters this behavior and only allows the guest to use exposed Hyper-V +enlightenments. + -4. Useful links +5. Useful links Hyper-V Top Level Functional specification and other information: https://github.com/MicrosoftDocs/Virtualization-Documentation diff --git a/target/i386/cpu.c b/target/i386/cpu.c index a70038f172d9..36e1b6ec9c9b 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6665,6 +6665,7 @@ static Property x86_cpu_properties[] = { DEFINE_PROP_ON_OFF_AUTO("hv-no-nonarch-coresharing", X86CPU, hyperv_no_nonarch_cs, ON_OFF_AUTO_OFF), DEFINE_PROP_BOOL("hv-passthrough", X86CPU, hyperv_passthrough, false), +DEFINE_PROP_BOOL("hv-enforce-cpuid", X86CPU, hyperv_enforce_cpuid, false), DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true), DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false), diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 20273a8069dd..8822bea5c9a4 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1699,6 +1699,7 @@ struct X86CPU { uint32_t hyperv_version_id[4]; uint32_t hyperv_limits[3]; uint32_t hyperv_nested[4]; +bool hyperv_enforce_cpuid; bool check_cpuid; bool enforce_cpuid; diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 49f97f345069..bd0b53416315 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -1531,6 +1531,15 @@ static int hyperv_init_vcpu(X86CPU *cpu) cpu->hyperv_nested[0] = evmcs_version; } +if (cpu->hyperv_enforce_cpuid) { +ret = kvm_vcpu_enable_cap(cs, KVM_CAP_HYPERV_ENFORCE_CPUID, 0, 1); +if (ret < 0) { +error_report("failed to enable KVM_CAP_HYPERV_ENFORCE_CPUID: %s", + strerror(-ret)); +return ret; +} +} + return 0; } -- 2.31.1
[PATCH v2 7/8] i386: Make Hyper-V version id configurable
Currently, we hardcode Hyper-V version id (CPUID 0x4002) to WS2008R2 and it is known that certain tools in Windows check this. It seems useful to provide some flexibility by making it possible to change this info at will. CPUID information is defined in TLFS as: EAX: Build Number EBX Bits 31-16: Major Version Bits 15-0: Minor Version ECX Service Pack EDX Bits 31-24: Service Branch Bits 23-0: Service Number Signed-off-by: Vitaly Kuznetsov --- docs/hyperv.txt | 14 ++ target/i386/cpu.c | 15 +++ target/i386/cpu.h | 7 ++- target/i386/kvm/kvm.c | 26 -- 4 files changed, 47 insertions(+), 15 deletions(-) diff --git a/docs/hyperv.txt b/docs/hyperv.txt index cd1ea3bbe9d7..7803495468b7 100644 --- a/docs/hyperv.txt +++ b/docs/hyperv.txt @@ -211,6 +211,20 @@ When the option is set to 'on' QEMU will always enable the feature, regardless of host setup. To keep guests secure, this can only be used in conjunction with exposing correct vCPU topology and vCPU pinning. +3.20. hv-version-id-{build,major,minor,spack,sbranch,snumber} += +This changes Hyper-V version identification in CPUID 0x4002.EAX-EDX from the +default (WS2008R2). +- hv-version-id-build sets 'Build Number' (32 bits) +- hv-version-id-major sets 'Major Version' (16 bits) +- hv-version-id-minor sets 'Minor Version' (16 bits) +- hv-version-id-spack sets 'Service Pack' (32 bits) +- hv-version-id-sbranch sets 'Service Branch' (8 bits) +- hv-version-id-snumber sets 'Service Number' (24 bits) + +Note: hv-version-id-* are not enlightenments and thus don't enable Hyper-V +identification when specified without any other enlightenments. + 4. Supplementary features = diff --git a/target/i386/cpu.c b/target/i386/cpu.c index a695e200d409..5766e720093d 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6093,10 +6093,6 @@ static void x86_cpu_hyperv_realize(X86CPU *cpu) cpu->hyperv_interface_id[2] = 0; cpu->hyperv_interface_id[3] = 0; -/* Hypervisor system identity */ -cpu->hyperv_version_id[0] = 0x1bbc; -cpu->hyperv_version_id[1] = 0x00060001; - /* Hypervisor implementation limits */ cpu->hyperv_limits[0] = 64; cpu->hyperv_limits[1] = 0; @@ -6671,6 +6667,17 @@ static Property x86_cpu_properties[] = { DEFINE_PROP_BOOL("hv-passthrough", X86CPU, hyperv_passthrough, false), DEFINE_PROP_BOOL("hv-enforce-cpuid", X86CPU, hyperv_enforce_cpuid, false), +/* WS2008R2 identify by default */ +DEFINE_PROP_UINT32("hv-version-id-build", X86CPU, hyperv_ver_id_build, + 0x1bbc), +DEFINE_PROP_UINT16("hv-version-id-major", X86CPU, hyperv_ver_id_major, + 0x0006), +DEFINE_PROP_UINT16("hv-version-id-minor", X86CPU, hyperv_ver_id_minor, + 0x0001), +DEFINE_PROP_UINT32("hv-version-id-spack", X86CPU, hyperv_ver_id_sp, 0), +DEFINE_PROP_UINT8("hv-version-id-sbranch", X86CPU, hyperv_ver_id_sb, 0), +DEFINE_PROP_UINT32("hv-version-id-snumber", X86CPU, hyperv_ver_id_sn, 0), + DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true), DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false), DEFINE_PROP_BOOL("x-force-features", X86CPU, force_features, false), diff --git a/target/i386/cpu.h b/target/i386/cpu.h index d22a8d259967..5c2bf1079745 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1697,10 +1697,15 @@ struct X86CPU { OnOffAuto hyperv_no_nonarch_cs; uint32_t hyperv_vendor_id[3]; uint32_t hyperv_interface_id[4]; -uint32_t hyperv_version_id[4]; uint32_t hyperv_limits[3]; uint32_t hyperv_nested[4]; bool hyperv_enforce_cpuid; +uint32_t hyperv_ver_id_build; +uint16_t hyperv_ver_id_major; +uint16_t hyperv_ver_id_minor; +uint32_t hyperv_ver_id_sp; +uint8_t hyperv_ver_id_sb; +uint32_t hyperv_ver_id_sn; bool check_cpuid; bool enforce_cpuid; diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 0f3cb61a9cfd..918472905e73 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -1258,14 +1258,18 @@ bool kvm_hyperv_expand_features(X86CPU *cpu, Error **errp) cpu->hyperv_interface_id[3] = hv_cpuid_get_host(cs, HV_CPUID_INTERFACE, R_EDX); -cpu->hyperv_version_id[0] = +cpu->hyperv_ver_id_build = hv_cpuid_get_host(cs, HV_CPUID_VERSION, R_EAX); -cpu->hyperv_version_id[1] = -hv_cpuid_get_host(cs, HV_CPUID_VERSION, R_EBX); -cpu->hyperv_version_id[2] = +cpu->hyperv_ver_id_major = +hv_cpuid_get_host(cs, HV_CPUID_VERSION, R_EBX) >> 16; +cpu->hyperv_ver_id_minor = +hv_cpuid_get_host(cs, HV_CPUI
[PATCH v2 6/8] i386: Implement pseudo 'hv-avic' ('hv-apicv') enlightenment
The enlightenment allows to use Hyper-V SynIC with hardware APICv/AVIC enabled. Normally, Hyper-V SynIC disables these hardware features and suggests the guest to use paravirtualized AutoEOI feature. Linux-4.15 gains support for conditional APICv/AVIC disablement, the feature stays on until the guest tries to use AutoEOI feature with SynIC. With 'HV_DEPRECATING_AEOI_RECOMMENDED' bit exposed, modern enough Windows/ Hyper-V versions should follow the recommendation and not use the (unwanted) feature. Signed-off-by: Vitaly Kuznetsov --- docs/hyperv.txt| 10 +- target/i386/cpu.c | 4 target/i386/cpu.h | 1 + target/i386/kvm/hyperv-proto.h | 1 + target/i386/kvm/kvm.c | 10 +- 5 files changed, 24 insertions(+), 2 deletions(-) diff --git a/docs/hyperv.txt b/docs/hyperv.txt index 072709a68f47..cd1ea3bbe9d7 100644 --- a/docs/hyperv.txt +++ b/docs/hyperv.txt @@ -189,7 +189,15 @@ enabled. Requires: hv-vpindex, hv-synic, hv-time, hv-stimer -3.17. hv-no-nonarch-coresharing=on/off/auto +3.18. hv-avic (hv-apicv) +=== +The enlightenment allows to use Hyper-V SynIC with hardware APICv/AVIC enabled. +Normally, Hyper-V SynIC disables these hardware feature and suggests the guest +to use paravirtualized AutoEOI feature. +Note: enabling this feature on old hardware (without APICv/AVIC support) may +have negative effect on guest's performace. + +3.19. hv-no-nonarch-coresharing=on/off/auto === This enlightenment tells guest OS that virtual processors will never share a physical core unless they are reported as sibling SMT threads. This information diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 36e1b6ec9c9b..a695e200d409 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6477,6 +6477,8 @@ static void x86_cpu_initfn(Object *obj) object_property_add_alias(obj, "sse4_1", obj, "sse4.1"); object_property_add_alias(obj, "sse4_2", obj, "sse4.2"); +object_property_add_alias(obj, "hv-apicv", obj, "hv-avic"); + if (xcc->model) { x86_cpu_load_model(cpu, xcc->model); } @@ -6662,6 +6664,8 @@ static Property x86_cpu_properties[] = { HYPERV_FEAT_IPI, 0), DEFINE_PROP_BIT64("hv-stimer-direct", X86CPU, hyperv_features, HYPERV_FEAT_STIMER_DIRECT, 0), +DEFINE_PROP_BIT64("hv-avic", X86CPU, hyperv_features, + HYPERV_FEAT_AVIC, 0), DEFINE_PROP_ON_OFF_AUTO("hv-no-nonarch-coresharing", X86CPU, hyperv_no_nonarch_cs, ON_OFF_AUTO_OFF), DEFINE_PROP_BOOL("hv-passthrough", X86CPU, hyperv_passthrough, false), diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 8822bea5c9a4..d22a8d259967 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1038,6 +1038,7 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS]; #define HYPERV_FEAT_EVMCS 12 #define HYPERV_FEAT_IPI 13 #define HYPERV_FEAT_STIMER_DIRECT 14 +#define HYPERV_FEAT_AVIC15 #ifndef HYPERV_SPINLOCK_NEVER_NOTIFY #define HYPERV_SPINLOCK_NEVER_NOTIFY 0x diff --git a/target/i386/kvm/hyperv-proto.h b/target/i386/kvm/hyperv-proto.h index 5fbb385cc136..89f81afda7c6 100644 --- a/target/i386/kvm/hyperv-proto.h +++ b/target/i386/kvm/hyperv-proto.h @@ -66,6 +66,7 @@ #define HV_APIC_ACCESS_RECOMMENDED (1u << 3) #define HV_SYSTEM_RESET_RECOMMENDED (1u << 4) #define HV_RELAXED_TIMING_RECOMMENDED (1u << 5) +#define HV_DEPRECATING_AEOI_RECOMMENDED (1u << 9) #define HV_CLUSTER_IPI_RECOMMENDED (1u << 10) #define HV_EX_PROCESSOR_MASKS_RECOMMENDED (1u << 11) #define HV_ENLIGHTENED_VMCS_RECOMMENDED (1u << 14) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 430007c2691a..0f3cb61a9cfd 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -924,6 +924,13 @@ static struct { }, .dependencies = BIT(HYPERV_FEAT_STIMER) }, +[HYPERV_FEAT_AVIC] = { +.desc = "AVIC/APICv support (hv-avic/hv-apicv)", +.flags = { +{.func = HV_CPUID_ENLIGHTMENT_INFO, .reg = R_EAX, + .bits = HV_DEPRECATING_AEOI_RECOMMENDED} +} +}, }; static struct kvm_cpuid2 *try_get_hv_cpuid(CPUState *cs, int max, @@ -1373,7 +1380,8 @@ static int hyperv_fill_cpuids(CPUState *cs, c->eax = hv_build_cpuid_leaf(cs, HV_CPUID_ENLIGHTMENT_INFO, R_EAX); c->ebx = cpu->hyperv_spinlock_attempts; -if (hyperv_feat_enabled(cpu, HYPERV_FEAT_VAPIC)) { +if (hyperv_feat_enabled(cpu, HYPERV_FEAT_VAPIC) && +!hyperv_feat_enabled(cpu, HYPERV_FEAT_AVIC)) { c->eax |= HV_APIC_ACCESS_RECOMMENDED; } -- 2.31.1
[PATCH v2 2/8] i386: docs: Briefly describe KVM PV features
KVM PV features don't seem to be documented anywhere, in particular, the fact that some of the features are enabled by default and some are not can only be figured out from the code. Signed-off-by: Vitaly Kuznetsov --- docs/kvm-pv.txt | 92 + 1 file changed, 92 insertions(+) create mode 100644 docs/kvm-pv.txt diff --git a/docs/kvm-pv.txt b/docs/kvm-pv.txt new file mode 100644 index ..84ad7fa60f8d --- /dev/null +++ b/docs/kvm-pv.txt @@ -0,0 +1,92 @@ +KVM paravirtualized features + + + +1. Description +=== +In some cases when implementing a hardware interface in software is slow, KVM +implements its own paravirtualized interfaces. + +2. Setup += +KVM PV features are represented as CPU flags. The following features are enabled +by default for any CPU model when KVM is enabled: + kvmclock + kvm-nopiodelay + kvm-asyncpf + kvm-steal-time + kvm-pv-eoi + kvmclock-stable-bit + +'kvm-msi-ext-dest-id' feature is enabled by default in x2apic mode with split +irqchip (e.g. "-machine ...,kernel-irqchip=split -cpu ...,x2apic"). + +Note: when cpu model 'host' is used, QEMU passes through all KVM PV features +exposed by KVM to the guest. + +3. Existing features + + +3.1. kvmclock + +This feature exposes KVM specific PV clocksource to the guest. + +3.2. kvm-nopiodelay +=== +The guest doesn't need to perform delays on PIO operations. + +3.3. kvm-mmu + +This feature is deprecated. + +3.4. kvm-asyncpf + +Enables asynchronous page fault mechanism. Note: since Linux-5.10 the feature is +deprecated and not enabled by KVM. Use "kvm-asyncpf-int" instead. + +3.5. kvm-steal-time +=== +Enables stolen (when guest vCPU is not running) time accounting. + +3.6. kvm-pv-eoi +=== +Enables paravirtualized end-of-interrupt signaling. + +3.7. kvm-pv-unhalt +== +Enables paravirtualized spinlocks support. + +3.8. kvm-pv-tlb-flush += +Enables paravirtualized TLB flush mechanism. + +3.9. kvm-pv-ipi +=== +Enables paravirtualized IPI mechanism. + +3.10. kvm-poll-control +== +Enables host-side polling on HLT control from the guest. + +3.11. kvm-pv-sched-yield + +Enables paravirtualized sched yield feature. + +3.12. kvm-asyncpf-int += +Enables interrupt based asynchronous page fault mechanism. + +3.13. kvm-msi-ext-dest-id += +Support 'Extended Destination ID' for external interrupts. The feature allows +to use up to 32768 CPUs without IRQ remapping (but other limits may apply making +the number of supported vCPUs for a given configuration lower). + +3.14. kvmclock-stable-bit += +Tells the guest that guest visible TSC value can be fully trusted for kvmclock +computations and no warps are expected. + +4. Useful links + +Please refer to Documentation/virt/kvm in Linux for additional detail. -- 2.31.1
[PATCH 1/3] docs: Briefly describe KVM PV features
KVM PV features don't seem to be documented anywhere, in particular, the fact that some of the features are enabled by default and some are not can only be figured out from the code. Signed-off-by: Vitaly Kuznetsov --- docs/kvm-pv.txt | 92 + 1 file changed, 92 insertions(+) create mode 100644 docs/kvm-pv.txt diff --git a/docs/kvm-pv.txt b/docs/kvm-pv.txt new file mode 100644 index ..84ad7fa60f8d --- /dev/null +++ b/docs/kvm-pv.txt @@ -0,0 +1,92 @@ +KVM paravirtualized features + + + +1. Description +=== +In some cases when implementing a hardware interface in software is slow, KVM +implements its own paravirtualized interfaces. + +2. Setup += +KVM PV features are represented as CPU flags. The following features are enabled +by default for any CPU model when KVM is enabled: + kvmclock + kvm-nopiodelay + kvm-asyncpf + kvm-steal-time + kvm-pv-eoi + kvmclock-stable-bit + +'kvm-msi-ext-dest-id' feature is enabled by default in x2apic mode with split +irqchip (e.g. "-machine ...,kernel-irqchip=split -cpu ...,x2apic"). + +Note: when cpu model 'host' is used, QEMU passes through all KVM PV features +exposed by KVM to the guest. + +3. Existing features + + +3.1. kvmclock + +This feature exposes KVM specific PV clocksource to the guest. + +3.2. kvm-nopiodelay +=== +The guest doesn't need to perform delays on PIO operations. + +3.3. kvm-mmu + +This feature is deprecated. + +3.4. kvm-asyncpf + +Enables asynchronous page fault mechanism. Note: since Linux-5.10 the feature is +deprecated and not enabled by KVM. Use "kvm-asyncpf-int" instead. + +3.5. kvm-steal-time +=== +Enables stolen (when guest vCPU is not running) time accounting. + +3.6. kvm-pv-eoi +=== +Enables paravirtualized end-of-interrupt signaling. + +3.7. kvm-pv-unhalt +== +Enables paravirtualized spinlocks support. + +3.8. kvm-pv-tlb-flush += +Enables paravirtualized TLB flush mechanism. + +3.9. kvm-pv-ipi +=== +Enables paravirtualized IPI mechanism. + +3.10. kvm-poll-control +== +Enables host-side polling on HLT control from the guest. + +3.11. kvm-pv-sched-yield + +Enables paravirtualized sched yield feature. + +3.12. kvm-asyncpf-int += +Enables interrupt based asynchronous page fault mechanism. + +3.13. kvm-msi-ext-dest-id += +Support 'Extended Destination ID' for external interrupts. The feature allows +to use up to 32768 CPUs without IRQ remapping (but other limits may apply making +the number of supported vCPUs for a given configuration lower). + +3.14. kvmclock-stable-bit += +Tells the guest that guest visible TSC value can be fully trusted for kvmclock +computations and no warps are expected. + +4. Useful links + +Please refer to Documentation/virt/kvm in Linux for additional detail. -- 2.31.1
[PATCH 3/3] i386: Support KVM_CAP_HYPERV_ENFORCE_CPUID
By default, KVM allows the guest to use all currently supported Hyper-V enlightenments when Hyper-V CPUID interface was exposed, regardless of if some features were not announced in guest visible CPUIDs. hv-enforce-cpuid feature alters this behavior and only allows the guest to use exposed Hyper-V enlightenments. The feature is supported by Linux >= 5.14 and is not enabled by default in QEMU. Signed-off-by: Vitaly Kuznetsov --- docs/hyperv.txt | 17 ++--- target/i386/cpu.c | 1 + target/i386/cpu.h | 1 + target/i386/kvm/kvm.c | 9 + 4 files changed, 25 insertions(+), 3 deletions(-) diff --git a/docs/hyperv.txt b/docs/hyperv.txt index 000638a2fd38..072709a68f47 100644 --- a/docs/hyperv.txt +++ b/docs/hyperv.txt @@ -203,8 +203,11 @@ When the option is set to 'on' QEMU will always enable the feature, regardless of host setup. To keep guests secure, this can only be used in conjunction with exposing correct vCPU topology and vCPU pinning. -4. Development features - +4. Supplementary features += + +4.1. hv-passthrough +=== In some cases (e.g. during development) it may make sense to use QEMU in 'pass-through' mode and give Windows guests all enlightenments currently supported by KVM. This pass-through mode is enabled by "hv-passthrough" CPU @@ -215,8 +218,16 @@ values from KVM to QEMU. "hv-passthrough" overrides all other "hv-*" settings on the command line. Also, enabling this flag effectively prevents migration as the list of enabled enlightenments may differ between target and destination hosts. +4.2. hv-enforce-cpuid += +By default, KVM allows the guest to use all currently supported Hyper-V +enlightenments when Hyper-V CPUID interface was exposed, regardless of if +some features were not announced in guest visible CPUIDs. 'hv-enforce-cpuid' +feature alters this behavior and only allows the guest to use exposed Hyper-V +enlightenments. + -4. Useful links +5. Useful links Hyper-V Top Level Functional specification and other information: https://github.com/MicrosoftDocs/Virtualization-Documentation diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 0a0d2cddc9d2..1d4c44c8b762 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6642,6 +6642,7 @@ static Property x86_cpu_properties[] = { DEFINE_PROP_ON_OFF_AUTO("hv-no-nonarch-coresharing", X86CPU, hyperv_no_nonarch_cs, ON_OFF_AUTO_OFF), DEFINE_PROP_BOOL("hv-passthrough", X86CPU, hyperv_passthrough, false), +DEFINE_PROP_BOOL("hv-enforce-cpuid", X86CPU, hyperv_enforce_cpuid, false), DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true), DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false), diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 31f1f7caf116..9539f57199fa 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1685,6 +1685,7 @@ struct X86CPU { uint32_t hyperv_version_id[4]; uint32_t hyperv_limits[3]; uint32_t hyperv_nested[4]; +bool hyperv_enforce_cpuid; bool check_cpuid; bool enforce_cpuid; diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 452b04f469b5..ccbea88080fc 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -1519,6 +1519,15 @@ static int hyperv_init_vcpu(X86CPU *cpu) cpu->hyperv_nested[0] = evmcs_version; } +if (cpu->hyperv_enforce_cpuid) { +ret = kvm_vcpu_enable_cap(cs, KVM_CAP_HYPERV_ENFORCE_CPUID, 0, 1); +if (ret < 0) { +error_report("failed to enable KVM_CAP_HYPERV_ENFORCE_CPUID: %s", + strerror(-ret)); +return ret; +} +} + return 0; } -- 2.31.1
[PATCH 2/3] i386: Support KVM_CAP_ENFORCE_PV_FEATURE_CPUID
By default, KVM allows the guest to use all currently supported PV features even when they were not announced in guest visible CPUIDs. Introduce a new "kvm-pv-enforce-cpuid" flag to limit the supported feature set to the exposed features. The feature is supported by Linux >= 5.10 and is not enabled by default in QEMU. Signed-off-by: Vitaly Kuznetsov --- docs/kvm-pv.txt | 13 - target/i386/cpu.c | 2 ++ target/i386/cpu.h | 3 +++ target/i386/kvm/kvm.c | 10 ++ 4 files changed, 27 insertions(+), 1 deletion(-) diff --git a/docs/kvm-pv.txt b/docs/kvm-pv.txt index 84ad7fa60f8d..d1aac533feea 100644 --- a/docs/kvm-pv.txt +++ b/docs/kvm-pv.txt @@ -87,6 +87,17 @@ the number of supported vCPUs for a given configuration lower). Tells the guest that guest visible TSC value can be fully trusted for kvmclock computations and no warps are expected. -4. Useful links +4. Supplementary features += + +4.1. kvm-pv-enforce-cpuid += +By default, KVM allows the guest to use all currently supported PV features even +when they were not announced in guest visible CPUIDs. 'kvm-pv-enforce-cpuid' +feature alters this behavior and limits the supported feature set to the +exposed features only. + + +5. Useful links Please refer to Documentation/virt/kvm in Linux for additional detail. diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 48b55ebd0a67..0a0d2cddc9d2 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6668,6 +6668,8 @@ static Property x86_cpu_properties[] = { DEFINE_PROP_BOOL("l3-cache", X86CPU, enable_l3_cache, true), DEFINE_PROP_BOOL("kvm-no-smi-migration", X86CPU, kvm_no_smi_migration, false), +DEFINE_PROP_BOOL("kvm-pv-enforce-cpuid", X86CPU, kvm_pv_enforce_cpuid, + false), DEFINE_PROP_BOOL("vmware-cpuid-freq", X86CPU, vmware_cpuid_freq, true), DEFINE_PROP_BOOL("tcg-cpuid", X86CPU, expose_tcg, true), DEFINE_PROP_BOOL("x-migrate-smi-count", X86CPU, migrate_smi_count, diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 5d98a4e7c025..31f1f7caf116 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1768,6 +1768,9 @@ struct X86CPU { /* Stop SMI delivery for migration compatibility with old machines */ bool kvm_no_smi_migration; +/* Forcefully disable KVM PV features not exposed in guest CPUIDs */ +bool kvm_pv_enforce_cpuid; + /* Number of physical address bits supported */ uint32_t phys_bits; diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 59ed8327ac13..452b04f469b5 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -1617,6 +1617,16 @@ int kvm_arch_init_vcpu(CPUState *cs) cpu_x86_cpuid(env, 0, 0, , , , ); +if (cpu->kvm_pv_enforce_cpuid) { +r = kvm_vcpu_enable_cap(cs, KVM_CAP_ENFORCE_PV_FEATURE_CPUID, 0, 1); +if (r < 0) { +fprintf(stderr, +"failed to enable KVM_CAP_ENFORCE_PV_FEATURE_CPUID: %s", +strerror(-r)); +abort(); +} +} + for (i = 0; i <= limit; i++) { if (cpuid_i == KVM_MAX_CPUID_ENTRIES) { fprintf(stderr, "unsupported level value: 0x%x\n", limit); -- 2.31.1
[PATCH 0/3] i386/kvm: Paravirtualized features usage enforcement
[I know this is probably too late for 6.1 but maybe the first patch of the series is good as it just adds a missing doc?] By default, KVM doesn't limit the usage of paravirtualized feature (neither native KVM nor Hyper-V) to what was exposed to the guest in CPUIDs making it possible to use all of them. KVM_CAP_HYPERV_ENFORCE_CPUID and KVM_CAP_ENFORCE_PV_FEATURE_CPUID features were recently introduced making it possible to limit available features to what was actually exposed. Add support for these to QEMU. While on it, document all currently supported KVM PV features in docs/kvm-pv.txt. Vitaly Kuznetsov (3): docs: Briefly describe KVM PV features i386: Support KVM_CAP_ENFORCE_PV_FEATURE_CPUID i386: Support KVM_CAP_HYPERV_ENFORCE_CPUID docs/hyperv.txt | 17 +-- docs/kvm-pv.txt | 103 ++ target/i386/cpu.c | 3 ++ target/i386/cpu.h | 4 ++ target/i386/kvm/kvm.c | 19 5 files changed, 143 insertions(+), 3 deletions(-) create mode 100644 docs/kvm-pv.txt -- 2.31.1
Re: [PATCH] qtest/hyperv: Introduce a simple hyper-v test
Andrew Jones writes: > On Fri, Jul 16, 2021 at 02:55:28PM +0200, Vitaly Kuznetsov wrote: >> For the beginning, just test 'hv-passthrough' and a couple of custom >> Hyper-V enlightenments configurations through QMP. Later, it would >> be great to complement this by checking CPUID values from within the >> guest. >> >> Signed-off-by: Vitaly Kuznetsov >> --- >> - Changes since "[PATCH v8 0/9] i386: KVM: expand Hyper-V features early": >> make the test SKIP correctly when KVM is not present. >> --- >> MAINTAINERS | 1 + >> tests/qtest/hyperv-test.c | 228 ++ >> tests/qtest/meson.build | 3 +- >> 3 files changed, 231 insertions(+), 1 deletion(-) >> create mode 100644 tests/qtest/hyperv-test.c >> >> diff --git a/MAINTAINERS b/MAINTAINERS >> index 148153d74f5b..c1afd744edca 100644 >> --- a/MAINTAINERS >> +++ b/MAINTAINERS >> @@ -1576,6 +1576,7 @@ F: hw/isa/apm.c >> F: include/hw/isa/apm.h >> F: tests/unit/test-x86-cpuid.c >> F: tests/qtest/test-x86-cpuid-compat.c >> +F: tests/qtest/hyperv-test.c >> >> PC Chipset >> M: Michael S. Tsirkin >> diff --git a/tests/qtest/hyperv-test.c b/tests/qtest/hyperv-test.c >> new file mode 100644 >> index ..2155e5d90970 >> --- /dev/null >> +++ b/tests/qtest/hyperv-test.c >> @@ -0,0 +1,228 @@ >> +/* >> + * Hyper-V emulation CPU feature test cases >> + * >> + * Copyright (c) 2021 Red Hat Inc. >> + * Authors: >> + * Vitaly Kuznetsov >> + * >> + * This work is licensed under the terms of the GNU GPL, version 2 or later. >> + * See the COPYING file in the top-level directory. >> + */ >> +#include >> +#include >> + >> +#include "qemu/osdep.h" >> +#include "qemu/bitops.h" >> +#include "libqos/libqtest.h" >> +#include "qapi/qmp/qdict.h" >> +#include "qapi/qmp/qjson.h" >> + >> +#define MACHINE_KVM "-machine pc-q35-5.2 -accel kvm " >> +#define QUERY_HEAD "{ 'execute': 'query-cpu-model-expansion', " \ >> +" 'arguments': { 'type': 'full', " >> +#define QUERY_TAIL "}}" >> + >> +static bool kvm_enabled(QTestState *qts) >> +{ >> +QDict *resp, *qdict; >> +bool enabled; >> + >> +resp = qtest_qmp(qts, "{ 'execute': 'query-kvm' }"); >> +g_assert(qdict_haskey(resp, "return")); >> +qdict = qdict_get_qdict(resp, "return"); >> +g_assert(qdict_haskey(qdict, "enabled")); >> +enabled = qdict_get_bool(qdict, "enabled"); >> +qobject_unref(resp); >> + >> +return enabled; >> +} >> + >> +static bool kvm_has_cap(int cap) >> +{ >> +int fd = open("/dev/kvm", O_RDWR); >> +int ret; >> + >> +if (fd < 0) { >> +return false; >> +} >> + >> +ret = ioctl(fd, KVM_CHECK_EXTENSION, cap); >> + >> +close(fd); >> + >> +return ret > 0; >> +} >> + >> +static QDict *do_query_no_props(QTestState *qts, const char *cpu_type) >> +{ >> +return qtest_qmp(qts, QUERY_HEAD "'model': { 'name': %s }" >> + QUERY_TAIL, cpu_type); >> +} >> + >> +static bool resp_has_props(QDict *resp) >> +{ >> +QDict *qdict; >> + >> +g_assert(resp); >> + >> +if (!qdict_haskey(resp, "return")) { >> +return false; >> +} >> +qdict = qdict_get_qdict(resp, "return"); >> + >> +if (!qdict_haskey(qdict, "model")) { >> +return false; >> +} >> +qdict = qdict_get_qdict(qdict, "model"); >> + >> +return qdict_haskey(qdict, "props"); >> +} >> + >> +static QDict *resp_get_props(QDict *resp) >> +{ >> +QDict *qdict; >> + >> +g_assert(resp); >> +g_assert(resp_has_props(resp)); >> + >> +qdict = qdict_get_qdict(resp, "return"); >> +qdict = qdict_get_qdict(qdict, "model"); >> +qdict = qdict_get_qdict(qdict, "props"); >> + >> +return qdict; >> +} >> + >> +static bool resp_get_feature(QDict *resp, const char *feature) >> +{ >> +QDict *props; >> + >> +g_assert(resp); >> +g_assert(resp_has_props(resp)); >> +props = resp_get
[PATCH] qtest/hyperv: Introduce a simple hyper-v test
For the beginning, just test 'hv-passthrough' and a couple of custom Hyper-V enlightenments configurations through QMP. Later, it would be great to complement this by checking CPUID values from within the guest. Signed-off-by: Vitaly Kuznetsov --- - Changes since "[PATCH v8 0/9] i386: KVM: expand Hyper-V features early": make the test SKIP correctly when KVM is not present. --- MAINTAINERS | 1 + tests/qtest/hyperv-test.c | 228 ++ tests/qtest/meson.build | 3 +- 3 files changed, 231 insertions(+), 1 deletion(-) create mode 100644 tests/qtest/hyperv-test.c diff --git a/MAINTAINERS b/MAINTAINERS index 148153d74f5b..c1afd744edca 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1576,6 +1576,7 @@ F: hw/isa/apm.c F: include/hw/isa/apm.h F: tests/unit/test-x86-cpuid.c F: tests/qtest/test-x86-cpuid-compat.c +F: tests/qtest/hyperv-test.c PC Chipset M: Michael S. Tsirkin diff --git a/tests/qtest/hyperv-test.c b/tests/qtest/hyperv-test.c new file mode 100644 index ..2155e5d90970 --- /dev/null +++ b/tests/qtest/hyperv-test.c @@ -0,0 +1,228 @@ +/* + * Hyper-V emulation CPU feature test cases + * + * Copyright (c) 2021 Red Hat Inc. + * Authors: + * Vitaly Kuznetsov + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ +#include +#include + +#include "qemu/osdep.h" +#include "qemu/bitops.h" +#include "libqos/libqtest.h" +#include "qapi/qmp/qdict.h" +#include "qapi/qmp/qjson.h" + +#define MACHINE_KVM "-machine pc-q35-5.2 -accel kvm " +#define QUERY_HEAD "{ 'execute': 'query-cpu-model-expansion', " \ +" 'arguments': { 'type': 'full', " +#define QUERY_TAIL "}}" + +static bool kvm_enabled(QTestState *qts) +{ +QDict *resp, *qdict; +bool enabled; + +resp = qtest_qmp(qts, "{ 'execute': 'query-kvm' }"); +g_assert(qdict_haskey(resp, "return")); +qdict = qdict_get_qdict(resp, "return"); +g_assert(qdict_haskey(qdict, "enabled")); +enabled = qdict_get_bool(qdict, "enabled"); +qobject_unref(resp); + +return enabled; +} + +static bool kvm_has_cap(int cap) +{ +int fd = open("/dev/kvm", O_RDWR); +int ret; + +if (fd < 0) { +return false; +} + +ret = ioctl(fd, KVM_CHECK_EXTENSION, cap); + +close(fd); + +return ret > 0; +} + +static QDict *do_query_no_props(QTestState *qts, const char *cpu_type) +{ +return qtest_qmp(qts, QUERY_HEAD "'model': { 'name': %s }" + QUERY_TAIL, cpu_type); +} + +static bool resp_has_props(QDict *resp) +{ +QDict *qdict; + +g_assert(resp); + +if (!qdict_haskey(resp, "return")) { +return false; +} +qdict = qdict_get_qdict(resp, "return"); + +if (!qdict_haskey(qdict, "model")) { +return false; +} +qdict = qdict_get_qdict(qdict, "model"); + +return qdict_haskey(qdict, "props"); +} + +static QDict *resp_get_props(QDict *resp) +{ +QDict *qdict; + +g_assert(resp); +g_assert(resp_has_props(resp)); + +qdict = qdict_get_qdict(resp, "return"); +qdict = qdict_get_qdict(qdict, "model"); +qdict = qdict_get_qdict(qdict, "props"); + +return qdict; +} + +static bool resp_get_feature(QDict *resp, const char *feature) +{ +QDict *props; + +g_assert(resp); +g_assert(resp_has_props(resp)); +props = resp_get_props(resp); +g_assert(qdict_get(props, feature)); +return qdict_get_bool(props, feature); +} + +#define assert_has_feature(qts, cpu_type, feature) \ +({ \ +QDict *_resp = do_query_no_props(qts, cpu_type); \ +g_assert(_resp); \ +g_assert(resp_has_props(_resp)); \ +g_assert(qdict_get(resp_get_props(_resp), feature)); \ +qobject_unref(_resp); \ +}) + +#define resp_assert_feature(resp, feature, expected_value) \ +({ \ +QDict *_props; \ + \ +g_assert(_resp); \ +g_assert(resp_has_props(_resp)); \ +_props = resp_get_props(_resp);\ +g_assert(qdict_get(_props, feature));
Re: [PATCH v8 9/9] qtest/hyperv: Introduce a simple hyper-v test
Igor Mammedov writes: > On Thu, 8 Jul 2021 17:02:22 -0400 > Eduardo Habkost wrote: > >> On Tue, Jun 08, 2021 at 02:08:17PM +0200, Vitaly Kuznetsov wrote: >> > For the beginning, just test 'hv-passthrough' and a couple of custom >> > Hyper-V enlightenments configurations through QMP. Later, it would >> > be great to complement this by checking CPUID values from within the >> > guest. >> > >> > Signed-off-by: Vitaly Kuznetsov >> [...] >> > +static bool kvm_has_sys_hyperv_cpuid(void) >> > +{ >> > +int fd = open("/dev/kvm", O_RDWR); >> > +int ret; >> > + >> > +g_assert(fd > 0); >> g_assert() was an overkill, just 'return false' would do. >> This crashes when /dev/kvm doesn't exist. See: >> https://gitlab.com/ehabkost/qemu/-/jobs/1404084459 > > maybe reuse qtest_has_accel() > https://lists.gnu.org/archive/html/qemu-devel/2021-06/msg06864.html > > instead of op encoding it. The purpose of this function is to check if KVM_CAP_SYS_HYPERV_CPUID is supported by KVM. It is certainly unsupported when KVM is not present :-) but an ioctl() is needed when it is. We already have a similar check in tests/qtest/migration-test.c where we test for KVM_CAP_DIRTY_LOG_RING, maybe we can create a library function but we don't seem to have any KVM-specific stuff in qtest at this moment ... >> I'm removing it from the queue. I'll fix g_assert() and send as a separate patch if it's fine. -- Vitaly
[PATCH 2/2] i386: Fix coding style in kvm_hyperv_expand_features()
QEMU coding style requires braces around bodies of ifs. Reported-by: Peter Maydell Signed-off-by: Vitaly Kuznetsov --- target/i386/kvm/kvm.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index e69abe48e3f8..28ca682b1089 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -1219,8 +1219,9 @@ bool kvm_hyperv_expand_features(X86CPU *cpu, Error **errp) Error *local_err = NULL; int feat; -if (!hyperv_enabled(cpu)) +if (!hyperv_enabled(cpu)) { return true; +} /* * When kvm_hyperv_expand_features is called at CPU feature expansion @@ -1228,8 +1229,9 @@ bool kvm_hyperv_expand_features(X86CPU *cpu, Error **errp) * when KVM_CAP_SYS_HYPERV_CPUID is supported. */ if (!cs->kvm_state && -!kvm_check_extension(kvm_state, KVM_CAP_SYS_HYPERV_CPUID)) +!kvm_check_extension(kvm_state, KVM_CAP_SYS_HYPERV_CPUID)) { return true; +} if (cpu->hyperv_passthrough) { cpu->hyperv_vendor_id[0] = -- 2.31.1
[PATCH 1/2] i386: assert 'cs->kvm_state' is not null
Coverity reports potential NULL pointer dereference in get_supported_hv_cpuid_legacy() when 'cs->kvm_state' is NULL. While 'cs->kvm_state' can indeed be NULL in hv_cpuid_get_host(), kvm_hyperv_expand_features() makes sure that it only happens when KVM_CAP_SYS_HYPERV_CPUID is supported and KVM_CAP_SYS_HYPERV_CPUID implies KVM_CAP_HYPERV_CPUID so get_supported_hv_cpuid_legacy() is never really called. Add asserts to strengthen the protection against broken KVM behavior. Coverity: CID 1458243 Signed-off-by: Vitaly Kuznetsov --- target/i386/kvm/kvm.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 59ed8327ac13..e69abe48e3f8 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -974,6 +974,12 @@ static struct kvm_cpuid2 *get_supported_hv_cpuid(CPUState *cs) do_sys_ioctl = kvm_check_extension(kvm_state, KVM_CAP_SYS_HYPERV_CPUID) > 0; +/* + * Non-empty KVM context is needed when KVM_CAP_SYS_HYPERV_CPUID is + * unsupported, kvm_hyperv_expand_features() checks for that. + */ +assert(do_sys_ioctl || cs->kvm_state); + /* * When the buffer is too small, KVM_GET_SUPPORTED_HV_CPUID fails with * -E2BIG, however, it doesn't report back the right size. Keep increasing @@ -1105,6 +,14 @@ static uint32_t hv_cpuid_get_host(CPUState *cs, uint32_t func, int reg) if (kvm_check_extension(kvm_state, KVM_CAP_HYPERV_CPUID) > 0) { cpuid = get_supported_hv_cpuid(cs); } else { +/* + * 'cs->kvm_state' may be NULL when Hyper-V features are expanded + * before KVM context is created but this is only done when + * KVM_CAP_SYS_HYPERV_CPUID is supported and it implies + * KVM_CAP_HYPERV_CPUID. + */ +assert(cs->kvm_state); + cpuid = get_supported_hv_cpuid_legacy(cs); } hv_cpuid_cache = cpuid; -- 2.31.1
Re: [PULL 04/11] i386: expand Hyper-V features during CPU feature expansion time
Peter Maydell writes: > On Tue, 13 Jul 2021 at 17:19, Eduardo Habkost wrote: >> >> From: Vitaly Kuznetsov >> >> To make Hyper-V features appear in e.g. QMP query-cpu-model-expansion we >> need to expand and set the corresponding CPUID leaves early. Modify >> x86_cpu_get_supported_feature_word() to call newly intoduced Hyper-V >> specific kvm_hv_get_supported_cpuid() instead of >> kvm_arch_get_supported_cpuid(). We can't use kvm_arch_get_supported_cpuid() >> as Hyper-V specific CPUID leaves intersect with KVM's. >> >> Note, early expansion will only happen when KVM supports system wide >> KVM_GET_SUPPORTED_HV_CPUID ioctl (KVM_CAP_SYS_HYPERV_CPUID). >> >> Reviewed-by: Eduardo Habkost >> Signed-off-by: Vitaly Kuznetsov >> Message-Id: <20210608120817.1325125-6-vkuzn...@redhat.com> >> Signed-off-by: Eduardo Habkost > > Hi; Coverity reports an issue in this code (CID 1458243): > >> -static bool hyperv_expand_features(CPUState *cs, Error **errp) >> +bool kvm_hyperv_expand_features(X86CPU *cpu, Error **errp) >> { >> -X86CPU *cpu = X86_CPU(cs); >> +CPUState *cs = CPU(cpu); >> >> if (!hyperv_enabled(cpu)) >> return true; >> >> +/* >> + * When kvm_hyperv_expand_features is called at CPU feature expansion >> + * time per-CPU kvm_state is not available yet so we can only proceed >> + * when KVM_CAP_SYS_HYPERV_CPUID is supported. >> + */ >> +if (!cs->kvm_state && >> +!kvm_check_extension(kvm_state, KVM_CAP_SYS_HYPERV_CPUID)) >> +return true; > > Here we check whether cs->kvm_state is NULL, but even if it is > NULL we can still continue execution further through the function. > > Later in the function we call hv_cpuid_get_host(), which in turn > can call get_supported_hv_cpuid_legacy(), which can dereference > cs->kvm_state without checking it. get_supported_hv_cpuid_legacy() is only called when KVM_CAP_HYPERV_CPUID is not supported and this is not possible with KVM_CAP_SYS_HYPERV_CPUID. Coverity, of course, can't know that. > > So either the check on cs->kvm_state above is unnecessary, or we > need to handle it being NULL in some way other than falling through. It seems an assert(cs) before calling get_supported_hv_cpuid_legacy() (with a proper comment) should do the job. > > Side note: this change isn't in line with our coding style, which > requires braces around the body of the if(). My bad, will fix. -- Vitaly
Re: [PATCH v8 3/9] i386: hardcode supported eVMCS version to '1'
Eduardo Habkost writes: > On Tue, Jun 08, 2021 at 02:08:11PM +0200, Vitaly Kuznetsov wrote: >> Currently, the only eVMCS version, supported by KVM (and described in TLFS) >> is '1'. When Enlightened VMCS feature is enabled, QEMU takes the supported >> eVMCS version range (from KVM_CAP_HYPERV_ENLIGHTENED_VMCS enablement) and >> puts it to guest visible CPUIDs. When (and if) eVMCS ver.2 appears a >> problem on migration is expected: it doesn't seem to be possible to migrate >> from a host supporting eVMCS ver.2 to a host, which only support eVMCS >> ver.1. > > Should we rewrite this as "it wouldn't be possible to migrate", > as this patch fixes the problem and makes it possible? Yes, no problem with such amendment. Currently, there's no issue as EVMCSv2 just doesn't exist. We, however, expect it to appear some time in the future and this change allows us to re-use KVM_CAP_HYPERV_ENLIGHTENED_VMCS in KVM without (potentially) breaking migrations. Note: the migration will only be broken when we migrate to KVM/QEMU which does not support EVMCSv2 *and* when the guest is already using it. As we expose the range of supported versions, it is possible that guests (esp. older Hyper-V versions) will stick to 'v1' even when 'v2' is supported. > >> >> Hardcode eVMCS ver.1 as the result of 'hv-evmcs' enablement for now. Newer >> eVMCS versions will have to have their own enablement options (e.g. >> 'hv-evmcs=2'). >> >> Signed-off-by: Vitaly Kuznetsov > > Reviewed-by: Eduardo Habkost Thanks! Please let me know if expect v9 with amended commit message or if you're able to alter it upon commit. -- Vitaly
[PATCH v8 9/9] qtest/hyperv: Introduce a simple hyper-v test
For the beginning, just test 'hv-passthrough' and a couple of custom Hyper-V enlightenments configurations through QMP. Later, it would be great to complement this by checking CPUID values from within the guest. Signed-off-by: Vitaly Kuznetsov --- MAINTAINERS | 1 + tests/qtest/hyperv-test.c | 221 ++ tests/qtest/meson.build | 3 +- 3 files changed, 224 insertions(+), 1 deletion(-) create mode 100644 tests/qtest/hyperv-test.c diff --git a/MAINTAINERS b/MAINTAINERS index 7d9cd2904264..6345bad461e8 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1545,6 +1545,7 @@ F: hw/isa/apm.c F: include/hw/isa/apm.h F: tests/unit/test-x86-cpuid.c F: tests/qtest/test-x86-cpuid-compat.c +F: tests/qtest/hyperv-test.c PC Chipset M: Michael S. Tsirkin diff --git a/tests/qtest/hyperv-test.c b/tests/qtest/hyperv-test.c new file mode 100644 index ..88f7a19e4a85 --- /dev/null +++ b/tests/qtest/hyperv-test.c @@ -0,0 +1,221 @@ +/* + * Hyper-V emulation CPU feature test cases + * + * Copyright (c) 2021 Red Hat Inc. + * Authors: + * Vitaly Kuznetsov + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ +#include +#include + +#include "qemu/osdep.h" +#include "qemu/bitops.h" +#include "libqos/libqtest.h" +#include "qapi/qmp/qdict.h" +#include "qapi/qmp/qjson.h" + +#define MACHINE_KVM "-machine pc-q35-5.2 -accel kvm " +#define QUERY_HEAD "{ 'execute': 'query-cpu-model-expansion', " \ +" 'arguments': { 'type': 'full', " +#define QUERY_TAIL "}}" + +static bool kvm_enabled(QTestState *qts) +{ +QDict *resp, *qdict; +bool enabled; + +resp = qtest_qmp(qts, "{ 'execute': 'query-kvm' }"); +g_assert(qdict_haskey(resp, "return")); +qdict = qdict_get_qdict(resp, "return"); +g_assert(qdict_haskey(qdict, "enabled")); +enabled = qdict_get_bool(qdict, "enabled"); +qobject_unref(resp); + +return enabled; +} + +static bool kvm_has_sys_hyperv_cpuid(void) +{ +int fd = open("/dev/kvm", O_RDWR); +int ret; + +g_assert(fd > 0); + +ret = ioctl(fd, KVM_CHECK_EXTENSION, KVM_CAP_SYS_HYPERV_CPUID); + +close(fd); + +return ret > 0; +} + +static QDict *do_query_no_props(QTestState *qts, const char *cpu_type) +{ +return qtest_qmp(qts, QUERY_HEAD "'model': { 'name': %s }" + QUERY_TAIL, cpu_type); +} + +static bool resp_has_props(QDict *resp) +{ +QDict *qdict; + +g_assert(resp); + +if (!qdict_haskey(resp, "return")) { +return false; +} +qdict = qdict_get_qdict(resp, "return"); + +if (!qdict_haskey(qdict, "model")) { +return false; +} +qdict = qdict_get_qdict(qdict, "model"); + +return qdict_haskey(qdict, "props"); +} + +static QDict *resp_get_props(QDict *resp) +{ +QDict *qdict; + +g_assert(resp); +g_assert(resp_has_props(resp)); + +qdict = qdict_get_qdict(resp, "return"); +qdict = qdict_get_qdict(qdict, "model"); +qdict = qdict_get_qdict(qdict, "props"); + +return qdict; +} + +static bool resp_get_feature(QDict *resp, const char *feature) +{ +QDict *props; + +g_assert(resp); +g_assert(resp_has_props(resp)); +props = resp_get_props(resp); +g_assert(qdict_get(props, feature)); +return qdict_get_bool(props, feature); +} + +#define assert_has_feature(qts, cpu_type, feature) \ +({ \ +QDict *_resp = do_query_no_props(qts, cpu_type); \ +g_assert(_resp); \ +g_assert(resp_has_props(_resp)); \ +g_assert(qdict_get(resp_get_props(_resp), feature)); \ +qobject_unref(_resp); \ +}) + +#define resp_assert_feature(resp, feature, expected_value) \ +({ \ +QDict *_props; \ + \ +g_assert(_resp); \ +g_assert(resp_has_props(_resp)); \ +_props = resp_get_props(_resp);\ +g_assert(qdict_get(_props, feature)); \ +g_assert(qdict_get_bool(_props, feature) == (expected_value)); \ +}) + +#define assert_feature(qts, cpu_type, feature, expected_value) \ +({
[PATCH v8 8/9] i386: Hyper-V SynIC requires POST_MESSAGES/SIGNAL_EVENTS privileges
When Hyper-V SynIC is enabled, we may need to allow Windows guests to make hypercalls (POST_MESSAGES/SIGNAL_EVENTS). No issue is currently observed because KVM is very permissive, allowing these hypercalls regarding of guest visible CPUid bits. Reviewed-by: Eduardo Habkost Signed-off-by: Vitaly Kuznetsov --- target/i386/kvm/hyperv-proto.h | 6 ++ target/i386/kvm/kvm.c | 6 ++ 2 files changed, 12 insertions(+) diff --git a/target/i386/kvm/hyperv-proto.h b/target/i386/kvm/hyperv-proto.h index e30d64b4ade4..5fbb385cc136 100644 --- a/target/i386/kvm/hyperv-proto.h +++ b/target/i386/kvm/hyperv-proto.h @@ -38,6 +38,12 @@ #define HV_ACCESS_FREQUENCY_MSRS (1u << 11) #define HV_ACCESS_REENLIGHTENMENTS_CONTROL (1u << 13) +/* + * HV_CPUID_FEATURES.EBX bits + */ +#define HV_POST_MESSAGES (1u << 4) +#define HV_SIGNAL_EVENTS (1u << 5) + /* * HV_CPUID_FEATURES.EDX bits */ diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 33830117fa31..260c563d59a3 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -1343,6 +1343,12 @@ static int hyperv_fill_cpuids(CPUState *cs, /* Unconditionally required with any Hyper-V enlightenment */ c->eax |= HV_HYPERCALL_AVAILABLE; +/* SynIC and Vmbus devices require messages/signals hypercalls */ +if (hyperv_feat_enabled(cpu, HYPERV_FEAT_SYNIC) && +!cpu->hyperv_synic_kvm_only) { +c->ebx |= HV_POST_MESSAGES | HV_SIGNAL_EVENTS; +} + /* Not exposed by KVM but needed to make CPU hotplug in Windows work */ c->edx |= HV_CPU_DYNAMIC_PARTITIONING_AVAILABLE; -- 2.31.1
[PATCH v8 3/9] i386: hardcode supported eVMCS version to '1'
Currently, the only eVMCS version, supported by KVM (and described in TLFS) is '1'. When Enlightened VMCS feature is enabled, QEMU takes the supported eVMCS version range (from KVM_CAP_HYPERV_ENLIGHTENED_VMCS enablement) and puts it to guest visible CPUIDs. When (and if) eVMCS ver.2 appears a problem on migration is expected: it doesn't seem to be possible to migrate from a host supporting eVMCS ver.2 to a host, which only support eVMCS ver.1. Hardcode eVMCS ver.1 as the result of 'hv-evmcs' enablement for now. Newer eVMCS versions will have to have their own enablement options (e.g. 'hv-evmcs=2'). Signed-off-by: Vitaly Kuznetsov --- docs/hyperv.txt | 2 +- target/i386/kvm/kvm.c | 39 +++ 2 files changed, 36 insertions(+), 5 deletions(-) diff --git a/docs/hyperv.txt b/docs/hyperv.txt index a51953daa833..000638a2fd38 100644 --- a/docs/hyperv.txt +++ b/docs/hyperv.txt @@ -170,7 +170,7 @@ Recommended: hv-frequencies 3.16. hv-evmcs === The enlightenment is nested specific, it targets Hyper-V on KVM guests. When -enabled, it provides Enlightened VMCS feature to the guest. The feature +enabled, it provides Enlightened VMCS version 1 feature to the guest. The feature implements paravirtualized protocol between L0 (KVM) and L1 (Hyper-V) hypervisors making L2 exits to the hypervisor faster. The feature is Intel-only. Note: some virtualization features (e.g. Posted Interrupts) are disabled when diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index c676ee8b38a7..13d63f576b88 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -1406,6 +1406,21 @@ static int hyperv_fill_cpuids(CPUState *cs, static Error *hv_passthrough_mig_blocker; static Error *hv_no_nonarch_cs_mig_blocker; +/* Checks that the exposed eVMCS version range is supported by KVM */ +static bool evmcs_version_supported(uint16_t evmcs_version, +uint16_t supported_evmcs_version) +{ +uint8_t min_version = evmcs_version & 0xff; +uint8_t max_version = evmcs_version >> 8; +uint8_t min_supported_version = supported_evmcs_version & 0xff; +uint8_t max_supported_version = supported_evmcs_version >> 8; + +return (min_version >= min_supported_version) && +(max_version <= max_supported_version); +} + +#define DEFAULT_EVMCS_VERSION ((1 << 8) | 1) + static int hyperv_init_vcpu(X86CPU *cpu) { CPUState *cs = CPU(cpu); @@ -1485,17 +1500,33 @@ static int hyperv_init_vcpu(X86CPU *cpu) } if (hyperv_feat_enabled(cpu, HYPERV_FEAT_EVMCS)) { -uint16_t evmcs_version; +uint16_t evmcs_version = DEFAULT_EVMCS_VERSION; +uint16_t supported_evmcs_version; ret = kvm_vcpu_enable_cap(cs, KVM_CAP_HYPERV_ENLIGHTENED_VMCS, 0, - (uintptr_t)_version); + (uintptr_t)_evmcs_version); +/* + * KVM is required to support EVMCS ver.1. as that's what 'hv-evmcs' + * option sets. Note: we hardcode the maximum supported eVMCS version + * to '1' as well so 'hv-evmcs' feature is migratable even when (and if) + * ver.2 is implemented. A new option (e.g. 'hv-evmcs=2') will then have + * to be added. + */ if (ret < 0) { -fprintf(stderr, "Hyper-V %s is not supported by kernel\n", -kvm_hyperv_properties[HYPERV_FEAT_EVMCS].desc); +error_report("Hyper-V %s is not supported by kernel", + kvm_hyperv_properties[HYPERV_FEAT_EVMCS].desc); return ret; } +if (!evmcs_version_supported(evmcs_version, supported_evmcs_version)) { +error_report("eVMCS version range [%d..%d] is not supported by " + "kernel (supported: [%d..%d])", evmcs_version & 0xff, + evmcs_version >> 8, supported_evmcs_version & 0xff, + supported_evmcs_version >> 8); +return -ENOTSUP; +} + cpu->hyperv_nested[0] = evmcs_version; } -- 2.31.1
[PATCH v8 7/9] i386: HV_HYPERCALL_AVAILABLE privilege bit is always needed
According to TLFS, Hyper-V guest is supposed to check HV_HYPERCALL_AVAILABLE privilege bit before accessing HV_X64_MSR_GUEST_OS_ID/HV_X64_MSR_HYPERCALL MSRs but at least some Windows versions ignore that. As KVM is very permissive and allows accessing these MSRs unconditionally, no issue is observed. We may, however, want to tighten the checks eventually. Conforming to the spec is probably also a good idea. Enable HV_HYPERCALL_AVAILABLE bit unconditionally. Reviewed-by: Eduardo Habkost Signed-off-by: Vitaly Kuznetsov --- target/i386/kvm/kvm.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 1cce0969067e..33830117fa31 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -810,8 +810,6 @@ static struct { [HYPERV_FEAT_RELAXED] = { .desc = "relaxed timing (hv-relaxed)", .flags = { -{.func = HV_CPUID_FEATURES, .reg = R_EAX, - .bits = HV_HYPERCALL_AVAILABLE}, {.func = HV_CPUID_ENLIGHTMENT_INFO, .reg = R_EAX, .bits = HV_RELAXED_TIMING_RECOMMENDED} } @@ -820,7 +818,7 @@ static struct { .desc = "virtual APIC (hv-vapic)", .flags = { {.func = HV_CPUID_FEATURES, .reg = R_EAX, - .bits = HV_HYPERCALL_AVAILABLE | HV_APIC_ACCESS_AVAILABLE}, + .bits = HV_APIC_ACCESS_AVAILABLE}, {.func = HV_CPUID_ENLIGHTMENT_INFO, .reg = R_EAX, .bits = HV_APIC_ACCESS_RECOMMENDED} } @@ -829,8 +827,7 @@ static struct { .desc = "clocksources (hv-time)", .flags = { {.func = HV_CPUID_FEATURES, .reg = R_EAX, - .bits = HV_HYPERCALL_AVAILABLE | HV_TIME_REF_COUNT_AVAILABLE | - HV_REFERENCE_TSC_AVAILABLE} + .bits = HV_TIME_REF_COUNT_AVAILABLE | HV_REFERENCE_TSC_AVAILABLE} } }, [HYPERV_FEAT_CRASH] = { @@ -1343,6 +1340,9 @@ static int hyperv_fill_cpuids(CPUState *cs, c->ebx = hv_build_cpuid_leaf(cs, HV_CPUID_FEATURES, R_EBX); c->edx = hv_build_cpuid_leaf(cs, HV_CPUID_FEATURES, R_EDX); +/* Unconditionally required with any Hyper-V enlightenment */ +c->eax |= HV_HYPERCALL_AVAILABLE; + /* Not exposed by KVM but needed to make CPU hotplug in Windows work */ c->edx |= HV_CPU_DYNAMIC_PARTITIONING_AVAILABLE; -- 2.31.1
[PATCH v8 6/9] i386: kill off hv_cpuid_check_and_set()
hv_cpuid_check_and_set() does too much: - Checks if the feature is supported by KVM; - Checks if all dependencies are enabled; - Sets the feature bit in cpu->hyperv_features for 'passthrough' mode. To reduce the complexity, move all the logic except for dependencies check out of it. Also, in 'passthrough' mode we don't really need to check dependencies because KVM is supposed to provide a consistent set anyway. Reviewed-by: Eduardo Habkost Signed-off-by: Vitaly Kuznetsov --- target/i386/kvm/kvm.c | 104 +++--- 1 file changed, 36 insertions(+), 68 deletions(-) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index b679dfdfc655..1cce0969067e 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -1145,16 +1145,12 @@ static bool hyperv_feature_supported(CPUState *cs, int feature) return true; } -static int hv_cpuid_check_and_set(CPUState *cs, int feature, Error **errp) +/* Checks that all feature dependencies are enabled */ +static bool hv_feature_check_deps(X86CPU *cpu, int feature, Error **errp) { -X86CPU *cpu = X86_CPU(cs); uint64_t deps; int dep_feat; -if (!hyperv_feat_enabled(cpu, feature) && !cpu->hyperv_passthrough) { -return 0; -} - deps = kvm_hyperv_properties[feature].dependencies; while (deps) { dep_feat = ctz64(deps); @@ -1162,26 +1158,12 @@ static int hv_cpuid_check_and_set(CPUState *cs, int feature, Error **errp) error_setg(errp, "Hyper-V %s requires Hyper-V %s", kvm_hyperv_properties[feature].desc, kvm_hyperv_properties[dep_feat].desc); -return 1; +return false; } deps &= ~(1ull << dep_feat); } -if (!hyperv_feature_supported(cs, feature)) { -if (hyperv_feat_enabled(cpu, feature)) { -error_setg(errp, "Hyper-V %s is not supported by kernel", - kvm_hyperv_properties[feature].desc); -return 1; -} else { -return 0; -} -} - -if (cpu->hyperv_passthrough) { -cpu->hyperv_features |= BIT(feature); -} - -return 0; +return true; } static uint32_t hv_build_cpuid_leaf(CPUState *cs, uint32_t func, int reg) @@ -1220,6 +1202,8 @@ static uint32_t hv_build_cpuid_leaf(CPUState *cs, uint32_t func, int reg) bool kvm_hyperv_expand_features(X86CPU *cpu, Error **errp) { CPUState *cs = CPU(cpu); +Error *local_err = NULL; +int feat; if (!hyperv_enabled(cpu)) return true; @@ -1275,53 +1259,37 @@ bool kvm_hyperv_expand_features(X86CPU *cpu, Error **errp) cpu->hyperv_spinlock_attempts = hv_cpuid_get_host(cs, HV_CPUID_ENLIGHTMENT_INFO, R_EBX); -} -/* Features */ -if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_RELAXED, errp)) { -return false; -} -if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_VAPIC, errp)) { -return false; -} -if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_TIME, errp)) { -return false; -} -if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_CRASH, errp)) { -return false; -} -if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_RESET, errp)) { -return false; -} -if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_VPINDEX, errp)) { -return false; -} -if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_RUNTIME, errp)) { -return false; -} -if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_SYNIC, errp)) { -return false; -} -if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_STIMER, errp)) { -return false; -} -if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_FREQUENCIES, errp)) { -return false; -} -if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_REENLIGHTENMENT, errp)) { -return false; -} -if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_TLBFLUSH, errp)) { -return false; -} -if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_EVMCS, errp)) { -return false; -} -if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_IPI, errp)) { -return false; -} -if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_STIMER_DIRECT, errp)) { -return false; +/* + * Mark feature as enabled in 'cpu->hyperv_features' as + * hv_build_cpuid_leaf() uses this info to build guest CPUIDs. + */ +for (feat = 0; feat < ARRAY_SIZE(kvm_hyperv_properties); feat++) { +if (hyperv_feature_supported(cs, feat)) { +cpu->hyperv_features |= BIT(feat); +} +} +} else { +/* Check features availability and dependencies */ +for (feat = 0; feat < ARRAY_SIZE(kvm_hyperv_properties); feat++) { +/* If the feature was not requested skip it. */ +if (!hyperv_feat_enabled(cpu, feat)) { +continue; +
[PATCH v8 5/9] i386: expand Hyper-V features during CPU feature expansion time
To make Hyper-V features appear in e.g. QMP query-cpu-model-expansion we need to expand and set the corresponding CPUID leaves early. Modify x86_cpu_get_supported_feature_word() to call newly intoduced Hyper-V specific kvm_hv_get_supported_cpuid() instead of kvm_arch_get_supported_cpuid(). We can't use kvm_arch_get_supported_cpuid() as Hyper-V specific CPUID leaves intersect with KVM's. Note, early expansion will only happen when KVM supports system wide KVM_GET_SUPPORTED_HV_CPUID ioctl (KVM_CAP_SYS_HYPERV_CPUID). Reviewed-by: Eduardo Habkost Signed-off-by: Vitaly Kuznetsov --- target/i386/cpu.c | 4 target/i386/kvm/kvm-stub.c | 5 + target/i386/kvm/kvm.c | 24 target/i386/kvm/kvm_i386.h | 1 + 4 files changed, 30 insertions(+), 4 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index f8ae45be0d53..c5d19216787c 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -5990,6 +5990,10 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp) if (env->cpuid_xlevel2 == UINT32_MAX) { env->cpuid_xlevel2 = env->cpuid_min_xlevel2; } + +if (kvm_enabled()) { +kvm_hyperv_expand_features(cpu, errp); +} } /* diff --git a/target/i386/kvm/kvm-stub.c b/target/i386/kvm/kvm-stub.c index 92f49121b8fa..f6e7e4466e1a 100644 --- a/target/i386/kvm/kvm-stub.c +++ b/target/i386/kvm/kvm-stub.c @@ -39,3 +39,8 @@ bool kvm_hv_vpindex_settable(void) { return false; } + +bool kvm_hyperv_expand_features(X86CPU *cpu, Error **errp) +{ +abort(); +} diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 1e6f3c483e28..b679dfdfc655 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -1217,13 +1217,22 @@ static uint32_t hv_build_cpuid_leaf(CPUState *cs, uint32_t func, int reg) * of 'hv_passthrough' mode and fills the environment with all supported * Hyper-V features. */ -static bool hyperv_expand_features(CPUState *cs, Error **errp) +bool kvm_hyperv_expand_features(X86CPU *cpu, Error **errp) { -X86CPU *cpu = X86_CPU(cs); +CPUState *cs = CPU(cpu); if (!hyperv_enabled(cpu)) return true; +/* + * When kvm_hyperv_expand_features is called at CPU feature expansion + * time per-CPU kvm_state is not available yet so we can only proceed + * when KVM_CAP_SYS_HYPERV_CPUID is supported. + */ +if (!cs->kvm_state && +!kvm_check_extension(kvm_state, KVM_CAP_SYS_HYPERV_CPUID)) +return true; + if (cpu->hyperv_passthrough) { cpu->hyperv_vendor_id[0] = hv_cpuid_get_host(cs, HV_CPUID_VENDOR_AND_MAX_FUNCTIONS, R_EBX); @@ -1590,8 +1599,15 @@ int kvm_arch_init_vcpu(CPUState *cs) env->apic_bus_freq = KVM_APIC_BUS_FREQUENCY; -/* Paravirtualization CPUIDs */ -if (!hyperv_expand_features(cs, _err)) { +/* + * kvm_hyperv_expand_features() is called here for the second time in case + * KVM_CAP_SYS_HYPERV_CPUID is not supported. While we can't possibly handle + * 'query-cpu-model-expansion' in this case as we don't have a KVM vCPU to + * check which Hyper-V enlightenments are supported and which are not, we + * can still proceed and check/expand Hyper-V enlightenments here so legacy + * behavior is preserved. + */ +if (!kvm_hyperv_expand_features(cpu, _err)) { error_report_err(local_err); return -ENOSYS; } diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h index dc725083891c..54667b35f09c 100644 --- a/target/i386/kvm/kvm_i386.h +++ b/target/i386/kvm/kvm_i386.h @@ -47,6 +47,7 @@ bool kvm_has_x2apic_api(void); bool kvm_has_waitpkg(void); bool kvm_hv_vpindex_settable(void); +bool kvm_hyperv_expand_features(X86CPU *cpu, Error **errp); uint64_t kvm_swizzle_msi_ext_dest_id(uint64_t address); -- 2.31.1
[PATCH v8 2/9] i386: clarify 'hv-passthrough' behavior
Clarify the fact that 'hv-passthrough' only enables features which are already known to QEMU and that it overrides all other 'hv-*' settings. Reviewed-by: Eduardo Habkost Signed-off-by: Vitaly Kuznetsov --- docs/hyperv.txt | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/docs/hyperv.txt b/docs/hyperv.txt index e53c581f4586..a51953daa833 100644 --- a/docs/hyperv.txt +++ b/docs/hyperv.txt @@ -209,8 +209,11 @@ In some cases (e.g. during development) it may make sense to use QEMU in 'pass-through' mode and give Windows guests all enlightenments currently supported by KVM. This pass-through mode is enabled by "hv-passthrough" CPU flag. -Note: enabling this flag effectively prevents migration as supported features -may differ between target and destination. +Note: "hv-passthrough" flag only enables enlightenments which are known to QEMU +(have corresponding "hv-*" flag) and copies "hv-spinlocks="/"hv-vendor-id=" +values from KVM to QEMU. "hv-passthrough" overrides all other "hv-*" settings on +the command line. Also, enabling this flag effectively prevents migration as the +list of enabled enlightenments may differ between target and destination hosts. 4. Useful links -- 2.31.1
[PATCH v8 0/9] i386: KVM: expand Hyper-V features early
Changes since v7: - Make eVMCS version check future proof [Eduardo] - Collect R-b tags [Eduardo] - Drop 'if (!strcmp(arch, "i386") || !strcmp(arch, "x86_64"))' check from qtest [Eduardo] - s/priviliges/privileges/ [Eric] The last two functional patches are inspired by 'Fine-grained access check to Hyper-V hypercalls and MSRs' work for KVM: https://lore.kernel.org/kvm/20210521095204.2161214-1-vkuzn...@redhat.com/ Original description: Upper layer tools like libvirt want to figure out which Hyper-V features are supported by the underlying stack (QEMU/KVM) but currently they are unable to do so. We have a nice 'hv_passthrough' CPU flag supported by QEMU but it has no effect on e.g. QMP's query-cpu-model-expansion type=full model={"name":"host","props":{"hv-passthrough":true}} command as we parse Hyper-V features after creating KVM vCPUs and not at feature expansion time. To support the use-case we first need to make KVM_GET_SUPPORTED_HV_CPUID ioctl a system-wide ioctl as the existing vCPU version can't be used that early. This is what KVM part does. With that done, we can make early Hyper-V feature expansion (this series). Vitaly Kuznetsov (9): i386: avoid hardcoding '12' as 'hyperv_vendor_id' length i386: clarify 'hv-passthrough' behavior i386: hardcode supported eVMCS version to '1' i386: make hyperv_expand_features() return bool i386: expand Hyper-V features during CPU feature expansion time i386: kill off hv_cpuid_check_and_set() i386: HV_HYPERCALL_AVAILABLE privilege bit is always needed i386: Hyper-V SynIC requires POST_MESSAGES/SIGNAL_EVENTS privileges qtest/hyperv: Introduce a simple hyper-v test MAINTAINERS| 1 + docs/hyperv.txt| 9 +- target/i386/cpu.c | 13 +- target/i386/kvm/hyperv-proto.h | 6 + target/i386/kvm/kvm-stub.c | 5 + target/i386/kvm/kvm.c | 189 +++- target/i386/kvm/kvm_i386.h | 1 + tests/qtest/hyperv-test.c | 221 + tests/qtest/meson.build| 3 +- 9 files changed, 357 insertions(+), 91 deletions(-) create mode 100644 tests/qtest/hyperv-test.c -- 2.31.1
[PATCH v8 4/9] i386: make hyperv_expand_features() return bool
Return 'false' when hyperv_expand_features() sets an error. No functional change intended. Reviewed-by: Eduardo Habkost Signed-off-by: Vitaly Kuznetsov --- target/i386/kvm/kvm.c | 40 +--- 1 file changed, 21 insertions(+), 19 deletions(-) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 13d63f576b88..1e6f3c483e28 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -1217,12 +1217,12 @@ static uint32_t hv_build_cpuid_leaf(CPUState *cs, uint32_t func, int reg) * of 'hv_passthrough' mode and fills the environment with all supported * Hyper-V features. */ -static void hyperv_expand_features(CPUState *cs, Error **errp) +static bool hyperv_expand_features(CPUState *cs, Error **errp) { X86CPU *cpu = X86_CPU(cs); if (!hyperv_enabled(cpu)) -return; +return true; if (cpu->hyperv_passthrough) { cpu->hyperv_vendor_id[0] = @@ -1270,49 +1270,49 @@ static void hyperv_expand_features(CPUState *cs, Error **errp) /* Features */ if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_RELAXED, errp)) { -return; +return false; } if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_VAPIC, errp)) { -return; +return false; } if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_TIME, errp)) { -return; +return false; } if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_CRASH, errp)) { -return; +return false; } if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_RESET, errp)) { -return; +return false; } if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_VPINDEX, errp)) { -return; +return false; } if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_RUNTIME, errp)) { -return; +return false; } if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_SYNIC, errp)) { -return; +return false; } if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_STIMER, errp)) { -return; +return false; } if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_FREQUENCIES, errp)) { -return; +return false; } if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_REENLIGHTENMENT, errp)) { -return; +return false; } if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_TLBFLUSH, errp)) { -return; +return false; } if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_EVMCS, errp)) { -return; +return false; } if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_IPI, errp)) { -return; +return false; } if (hv_cpuid_check_and_set(cs, HYPERV_FEAT_STIMER_DIRECT, errp)) { -return; +return false; } /* Additional dependencies not covered by kvm_hyperv_properties[] */ @@ -1322,7 +1322,10 @@ static void hyperv_expand_features(CPUState *cs, Error **errp) error_setg(errp, "Hyper-V %s requires Hyper-V %s", kvm_hyperv_properties[HYPERV_FEAT_SYNIC].desc, kvm_hyperv_properties[HYPERV_FEAT_VPINDEX].desc); +return false; } + +return true; } /* @@ -1588,8 +1591,7 @@ int kvm_arch_init_vcpu(CPUState *cs) env->apic_bus_freq = KVM_APIC_BUS_FREQUENCY; /* Paravirtualization CPUIDs */ -hyperv_expand_features(cs, _err); -if (local_err) { +if (!hyperv_expand_features(cs, _err)) { error_report_err(local_err); return -ENOSYS; } -- 2.31.1
[PATCH v8 1/9] i386: avoid hardcoding '12' as 'hyperv_vendor_id' length
While this is very unlikely to change, let's avoid hardcoding '12' as 'hyperv_vendor_id' length. No functional change intended. Reviewed-by: Eduardo Habkost Signed-off-by: Vitaly Kuznetsov --- target/i386/cpu.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index a9fe1662d392..f8ae45be0d53 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6057,11 +6057,12 @@ static void x86_cpu_hyperv_realize(X86CPU *cpu) _abort); } len = strlen(cpu->hyperv_vendor); -if (len > 12) { -warn_report("hv-vendor-id truncated to 12 characters"); -len = 12; +if (len > sizeof(cpu->hyperv_vendor_id)) { +warn_report("hv-vendor-id truncated to %ld characters", +sizeof(cpu->hyperv_vendor_id)); +len = sizeof(cpu->hyperv_vendor_id); } -memset(cpu->hyperv_vendor_id, 0, 12); +memset(cpu->hyperv_vendor_id, 0, sizeof(cpu->hyperv_vendor_id)); memcpy(cpu->hyperv_vendor_id, cpu->hyperv_vendor, len); /* 'Hv#1' interface identification*/ -- 2.31.1
Re: [PATCH v7 3/9] i386: hardcode supported eVMCS version to '1'
Eduardo Habkost writes: > On Fri, Jun 04, 2021 at 09:28:15AM +0200, Vitaly Kuznetsov wrote: >> Eduardo Habkost writes: >> >> > On Thu, Jun 03, 2021 at 01:48:29PM +0200, Vitaly Kuznetsov wrote: >> >> Currently, the only eVMCS version, supported by KVM (and described in >> >> TLFS) >> >> is '1'. When Enlightened VMCS feature is enabled, QEMU takes the supported >> >> eVMCS version range (from KVM_CAP_HYPERV_ENLIGHTENED_VMCS enablement) and >> >> puts it to guest visible CPUIDs. When (and if) eVMCS ver.2 appears a >> >> problem on migration is expected: it doesn't seem to be possible to >> >> migrate >> >> from a host supporting eVMCS ver.2 to a host, which only support eVMCS >> >> ver.1. >> > >> > Isn't it possible and safe to expose eVMCS ver.1 to the guest on >> > a host that supports ver.2? >> >> We expose the supported range, guest is free to use any eVMCS version in >> the range (see below): > > Oh, I didn't notice the returned value was a range. > >> >> > >> >> >> >> Hardcode eVMCS ver.1 as the result of 'hv-evmcs' enablement for now. Newer >> >> eVMCS versions will have to have their own enablement options (e.g. >> >> 'hv-evmcs=2'). >> >> >> >> Signed-off-by: Vitaly Kuznetsov >> >> --- >> >> docs/hyperv.txt | 2 +- >> >> target/i386/kvm/kvm.c | 16 +++- >> >> 2 files changed, 12 insertions(+), 6 deletions(-) >> >> >> >> diff --git a/docs/hyperv.txt b/docs/hyperv.txt >> >> index a51953daa833..000638a2fd38 100644 >> >> --- a/docs/hyperv.txt >> >> +++ b/docs/hyperv.txt >> >> @@ -170,7 +170,7 @@ Recommended: hv-frequencies >> >> 3.16. hv-evmcs >> >> === >> >> The enlightenment is nested specific, it targets Hyper-V on KVM guests. >> >> When >> >> -enabled, it provides Enlightened VMCS feature to the guest. The feature >> >> +enabled, it provides Enlightened VMCS version 1 feature to the guest. >> >> The feature >> >> implements paravirtualized protocol between L0 (KVM) and L1 (Hyper-V) >> >> hypervisors making L2 exits to the hypervisor faster. The feature is >> >> Intel-only. >> >> Note: some virtualization features (e.g. Posted Interrupts) are disabled >> >> when >> >> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c >> >> index c676ee8b38a7..d57eede5dc81 100644 >> >> --- a/target/i386/kvm/kvm.c >> >> +++ b/target/i386/kvm/kvm.c >> >> @@ -1490,13 +1490,19 @@ static int hyperv_init_vcpu(X86CPU *cpu) >> >> ret = kvm_vcpu_enable_cap(cs, KVM_CAP_HYPERV_ENLIGHTENED_VMCS, 0, >> >>(uintptr_t)_version); >> >> >> >> -if (ret < 0) { >> >> -fprintf(stderr, "Hyper-V %s is not supported by kernel\n", >> >> -kvm_hyperv_properties[HYPERV_FEAT_EVMCS].desc); >> >> +/* >> >> + * KVM is required to support EVMCS ver.1. as that's what >> >> 'hv-evmcs' >> >> + * option sets. Note: we hardcode the maximum supported eVMCS >> >> version >> >> + * to '1' as well so 'hv-evmcs' feature is migratable even when >> >> (and if) >> >> + * ver.2 is implemented. A new option (e.g. 'hv-evmcs=2') will >> >> then have >> >> + * to be added. >> >> + */ >> >> +if (ret < 0 || (uint8_t)evmcs_version > 1) { >> > >> > Wait, do you really want to get a fatal error every time, after a >> > kernel upgrade? >> > >> >> Here, evmcs_version (returned by kvm_vcpu_enable_cap()) represents a >> *range* of supported eVMCS versions: >> >> (evmcs_highest_supported_version << 8) | evmcs_lowest_supported_version >> >> Currently, this is 0x101 [1..1] range. >> >> The '(uint8_t)evmcs_version > 1' check here means 'eVMCS v1' is no >> longer supported by KVM. This is not going to happen any time soon, but >> I can imagine in 10 years or so we'll be dropping v1 so the range (in >> theory) can be [10..2] -- which would mean eVMCS ver. 1 is NOT >> supported. And we can't proceed then. > > Where is this documented? The only reference to > KVM_CAP_HYPERV_ENLIGHTENED_VMCS I've found in linux/Documentatio