On 1/7/2026 12:05 AM, Dongli Zhang wrote:
> Hi Zide,
>
> On 1/6/26 1:03 PM, Chen, Zide wrote:
>>
>>
>> On 1/5/2026 12:21 PM, Dongli Zhang wrote:
>>> Hi Zide,
>>>
>>> On 1/2/26 2:59 PM, Chen, Zide wrote:
>>>>
>>>>
>>>> On 12/29/2025 11:42 PM, Dongli Zhang wrote:
>>>
>>> [snip]
>>>
>>>>>
>>>>> static struct kvm_cpuid2 *cpuid_cache;
>>>>> static struct kvm_cpuid2 *hv_cpuid_cache;
>>>>> @@ -2068,23 +2072,30 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error
>>>>> **errp)
>>>>> if (first) {
>>>>> first = false;
>>>>>
>>>>> - /*
>>>>> - * Since Linux v5.18, KVM provides a VM-level capability to
>>>>> easily
>>>>> - * disable PMUs; however, QEMU has been providing PMU property
>>>>> per
>>>>> - * CPU since v1.6. In order to accommodate both, have to
>>>>> configure
>>>>> - * the VM-level capability here.
>>>>> - *
>>>>> - * KVM_PMU_CAP_DISABLE doesn't change the PMU
>>>>> - * behavior on Intel platform because current "pmu" property
>>>>> works
>>>>> - * as expected.
>>>>> - */
>>>>> - if ((pmu_cap & KVM_PMU_CAP_DISABLE) &&
>>>>> !X86_CPU(cpu)->enable_pmu) {
>>>>> - ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
>>>>> - KVM_PMU_CAP_DISABLE);
>>>>> - if (ret < 0) {
>>>>> - error_setg_errno(errp, -ret,
>>>>> - "Failed to set KVM_PMU_CAP_DISABLE");
>>>>> - return ret;
>>>>> + if (X86_CPU(cpu)->enable_pmu) {
>>>>> + if (kvm_pmu_disabled) {
>>>>> + warn_report("Failed to enable PMU since "
>>>>> + "KVM's enable_pmu parameter is disabled");
>>>>
>>>> I'm wondering about the intended value of this patch?
>>>>
>>>> If enable_pmu is true in QEMU but the corresponding KVM parameter is
>>>> false, then KVM_GET_SUPPORTED_CPUID or KVM_GET_MSRS should be able to
>>>> tell that the PMU feature is not supported by host.
>>>>
>>>> The logic implemented in this patch seems somewhat redundant.
>>>
>>> For Intel, the QEMU userspace can determine if the vPMU is disabled by KVM
>>> through the use of KVM_GET_SUPPORTED_CPUID.
>>>
>>> However, this approach does not apply to AMD. Unlike Intel, AMD does not
>>> rely on
>>> CPUID to detect whether PMU is supported. By default, we can assume that
>>> PMU is
>>> always available, except for the recent PerfMonV2 feature.
>>>
>>> The main objective of this PATCH 4/7 is to introduce the variable
>>> 'kvm_pmu_disabled', which will be reused in PATCH 5/7 to skip any PMU
>>> initialization if the parameter is set to 'N'.
>>>
>>> +static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
>>> +{
>>> + CPUX86State *env = &cpu->env;
>>> +
>>> + /*
>>> + * The PMU virtualization is disabled by kvm.enable_pmu=N.
>>> + */
>>> + if (kvm_pmu_disabled) {
>>> + return;
>>> + }
>>
>> Thanks for explanation.
>>
>>> The 'kvm_pmu_disabled' variable is used to differentiate between the
>>> following
>>> two scenarios on AMD:
>>>
>>> (1) A newer KVM with KVM_PMU_CAP_DISABLE support, but explicitly disabled
>>> via
>>> the KVM parameter ('N').
>>>
>>> (2) An older KVM without KVM_CAP_PMU_CAPABILITY support.
>>>
>>> In both cases, the call to KVM_CAP_PMU_CAPABILITY extension support check
>>> may
>>> return 0.
>>>
>>> By reading the file "/sys/module/kvm/parameters/enable_pmu", we can
>>> distinguish
>>> between these two scenarios.
>>
>> As described in PATCH 1/7, without issuing KVM_PMU_CAP_DISABLE, KVM has
>> no way to know that userspace does not intend to enable vPMU in AMD
>> platforms, and therefore does not fault guest accesses to PMU MSRs.
>>
>> My understanding is that the issue being addressed here is basically the
>> opposite: QEMU does not know that vPMU is disabled by KVM.
>
> Exactly.
>
> Otherwise, QEMU issues unwanted MSR writes for every vCPU during QEMU reset.
>
>>
>> IIUC, one difference between Intel and AMD is that AMD lacks a CPUID
>> leaf to indicate the availability of PMU version 1. But Intel
>> potentially could be in the same situation that KVM advertises PMU
>> availability but it's not actually supported. (e.g. kvm->arch.enable_pmu
>> is false while modules parameter enable_pmu is true).
>>
>> From the guest’s point of view, it probes PMU MSRs to determine whether
>> PMU support is present and it's fine in this situation.
>>
>> In userspace, QEMU may issue KVM_SET_MSRS / KVM_GET_MSRS to KVM without
>> knowing that vPMU has been disabled by KVM. I think these IOCTLs should
>> not fail, since KVM states that “Userspace is allowed to read MSRs, and
>> write ‘0’ to MSRs, that KVM advertises to userspace, even if an MSR
>> isn’t fully supported.”
>>
>> My current understanding is that AMD should be fine even without
>> kvm_pmu_disabled, but I may be missing some context here.
>>
>> The bottom line is this patch doesn't handle the cases that KVM still
>> could disable vPMU support even if enable_pmu is true.
>
> Yes. There are still unwanted PMU MSR writes from QEMU. This just seems odd.
>
> The concern with unwanted MSR writes was initially raised by Maksim Davydov:
>
> https://lore.kernel.org/qemu-devel/[email protected]/
>
> As shown below on the v6.0 KVM hypervisor (AMD), while there are no errors
> from
> QEMU, numerous annoying warnings are generated. (If I recall correctly, this
> can
> also be triggered from the VM itself.)
>
> However, here the logs are not only due to vcpu0, but indeed every vcpu.
>
> [ 280.802976] kvm_set_msr_common: 1910 callbacks suppressed
> [ 280.802981] kvm [18411]: vcpu0, guest rIP: 0xffffffffa4c97844 disabled
> perfctr wrmsr: 0xc0010007 data 0xffff
> [ 295.345747] kvm [18411]: vcpu0, guest rIP: 0xfff0 disabled perfctr wrmsr:
> 0xc0010004 data 0x0
> [ 295.355379] kvm [18411]: vcpu0, guest rIP: 0xfff0 disabled perfctr wrmsr:
> 0xc0010005 data 0x0
> [ 295.364997] kvm [18411]: vcpu0, guest rIP: 0xfff0 disabled perfctr wrmsr:
> 0xc0010006 data 0x0
> [ 295.374618] kvm [18411]: vcpu0, guest rIP: 0xfff0 disabled perfctr wrmsr:
> 0xc0010007 data 0x0
> [ 295.385048] kvm [18411]: vcpu1, guest rIP: 0xfff0 disabled perfctr wrmsr:
> 0xc0010004 data 0x0
> [ 295.394694] kvm [18411]: vcpu1, guest rIP: 0xfff0 disabled perfctr wrmsr:
> 0xc0010005 data 0x0
> [ 295.404317] kvm [18411]: vcpu1, guest rIP: 0xfff0 disabled perfctr wrmsr:
> 0xc0010006 data 0x0
> [ 295.413928] kvm [18411]: vcpu1, guest rIP: 0xfff0 disabled perfctr wrmsr:
> 0xc0010007 data 0x0
> [ 295.424319] kvm [18411]: vcpu2, guest rIP: 0xfff0 disabled perfctr wrmsr:
> 0xc0010004 data 0x0
> [ 295.433963] kvm [18411]: vcpu2, guest rIP: 0xfff0 disabled perfctr wrmsr:
> 0xc0010005 data 0x0
> [ 301.966571] kvm_set_msr_common: 1910 callbacks suppressed
> [ 301.966577] kvm [18411]: vcpu0, guest rIP: 0xffffffff8ac97844 disabled
> perfctr wrmsr: 0xc0010007 data 0xffff
In e76ae52747a8 ("KVM: x86/pmu: Gate all "unimplemented MSR" prints on
report_ignored_msrs"), in "disabled perfctr wrmsr" case, vcpu_unimpl()
is no longer forced for counter MSRs, so most of the above warnings go away.
For the remaining warnings, vcpu_unimpl() is ratelimitedm and plus all
these logs can be removed by setting report_ignored_msrs=false.
So, it should be not that bad now.
>>
>>
>>> As you mentioned, another approach would be to use KVM_GET_MSRS to
>>> specifically
>>> probe for AMD during QEMU initialization. In this case, we can set
>>> 'kvm_pmu_disabled' to true if reading the AMD PMU MSR registers fails.
>>>
>>> To implement this, we may need to:
>>>
>>> 1. Turn this patch to be AMD specific by probing the AMD PMU registers
>>> during
>>> initialization. We may need go create a new function in QEMU to use
>>> KVM_GET_MSRS
>>> for probing only, or we may re-use kvm_arch_get_supported_msr_feature() or
>>> kvm_get_one_msr(). I may change in the next version.
>>>
>>> 2. Limit the usage of 'kvm_pmu_disabled' to be AMD specific in PATCH 5/7.
>>
>> I guess this might make things more complicated.
>>
>>>>
>>>> Additionally, in this scenario — where the user intends to enable a
>>>> feature but the host cannot support it — normally no warning is emitted
>>>> by QEMU.
>>>
>>> According to the usage of QEMU, may I assume QEMU already prints warning
>>> logs
>>> for unsupported features? The below is an example.
>>>
>>> QEMU 10.2.50 monitor - type 'help' for more information
>>> qemu-system-x86_64: warning: host doesn't support requested feature:
>>> CPUID[eax=07h,ecx=00h].EBX.hle [bit 4]
>>> qemu-system-x86_64: warning: host doesn't support requested feature:
>>> CPUID[eax=07h,ecx=00h].EBX.rtm [bit 11]
>>>
>>>>
>>>>> + }
>>>>> + } else {
>>>>> + /*
>>>>> + * Since Linux v5.18, KVM provides a VM-level capability to
>>>>> easily
>>>>> + * disable PMUs; however, QEMU has been providing PMU
>>>>> property per
>>>>> + * CPU since v1.6. In order to accommodate both, have to
>>>>> configure
>>>>> + * the VM-level capability here.
>>>>> + *
>>>>> + * KVM_PMU_CAP_DISABLE doesn't change the PMU
>>>>> + * behavior on Intel platform because current "pmu" property
>>>>> works
>>>>> + * as expected.
>>>>> + */
>>>>> + if (pmu_cap & KVM_PMU_CAP_DISABLE) {
>>>>> + ret = kvm_vm_enable_cap(kvm_state,
>>>>> KVM_CAP_PMU_CAPABILITY, 0,
>>>>> + KVM_PMU_CAP_DISABLE);
>>>>> + if (ret < 0) {
>>>>> + error_setg_errno(errp, -ret,
>>>>> + "Failed to set
>>>>> KVM_PMU_CAP_DISABLE");
>>>>> + return ret;
>>>>> + }
>>>>> }
>>>>> }
>>>>> }
>>>>> @@ -3302,6 +3313,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>>>> int ret;
>>>>> struct utsname utsname;
>>>>> Error *local_err = NULL;
>>>>> + g_autofree char *kvm_enable_pmu;
>>>>>
>>>>> /*
>>>>> * Initialize confidential guest (SEV/TDX) context, if required
>>>>> @@ -3437,6 +3449,21 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>>>>
>>>>> pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
>>>>>
>>>>> + /*
>>>>> + * The enable_pmu parameter is introduced since Linux v5.17,
>>>>> + * give a chance to provide more information about vPMU
>>>>> + * enablement.
>>>>> + *
>>>>> + * The kvm.enable_pmu's permission is 0444. It does not change
>>>>> + * until a reload of the KVM module.
>>>>> + */
>>>>> + if (g_file_get_contents("/sys/module/kvm/parameters/enable_pmu",
>>>>> + &kvm_enable_pmu, NULL, NULL)) {
>>>>> + if (*kvm_enable_pmu == 'N') {
>>>>> + kvm_pmu_disabled = true;
>>>>
>>>> It’s generally better not to rely on KVM’s internal implementation
>>>> unless really necessary.
>>>>
>>>> For example, in the new mediated vPMU framework, even if the KVM module
>>>> parameter enable_pmu is set, the per-guest kvm->arch.enable_pmu could
>>>> still be cleared.
>>>>
>>>> In such a case, the logic here might not be correct.
>>>
>>> Would the Mediated vPMU set KVM_PMU_CAP_DISABLE to clear per-VM enable_pmu
>>> even
>>> when the global KVM parameter enable_pmu=N is set?
>>>
>>> In this scenario, we plan to rely on KVM_PMU_CAP_DISABLE only when the
>>> value of
>>> "/sys/module/kvm/parameters/enable_pmu" is not equal to N.
>>>
>>> Can I assume that this will work with Mediated vPMU?
>>>
>>>
>>> Is there any possibility to follow the current approach before Mediated
>>> vPMU is
>>> finalized for mainline, and later introduce an incremental change using
>>> KVM_GET_MSRS probing? The current approach is straightforward and can work
>>> with
>>> existing Linux kernel source code.
>>
>> Apologies for the incorrect statement I made earlier regarding mediated
>> vPMU.
>>
>> According to the mediated vPMU v6, the only behavior specific to
>> mediated vPMU is that kvm->arch.enable_pmu may be cleared when
>> irqchip_in_kernel() is not true:
>> https://urldefense.com/v3/__https://lore.kernel.org/all/[email protected]/__;!!ACWV5N9M2RV99hQ!IS8XQG3Zx84utP2QScNlxp-0H5JAgr89lBb1j2oGVJJop3WMyK6X2I5mlerMPA06wkJy8VFd1x6XGEHc6kWn$
>>
>>
>> However, this does not imply that mediated vPMU requires any special
>> handling here. In theory, KVM could clear kvm->arch.enable_pmu in the
>> future for other reasons.
>>
>
> While "KVM could clear kvm->arch.enable_pmu in the future," I don't think KVM
> may set kvm->arch.enable_pmu if the global enable_pmu is set to 'N'.
>
> Taking Intel VMX EPT as an example, once
> "/sys/module/kvm_intel/parameters/ept"
> is globally disabled, there's no way within KVM software to enable it for any
> guest VM **after** 'ept' is set to N.
>
> Similarly, "/sys/module/kvm/parameters/enable_pmu=N" indicates that this KVM
> host will not support PMU virtualization in any way. Therefore, there should
> be
> no way to enable vPMU for any guest VM if the global parameter is set to 'N'.
> Here we read from ths parameter only during QEMU initialization.
>
> That's why I believe it's reliable to trust the setting when
> "/sys/module/kvm/parameters/enable_pmu=N".
>
> In this way, we can avoid many unnecessary MSR writes, especially in cases
> where
> a VM has 300+ vCPUs, even though these may be equivalent to NOPs with
> optimizations in more recent KVM versions.
>
> The objective isn't to improve performance. Minimizing the number of unwanted
> MSR writes from QEMU reduces the chances of failure (e.g., due to any QEMU
> software bug). We can simply avoid those unwanted MSR/NOPs by reading from a
> KVM
> parameter.
>
> From a user's perspective, this just seems odd.
> "/sys/module/kvm/parameters/enable_pmu=N" is a reliable setting. If there's a
> configuration mismatch between QEMU and KVM, a warning could alert the user.
>
> I can remove this patch, along with the 'kvm_pmu_disabled' variable.
Even with /sys/module/kvm/parameters/enable_pmu=Y, theoretically it's
possible for kvm->arch.enable_pmu to be false. In such a case, vPMU
could still be advertised, and QEMU doens't know that vPMU is not
supported by KVM, on either Intel or AMD platforms.
Anyway, this is likely only a theoretical scenario and may not actually
happens in practice.
> Thank you very much!
>
> Dongli Zhang