Hi Zide,

On 1/2/26 2:59 PM, Chen, Zide wrote:
> 
> 
> On 12/29/2025 11:42 PM, Dongli Zhang wrote:

[snip]

>>  
>>  static struct kvm_cpuid2 *cpuid_cache;
>>  static struct kvm_cpuid2 *hv_cpuid_cache;
>> @@ -2068,23 +2072,30 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error 
>> **errp)
>>      if (first) {
>>          first = false;
>>  
>> -        /*
>> -         * Since Linux v5.18, KVM provides a VM-level capability to easily
>> -         * disable PMUs; however, QEMU has been providing PMU property per
>> -         * CPU since v1.6. In order to accommodate both, have to configure
>> -         * the VM-level capability here.
>> -         *
>> -         * KVM_PMU_CAP_DISABLE doesn't change the PMU
>> -         * behavior on Intel platform because current "pmu" property works
>> -         * as expected.
>> -         */
>> -        if ((pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu) {
>> -            ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
>> -                                    KVM_PMU_CAP_DISABLE);
>> -            if (ret < 0) {
>> -                error_setg_errno(errp, -ret,
>> -                                 "Failed to set KVM_PMU_CAP_DISABLE");
>> -                return ret;
>> +        if (X86_CPU(cpu)->enable_pmu) {
>> +            if (kvm_pmu_disabled) {
>> +                warn_report("Failed to enable PMU since "
>> +                            "KVM's enable_pmu parameter is disabled");
> 
> I'm wondering about the intended value of this patch?
> 
> If enable_pmu is true in QEMU but the corresponding KVM parameter is
> false, then KVM_GET_SUPPORTED_CPUID or KVM_GET_MSRS should be able to
> tell that the PMU feature is not supported by host.
> 
> The logic implemented in this patch seems somewhat redundant.

For Intel, the QEMU userspace can determine if the vPMU is disabled by KVM
through the use of KVM_GET_SUPPORTED_CPUID.

However, this approach does not apply to AMD. Unlike Intel, AMD does not rely on
CPUID to detect whether PMU is supported. By default, we can assume that PMU is
always available, except for the recent PerfMonV2 feature.

The main objective of this PATCH 4/7 is to introduce the variable
'kvm_pmu_disabled', which will be reused in PATCH 5/7 to skip any PMU
initialization if the parameter is set to 'N'.

+static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
+{
+    CPUX86State *env = &cpu->env;
+
+    /*
+     * The PMU virtualization is disabled by kvm.enable_pmu=N.
+     */
+    if (kvm_pmu_disabled) {
+        return;
+    }

The 'kvm_pmu_disabled' variable is used to differentiate between the following
two scenarios on AMD:

(1) A newer KVM with KVM_PMU_CAP_DISABLE support, but explicitly disabled via
the KVM parameter ('N').

(2) An older KVM without KVM_CAP_PMU_CAPABILITY support.

In both cases, the call to KVM_CAP_PMU_CAPABILITY extension support check may
return 0.

By reading the file "/sys/module/kvm/parameters/enable_pmu", we can distinguish
between these two scenarios.

As you mentioned, another approach would be to use KVM_GET_MSRS to specifically
probe for AMD during QEMU initialization. In this case, we can set
'kvm_pmu_disabled' to true if reading the AMD PMU MSR registers fails.

To implement this, we may need to:

1. Turn this patch to be AMD specific by probing the AMD PMU registers during
initialization. We may need go create a new function in QEMU to use KVM_GET_MSRS
for probing only, or we may re-use kvm_arch_get_supported_msr_feature() or
kvm_get_one_msr(). I may change in the next version.

2. Limit the usage of 'kvm_pmu_disabled' to be AMD specific in PATCH 5/7.

> 
> Additionally, in this scenario — where the user intends to enable a
> feature but the host cannot support it — normally no warning is emitted
> by QEMU.

According to the usage of QEMU, may I assume QEMU already prints warning logs
for unsupported features? The below is an example.

QEMU 10.2.50 monitor - type 'help' for more information
qemu-system-x86_64: warning: host doesn't support requested feature:
CPUID[eax=07h,ecx=00h].EBX.hle [bit 4]
qemu-system-x86_64: warning: host doesn't support requested feature:
CPUID[eax=07h,ecx=00h].EBX.rtm [bit 11]

> 
>> +            }
>> +        } else {
>> +            /*
>> +             * Since Linux v5.18, KVM provides a VM-level capability to 
>> easily
>> +             * disable PMUs; however, QEMU has been providing PMU property 
>> per
>> +             * CPU since v1.6. In order to accommodate both, have to 
>> configure
>> +             * the VM-level capability here.
>> +             *
>> +             * KVM_PMU_CAP_DISABLE doesn't change the PMU
>> +             * behavior on Intel platform because current "pmu" property 
>> works
>> +             * as expected.
>> +             */
>> +            if (pmu_cap & KVM_PMU_CAP_DISABLE) {
>> +                ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 
>> 0,
>> +                                        KVM_PMU_CAP_DISABLE);
>> +                if (ret < 0) {
>> +                    error_setg_errno(errp, -ret,
>> +                                     "Failed to set KVM_PMU_CAP_DISABLE");
>> +                    return ret;
>> +                }
>>              }
>>          }
>>      }
>> @@ -3302,6 +3313,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>      int ret;
>>      struct utsname utsname;
>>      Error *local_err = NULL;
>> +    g_autofree char *kvm_enable_pmu;
>>  
>>      /*
>>       * Initialize confidential guest (SEV/TDX) context, if required
>> @@ -3437,6 +3449,21 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>  
>>      pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
>>  
>> +    /*
>> +     * The enable_pmu parameter is introduced since Linux v5.17,
>> +     * give a chance to provide more information about vPMU
>> +     * enablement.
>> +     *
>> +     * The kvm.enable_pmu's permission is 0444. It does not change
>> +     * until a reload of the KVM module.
>> +     */
>> +    if (g_file_get_contents("/sys/module/kvm/parameters/enable_pmu",
>> +                            &kvm_enable_pmu, NULL, NULL)) {
>> +        if (*kvm_enable_pmu == 'N') {
>> +            kvm_pmu_disabled = true;
> 
> It’s generally better not to rely on KVM’s internal implementation
> unless really necessary.
> 
> For example, in the new mediated vPMU framework, even if the KVM module
> parameter enable_pmu is set, the per-guest kvm->arch.enable_pmu could
> still be cleared.
> 
> In such a case, the logic here might not be correct.

Would the Mediated vPMU set KVM_PMU_CAP_DISABLE to clear per-VM enable_pmu even
when the global KVM parameter enable_pmu=N is set?

In this scenario, we plan to rely on KVM_PMU_CAP_DISABLE only when the value of
"/sys/module/kvm/parameters/enable_pmu" is not equal to N.

Can I assume that this will work with Mediated vPMU?


Is there any possibility to follow the current approach before Mediated vPMU is
finalized for mainline, and later introduce an incremental change using
KVM_GET_MSRS probing? The current approach is straightforward and can work with
existing Linux kernel source code.

For quite some time, QEMU has lacked support for disabling or resetting AMD PMU
registers. If we could add this feature before Mediated vPMU is finalized, it
would benefit many existing kernel versions. This patchset solves production 
bugs.


Feel free to let me know your thought, while I would starting working on next
version now.

Thank you very much!

Dongli Zhang



Reply via email to