答复: 答复: [PATCH][v5] KVM: X86: support APERF/MPERF registers
> IMO, If we really want to ensure the correctness of userspace provided CPUID > settings, we need to return ERROR to userspace instead of fixing it siliently. > Ok , I will make it return a error Thanks -Li > - Xiaoyao
Re: 答复: [PATCH][v5] KVM: X86: support APERF/MPERF registers
On 5/31/2020 10:08 AM, Li,Rongqing wrote: -邮件原件- 发件人: Xiaoyao Li [mailto:xiaoyao...@intel.com] 发送时间: 2020年5月30日 18:40 收件人: Li,Rongqing ; linux-kernel@vger.kernel.org; k...@vger.kernel.org; x...@kernel.org; h...@zytor.com; b...@alien8.de; mi...@redhat.com; t...@linutronix.de; jmatt...@google.com; wanpen...@tencent.com; vkuzn...@redhat.com; sean.j.christopher...@intel.com; pbonz...@redhat.com; wei.hua...@amd.com 主题: Re: [PATCH][v5] KVM: X86: support APERF/MPERF registers On 5/30/2020 12:35 PM, Li RongQing wrote: Guest kernel reports a fixed cpu frequency in /proc/cpuinfo, this is confused to user when turbo is enable, and aperf/mperf can be used to show current cpu frequency after 7d5905dc14a "(x86 / CPU: Always show current CPU frequency in /proc/cpuinfo)" so guest should support aperf/mperf capability This patch implements aperf/mperf by three mode: none, software emulation, and pass-through None: default mode, guest does not support aperf/mperf Software emulation: the period of aperf/mperf in guest mode are accumulated as emulated value Pass-though: it is only suitable for KVM_HINTS_REALTIME, Because that hint guarantees we have a 1:1 vCPU:CPU binding and guaranteed no over-commit. And a per-VM capability is added to configure aperfmperf mode [...] diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index cd708b0b460a..c960dda4251b 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -122,6 +122,14 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu) MSR_IA32_MISC_ENABLE_MWAIT); } + best = kvm_find_cpuid_entry(vcpu, 6, 0); + if (best) { + if (guest_has_aperfmperf(vcpu->kvm) && + boot_cpu_has(X86_FEATURE_APERFMPERF)) + best->ecx |= 1; + else + best->ecx &= ~1; + } In my understanding, KVM allows userspace to set a CPUID feature bit for guest even if hardware doesn't support the feature. So what makes X86_FEATURE_APERFMPERF different here? Is there any concern I miss? -Xiaoyao Whether software emulation for aperf/mperf or pass-through depends on host cpu aperf/mperf feature. Software emulation: the period of aperf/mperf in guest mode are accumulated as emulated value I know it that you want to ensure the correctness of exposure of aperf/mperf. But there are so many features other than aperf/mperf that KVM reports the supported settings of them through KVM_GET_SUPPORTED_CPUID, but doesn't check nor force the correctness of userspace input. i.e., KVM allows userspace to set bogus CPUID settings as long as it doesn't break KVM (host kernel). Indeed, bogus CPUID settings more than likely breaks the guest. But it's not KVM's fault. KVM just do what userspace wants. IMO, If we really want to ensure the correctness of userspace provided CPUID settings, we need to return ERROR to userspace instead of fixing it siliently. - Xiaoyao
答复: [PATCH][v5] KVM: X86: support APERF/MPERF registers
> -邮件原件- > 发件人: Xiaoyao Li [mailto:xiaoyao...@intel.com] > 发送时间: 2020年5月30日 18:40 > 收件人: Li,Rongqing ; linux-kernel@vger.kernel.org; > k...@vger.kernel.org; x...@kernel.org; h...@zytor.com; b...@alien8.de; > mi...@redhat.com; t...@linutronix.de; jmatt...@google.com; > wanpen...@tencent.com; vkuzn...@redhat.com; > sean.j.christopher...@intel.com; pbonz...@redhat.com; > wei.hua...@amd.com > 主题: Re: [PATCH][v5] KVM: X86: support APERF/MPERF registers > > On 5/30/2020 12:35 PM, Li RongQing wrote: > > Guest kernel reports a fixed cpu frequency in /proc/cpuinfo, this is > > confused to user when turbo is enable, and aperf/mperf can be used to > > show current cpu frequency after 7d5905dc14a > > "(x86 / CPU: Always show current CPU frequency in /proc/cpuinfo)" > > so guest should support aperf/mperf capability > > > > This patch implements aperf/mperf by three mode: none, software > > emulation, and pass-through > > > > None: default mode, guest does not support aperf/mperf > > > > Software emulation: the period of aperf/mperf in guest mode are > > accumulated as emulated value > > > > Pass-though: it is only suitable for KVM_HINTS_REALTIME, Because that > > hint guarantees we have a 1:1 vCPU:CPU binding and guaranteed no > > over-commit. > > > > And a per-VM capability is added to configure aperfmperf mode > > > > [...] > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index > > cd708b0b460a..c960dda4251b 100644 > > --- a/arch/x86/kvm/cpuid.c > > +++ b/arch/x86/kvm/cpuid.c > > @@ -122,6 +122,14 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu) > >MSR_IA32_MISC_ENABLE_MWAIT); > > } > > > > + best = kvm_find_cpuid_entry(vcpu, 6, 0); > > + if (best) { > > + if (guest_has_aperfmperf(vcpu->kvm) && > > + boot_cpu_has(X86_FEATURE_APERFMPERF)) > > + best->ecx |= 1; > > + else > > + best->ecx &= ~1; > > + } > > In my understanding, KVM allows userspace to set a CPUID feature bit for > guest even if hardware doesn't support the feature. > > So what makes X86_FEATURE_APERFMPERF different here? Is there any > concern I miss? > > -Xiaoyao Whether software emulation for aperf/mperf or pass-through depends on host cpu aperf/mperf feature. Software emulation: the period of aperf/mperf in guest mode are accumulated as emulated value -Li
Re: [PATCH][v5] KVM: X86: support APERF/MPERF registers
On 5/30/2020 12:35 PM, Li RongQing wrote: Guest kernel reports a fixed cpu frequency in /proc/cpuinfo, this is confused to user when turbo is enable, and aperf/mperf can be used to show current cpu frequency after 7d5905dc14a "(x86 / CPU: Always show current CPU frequency in /proc/cpuinfo)" so guest should support aperf/mperf capability This patch implements aperf/mperf by three mode: none, software emulation, and pass-through None: default mode, guest does not support aperf/mperf Software emulation: the period of aperf/mperf in guest mode are accumulated as emulated value Pass-though: it is only suitable for KVM_HINTS_REALTIME, Because that hint guarantees we have a 1:1 vCPU:CPU binding and guaranteed no over-commit. And a per-VM capability is added to configure aperfmperf mode [...] diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index cd708b0b460a..c960dda4251b 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -122,6 +122,14 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu) MSR_IA32_MISC_ENABLE_MWAIT); } + best = kvm_find_cpuid_entry(vcpu, 6, 0); + if (best) { + if (guest_has_aperfmperf(vcpu->kvm) && + boot_cpu_has(X86_FEATURE_APERFMPERF)) + best->ecx |= 1; + else + best->ecx &= ~1; + } In my understanding, KVM allows userspace to set a CPUID feature bit for guest even if hardware doesn't support the feature. So what makes X86_FEATURE_APERFMPERF different here? Is there any concern I miss? -Xiaoyao /* Note, maxphyaddr must be updated before tdp_level. */ vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu); vcpu->arch.tdp_level = kvm_x86_ops.get_tdp_level(vcpu);
[PATCH][v5] KVM: X86: support APERF/MPERF registers
Guest kernel reports a fixed cpu frequency in /proc/cpuinfo, this is confused to user when turbo is enable, and aperf/mperf can be used to show current cpu frequency after 7d5905dc14a "(x86 / CPU: Always show current CPU frequency in /proc/cpuinfo)" so guest should support aperf/mperf capability This patch implements aperf/mperf by three mode: none, software emulation, and pass-through None: default mode, guest does not support aperf/mperf Software emulation: the period of aperf/mperf in guest mode are accumulated as emulated value Pass-though: it is only suitable for KVM_HINTS_REALTIME, Because that hint guarantees we have a 1:1 vCPU:CPU binding and guaranteed no over-commit. And a per-VM capability is added to configure aperfmperf mode Signed-off-by: Li RongQing Signed-off-by: Chai Wen Signed-off-by: Jia Lina --- diff v4: fix maybe-uninitialized warning diff v3: fix interception of MSR_IA32_MPERF/APERF in svm diff v2: support aperfmperf pass though move common codes to kvm_get_msr_common diff v1: 1. support AMD, but not test 2. support per-vm capability to enable Documentation/virt/kvm/api.rst | 10 ++ arch/x86/include/asm/kvm_host.h | 11 +++ arch/x86/kvm/cpuid.c| 13 - arch/x86/kvm/svm/svm.c | 8 arch/x86/kvm/vmx/vmx.c | 6 ++ arch/x86/kvm/x86.c | 42 + arch/x86/kvm/x86.h | 15 +++ include/uapi/linux/kvm.h| 1 + 8 files changed, 105 insertions(+), 1 deletion(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index d871dacb984e..f854f4da6fd8 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6126,3 +6126,13 @@ KVM can therefore start protected VMs. This capability governs the KVM_S390_PV_COMMAND ioctl and the KVM_MP_STATE_LOAD MP_STATE. KVM_SET_MP_STATE can fail for protected guests when the state change is invalid. + +8.23 KVM_CAP_APERFMPERF + + +:Architectures: x86 +:Parameters: args[0] is aperfmperf mode; + 0 for not support, 1 for software emulation, 2 for pass-through +:Returns: 0 on success; -1 on error + +This capability indicates that KVM supports APERF and MPERF MSR registers diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index fd78bd44b2d6..14643f8af9c4 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -824,6 +824,9 @@ struct kvm_vcpu_arch { /* AMD MSRC001_0015 Hardware Configuration */ u64 msr_hwcr; + + u64 v_mperf; + u64 v_aperf; }; struct kvm_lpage_info { @@ -889,6 +892,12 @@ enum kvm_irqchip_mode { KVM_IRQCHIP_SPLIT,/* created with KVM_CAP_SPLIT_IRQCHIP */ }; +enum kvm_aperfmperf_mode { + KVM_APERFMPERF_NONE, + KVM_APERFMPERF_SOFT, /* software emulate aperfmperf */ + KVM_APERFMPERF_PT,/* pass-through aperfmperf to guest */ +}; + #define APICV_INHIBIT_REASON_DISABLE0 #define APICV_INHIBIT_REASON_HYPERV 1 #define APICV_INHIBIT_REASON_NESTED 2 @@ -986,6 +995,8 @@ struct kvm_arch { struct kvm_pmu_event_filter *pmu_event_filter; struct task_struct *nx_lpage_recovery_thread; + + enum kvm_aperfmperf_mode aperfmperf_mode; }; struct kvm_vm_stat { diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index cd708b0b460a..c960dda4251b 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -122,6 +122,14 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu) MSR_IA32_MISC_ENABLE_MWAIT); } + best = kvm_find_cpuid_entry(vcpu, 6, 0); + if (best) { + if (guest_has_aperfmperf(vcpu->kvm) && + boot_cpu_has(X86_FEATURE_APERFMPERF)) + best->ecx |= 1; + else + best->ecx &= ~1; + } /* Note, maxphyaddr must be updated before tdp_level. */ vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu); vcpu->arch.tdp_level = kvm_x86_ops.get_tdp_level(vcpu); @@ -557,7 +565,10 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) case 6: /* Thermal management */ entry->eax = 0x4; /* allow ARAT */ entry->ebx = 0; - entry->ecx = 0; + if (boot_cpu_has(X86_FEATURE_APERFMPERF)) + entry->ecx = 0x1; + else + entry->ecx = 0x0; entry->edx = 0; break; /* function 7 has additional index. */ diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index e9c0fb68387d..1d38fe3afc0d 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1200,6 +1200,14 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu) svm->msrpm = page_address(msrpm_pages);