答复: 答复: [PATCH][v5] KVM: X86: support APERF/MPERF registers

2020-06-04 Thread Li,Rongqing
> IMO, If we really want to ensure the correctness of userspace provided CPUID
> settings, we need to return ERROR to userspace instead of fixing it siliently.
> 

Ok , I will make it return a error

Thanks

-Li


> - Xiaoyao


Re: 答复: [PATCH][v5] KVM: X86: support APERF/MPERF registers

2020-05-30 Thread Xiaoyao Li

On 5/31/2020 10:08 AM, Li,Rongqing wrote:




-邮件原件-
发件人: Xiaoyao Li [mailto:xiaoyao...@intel.com]
发送时间: 2020年5月30日 18:40
收件人: Li,Rongqing ; linux-kernel@vger.kernel.org;
k...@vger.kernel.org; x...@kernel.org; h...@zytor.com; b...@alien8.de;
mi...@redhat.com; t...@linutronix.de; jmatt...@google.com;
wanpen...@tencent.com; vkuzn...@redhat.com;
sean.j.christopher...@intel.com; pbonz...@redhat.com;
wei.hua...@amd.com
主题: Re: [PATCH][v5] KVM: X86: support APERF/MPERF registers

On 5/30/2020 12:35 PM, Li RongQing wrote:

Guest kernel reports a fixed cpu frequency in /proc/cpuinfo, this is
confused to user when turbo is enable, and aperf/mperf can be used to
show current cpu frequency after 7d5905dc14a
"(x86 / CPU: Always show current CPU frequency in /proc/cpuinfo)"
so guest should support aperf/mperf capability

This patch implements aperf/mperf by three mode: none, software
emulation, and pass-through

None: default mode, guest does not support aperf/mperf

Software emulation: the period of aperf/mperf in guest mode are
accumulated as emulated value

Pass-though: it is only suitable for KVM_HINTS_REALTIME, Because that
hint guarantees we have a 1:1 vCPU:CPU binding and guaranteed no
over-commit.

And a per-VM capability is added to configure aperfmperf mode



[...]


diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index
cd708b0b460a..c960dda4251b 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -122,6 +122,14 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
   MSR_IA32_MISC_ENABLE_MWAIT);
}

+   best = kvm_find_cpuid_entry(vcpu, 6, 0);
+   if (best) {
+   if (guest_has_aperfmperf(vcpu->kvm) &&
+   boot_cpu_has(X86_FEATURE_APERFMPERF))
+   best->ecx |= 1;
+   else
+   best->ecx &= ~1;
+   }


In my understanding, KVM allows userspace to set a CPUID feature bit for
guest even if hardware doesn't support the feature.

So what makes X86_FEATURE_APERFMPERF different here? Is there any
concern I miss?

-Xiaoyao


Whether software emulation for aperf/mperf or pass-through depends on host cpu 
aperf/mperf feature.
  
Software emulation: the period of aperf/mperf in guest mode are accumulated as emulated value




I know it that you want to ensure the correctness of exposure of 
aperf/mperf.


But there are so many features other than aperf/mperf that KVM reports 
the supported settings of them through KVM_GET_SUPPORTED_CPUID, but 
doesn't check nor force the correctness of userspace input. i.e., KVM 
allows userspace to set bogus CPUID settings as long as it doesn't break 
KVM (host kernel).


Indeed, bogus CPUID settings more than likely breaks the guest. But it's 
not KVM's fault. KVM just do what userspace wants.


IMO, If we really want to ensure the correctness of userspace provided 
CPUID settings, we need to return ERROR to userspace instead of fixing 
it siliently.


- Xiaoyao


答复: [PATCH][v5] KVM: X86: support APERF/MPERF registers

2020-05-30 Thread Li,Rongqing


> -邮件原件-
> 发件人: Xiaoyao Li [mailto:xiaoyao...@intel.com]
> 发送时间: 2020年5月30日 18:40
> 收件人: Li,Rongqing ; linux-kernel@vger.kernel.org;
> k...@vger.kernel.org; x...@kernel.org; h...@zytor.com; b...@alien8.de;
> mi...@redhat.com; t...@linutronix.de; jmatt...@google.com;
> wanpen...@tencent.com; vkuzn...@redhat.com;
> sean.j.christopher...@intel.com; pbonz...@redhat.com;
> wei.hua...@amd.com
> 主题: Re: [PATCH][v5] KVM: X86: support APERF/MPERF registers
> 
> On 5/30/2020 12:35 PM, Li RongQing wrote:
> > Guest kernel reports a fixed cpu frequency in /proc/cpuinfo, this is
> > confused to user when turbo is enable, and aperf/mperf can be used to
> > show current cpu frequency after 7d5905dc14a
> > "(x86 / CPU: Always show current CPU frequency in /proc/cpuinfo)"
> > so guest should support aperf/mperf capability
> >
> > This patch implements aperf/mperf by three mode: none, software
> > emulation, and pass-through
> >
> > None: default mode, guest does not support aperf/mperf
> >
> > Software emulation: the period of aperf/mperf in guest mode are
> > accumulated as emulated value
> >
> > Pass-though: it is only suitable for KVM_HINTS_REALTIME, Because that
> > hint guarantees we have a 1:1 vCPU:CPU binding and guaranteed no
> > over-commit.
> >
> > And a per-VM capability is added to configure aperfmperf mode
> >
> 
> [...]
> 
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index
> > cd708b0b460a..c960dda4251b 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -122,6 +122,14 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
> >MSR_IA32_MISC_ENABLE_MWAIT);
> > }
> >
> > +   best = kvm_find_cpuid_entry(vcpu, 6, 0);
> > +   if (best) {
> > +   if (guest_has_aperfmperf(vcpu->kvm) &&
> > +   boot_cpu_has(X86_FEATURE_APERFMPERF))
> > +   best->ecx |= 1;
> > +   else
> > +   best->ecx &= ~1;
> > +   }
> 
> In my understanding, KVM allows userspace to set a CPUID feature bit for
> guest even if hardware doesn't support the feature.
> 
> So what makes X86_FEATURE_APERFMPERF different here? Is there any
> concern I miss?
> 
> -Xiaoyao

Whether software emulation for aperf/mperf or pass-through depends on host cpu 
aperf/mperf feature.
 
Software emulation: the period of aperf/mperf in guest mode are accumulated as 
emulated value

-Li


Re: [PATCH][v5] KVM: X86: support APERF/MPERF registers

2020-05-30 Thread Xiaoyao Li

On 5/30/2020 12:35 PM, Li RongQing wrote:

Guest kernel reports a fixed cpu frequency in /proc/cpuinfo,
this is confused to user when turbo is enable, and aperf/mperf
can be used to show current cpu frequency after 7d5905dc14a
"(x86 / CPU: Always show current CPU frequency in /proc/cpuinfo)"
so guest should support aperf/mperf capability

This patch implements aperf/mperf by three mode: none, software
emulation, and pass-through

None: default mode, guest does not support aperf/mperf

Software emulation: the period of aperf/mperf in guest mode are
accumulated as emulated value

Pass-though: it is only suitable for KVM_HINTS_REALTIME, Because
that hint guarantees we have a 1:1 vCPU:CPU binding and guaranteed
no over-commit.

And a per-VM capability is added to configure aperfmperf mode



[...]


diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index cd708b0b460a..c960dda4251b 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -122,6 +122,14 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
   MSR_IA32_MISC_ENABLE_MWAIT);
}
  
+	best = kvm_find_cpuid_entry(vcpu, 6, 0);

+   if (best) {
+   if (guest_has_aperfmperf(vcpu->kvm) &&
+   boot_cpu_has(X86_FEATURE_APERFMPERF))
+   best->ecx |= 1;
+   else
+   best->ecx &= ~1;
+   }


In my understanding, KVM allows userspace to set a CPUID feature bit for 
guest even if hardware doesn't support the feature.


So what makes X86_FEATURE_APERFMPERF different here? Is there any 
concern I miss?


-Xiaoyao


/* Note, maxphyaddr must be updated before tdp_level. */
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
vcpu->arch.tdp_level = kvm_x86_ops.get_tdp_level(vcpu);


[PATCH][v5] KVM: X86: support APERF/MPERF registers

2020-05-29 Thread Li RongQing
Guest kernel reports a fixed cpu frequency in /proc/cpuinfo,
this is confused to user when turbo is enable, and aperf/mperf
can be used to show current cpu frequency after 7d5905dc14a
"(x86 / CPU: Always show current CPU frequency in /proc/cpuinfo)"
so guest should support aperf/mperf capability

This patch implements aperf/mperf by three mode: none, software
emulation, and pass-through

None: default mode, guest does not support aperf/mperf

Software emulation: the period of aperf/mperf in guest mode are
accumulated as emulated value

Pass-though: it is only suitable for KVM_HINTS_REALTIME, Because
that hint guarantees we have a 1:1 vCPU:CPU binding and guaranteed
no over-commit.

And a per-VM capability is added to configure aperfmperf mode

Signed-off-by: Li RongQing 
Signed-off-by: Chai Wen 
Signed-off-by: Jia Lina 
---
diff v4:
fix maybe-uninitialized warning

diff v3:
fix interception of MSR_IA32_MPERF/APERF in svm

diff v2:
support aperfmperf pass though
move common codes to kvm_get_msr_common

diff v1:
1. support AMD, but not test
2. support per-vm capability to enable

 Documentation/virt/kvm/api.rst  | 10 ++
 arch/x86/include/asm/kvm_host.h | 11 +++
 arch/x86/kvm/cpuid.c| 13 -
 arch/x86/kvm/svm/svm.c  |  8 
 arch/x86/kvm/vmx/vmx.c  |  6 ++
 arch/x86/kvm/x86.c  | 42 +
 arch/x86/kvm/x86.h  | 15 +++
 include/uapi/linux/kvm.h|  1 +
 8 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index d871dacb984e..f854f4da6fd8 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6126,3 +6126,13 @@ KVM can therefore start protected VMs.
 This capability governs the KVM_S390_PV_COMMAND ioctl and the
 KVM_MP_STATE_LOAD MP_STATE. KVM_SET_MP_STATE can fail for protected
 guests when the state change is invalid.
+
+8.23 KVM_CAP_APERFMPERF
+
+
+:Architectures: x86
+:Parameters: args[0] is aperfmperf mode;
+ 0 for not support, 1 for software emulation, 2 for pass-through
+:Returns: 0 on success; -1 on error
+
+This capability indicates that KVM supports APERF and MPERF MSR registers
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fd78bd44b2d6..14643f8af9c4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -824,6 +824,9 @@ struct kvm_vcpu_arch {
 
/* AMD MSRC001_0015 Hardware Configuration */
u64 msr_hwcr;
+
+   u64 v_mperf;
+   u64 v_aperf;
 };
 
 struct kvm_lpage_info {
@@ -889,6 +892,12 @@ enum kvm_irqchip_mode {
KVM_IRQCHIP_SPLIT,/* created with KVM_CAP_SPLIT_IRQCHIP */
 };
 
+enum kvm_aperfmperf_mode {
+   KVM_APERFMPERF_NONE,
+   KVM_APERFMPERF_SOFT,  /* software emulate aperfmperf */
+   KVM_APERFMPERF_PT,/* pass-through aperfmperf to guest */
+};
+
 #define APICV_INHIBIT_REASON_DISABLE0
 #define APICV_INHIBIT_REASON_HYPERV 1
 #define APICV_INHIBIT_REASON_NESTED 2
@@ -986,6 +995,8 @@ struct kvm_arch {
 
struct kvm_pmu_event_filter *pmu_event_filter;
struct task_struct *nx_lpage_recovery_thread;
+
+   enum kvm_aperfmperf_mode aperfmperf_mode;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index cd708b0b460a..c960dda4251b 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -122,6 +122,14 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
   MSR_IA32_MISC_ENABLE_MWAIT);
}
 
+   best = kvm_find_cpuid_entry(vcpu, 6, 0);
+   if (best) {
+   if (guest_has_aperfmperf(vcpu->kvm) &&
+   boot_cpu_has(X86_FEATURE_APERFMPERF))
+   best->ecx |= 1;
+   else
+   best->ecx &= ~1;
+   }
/* Note, maxphyaddr must be updated before tdp_level. */
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
vcpu->arch.tdp_level = kvm_x86_ops.get_tdp_level(vcpu);
@@ -557,7 +565,10 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array 
*array, u32 function)
case 6: /* Thermal management */
entry->eax = 0x4; /* allow ARAT */
entry->ebx = 0;
-   entry->ecx = 0;
+   if (boot_cpu_has(X86_FEATURE_APERFMPERF))
+   entry->ecx = 0x1;
+   else
+   entry->ecx = 0x0;
entry->edx = 0;
break;
/* function 7 has additional index. */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e9c0fb68387d..1d38fe3afc0d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1200,6 +1200,14 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
svm->msrpm = page_address(msrpm_pages);