from:"Xiaoyao Li"

Re: [RFCv2 13/13] KVM: unmap guest memory using poisoned pages

2021-04-19 Thread Xiaoyao Li


On 4/17/2021 1:30 AM, Sean Christopherson wrote:

On Fri, Apr 16, 2021, Kirill A. Shutemov wrote:

[...]

index fadaccb95a4c..cd2374802702 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -436,6 +436,8 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu 
*vcpu)
  }
  #endif
  
+#define KVM_NR_SHARED_RANGES 32

+
  /*
   * Note:
   * memslots are not sorted by id anymore, please use id_to_memslot()
@@ -513,6 +515,10 @@ struct kvm {
pid_t userspace_pid;
unsigned int max_halt_poll_ns;
u32 dirty_ring_size;
+   bool mem_protected;
+   void *id;
+   int nr_shared_ranges;
+   struct range shared_ranges[KVM_NR_SHARED_RANGES];


Hard no for me.  IMO, anything that requires KVM to track shared/pinned pages in
a separate tree/array is non-starter.  More specific to TDX #MCs, KVM should not
be the canonical reference for the state of a page.



Do you mean something in struct page to track if the page is shared or 
private?

Re: [PATCH v2 1/3] KVM: X86: Rename DR6_INIT to DR6_ACTIVE_LOW

2021-04-02 Thread Xiaoyao Li


On 2/3/2021 12:05 AM, Paolo Bonzini wrote:

On 02/02/21 16:02, Xiaoyao Li wrote:

On 2/2/2021 10:49 PM, Paolo Bonzini wrote:

On 02/02/21 10:04, Chenyi Qiang wrote:


 #define DR6_FIXED_1    0xfffe0ff0
-#define DR6_INIT    0x0ff0
+/*
+ * DR6_ACTIVE_LOW is actual the result of DR6_FIXED_1 | 
ACTIVE_LOW_BITS.

+ * We can regard all the current FIXED_1 bits as active_low bits even
+ * though in no case they will be turned into 0. But if they are 
defined

+ * in the future, it will require no code change.
+ * At the same time, it can be served as the init/reset value for DR6.
+ */
+#define DR6_ACTIVE_LOW    0x0ff0
 #define DR6_VOLATILE    0x0001e00f



Committed with some changes in the wording of the comment.

Also, DR6_FIXED_1 is (DR6_ACTIVE_LOW & ~DR6_VOLATILE).


Maybe we can add BUILD_BUG_ON() to make sure the correctness?


We can even

#define DR_FIXED_1  (DR6_ACTIVE_LOW & ~DR6_VOLATILE)

directly.  I have pushed this patch to kvm/queue, but the other two will 
have to wait for Fenghua's bare metal support.




Hi Paolo,

Fenghua's bare metal support is in tip tree now.
https://lore.kernel.org/lkml/20210322135325.682257-1-fenghua...@intel.com/

Will the rest KVM patches get into 5.13 together?

Re: [PATCH v1 1/1] x86/tdx: Handle MWAIT, MONITOR and WBINVD

2021-03-29 Thread Xiaoyao Li


On 3/27/2021 8:18 AM, Kuppuswamy Sathyanarayanan wrote:

In non-root TDX guest mode, MWAIT, MONITOR and WBINVD instructions
are not supported. So handle #VE due to these instructions as no ops.

Signed-off-by: Kuppuswamy Sathyanarayanan 

Reviewed-by: Andi Kleen 
---

Changes since previous series:
  * Suppressed MWAIT feature as per Andi's comment.
  * Added warning debug log for MWAIT #VE exception.

  arch/x86/kernel/tdx.c | 23 +++
  1 file changed, 23 insertions(+)

diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index e936b2f88bf6..fb7d22b846fc 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -308,6 +308,9 @@ void __init tdx_early_init(void)
  
  	setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
  
+	/* MWAIT is not supported in TDX platform, so suppress it */

+   setup_clear_cpu_cap(X86_FEATURE_MWAIT);


In fact, MWAIT bit returned by CPUID instruction is zero for TD guest. 
This is enforced by SEAM module.


Do we still need to safeguard it by setup_clear_cpu_cap() here?


+
tdg_get_info();
  
  	pv_ops.irq.safe_halt = tdg_safe_halt;

Re: [PATCH v2 1/3] KVM: X86: Rename DR6_INIT to DR6_ACTIVE_LOW

2021-02-02 Thread Xiaoyao Li


On 2/2/2021 10:49 PM, Paolo Bonzini wrote:

On 02/02/21 10:04, Chenyi Qiang wrote:


 #define DR6_FIXED_1    0xfffe0ff0
-#define DR6_INIT    0x0ff0
+/*
+ * DR6_ACTIVE_LOW is actual the result of DR6_FIXED_1 | ACTIVE_LOW_BITS.
+ * We can regard all the current FIXED_1 bits as active_low bits even
+ * though in no case they will be turned into 0. But if they are defined
+ * in the future, it will require no code change.
+ * At the same time, it can be served as the init/reset value for DR6.
+ */
+#define DR6_ACTIVE_LOW    0x0ff0
 #define DR6_VOLATILE    0x0001e00f



Committed with some changes in the wording of the comment.

Also, DR6_FIXED_1 is (DR6_ACTIVE_LOW & ~DR6_VOLATILE).


Maybe we can add BUILD_BUG_ON() to make sure the correctness?


Paolo

Re: [RESEND PATCH 1/2] KVM: X86: Add support for the emulation of DR6_BUS_LOCK bit

2021-01-27 Thread Xiaoyao Li


On 1/28/2021 3:25 PM, Paolo Bonzini wrote:

On 28/01/21 08:17, Xiaoyao Li wrote:


"Active low" means that the bit is usually 1 and goes to 0 when the 
condition (such as RTM or bus lock) happens.  For almost all those 
DR6 bits the value is in fact always 1, but if they are defined in 
the future it will require no code change.


Why not keep use DR6_INIT, or DR6_RESET_VALUE? or any other better name.

It's just the default clear value of DR6 that no debug condition is hit.


I preferred "DR6_ACTIVE_LOW" because the value is used only once or 
twice to initialize dr6, and many times to invert those bits.  For example:


vcpu->arch.dr6 &= ~DR_TRAP_BITS;
vcpu->arch.dr6 |= DR6_ACTIVE_LOW;
vcpu->arch.dr6 |= payload;
vcpu->arch.dr6 ^= payload & DR6_ACTIVE_LOW;

payload = vcpu->arch.dr6;
payload &= ~DR6_BT;
payload ^= DR6_ACTIVE_LOW;

The name conveys that it's not just the initialization value; it's also 
the XOR mask between the #DB exit qualification (which we also use as 
the "payload") and DR6.


Fair enough.

Re: [RESEND PATCH 1/2] KVM: X86: Add support for the emulation of DR6_BUS_LOCK bit

2021-01-27 Thread Xiaoyao Li


On 1/27/2021 6:04 PM, Paolo Bonzini wrote:

On 27/01/21 04:41, Xiaoyao Li wrote:

On 1/27/2021 12:31 AM, Paolo Bonzini wrote:

On 08/01/21 07:49, Chenyi Qiang wrote:

To avoid breaking the CPUs without bus lock detection, activate the
DR6_BUS_LOCK bit (bit 11) conditionally in DR6_FIXED_1 bits.

The set/clear of DR6_BUS_LOCK is similar to the DR6_RTM in DR6
register. The processor clears DR6_BUS_LOCK when bus lock debug
exception is generated. (For all other #DB the processor sets this bit
to 1.) Software #DB handler should set this bit before returning to the
interrupted task.

For VM exit caused by debug exception, bit 11 of the exit qualification
is set to indicate that a bus lock debug exception condition was
detected. The VMM should emulate the exception by clearing bit 11 of 
the

guest DR6.


Please rename DR6_INIT to DR6_ACTIVE_LOW, and then a lot of changes 
become simpler:


Paolo,

What do you want to convey with the new name DR6_ACTIVE_LOW? To be 
honest, the new name is confusing to me.


"Active low" means that the bit is usually 1 and goes to 0 when the 
condition (such as RTM or bus lock) happens.  For almost all those DR6 
bits the value is in fact always 1, but if they are defined in the 
future it will require no code change.


Why not keep use DR6_INIT, or DR6_RESET_VALUE? or any other better name.

It's just the default clear value of DR6 that no debug condition is hit.


Paolo


-    dr6 |= DR6_BD | DR6_RTM;
+    dr6 |= DR6_BD | DR6_RTM | DR6_BUS_LOCK;


dr6 |= DR6_BD | DR6_ACTIVE_LOW;

Re: [RESEND PATCH 1/2] KVM: X86: Add support for the emulation of DR6_BUS_LOCK bit

2021-01-26 Thread Xiaoyao Li


On 1/27/2021 12:31 AM, Paolo Bonzini wrote:

On 08/01/21 07:49, Chenyi Qiang wrote:

To avoid breaking the CPUs without bus lock detection, activate the
DR6_BUS_LOCK bit (bit 11) conditionally in DR6_FIXED_1 bits.

The set/clear of DR6_BUS_LOCK is similar to the DR6_RTM in DR6
register. The processor clears DR6_BUS_LOCK when bus lock debug
exception is generated. (For all other #DB the processor sets this bit
to 1.) Software #DB handler should set this bit before returning to the
interrupted task.

For VM exit caused by debug exception, bit 11 of the exit qualification
is set to indicate that a bus lock debug exception condition was
detected. The VMM should emulate the exception by clearing bit 11 of the
guest DR6.


Please rename DR6_INIT to DR6_ACTIVE_LOW, and then a lot of changes 
become simpler:


Paolo,

What do you want to convey with the new name DR6_ACTIVE_LOW? To be 
honest, the new name is confusing to me.



-    dr6 |= DR6_BD | DR6_RTM;
+    dr6 |= DR6_BD | DR6_RTM | DR6_BUS_LOCK;


dr6 |= DR6_BD | DR6_ACTIVE_LOW;

Re: [PATCH RFC v3 2/4] x86/bus_lock: Handle warn and fatal in #DB for bus lock

2020-11-03 Thread Xiaoyao Li


On 10/31/2020 8:27 AM, Fenghua Yu wrote:

...


diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 3c70fb34028b..1c3442000972 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -953,6 +953,13 @@ static __always_inline void exc_debug_user(struct pt_regs 
*regs,
goto out_irq;
}
  
+	/*

+* Handle bus lock. #DB for bus lock can only be triggered from
+* userspace.
+*/
+   if (!(dr6 & DR_BUS_LOCK))


it should be

if (dr6 & DR_BUS_LOCK)

since you keep DR6.[bit 11] reserved in this version. bit 11 of 
debug_read_clear_dr6() being set to 1 means bus lock detected.




+   handle_bus_lock(regs);
+
/* Add the virtual_dr6 bits for signals. */
dr6 |= current->thread.virtual_dr6;
if (dr6 & (DR_STEP | DR_TRAP_BITS) || icebp)

Re: [PATCH] KVM: VMX: Enable Notify VM exit

2020-11-02 Thread Xiaoyao Li


On 11/3/2020 2:08 PM, Tao Xu wrote:



On 11/3/20 12:43 AM, Andy Lutomirski wrote:

On Sun, Nov 1, 2020 at 10:14 PM Tao Xu  wrote:



...



+static int handle_notify(struct kvm_vcpu *vcpu)
+{
+   unsigned long exit_qualification = 
vmcs_readl(EXIT_QUALIFICATION);

+
+   /*
+    * Notify VM exit happened while executing iret from NMI,
+    * "blocked by NMI" bit has to be set before next VM entry.
+    */
+   if (exit_qualification & NOTIFY_VM_CONTEXT_VALID) {
+   if (enable_vnmi &&
+   (exit_qualification & INTR_INFO_UNBLOCK_NMI))
+   vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
+ GUEST_INTR_STATE_NMI);


This needs actual documentation in the SDM or at least ISE please.



Hi Andy,

Do you mean SDM or ISE should call out it needs to restore "blocked by 
NMI" if bit 12 of exit qualification is set and VMM decides to re-enter 
the guest?


you can refer to SDM 27.2.3 "Information about NMI unblocking Due to 
IRET" in latest SDM 325462-072US



Notify VM-Exit is defined in ISE, chapter 9.2:
https://software.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf 



I will add this information into commit message. Thank you for reminding 
me.

Re: [PATCH] KVM: VMX: Enable Notify VM exit

2020-11-02 Thread Xiaoyao Li


On 11/3/2020 2:25 AM, Paolo Bonzini wrote:

On 02/11/20 19:01, Andy Lutomirski wrote:

What's the point?  Surely the kernel should reliably mitigate the
flaw, and the kernel should decide how to do so.


There is some slowdown in trapping #DB and #AC unconditionally.  Though
for these two cases nobody should care so I agree with keeping the code
simple and keeping the workaround.


OK.


Also, why would this trigger after more than a few hundred cycles,
something like the length of the longest microcode loop?  HZ*10 seems
like a very generous estimate already.



As Sean said in another mail, 1/10 tick should be a placeholder.
Glad to see all of you think it should be smaller. We'll come up with 
more reasonable candidate once we can test on real silicon.

Re: [PATCH] KVM: VMX: Enable Notify VM exit

2020-11-02 Thread Xiaoyao Li


On 11/3/2020 2:12 PM, Tao Xu wrote:



On 11/3/20 6:53 AM, Jim Mattson wrote:

On Sun, Nov 1, 2020 at 10:14 PM Tao Xu  wrote:


There are some cases that malicious virtual machines can cause CPU stuck
(event windows don't open up), e.g., infinite loop in microcode when
nested #AC (CVE-2015-5307). No event window obviously means no events,
e.g. NMIs, SMIs, and IRQs will all be blocked, may cause the related
hardware CPU can't be used by host or other VM.

To resolve those cases, it can enable a notify VM exit if no
event window occur in VMX non-root mode for a specified amount of
time (notify window).

Expose a module param for setting notify window, default setting it to
the time as 1/10 of periodic tick, and user can set it to 0 to disable
this feature.

TODO:
1. The appropriate value of notify window.
2. Another patch to disable interception of #DB and #AC when notify
VM-Exiting is enabled.

Co-developed-by: Xiaoyao Li 
Signed-off-by: Tao Xu 
Signed-off-by: Xiaoyao Li 


Do you have test cases?



yes we have. The nested #AC (CVE-2015-5307) is a known test case, though 
we need to tweak KVM to disable interception #AC for it.


Not yet, because we are waiting real silicon to do some test. I should 
add RFC next time before I test it in hardware.

Re: [PATCH] KVM: X86: Expose KVM_HINTS_REALTIME in KVM_GET_SUPPORTED_CPUID

2020-10-22 Thread Xiaoyao Li


On 10/22/2020 10:06 PM, Paolo Bonzini wrote:

On 22/10/20 15:31, Xiaoyao Li wrote:


It's common for userspace to copy all supported CPUID bits to
KVM_SET_CPUID2, I don't think this is the right behavior for
KVM_HINTS_REALTIME.


It reminds of X86_FEATURE_WAITPKG, which is added to supported CPUID
recently as a fix but QEMU exposes it to guest only when "-overcommit
cpu-pm"


WAITPKG is not included in KVM_GET_SUPPORTED_CPUID either.  QEMU detects
it through the MSR_IA32_UMWAIT register.


Doesn't 0abcc8f65cc2 ("KVM: VMX: enable X86_FEATURE_WAITPKG in KVM 
capabilities") add WAITPKG to KVM_GET_SUPPORTED_CPUID?



Paolo

Re: [PATCH] KVM: X86: Expose KVM_HINTS_REALTIME in KVM_GET_SUPPORTED_CPUID

2020-10-22 Thread Xiaoyao Li


On 10/22/2020 9:02 PM, Paolo Bonzini wrote:

On 22/10/20 03:34, Wanpeng Li wrote:

From: Wanpeng Li 

Per KVM_GET_SUPPORTED_CPUID ioctl documentation:

This ioctl returns x86 cpuid features which are supported by both the
hardware and kvm in its default configuration.

A well-behaved userspace should not set the bit if it is not supported.

Suggested-by: Jim Mattson 
Signed-off-by: Wanpeng Li 


It's common for userspace to copy all supported CPUID bits to
KVM_SET_CPUID2, I don't think this is the right behavior for
KVM_HINTS_REALTIME.


It reminds of X86_FEATURE_WAITPKG, which is added to supported CPUID 
recently as a fix but QEMU exposes it to guest only when "-overcommit 
cpu-pm"



(But maybe this was discussed already; if so, please point me to the
previous discussion).

Paolo

Re: [RFC v2 2/2] KVM: VMX: Enable bus lock VM exit

2020-09-02 Thread Xiaoyao Li


On 9/3/2020 6:44 AM, Sean Christopherson wrote:

On Tue, Sep 01, 2020 at 10:43:12AM +0200, Vitaly Kuznetsov wrote:

@@ -6809,6 +6824,19 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
if (unlikely(vmx->exit_reason.failed_vmentry))
return EXIT_FASTPATH_NONE;
  
+	/*

+* check the exit_reason to see if there is a bus lock
+* happened in guest.
+*/
+   if (kvm_bus_lock_exit_enabled(vmx->vcpu.kvm)) {
+   if (vmx->exit_reason.bus_lock_detected) {
+   vcpu->stat.bus_locks++;


Why bother with stats?  Every bus lock exits to userspace, having quick
stats doesn't seem all that interesting.


OK. We will remove it.


+   vcpu->arch.bus_lock_detected = true;
+   } else {
+   vcpu->arch.bus_lock_detected = false;


This is a fast path so I'm wondering if we can move bus_lock_detected
clearing somewhere else.


Why even snapshot vmx->exit_reason.bus_lock_detected?  I don't see any
reason why vcpu_enter_guest() needs to handle the exit to userspace, e.g.
it's just as easily handled in VMX code.


Because we want to handle the exit to userspace only in one place, i.e., 
after kvm_x86_ops.handle_exit(vcpu, exit_fastpath). Otherwise, we would 
have to check vmx->exit_reason.bus_lock_detected in every other handler, 
at least in those can preempt the bus lock VM-exit theoretically.





+   }
+   }
+
vmx->loaded_vmcs->launched = 1;
vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
  
@@ -8060,6 +8088,9 @@ static __init int hardware_setup(void)

kvm_tsc_scaling_ratio_frac_bits = 48;
}
  
+	if (cpu_has_vmx_bus_lock_detection())

+   kvm_has_bus_lock_exit = true;
+
set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
  
  	if (enable_ept)


...


@@ -4990,6 +4996,12 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
kvm->arch.exception_payload_enabled = cap->args[0];
r = 0;
break;
+   case KVM_CAP_X86_BUS_LOCK_EXIT:
+   if (!kvm_has_bus_lock_exit)
+   return -EINVAL;


... because userspace can check for -EINVAL when enabling the cap. Or we
can return e.g. -EOPNOTSUPP here. I don't have a strong opinion on the matter..


+   kvm->arch.bus_lock_exit = cap->args[0];


Assuming we even want to make this per-VM, I think it'd make sense to make
args[0] a bit mask, e.g. to provide "off" and "exit" (this behavior) while
allowing for future modes, e.g. log-only.


Good idea, will do it in next version.


+   r = 0;
+   break;
default:
r = -EINVAL;
break;
@@ -7732,12 +7744,23 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu)

Re: [PATCH 1/5] KVM: nVMX: Fix VMX controls MSRs setup when nested VMX enabled

2020-08-31 Thread Xiaoyao Li


On 8/28/2020 4:56 PM, Chenyi Qiang wrote:

KVM supports the nested VM_{EXIT, ENTRY}_LOAD_IA32_PERF_GLOBAL_CTRL and
VM_{ENTRY_LOAD, EXIT_CLEAR}_BNDCFGS, but they doesn't expose during
the setup of nested VMX controls MSR.

Signed-off-by: Chenyi Qiang 


Reviewed-by: Xiaoyao Li 


---
  arch/x86/kvm/vmx/nested.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 23b58c28a1c9..6e0e71f4d45f 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6310,7 +6310,8 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs 
*msrs, u32 ept_caps)
  #ifdef CONFIG_X86_64
VM_EXIT_HOST_ADDR_SPACE_SIZE |
  #endif
-   VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
+   VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
+   VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
msrs->exit_ctls_high |=
VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER |
@@ -6329,7 +6330,8 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs 
*msrs, u32 ept_caps)
  #ifdef CONFIG_X86_64
VM_ENTRY_IA32E_MODE |
  #endif
-   VM_ENTRY_LOAD_IA32_PAT;
+   VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS |
+   VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
msrs->entry_ctls_high |=
(VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER);

Re: [PATCH 1/5] KVM: nVMX: Fix VMX controls MSRs setup when nested VMX enabled

2020-08-28 Thread Xiaoyao Li


On 8/29/2020 9:49 AM, Chenyi Qiang wrote:



On 8/29/2020 1:43 AM, Jim Mattson wrote:
On Fri, Aug 28, 2020 at 1:54 AM Chenyi Qiang  
wrote:


KVM supports the nested VM_{EXIT, ENTRY}_LOAD_IA32_PERF_GLOBAL_CTRL and
VM_{ENTRY_LOAD, EXIT_CLEAR}_BNDCFGS, but they doesn't expose during
the setup of nested VMX controls MSR.



Aren't these features added conditionally in
nested_vmx_entry_exit_ctls_update() and
nested_vmx_pmu_entry_exit_ctls_update()?



Yes, but I assume vmcs_config.nested should reflect the global 
capability of VMX MSR. KVM supports these two controls, so should be 
exposed here.


No. I prefer to say they are removed conditionally in 
nested_vmx_entry_exit_ctls_update() and 
nested_vmx_pmu_entry_exit_ctls_update().


Userspace calls vmx_get_msr_feature() to query what KVM supports for 
these VMX MSR. In vmx_get_msr_feature(), it returns the value of 
vmcs_config.nested. As KVM supports these two bits, we should advertise 
them in vmcs_config.nested and report to userspace.

[PATCH v10 2/9] x86/split_lock: Remove bogus case in handle_guest_split_lock()

2020-08-18 Thread Xiaoyao Li

The bogus case can never happen, i.e., when sld_state == sld_off, guest
won't trigger split lock #AC and of course no handle_guest_split_lock()
will be called.

Besides, drop bogus case also makes future patch easier to remove
sld_state if we reach the alignment that it must be sld_warn or
sld_fatal when handle_guest_split_lock() is called.

Signed-off-by: Xiaoyao Li 
---
  The alternative would be to remove the "SLD enabled" check from KVM so
  that a truly unexpected/bogus #AC would generate a warn.  It's not clear
  whether or not calling handle_guest_split_lock() iff SLD is enabled was
  intended in the long term.
---
 arch/x86/kernel/cpu/intel.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index c48b3267c141..5dab842ba7e1 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1104,9 +1104,8 @@ bool handle_guest_split_lock(unsigned long ip)
return true;
}
 
-   pr_warn_once("#AC: %s/%d %s split_lock trap at address: 0x%lx\n",
-current->comm, current->pid,
-sld_state == sld_fatal ? "fatal" : "bogus", ip);
+   pr_warn_once("#AC: %s/%d fatal split_lock trap at address: 0x%lx\n",
+current->comm, current->pid, ip);
 
current->thread.error_code = 0;
current->thread.trap_nr = X86_TRAP_AC;
-- 
2.18.4

[PATCH v10 9/9] x86/split_lock: Enable split lock detection initialization when running as a guest on KVM

2020-08-18 Thread Xiaoyao Li

When running as guest, enumerating feature split lock detection through
CPU model is not easy since CPU model is configurable by host VMM.

If running upon KVM, it can be enumerated through
KVM_FEATURE_SPLIT_LOCK_DETECT, and if KVM_HINTS_SLD_FATAL is set, it
needs to be set to sld_fatal mode.

Signed-off-by: Xiaoyao Li 
---
 arch/x86/include/asm/cpu.h  |  2 ++
 arch/x86/kernel/cpu/intel.c | 12 ++--
 arch/x86/kernel/kvm.c   |  3 +++
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index 2971a29d5094..5520cc1cbb68 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -42,12 +42,14 @@ unsigned int x86_model(unsigned int sig);
 unsigned int x86_stepping(unsigned int sig);
 #ifdef CONFIG_CPU_SUP_INTEL
 extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern void __init split_lock_setup(bool fatal);
 extern void switch_to_sld(unsigned long tifn);
 extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
 extern bool handle_guest_split_lock(unsigned long ip);
 extern bool split_lock_virt_switch(bool on);
 #else
 static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline void __init split_lock_setup(bool fatal) {}
 static inline void switch_to_sld(unsigned long tifn) {}
 static inline bool handle_user_split_lock(struct pt_regs *regs, long 
error_code)
 {
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index b16f3dd9b9c2..8cadd2fd8be6 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1011,12 +1011,18 @@ static bool split_lock_verify_msr(bool on)
return ctrl == tmp;
 }
 
-static void __init split_lock_setup(void)
+void __init split_lock_setup(bool fatal)
 {
enum split_lock_detect_state state = sld_warn;
char arg[20];
int i, ret;
 
+   if (fatal) {
+   state = sld_fatal;
+   pr_info("forced on, sending SIGBUS on user-space 
split_locks\n");
+   goto set_cap;
+   }
+
if (!split_lock_verify_msr(false)) {
pr_info("MSR access failed: Disabled\n");
return;
@@ -1052,6 +1058,7 @@ static void __init split_lock_setup(void)
return;
}
 
+set_cap:
cpu_model_supports_sld = true;
setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
if (state == sld_fatal)
@@ -1183,6 +1190,7 @@ void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
const struct x86_cpu_id *m;
u64 ia32_core_caps;
 
+   /* Note, paravirt support can enable SLD, e.g., see kvm_guest_init(). */
if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
return;
 
@@ -1204,5 +1212,5 @@ void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
return;
}
 
-   split_lock_setup();
+   split_lock_setup(false);
 }
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 08320b0b2b27..25cd5d7f1e51 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -687,6 +687,9 @@ static void __init kvm_guest_init(void)
 * overcommitted.
 */
hardlockup_detector_disable();
+
+   if (kvm_para_has_feature(KVM_FEATURE_SPLIT_LOCK_DETECT))
+   split_lock_setup(kvm_para_has_hint(KVM_HINTS_SLD_FATAL));
 }
 
 static noinline uint32_t __kvm_cpuid_base(void)
-- 
2.18.4

[PATCH v10 7/9] KVM: VMX: virtualize split lock detection

2020-08-18 Thread Xiaoyao Li

TEST_CTRL MSR is per-core scope, i.e., the sibling threads in the same
physical core share the same MSR. This requires additional constraint
when exposing it to guest.

1) When host SLD state is sld_off (no X86_FEATURE_SPLIT_LOCK_DETECT),
   feature split lock detection is unsupported/disabled.
   Cannot expose it to guest.

2) When host SLD state is sld_warn (has X86_FEATURE_SPLIT_LOCK_DETECT
   but no X86_FEATURE_SLD_FATAL), feature split lock detection can be
   exposed to guest only when nosmt due to the per-core scope.

   In this case, guest's setting can be propagated into real hardware
   MSR. Further, to avoid the potiential MSR_TEST_CTRL.SLD toggling
   overhead during every vm-enter and vm-exit, it loads and keeps
   guest's SLD setting when in vcpu run loop and guest_state_loaded,
   i.e., between vmx_prepare_switch_to_guest() and
   vmx_prepare_switch_to_host().

3) when host SLD state is sld_fatal (has X86_FEATURE_SLD_FATAL),
   feature split lock detection can be exposed to guest regardless of
   SMT but KVM_HINTS_SLD_FATAL needs to be set.

   In this case, guest can still set and clear MSR_TEST_CTRL.SLD bit,
   but the bit value never be propagated to real MSR. KVM always keeps
   SLD bit turned on for guest vcpu. The reason why not force guest
   MSR_CTRL.SLD bit to 1 is that guest needs to set this bit to 1 itself
   to tell KVM it's SLD-aware.

Signed-off-by: Xiaoyao Li 
---
 arch/x86/kvm/cpuid.c   |  6 
 arch/x86/kvm/vmx/vmx.c | 68 --
 arch/x86/kvm/vmx/vmx.h |  3 ++
 arch/x86/kvm/x86.c |  6 +++-
 arch/x86/kvm/x86.h | 16 ++
 5 files changed, 89 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 3fd6eec202d7..58d902fb12c0 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -751,9 +751,15 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array 
*array, u32 function)
if (sched_info_on())
entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
 
+   if (kvm_split_lock_detect_supported())
+   entry->eax |= (1 << KVM_FEATURE_SPLIT_LOCK_DETECT);
+
entry->ebx = 0;
entry->ecx = 0;
entry->edx = 0;
+
+   if (boot_cpu_has(X86_FEATURE_SLD_FATAL))
+   entry->edx |= (1 << KVM_HINTS_SLD_FATAL);
break;
case 0x8000:
entry->eax = min(entry->eax, 0x801f);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e113c9171bcc..e806c2855c9e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1123,6 +1123,29 @@ void vmx_set_host_fs_gs(struct vmcs_host_state *host, 
u16 fs_sel, u16 gs_sel,
}
 }
 
+static inline u64 vmx_msr_test_ctrl_valid_bits(struct vcpu_vmx *vmx)
+{
+   u64 valid_bits = 0;
+
+   if (vmx->guest_has_sld)
+   valid_bits |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+   return valid_bits;
+}
+
+static inline bool guest_sld_on(struct vcpu_vmx *vmx)
+{
+   return vmx->msr_test_ctrl & MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+}
+
+static inline void vmx_update_guest_sld(struct vcpu_vmx *vmx)
+{
+   preempt_disable();
+   if (vmx->guest_state_loaded)
+   split_lock_set_guest(guest_sld_on(vmx));
+   preempt_enable();
+}
+
 void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -1191,6 +1214,10 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 #endif
 
vmx_set_host_fs_gs(host_state, fs_sel, gs_sel, fs_base, gs_base);
+
+   if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) && vmx->guest_has_sld)
+   vmx->host_sld_on = split_lock_set_guest(guest_sld_on(vmx));
+
vmx->guest_state_loaded = true;
 }
 
@@ -1229,6 +1256,10 @@ static void vmx_prepare_switch_to_host(struct vcpu_vmx 
*vmx)
wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base);
 #endif
load_fixmap_gdt(raw_smp_processor_id());
+
+   if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) && vmx->guest_has_sld)
+   split_lock_restore_host(vmx->host_sld_on);
+
vmx->guest_state_loaded = false;
vmx->guest_msrs_ready = false;
 }
@@ -1833,7 +1864,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
 
switch (msr_info->index) {
case MSR_TEST_CTRL:
-   msr_info->data = 0;
+   msr_info->data = vmx->msr_test_ctrl;
break;
 #ifdef CONFIG_X86_64
case MSR_FS_BASE:
@@ -1999,9 +2030,12 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
 
switch (msr_index) {
case MSR_TEST_CTRL:
-   if (data)
+   if (data & ~vmx_msr_test_ctrl_valid_bits(vmx))
return 1;

[PATCH v10 6/9] KVM: VMX: Enable MSR TEST_CTRL for guest

2020-08-18 Thread Xiaoyao Li

Unconditionally allow the guest to read and zero-write MSR TEST_CTRL.

This matches the fact that most Intel CPUs support MSR TEST_CTRL, and
it also alleviates the effort to handle wrmsr/rdmsr when split lock
detection is exposed to the guest in a future patch.

Signed-off-by: Xiaoyao Li 
---
 arch/x86/kvm/vmx/vmx.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 46ba2e03a892..e113c9171bcc 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1832,6 +1832,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
u32 index;
 
switch (msr_info->index) {
+   case MSR_TEST_CTRL:
+   msr_info->data = 0;
+   break;
 #ifdef CONFIG_X86_64
case MSR_FS_BASE:
msr_info->data = vmcs_readl(GUEST_FS_BASE);
@@ -1995,6 +1998,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
u32 index;
 
switch (msr_index) {
+   case MSR_TEST_CTRL:
+   if (data)
+   return 1;
+
+   break;
case MSR_EFER:
ret = kvm_set_msr_common(vcpu, msr_info);
break;
-- 
2.18.4

[PATCH v10 8/9] x86/split_lock: Set cpu_model_supports_sld to true after the existence of split lock detection is verified BSP

2020-08-18 Thread Xiaoyao Li

It should be more resonable to set cpu_model_supports_sld to true after
BSP is verified to have feature split lock detection.

It also avoids externing the cpu_model_supports_sld when enumerate the
split lock detection in a guest via PV interface in a future patch.

Signed-off-by: Xiaoyao Li 
---
 arch/x86/kernel/cpu/intel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 5f44e0de04b9..b16f3dd9b9c2 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1052,6 +1052,7 @@ static void __init split_lock_setup(void)
return;
}
 
+   cpu_model_supports_sld = true;
setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
if (state == sld_fatal)
setup_force_cpu_cap(X86_FEATURE_SLD_FATAL);
@@ -1203,6 +1204,5 @@ void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
return;
}
 
-   cpu_model_supports_sld = true;
split_lock_setup();
 }
-- 
2.18.4

[PATCH v10 5/9] x86/kvm: Introduce paravirt split lock detection enumeration

2020-08-18 Thread Xiaoyao Li

Introduce KVM_FEATURE_SPLIT_LOCK_DETECT, with which guests running
on KVM can enumerate the avaliablility of feature split lock detection.

Introduce KVM_HINTS_SLD_FATAL, which tells whether host is sld_fatal mode,
i.e., whether split lock detection is forced on for guest vcpu.

Signed-off-by: Xiaoyao Li 
---
 Documentation/virt/kvm/cpuid.rst | 29 
 arch/x86/include/uapi/asm/kvm_para.h |  8 +---
 2 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index a7dff9186bed..c214f1c70703 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -92,6 +92,12 @@ KVM_FEATURE_ASYNC_PF_INT  14  guest checks 
this feature bit
   async pf acknowledgment msr
   0x4b564d07.
 
+KVM_FEATURE_SPLIT_LOCK_DETECT 15  guest checks this feature bit for
+  available of split lock 
detection.
+
+  KVM doesn't support enumerating
+ split lock detection via CPU model
+
 KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24  host will warn if no guest-side
   per-cpu warps are expeced in
   kvmclock
@@ -103,11 +109,18 @@ KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24  host will 
warn if no guest-side
 
 Where ``flag`` here is defined as below:
 
-==  =
-flag   valuemeaning
-==  =
-KVM_HINTS_REALTIME 0guest checks this feature bit to
-determine that vCPUs are never
-preempted for an unlimited time
-allowing optimizations
-==  =
+  =
+flag valuemeaning
+  =
+KVM_HINTS_REALTIME   0guest checks this feature bit to
+  determine that vCPUs are never
+  preempted for an unlimited time
+  allowing optimizations
+
+KVM_HINTS_SLD_FATAL  1set if split lock detection is
+  forced on in the host, in which
+ case KVM will kill the guest if it
+ generates a split lock #AC with
+ SLD disabled from guest's
+ perspective
+  =
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 812e9b4c1114..328ddfaacd7b 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -32,14 +32,16 @@
 #define KVM_FEATURE_POLL_CONTROL   12
 #define KVM_FEATURE_PV_SCHED_YIELD 13
 #define KVM_FEATURE_ASYNC_PF_INT   14
-
-#define KVM_HINTS_REALTIME  0
-
+#define KVM_FEATURE_SPLIT_LOCK_DETECT  15
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
  */
 #define KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24
 
+/* KVM feature hints in CPUID.0x4001.EDX */
+#define KVM_HINTS_REALTIME 0
+#define KVM_HINTS_SLD_FATAL1
+
 #define MSR_KVM_WALL_CLOCK  0x11
 #define MSR_KVM_SYSTEM_TIME 0x12
 
-- 
2.18.4

[PATCH v10 3/9] x86/split_lock: Introduce flag X86_FEATURE_SLD_FATAL and drop sld_state

2020-08-18 Thread Xiaoyao Li

Introduce a synthetic feature flag X86_FEATURE_SLD_FATAL, which means
kernel is in sld_fatal mode if set.

Now sld_state is not needed any more that the state of SLD can be
inferred from X86_FEATURE_SPLIT_LOCK_DETECT and X86_FEATURE_SLD_FATAL.

Suggested-by: Sean Christopherson 
Signed-off-by: Xiaoyao Li 
---
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/kernel/cpu/intel.c| 16 ++--
 2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 2901d5df4366..caf9f4e3e876 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -288,6 +288,7 @@
 #define X86_FEATURE_FENCE_SWAPGS_USER  (11*32+ 4) /* "" LFENCE in user entry 
SWAPGS path */
 #define X86_FEATURE_FENCE_SWAPGS_KERNEL(11*32+ 5) /* "" LFENCE in 
kernel entry SWAPGS path */
 #define X86_FEATURE_SPLIT_LOCK_DETECT  (11*32+ 6) /* #AC for split lock */
+#define X86_FEATURE_SLD_FATAL  (11*32+ 7) /* split lock detection in 
fatal mode */
 
 /* Intel-defined CPU features, CPUID level 0x0007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX512_BF16(12*32+ 5) /* AVX512 BFLOAT16 
instructions */
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 5dab842ba7e1..06de03974e66 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -42,12 +42,6 @@ enum split_lock_detect_state {
sld_fatal,
 };
 
-/*
- * Default to sld_off because most systems do not support split lock detection
- * split_lock_setup() will switch this to sld_warn on systems that support
- * split lock detect, unless there is a command line override.
- */
-static enum split_lock_detect_state sld_state __ro_after_init = sld_off;
 static u64 msr_test_ctrl_cache __ro_after_init;
 
 /*
@@ -1058,8 +1052,9 @@ static void __init split_lock_setup(void)
return;
}
 
-   sld_state = state;
setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+   if (state == sld_fatal)
+   setup_force_cpu_cap(X86_FEATURE_SLD_FATAL);
 }
 
 /*
@@ -1080,7 +1075,7 @@ static void sld_update_msr(bool on)
 static void split_lock_init(void)
 {
if (cpu_model_supports_sld)
-   split_lock_verify_msr(sld_state != sld_off);
+   
split_lock_verify_msr(boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT));
 }
 
 static void split_lock_warn(unsigned long ip)
@@ -1099,7 +1094,7 @@ static void split_lock_warn(unsigned long ip)
 
 bool handle_guest_split_lock(unsigned long ip)
 {
-   if (sld_state == sld_warn) {
+   if (!boot_cpu_has(X86_FEATURE_SLD_FATAL)) {
split_lock_warn(ip);
return true;
}
@@ -1116,7 +,8 @@ EXPORT_SYMBOL_GPL(handle_guest_split_lock);
 
 bool handle_user_split_lock(struct pt_regs *regs, long error_code)
 {
-   if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+   if ((regs->flags & X86_EFLAGS_AC) ||
+   boot_cpu_has(X86_FEATURE_SLD_FATAL))
return false;
split_lock_warn(regs->ip);
return true;
-- 
2.18.4

[PATCH v10 4/9] x86/split_lock: Introduce split_lock_virt_switch() and two wrappers

2020-08-18 Thread Xiaoyao Li

Introduce split_lock_virt_switch(), which is used for toggling split
lock detection setting as well as updating TIF_SLD_DISABLED flag to make
them consistent. Note, it can only be used in sld warn mode, i.e.,
X86_FEATURE_SPLIT_LOCK_DETECT && !X86_FEATURE_SLD_FATAL.

The FATAL check is handled by wrappers, split_lock_set_guest() and
split_lock_restore_host(), that will be used by KVM when virtualizing
split lock detection for guest in the future.

Signed-off-by: Xiaoyao Li 
---
 arch/x86/include/asm/cpu.h  | 34 ++
 arch/x86/kernel/cpu/intel.c | 20 
 2 files changed, 54 insertions(+)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index da78ccbd493b..2971a29d5094 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -45,6 +45,7 @@ extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 
*c);
 extern void switch_to_sld(unsigned long tifn);
 extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
 extern bool handle_guest_split_lock(unsigned long ip);
+extern bool split_lock_virt_switch(bool on);
 #else
 static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
 static inline void switch_to_sld(unsigned long tifn) {}
@@ -57,7 +58,40 @@ static inline bool handle_guest_split_lock(unsigned long ip)
 {
return false;
 }
+
+static inline bool split_lock_virt_switch(bool on) { return false; }
 #endif
+
+/**
+ * split_lock_set_guest - Set SLD state for a guest
+ * @guest_sld_on:  If SLD is on in the guest
+ *
+ * returns:%true if SLD was enabled in the task
+ *
+ * Must be called when X86_FEATURE_SPLIT_LOCK_DETECT is available.
+ */
+static inline bool split_lock_set_guest(bool guest_sld_on)
+{
+   if (static_cpu_has(X86_FEATURE_SLD_FATAL))
+   return true;
+
+   return split_lock_virt_switch(guest_sld_on);
+}
+
+/**
+ * split_lock_restore_host - Restore host SLD state
+ * @host_sld_on:   If SLD is on in the host
+ *
+ * Must be called when X86_FEATURE_SPLIT_LOCK_DETECT is available.
+ */
+static inline void split_lock_restore_host(bool host_sld_on)
+{
+   if (static_cpu_has(X86_FEATURE_SLD_FATAL))
+   return;
+
+   split_lock_virt_switch(host_sld_on);
+}
+
 #ifdef CONFIG_IA32_FEAT_CTL
 void init_ia32_feat_ctl(struct cpuinfo_x86 *c);
 #else
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 06de03974e66..5f44e0de04b9 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1078,6 +1078,26 @@ static void split_lock_init(void)

split_lock_verify_msr(boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT));
 }
 
+/*
+ * It should never be called directly but should use split_lock_set_guest()
+ * and split_lock_restore_host() instead.
+ *
+ * The caller needs to be in preemption disabled context to ensure
+ * MSR state and TIF_SLD_DISABLED state consistent.
+ */
+bool split_lock_virt_switch(bool on)
+{
+   bool was_on = !test_thread_flag(TIF_SLD_DISABLED);
+
+   if (on != was_on) {
+   sld_update_msr(on);
+   update_thread_flag(TIF_SLD_DISABLED, !on);
+   }
+
+   return was_on;
+}
+EXPORT_SYMBOL_GPL(split_lock_virt_switch);
+
 static void split_lock_warn(unsigned long ip)
 {
pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 
0x%lx\n",
-- 
2.18.4

[PATCH v10 0/9] KVM: Add virtualization support of split lock detection

2020-08-18 Thread Xiaoyao Li

Hi maintainers,

Please help review this new version and give comments. There is only one
minor change from previous v9, i.e., adding patch 8.

---

This series aims to add the virtualization of split lock detection in
KVM.

Due to the fact that split lock detection is tightly coupled with CPU
model and CPU model is configurable by host VMM, we elect to use
paravirt method to expose and enumerate it for guest.

Changes in v10
 - rebase to v5.9-rc1;
 - Add one patch to move the initialization of cpu_model_supports_sld to
   split_lock_setup();

Changes in v9
https://lkml.kernel.org/r/20200509110542.8159-1-xiaoyao...@intel.com
 - rebase to v5.7-rc4
 - Add one patch to rename TIF_SLD to TIF_SLD_DISABLED;
 - Add one patch to remove bogus case in handle_guest_split_lock;
 - Introduce flag X86_FEATURE_SPLIT_LOCK_DETECT_FATAL and thus drop
   sld_state;
 - Use X86_FEATURE_SPLIT_LOCK_DETECT and X86_FEATURE_SPLIT_LOCK_DETECT_FATAL
   to determine the SLD state of host;
 - Introduce split_lock_virt_switch() and two wrappers for KVM instead
   of sld_update_to(); 
 - Use paravirt to expose and enumerate split lock detection for guest;
 - Split lock detection can be exposed to guest when host is sld_fatal,
   even though host is SMT available. 

Changes in v8:
https://lkml.kernel.org/r/20200414063129.133630-1-xiaoyao...@intel.com
 - rebase to v5.7-rc1.
 - basic enabling of split lock detection already merged.
 - When host is sld_warn and nosmt, load guest's sld bit when in KVM
   context, i.e., between vmx_prepare_switch_to_guest() and before
   vmx_prepare_switch_to_host(), KVM uses guest sld setting.  

Changes in v7:
https://lkml.kernel.org/r/20200325030924.132881-1-xiaoyao...@intel.com
 - only pick patch 1 and patch 2, and hold all the left.
 - Update SLD bit on each processor based on sld_state.

Changes in v6:
https://lkml.kernel.org/r/20200324151859.31068-1-xiaoyao...@intel.com
 - Drop the sld_not_exist flag and use X86_FEATURE_SPLIT_LOCK_DETECT to
   check whether need to init split lock detection. [tglx]
 - Use tglx's method to verify the existence of split lock detectoin.
 - small optimization of sld_update_msr() that the default value of
   msr_test_ctrl_cache has split_lock_detect bit cleared.
 - Drop the patch3 in v5 that introducing kvm_only option. [tglx]
 - Rebase patch4-8 to kvm/queue.
 - use the new kvm-cpu-cap to expose X86_FEATURE_CORE_CAPABILITIES in
   Patch 6.

Changes in v5:
https://lkml.kernel.org/r/20200315050517.127446-1-xiaoyao...@intel.com
 - Use X86_FEATURE_SPLIT_LOCK_DETECT flag in kvm to ensure split lock
   detection is really supported.
 - Add and export sld related helper functions in their related usecase 
   kvm patches.

Xiaoyao Li (9):
  x86/split_lock: Rename TIF_SLD to TIF_SLD_DISABLED
  x86/split_lock: Remove bogus case in handle_guest_split_lock()
  x86/split_lock: Introduce flag X86_FEATURE_SLD_FATAL and drop
sld_state
  x86/split_lock: Introduce split_lock_virt_switch() and two wrappers
  x86/kvm: Introduce paravirt split lock detection enumeration
  KVM: VMX: Enable MSR TEST_CTRL for guest
  KVM: VMX: virtualize split lock detection
  x86/split_lock: Set cpu_model_supports_sld to true after the existence
of split lock detection is verified BSP
  x86/split_lock: Enable split lock detection initialization when
running as a guest on KVM

 Documentation/virt/kvm/cpuid.rst | 29 +++
 arch/x86/include/asm/cpu.h   | 36 ++
 arch/x86/include/asm/cpufeatures.h   |  1 +
 arch/x86/include/asm/thread_info.h   |  6 +--
 arch/x86/include/uapi/asm/kvm_para.h |  8 ++--
 arch/x86/kernel/cpu/intel.c  | 61 +++
 arch/x86/kernel/kvm.c|  3 ++
 arch/x86/kernel/process.c|  2 +-
 arch/x86/kvm/cpuid.c |  6 +++
 arch/x86/kvm/vmx/vmx.c   | 72 +---
 arch/x86/kvm/vmx/vmx.h   |  3 ++
 arch/x86/kvm/x86.c   |  6 ++-
 arch/x86/kvm/x86.h   | 16 +++
 13 files changed, 207 insertions(+), 42 deletions(-)

-- 
2.18.4

[PATCH v10 1/9] x86/split_lock: Rename TIF_SLD to TIF_SLD_DISABLED

2020-08-18 Thread Xiaoyao Li

TIF_SLD can only be set if a user space thread hits split lock and
sld_state == sld_warn. This flag is set to indicate SLD (split lock
detection) is turned off for the thread, so rename it to
TIF_SLD_DISABLED, which is pretty self explaining.

Suggested-by: Sean Christopherson 
Suggested-by: Thomas Gleixner 
Signed-off-by: Xiaoyao Li 
---
 arch/x86/include/asm/thread_info.h | 6 +++---
 arch/x86/kernel/cpu/intel.c| 6 +++---
 arch/x86/kernel/process.c  | 2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h 
b/arch/x86/include/asm/thread_info.h
index 267701ae3d86..fdb458a74557 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -92,7 +92,7 @@ struct thread_info {
 #define TIF_NOCPUID15  /* CPUID is not accessible in userland 
*/
 #define TIF_NOTSC  16  /* TSC is not accessible in userland */
 #define TIF_IA32   17  /* IA32 compatibility process */
-#define TIF_SLD18  /* Restore split lock detection 
on context switch */
+#define TIF_SLD_DISABLED   18  /* split lock detection is turned off */
 #define TIF_MEMDIE 20  /* is terminating due to OOM killer */
 #define TIF_POLLING_NRFLAG 21  /* idle is polling for TIF_NEED_RESCHED 
*/
 #define TIF_IO_BITMAP  22  /* uses I/O bitmap */
@@ -122,7 +122,7 @@ struct thread_info {
 #define _TIF_NOCPUID   (1 << TIF_NOCPUID)
 #define _TIF_NOTSC (1 << TIF_NOTSC)
 #define _TIF_IA32  (1 << TIF_IA32)
-#define _TIF_SLD   (1 << TIF_SLD)
+#define _TIF_SLD_DISABLED  (1 << TIF_SLD_DISABLED)
 #define _TIF_POLLING_NRFLAG(1 << TIF_POLLING_NRFLAG)
 #define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
 #define _TIF_FORCED_TF (1 << TIF_FORCED_TF)
@@ -136,7 +136,7 @@ struct thread_info {
 /* flags to check in __switch_to() */
 #define _TIF_WORK_CTXSW_BASE   \
(_TIF_NOCPUID | _TIF_NOTSC | _TIF_BLOCKSTEP |   \
-_TIF_SSBD | _TIF_SPEC_FORCE_UPDATE | _TIF_SLD)
+_TIF_SSBD | _TIF_SPEC_FORCE_UPDATE | _TIF_SLD_DISABLED)
 
 /*
  * Avoid calls to __switch_to_xtra() on UP as STIBP is not evaluated.
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 59a1e3ce3f14..c48b3267c141 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1090,11 +1090,11 @@ static void split_lock_warn(unsigned long ip)
 
/*
 * Disable the split lock detection for this task so it can make
-* progress and set TIF_SLD so the detection is re-enabled via
+* progress and set TIF_SLD_DISABLED so the detection is re-enabled via
 * switch_to_sld() when the task is scheduled out.
 */
sld_update_msr(false);
-   set_tsk_thread_flag(current, TIF_SLD);
+   set_tsk_thread_flag(current, TIF_SLD_DISABLED);
 }
 
 bool handle_guest_split_lock(unsigned long ip)
@@ -1132,7 +1132,7 @@ bool handle_user_split_lock(struct pt_regs *regs, long 
error_code)
  */
 void switch_to_sld(unsigned long tifn)
 {
-   sld_update_msr(!(tifn & _TIF_SLD));
+   sld_update_msr(!(tifn & _TIF_SLD_DISABLED));
 }
 
 /*
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 994d8393f2f7..e9265c6b9256 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -641,7 +641,7 @@ void __switch_to_xtra(struct task_struct *prev_p, struct 
task_struct *next_p)
__speculation_ctrl_update(~tifn, tifn);
}
 
-   if ((tifp ^ tifn) & _TIF_SLD)
+   if ((tifp ^ tifn) & _TIF_SLD_DISABLED)
switch_to_sld(tifn);
 }
 
-- 
2.18.4

Re: [RFC 2/2] KVM: VMX: Enable bus lock VM exit

2020-07-26 Thread Xiaoyao Li


On 7/23/2020 9:21 AM, Sean Christopherson wrote:

On Wed, Jul 01, 2020 at 04:49:49PM +0200, Vitaly Kuznetsov wrote:

Xiaoyao Li  writes:

So you want an exit to userspace for every bus lock and leave it all to
userspace. Yes, it's doable.


In some cases we may not even want to have a VM exit: think
e.g. real-time/partitioning case when even in case of bus lock we may
not want to add additional latency just to count such events.


Hmm, I suspect this isn't all that useful for real-time cases because they'd
probably want to prevent the split lock in the first place, e.g. would prefer
to use the #AC variant in fatal mode.  Of course, the availability of split
lock #AC is a whole other can of worms.

But anyways, I 100% agree that this needs either an off-by-default module
param or an opt-in per-VM capability.



Maybe on-by-default or an opt-out per-VM capability?
Turning it on introduces no overhead if no bus lock happens in guest but 
gives KVM the capability to track every potential bus lock. If user 
doesn't want the extra latency due to bus lock VM exit, it's better try 
to fix the bus lock, which also incurs high latency.



I'd suggest we make the new capability tri-state:
- disabled (no vmexit, default)
- stats only (what this patch does)
- userspace exit
But maybe this is an overkill, I'd like to hear what others think.


Userspace exit would also be interesting for debug.  Another throttling
option would be schedule() or cond_reched(), though that's probably getting
into overkill territory.



We're going to leverage host's policy, i.e., calling 
handle_user_bus_lock(), for throttling, as proposed in 
https://lkml.kernel.org/r/1595021700-68460-1-git-send-email-fenghua...@intel.com

Re: kvm crash on 5.7-rc1 and later

2020-07-11 Thread Xiaoyao Li


On 7/12/2020 2:21 AM, Peter Zijlstra wrote:

On Fri, Jul 03, 2020 at 11:15:31AM -0400, Woody Suwalski wrote:

I am observing a 100% reproducible kvm crash on kernels starting with
5.7-rc1, always with the same opcode .
It happens during wake up from the host suspended state. Worked OK on 5.6
and older.
The host is based on Debian testing, Thinkpad T440, i5 cpu.

[   61.576664] kernel BUG at arch/x86/kvm/x86.c:387!
[   61.576672] invalid opcode:  [#1] PREEMPT SMP NOPTI
[   61.576678] CPU: 0 PID: 3851 Comm: qemu-system-x86 Not tainted 5.7-pingu
#0
[   61.576680] Hardware name: LENOVO 20B6005JUS/20B6005JUS, BIOS GJETA4WW
(2.54 ) 03/27/2020
[   61.576700] RIP: 0010:kvm_spurious_fault+0xa/0x10 [kvm]

Crash results in a dead kvm and occasionally a very unstable system.

Bisecting the problem between v5.6 and v5.7-rc1 points to

commit 6650cdd9a8ccf00555dbbe743d58541ad8feb6a7
Author: Peter Zijlstra (Intel) 
Date:   Sun Jan 26 12:05:35 2020 -0800

     x86/split_lock: Enable split lock detection by kernel

Reversing that patch seems to actually "cure" the issue.

The problem is present in all kernels past 5.7-rc1, however the patch is not
reversing directly in later source trees, so can not retest the logic on
recent kernels.

Peter, would you have idea how to debug that (or even better - would you
happen to know the fix)?

I have attached dmesg logs from a "good" 5.6.9 kernel, and then "bad" 5.7.0
and 5.8-rc3


I have no clue about kvm. Nor do I actually have hardware with SLD on.
I've Cc'ed a bunch of folks who might have more ideas.



I think this bug is the same as the one found by Sean, and is already 
fixed in 5.8-rc4.


https://lore.kernel.org/kvm/20200605192605.7439-1-sean.j.christopher...@intel.com/

[PATCH v4 0/5] Refactor handling flow of KVM_SET_CPUID*

2020-07-08 Thread Xiaoyao Li

4 Patches of v3 has been queued into kvm/queue branch. This v4 contains
the rest to refactor the flow of KVM_SET_CPUID* as:

1. cpuid check: check if userspace provides legal CPUID settings;

2. cpuid update: Update userspace provided CPUID settings. It currently
   only contains kvm_update_cpuid_runtime, which updates special CPUID
   bits based on the vcpu state, e.g., OSXSAVE, OSPKE. In the future, we
   can re-introduce kvm_update_cpuid() if KVM needs to force on/off some
   bits.

3. update vcpu states: Update vcpu states/settings based on the final updated
   CPUID settings. 

v4:
 - remove 4 queued patches
 - rebased to kvm/queue: c16ced9cc67a "x86/kvm/vmx: Use native read/write_cr2()"
 - fix one bug in v3 to call kvfree(cpuid_entries) in kvm_vcpu_ioctl_set_cpuid()
 - rename "update_vcpu_model" to "vcpu_after_set_cpuid" [Paolo] 
 - Add a new patch to extrace kvm_update_cpuid_runtime()

v3:
https://lkml.kernel.org/r/20200708065054.19713-1-xiaoyao...@intel.com
 - Add a note in KVM api doc to state the previous CPUID configuration
   is not reliable if current KVM_SET_CPUID* fails [Jim]
 - Adjust Patch 2 to reduce code churn [Sean]
 - Commit message refine to add more justification [Sean]
 - Add a new patch 7

v2:
https://lkml.kernel.org/r/20200623115816.24132-1-xiaoyao...@intel.com
 - rebase to kvm/queue: a037ff353ba6 ("Merge branch 'kvm-master' into HEAD")
 - change the name of kvm_update_state_based_on_cpuid() to
   kvm_update_vcpu_model() [Sean]
 - Add patch 5 to rename kvm_x86_ops.cpuid_date() to
   kvm_x86_ops.update_vcpu_model()

v1:
https://lkml.kernel.org/r/20200529085545.29242-1-xiaoyao...@intel.com

Xiaoyao Li (5):
  KVM: x86: Introduce kvm_check_cpuid()
  KVM: x86: Extract kvm_update_cpuid_runtime() from kvm_update_cpuid()
  KVM: x86: Rename kvm_update_cpuid() to kvm_vcpu_after_set_cpuid()
  KVM: x86: Rename cpuid_update() callback to vcpu_after_set_cpuid()
  KVM: x86: Move kvm_x86_ops.vcpu_after_set_cpuid() into
kvm_vcpu_after_set_cpuid()

 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/cpuid.c| 99 +
 arch/x86/kvm/cpuid.h|  2 +-
 arch/x86/kvm/lapic.c|  2 +-
 arch/x86/kvm/svm/svm.c  |  4 +-
 arch/x86/kvm/vmx/nested.c   |  3 +-
 arch/x86/kvm/vmx/vmx.c  |  4 +-
 arch/x86/kvm/x86.c  | 10 ++--
 8 files changed, 76 insertions(+), 50 deletions(-)

-- 
2.18.4

[PATCH v4 1/5] KVM: x86: Introduce kvm_check_cpuid()

2020-07-08 Thread Xiaoyao Li

Use kvm_check_cpuid() to validate if userspace provides legal cpuid
settings and call it before KVM take any action to update CPUID or
update vcpu states based on given CPUID settings.

Signed-off-by: Xiaoyao Li 
---
 arch/x86/kvm/cpuid.c | 55 
 arch/x86/kvm/cpuid.h |  2 +-
 2 files changed, 36 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index eebd66f86abe..1a053022a961 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -54,7 +54,26 @@ static u32 xstate_required_size(u64 xstate_bv, bool 
compacted)
 
 #define F feature_bit
 
-int kvm_update_cpuid(struct kvm_vcpu *vcpu)
+static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
+{
+   struct kvm_cpuid_entry2 *best;
+
+   /*
+* The existing code assumes virtual address is 48-bit or 57-bit in the
+* canonical address checks; exit if it is ever changed.
+*/
+   best = kvm_find_cpuid_entry(vcpu, 0x8008, 0);
+   if (best) {
+   int vaddr_bits = (best->eax & 0xff00) >> 8;
+
+   if (vaddr_bits != 48 && vaddr_bits != 57 && vaddr_bits != 0)
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+void kvm_update_cpuid(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
struct kvm_lapic *apic = vcpu->arch.apic;
@@ -98,18 +117,6 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
 
-   /*
-* The existing code assumes virtual address is 48-bit or 57-bit in the
-* canonical address checks; exit if it is ever changed.
-*/
-   best = kvm_find_cpuid_entry(vcpu, 0x8008, 0);
-   if (best) {
-   int vaddr_bits = (best->eax & 0xff00) >> 8;
-
-   if (vaddr_bits != 48 && vaddr_bits != 57 && vaddr_bits != 0)
-   return -EINVAL;
-   }
-
best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
if (kvm_hlt_in_guest(vcpu->kvm) && best &&
(best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
@@ -131,7 +138,6 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
kvm_pmu_refresh(vcpu);
vcpu->arch.cr4_guest_rsvd_bits =
__cr4_reserved_bits(guest_cpuid_has, vcpu);
-   return 0;
 }
 
 static int is_efer_nx(void)
@@ -206,11 +212,16 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
vcpu->arch.cpuid_entries[i].padding[2] = 0;
}
vcpu->arch.cpuid_nent = cpuid->nent;
+   r = kvm_check_cpuid(vcpu);
+   if (r) {
+   vcpu->arch.cpuid_nent = 0;
+   kvfree(cpuid_entries);
+   goto out;
+   }
+
cpuid_fix_nx_cap(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
-   r = kvm_update_cpuid(vcpu);
-   if (r)
-   vcpu->arch.cpuid_nent = 0;
+   kvm_update_cpuid(vcpu);
 
kvfree(cpuid_entries);
 out:
@@ -231,10 +242,14 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
   cpuid->nent * sizeof(struct kvm_cpuid_entry2)))
goto out;
vcpu->arch.cpuid_nent = cpuid->nent;
-   kvm_x86_ops.cpuid_update(vcpu);
-   r = kvm_update_cpuid(vcpu);
-   if (r)
+   r = kvm_check_cpuid(vcpu);
+   if (r) {
vcpu->arch.cpuid_nent = 0;
+   goto out;
+   }
+
+   kvm_x86_ops.cpuid_update(vcpu);
+   kvm_update_cpuid(vcpu);
 out:
return r;
 }
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 05434cd9342f..f136de1debad 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -9,7 +9,7 @@
 extern u32 kvm_cpu_caps[NCAPINTS] __read_mostly;
 void kvm_set_cpu_caps(void);
 
-int kvm_update_cpuid(struct kvm_vcpu *vcpu);
+void kvm_update_cpuid(struct kvm_vcpu *vcpu);
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
  u32 function, u32 index);
 int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
-- 
2.18.4

[PATCH v4 4/5] KVM: x86: Rename cpuid_update() callback to vcpu_after_set_cpuid()

2020-07-08 Thread Xiaoyao Li

The name of callback cpuid_update() is misleading that it's not about
updating CPUID settings of vcpu but updating the configurations of vcpu
based on the CPUIDs. So rename it to vcpu_after_set_cpuid().

Signed-off-by: Xiaoyao Li 
---
 arch/x86/include/asm/kvm_host.h | 2 +-
 arch/x86/kvm/cpuid.c| 4 ++--
 arch/x86/kvm/svm/svm.c  | 4 ++--
 arch/x86/kvm/vmx/nested.c   | 3 ++-
 arch/x86/kvm/vmx/vmx.c  | 4 ++--
 5 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 16461a674406..3d7d818a282c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1052,7 +1052,7 @@ struct kvm_x86_ops {
void (*hardware_unsetup)(void);
bool (*cpu_has_accelerated_tpr)(void);
bool (*has_emulated_msr)(u32 index);
-   void (*cpuid_update)(struct kvm_vcpu *vcpu);
+   void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu);
 
unsigned int vm_size;
int (*vm_init)(struct kvm *kvm);
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b602c0c9078e..832a24c1334e 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -228,7 +228,7 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
}
 
cpuid_fix_nx_cap(vcpu);
-   kvm_x86_ops.cpuid_update(vcpu);
+   kvm_x86_ops.vcpu_after_set_cpuid(vcpu);
kvm_update_cpuid_runtime(vcpu);
kvm_vcpu_after_set_cpuid(vcpu);
 
@@ -257,7 +257,7 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
goto out;
}
 
-   kvm_x86_ops.cpuid_update(vcpu);
+   kvm_x86_ops.vcpu_after_set_cpuid(vcpu);
kvm_update_cpuid_runtime(vcpu);
kvm_vcpu_after_set_cpuid(vcpu);
 out:
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6d49afa3063f..535ad311ad02 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3595,7 +3595,7 @@ static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t 
gfn, bool is_mmio)
return 0;
 }
 
-static void svm_cpuid_update(struct kvm_vcpu *vcpu)
+static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
@@ -4095,7 +4095,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
.get_exit_info = svm_get_exit_info,
 
-   .cpuid_update = svm_cpuid_update,
+   .vcpu_after_set_cpuid = svm_vcpu_after_set_cpuid,
 
.has_wbinvd_exit = svm_has_wbinvd_exit,
 
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 7693d41a2446..e4080ab2df21 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6354,7 +6354,8 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs 
*msrs, u32 ept_caps)
 
/*
 * secondary cpu-based controls.  Do not include those that
-* depend on CPUID bits, they are added later by vmx_cpuid_update.
+* depend on CPUID bits, they are added later by
+* vmx_vcpu_after_set_cpuid.
 */
if (msrs->procbased_ctls_high & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS)
rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0cdc78f2f2f4..2b41d987b101 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7282,7 +7282,7 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
 }
 
-static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
+static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
 
@@ -7940,7 +7940,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 
.get_exit_info = vmx_get_exit_info,
 
-   .cpuid_update = vmx_cpuid_update,
+   .vcpu_after_set_cpuid = vmx_vcpu_after_set_cpuid,
 
.has_wbinvd_exit = cpu_has_vmx_wbinvd_exit,
 
-- 
2.18.4

[PATCH v4 3/5] KVM: x86: Rename kvm_update_cpuid() to kvm_vcpu_after_set_cpuid()

2020-07-08 Thread Xiaoyao Li

Now there is no updating CPUID bits behavior in kvm_update_cpuid(),
rename it to kvm_vcpu_after_set_cpuid().

Signed-off-by: Xiaoyao Li 
---
 arch/x86/kvm/cpuid.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0ed3b343c44e..b602c0c9078e 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -116,7 +116,7 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
}
 }
 
-static void kvm_update_cpuid(struct kvm_vcpu *vcpu)
+static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
struct kvm_lapic *apic = vcpu->arch.apic;
struct kvm_cpuid_entry2 *best;
@@ -230,7 +230,7 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
cpuid_fix_nx_cap(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
kvm_update_cpuid_runtime(vcpu);
-   kvm_update_cpuid(vcpu);
+   kvm_vcpu_after_set_cpuid(vcpu);
 
kvfree(cpuid_entries);
 out:
@@ -259,7 +259,7 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
 
kvm_x86_ops.cpuid_update(vcpu);
kvm_update_cpuid_runtime(vcpu);
-   kvm_update_cpuid(vcpu);
+   kvm_vcpu_after_set_cpuid(vcpu);
 out:
return r;
 }
-- 
2.18.4

[PATCH v4 5/5] KVM: x86: Move kvm_x86_ops.vcpu_after_set_cpuid() into kvm_vcpu_after_set_cpuid()

2020-07-08 Thread Xiaoyao Li

kvm_x86_ops.vcpu_after_set_cpuid() is used to update vmx/svm specific
vcpu settings based on updated CPUID settings. So it's supposed to be
called after CPUIDs are updated, i.e., kvm_update_cpuid_runtime().

Currently, kvm_update_cpuid_runtime() only updates CPUID bits of OSXSAVE,
APIC, OSPKE, MWAIT, KVM_FEATURE_PV_UNHALT and CPUID(0xD,0).ebx and
CPUID(0xD, 1).ebx. None of them is consumed by vmx/svm's
update_vcpu_after_set_cpuid(). So there is no dependency between them.

Move kvm_x86_ops.vcpu_after_set_cpuid() into kvm_vcpu_after_set_cpuid() is
obviously more reasonable.

Signed-off-by: Xiaoyao Li 
---
---
 arch/x86/kvm/cpuid.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 832a24c1334e..edbed4f522f2 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -121,6 +121,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
struct kvm_lapic *apic = vcpu->arch.apic;
struct kvm_cpuid_entry2 *best;
 
+   kvm_x86_ops.vcpu_after_set_cpuid(vcpu);
+
best = kvm_find_cpuid_entry(vcpu, 1, 0);
if (best && apic) {
if (cpuid_entry_has(best, X86_FEATURE_TSC_DEADLINE_TIMER))
@@ -228,7 +230,6 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
}
 
cpuid_fix_nx_cap(vcpu);
-   kvm_x86_ops.vcpu_after_set_cpuid(vcpu);
kvm_update_cpuid_runtime(vcpu);
kvm_vcpu_after_set_cpuid(vcpu);
 
@@ -257,7 +258,6 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
goto out;
}
 
-   kvm_x86_ops.vcpu_after_set_cpuid(vcpu);
kvm_update_cpuid_runtime(vcpu);
kvm_vcpu_after_set_cpuid(vcpu);
 out:
-- 
2.18.4

[PATCH v4 2/5] KVM: x86: Extract kvm_update_cpuid_runtime() from kvm_update_cpuid()

2020-07-08 Thread Xiaoyao Li

Beside called in kvm_vcpu_ioctl_set_cpuid*(), kvm_update_cpuid() is also
called 5 places else in x86.c and 1 place else in lapic.c. All those 6
places only need the part of updating guest CPUIDs (OSXSAVE, OSPKE, APIC,
KVM_FEATURE_PV_UNHALT, ...) based on the runtime vcpu state, so extract
them as a separate kvm_update_cpuid_runtime().

Signed-off-by: Xiaoyao Li 
---
 arch/x86/kvm/cpuid.c | 44 +++-
 arch/x86/kvm/cpuid.h |  2 +-
 arch/x86/kvm/lapic.c |  2 +-
 arch/x86/kvm/x86.c   | 10 +-
 4 files changed, 34 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1a053022a961..0ed3b343c44e 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -73,10 +73,9 @@ static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
return 0;
 }
 
-void kvm_update_cpuid(struct kvm_vcpu *vcpu)
+void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
-   struct kvm_lapic *apic = vcpu->arch.apic;
 
best = kvm_find_cpuid_entry(vcpu, 1, 0);
if (best) {
@@ -89,28 +88,14 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu)
   vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
}
 
-   if (best && apic) {
-   if (cpuid_entry_has(best, X86_FEATURE_TSC_DEADLINE_TIMER))
-   apic->lapic_timer.timer_mode_mask = 3 << 17;
-   else
-   apic->lapic_timer.timer_mode_mask = 1 << 17;
-
-   kvm_apic_set_version(vcpu);
-   }
-
best = kvm_find_cpuid_entry(vcpu, 7, 0);
if (best && boot_cpu_has(X86_FEATURE_PKU) && best->function == 0x7)
cpuid_entry_change(best, X86_FEATURE_OSPKE,
   kvm_read_cr4_bits(vcpu, X86_CR4_PKE));
 
best = kvm_find_cpuid_entry(vcpu, 0xD, 0);
-   if (!best) {
-   vcpu->arch.guest_supported_xcr0 = 0;
-   } else {
-   vcpu->arch.guest_supported_xcr0 =
-   (best->eax | ((u64)best->edx << 32)) & supported_xcr0;
+   if (best)
best->ebx = xstate_required_size(vcpu->arch.xcr0, false);
-   }
 
best = kvm_find_cpuid_entry(vcpu, 0xD, 1);
if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
@@ -129,6 +114,29 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu)
   vcpu->arch.ia32_misc_enable_msr &
   MSR_IA32_MISC_ENABLE_MWAIT);
}
+}
+
+static void kvm_update_cpuid(struct kvm_vcpu *vcpu)
+{
+   struct kvm_lapic *apic = vcpu->arch.apic;
+   struct kvm_cpuid_entry2 *best;
+
+   best = kvm_find_cpuid_entry(vcpu, 1, 0);
+   if (best && apic) {
+   if (cpuid_entry_has(best, X86_FEATURE_TSC_DEADLINE_TIMER))
+   apic->lapic_timer.timer_mode_mask = 3 << 17;
+   else
+   apic->lapic_timer.timer_mode_mask = 1 << 17;
+
+   kvm_apic_set_version(vcpu);
+   }
+
+   best = kvm_find_cpuid_entry(vcpu, 0xD, 0);
+   if (!best)
+   vcpu->arch.guest_supported_xcr0 = 0;
+   else
+   vcpu->arch.guest_supported_xcr0 =
+   (best->eax | ((u64)best->edx << 32)) & supported_xcr0;
 
/* Note, maxphyaddr must be updated before tdp_level. */
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
@@ -221,6 +229,7 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
 
cpuid_fix_nx_cap(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
+   kvm_update_cpuid_runtime(vcpu);
kvm_update_cpuid(vcpu);
 
kvfree(cpuid_entries);
@@ -249,6 +258,7 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
}
 
kvm_x86_ops.cpuid_update(vcpu);
+   kvm_update_cpuid_runtime(vcpu);
kvm_update_cpuid(vcpu);
 out:
return r;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index f136de1debad..3a923ae15f2f 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -9,7 +9,7 @@
 extern u32 kvm_cpu_caps[NCAPINTS] __read_mostly;
 void kvm_set_cpu_caps(void);
 
-void kvm_update_cpuid(struct kvm_vcpu *vcpu);
+void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu);
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
  u32 function, u32 index);
 int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e5dbb7ebae78..47801a44cfa6 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2230,7 +2230,7 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
vcpu->arch.apic_base = value;
 
if ((old_value ^ value) & MSR_IA32_APICBASE_ENABLE)
-   kvm_

Re: [PATCH v3 0/8] Refactor handling flow of KVM_SET_CPUID*

2020-07-08 Thread Xiaoyao Li


On 7/8/2020 8:10 PM, Paolo Bonzini wrote:

On 08/07/20 08:50, Xiaoyao Li wrote:

This serial is the extended version of
https://lkml.kernel.org/r/20200528151927.14346-1-xiaoyao...@intel.com

First two patches are bug fixing, and the others aim to refactor the flow
of SET_CPUID* as:

1. cpuid check: check if userspace provides legal CPUID settings;

2. cpuid update: Update some special CPUID bits based on current vcpu
  state, e.g., OSXSAVE, OSPKE, ...

3. update vcpu model: Update vcpu model (settings) based on the final CPUID
   settings.

v3:
  - Add a note in KVM api doc to state the previous CPUID configuration
is not reliable if current KVM_SET_CPUID* fails [Jim]
  - Adjust Patch 2 to reduce code churn [Sean]
  - Commit message refine to add more justification [Sean]
  - Add a new patch (7)

v2:
https://lkml.kernel.org/r/20200623115816.24132-1-xiaoyao...@intel.com
  - rebase to kvm/queue: a037ff353ba6 ("Merge branch 'kvm-master' into HEAD")
  - change the name of kvm_update_state_based_on_cpuid() to
kvm_update_vcpu_model() [Sean]
  - Add patch 5 to rename kvm_x86_ops.cpuid_date() to
kvm_x86_ops.update_vcpu_model()

v1:
https://lkml.kernel.org/r/20200529085545.29242-1-xiaoyao...@intel.com

Xiaoyao Li (8):
   KVM: X86: Reset vcpu->arch.cpuid_nent to 0 if SET_CPUID* fails
   KVM: X86: Go on updating other CPUID leaves when leaf 1 is absent
   KVM: X86: Introduce kvm_check_cpuid()
   KVM: X86: Split kvm_update_cpuid()
   KVM: X86: Rename cpuid_update() to update_vcpu_model()
   KVM: X86: Move kvm_x86_ops.update_vcpu_model() into
 kvm_update_vcpu_model()
   KVM: lapic: Use guest_cpuid_has() in kvm_apic_set_version()
   KVM: X86: Move kvm_apic_set_version() to kvm_update_vcpu_model()

  Documentation/virt/kvm/api.rst  |   4 ++
  arch/x86/include/asm/kvm_host.h |   2 +-
  arch/x86/kvm/cpuid.c| 107 
  arch/x86/kvm/cpuid.h|   3 +-
  arch/x86/kvm/lapic.c|   4 +-
  arch/x86/kvm/svm/svm.c  |   4 +-
  arch/x86/kvm/vmx/nested.c   |   2 +-
  arch/x86/kvm/vmx/vmx.c  |   4 +-
  arch/x86/kvm/x86.c  |   1 +
  9 files changed, 81 insertions(+), 50 deletions(-)



Queued patches 1/2/3/7/8, thanks.


Paolo,

I notice that you queued patch 8 into kvm/queue branch as
commit 84dd4897524e "KVM: X86: Move kvm_apic_set_version() to 
kvm_update_vcpu_model()"


Can you change the subject of that commit to "KVM: X86: Move 
kvm_apic_set_version() to kvm_update_cpuid()" ?



Paolo

Re: [PATCH v3 4/8] KVM: X86: Split kvm_update_cpuid()

2020-07-08 Thread Xiaoyao Li


On 7/8/2020 8:41 PM, Paolo Bonzini wrote:

On 08/07/20 14:33, Xiaoyao Li wrote:

On 7/8/2020 8:06 PM, Paolo Bonzini wrote:

On 08/07/20 08:50, Xiaoyao Li wrote:

Split the part of updating vcpu model out of kvm_update_cpuid(), and put
it into a new kvm_update_vcpu_model(). So it's more clear that
kvm_update_cpuid() is to update guest CPUID settings, while
kvm_update_vcpu_model() is to update vcpu model (settings) based on the
updated CPUID settings.

Signed-off-by: Xiaoyao Li 


I would prefer to keep the kvm_update_cpuid name for what you called
kvm_update_vcpu_model(), and rename the rest to
kvm_update_cpuid_runtime().


But there is no CPUID being updated in kvm_update_cpuid(), after
kvm_update_cpuid_runtime() is split out. This is confusing, IMO.


Then what about kvm_vcpu_after_set_cpuid()?  It's the "model" that is
not clear.


I'm ok with kvm_vcpu_after_set_cpuid().

BTW there is an unknown for me regarding enter_smm(). Currently, it 
calls kvm_update_cpuid(). I'm not sure which part it really needs, 
update CPUID or update vcpu state based on CPUID? maybe both?
So in this Patch, after splitting kvm_update_cpuid(), I keep both 
functions there to ensure no functional change in enter_smm().

So using the name "kvm_vcpu_after_set_cpuid" seems weird in that function.


Thanks,

Paolo


Paolo


---
   arch/x86/kvm/cpuid.c | 38 --
   arch/x86/kvm/cpuid.h |  1 +
   arch/x86/kvm/x86.c   |  1 +
   3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a825878b7f84..001f5a94880e 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -76,7 +76,6 @@ static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
   void kvm_update_cpuid(struct kvm_vcpu *vcpu)
   {
   struct kvm_cpuid_entry2 *best;
-    struct kvm_lapic *apic = vcpu->arch.apic;
     best = kvm_find_cpuid_entry(vcpu, 1, 0);
   if (best) {
@@ -89,26 +88,14 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu)
  vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
   }
   -    if (best && apic) {
-    if (cpuid_entry_has(best, X86_FEATURE_TSC_DEADLINE_TIMER))
-    apic->lapic_timer.timer_mode_mask = 3 << 17;
-    else
-    apic->lapic_timer.timer_mode_mask = 1 << 17;
-    }
-
   best = kvm_find_cpuid_entry(vcpu, 7, 0);
   if (best && boot_cpu_has(X86_FEATURE_PKU) && best->function ==
0x7)
   cpuid_entry_change(best, X86_FEATURE_OSPKE,
  kvm_read_cr4_bits(vcpu, X86_CR4_PKE));
     best = kvm_find_cpuid_entry(vcpu, 0xD, 0);
-    if (!best) {
-    vcpu->arch.guest_supported_xcr0 = 0;
-    } else {
-    vcpu->arch.guest_supported_xcr0 =
-    (best->eax | ((u64)best->edx << 32)) & supported_xcr0;
+    if (best)
   best->ebx = xstate_required_size(vcpu->arch.xcr0, false);
-    }
     best = kvm_find_cpuid_entry(vcpu, 0xD, 1);
   if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
@@ -127,6 +114,27 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu)
  vcpu->arch.ia32_misc_enable_msr &
  MSR_IA32_MISC_ENABLE_MWAIT);
   }
+}
+
+void kvm_update_vcpu_model(struct kvm_vcpu *vcpu)
+{
+    struct kvm_lapic *apic = vcpu->arch.apic;
+    struct kvm_cpuid_entry2 *best;
+
+    best = kvm_find_cpuid_entry(vcpu, 1, 0);
+    if (best && apic) {
+    if (cpuid_entry_has(best, X86_FEATURE_TSC_DEADLINE_TIMER))
+    apic->lapic_timer.timer_mode_mask = 3 << 17;
+    else
+    apic->lapic_timer.timer_mode_mask = 1 << 17;
+    }
+
+    best = kvm_find_cpuid_entry(vcpu, 0xD, 0);
+    if (!best)
+    vcpu->arch.guest_supported_xcr0 = 0;
+    else
+    vcpu->arch.guest_supported_xcr0 =
+    (best->eax | ((u64)best->edx << 32)) & supported_xcr0;
     /* Note, maxphyaddr must be updated before tdp_level. */
   vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
@@ -218,6 +226,7 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
   kvm_apic_set_version(vcpu);
   kvm_x86_ops.cpuid_update(vcpu);
   kvm_update_cpuid(vcpu);
+    kvm_update_vcpu_model(vcpu);
     kvfree(cpuid_entries);
   out:
@@ -247,6 +256,7 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
   kvm_apic_set_version(vcpu);
   kvm_x86_ops.cpuid_update(vcpu);
   kvm_update_cpuid(vcpu);
+    kvm_update_vcpu_model(vcpu);
   out:
   return r;
   }
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index f136de1debad..45e3643e2fba 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -10,6 +10,7 @@ extern u32 kvm_cpu_caps[NCAPINTS] __read_mostly;
   void kvm_set_cpu_caps(void);
     void kvm_update_cpuid(struct kvm_vcpu *vcpu);
+void kvm_update_vcpu_model(struct kvm_vcp

Re: [PATCH v3 4/8] KVM: X86: Split kvm_update_cpuid()

2020-07-08 Thread Xiaoyao Li


On 7/8/2020 8:06 PM, Paolo Bonzini wrote:

On 08/07/20 08:50, Xiaoyao Li wrote:

Split the part of updating vcpu model out of kvm_update_cpuid(), and put
it into a new kvm_update_vcpu_model(). So it's more clear that
kvm_update_cpuid() is to update guest CPUID settings, while
kvm_update_vcpu_model() is to update vcpu model (settings) based on the
updated CPUID settings.

Signed-off-by: Xiaoyao Li 


I would prefer to keep the kvm_update_cpuid name for what you called
kvm_update_vcpu_model(), and rename the rest to kvm_update_cpuid_runtime().


But there is no CPUID being updated in kvm_update_cpuid(), after 
kvm_update_cpuid_runtime() is split out. This is confusing, IMO.



Paolo


---
  arch/x86/kvm/cpuid.c | 38 --
  arch/x86/kvm/cpuid.h |  1 +
  arch/x86/kvm/x86.c   |  1 +
  3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a825878b7f84..001f5a94880e 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -76,7 +76,6 @@ static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
  void kvm_update_cpuid(struct kvm_vcpu *vcpu)
  {
struct kvm_cpuid_entry2 *best;
-   struct kvm_lapic *apic = vcpu->arch.apic;
  
  	best = kvm_find_cpuid_entry(vcpu, 1, 0);

if (best) {
@@ -89,26 +88,14 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu)
   vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
}
  
-	if (best && apic) {

-   if (cpuid_entry_has(best, X86_FEATURE_TSC_DEADLINE_TIMER))
-   apic->lapic_timer.timer_mode_mask = 3 << 17;
-   else
-   apic->lapic_timer.timer_mode_mask = 1 << 17;
-   }
-
best = kvm_find_cpuid_entry(vcpu, 7, 0);
if (best && boot_cpu_has(X86_FEATURE_PKU) && best->function == 0x7)
cpuid_entry_change(best, X86_FEATURE_OSPKE,
   kvm_read_cr4_bits(vcpu, X86_CR4_PKE));
  
  	best = kvm_find_cpuid_entry(vcpu, 0xD, 0);

-   if (!best) {
-   vcpu->arch.guest_supported_xcr0 = 0;
-   } else {
-   vcpu->arch.guest_supported_xcr0 =
-   (best->eax | ((u64)best->edx << 32)) & supported_xcr0;
+   if (best)
best->ebx = xstate_required_size(vcpu->arch.xcr0, false);
-   }
  
  	best = kvm_find_cpuid_entry(vcpu, 0xD, 1);

if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
@@ -127,6 +114,27 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu)
   vcpu->arch.ia32_misc_enable_msr &
   MSR_IA32_MISC_ENABLE_MWAIT);
}
+}
+
+void kvm_update_vcpu_model(struct kvm_vcpu *vcpu)
+{
+   struct kvm_lapic *apic = vcpu->arch.apic;
+   struct kvm_cpuid_entry2 *best;
+
+   best = kvm_find_cpuid_entry(vcpu, 1, 0);
+   if (best && apic) {
+   if (cpuid_entry_has(best, X86_FEATURE_TSC_DEADLINE_TIMER))
+   apic->lapic_timer.timer_mode_mask = 3 << 17;
+   else
+   apic->lapic_timer.timer_mode_mask = 1 << 17;
+   }
+
+   best = kvm_find_cpuid_entry(vcpu, 0xD, 0);
+   if (!best)
+   vcpu->arch.guest_supported_xcr0 = 0;
+   else
+   vcpu->arch.guest_supported_xcr0 =
+   (best->eax | ((u64)best->edx << 32)) & supported_xcr0;
  
  	/* Note, maxphyaddr must be updated before tdp_level. */

vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
@@ -218,6 +226,7 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
kvm_apic_set_version(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
kvm_update_cpuid(vcpu);
+   kvm_update_vcpu_model(vcpu);
  
  	kvfree(cpuid_entries);

  out:
@@ -247,6 +256,7 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
kvm_apic_set_version(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
kvm_update_cpuid(vcpu);
+   kvm_update_vcpu_model(vcpu);
  out:
return r;
  }
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index f136de1debad..45e3643e2fba 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -10,6 +10,7 @@ extern u32 kvm_cpu_caps[NCAPINTS] __read_mostly;
  void kvm_set_cpu_caps(void);
  
  void kvm_update_cpuid(struct kvm_vcpu *vcpu);

+void kvm_update_vcpu_model(struct kvm_vcpu *vcpu);
  struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
  u32 function, u32 index);
  int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 09ee54f5e385..6f376392e6e6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8184,6 +8184,7 @@ static void enter_smm(struct kvm_vcpu *vcpu)
  #endif
  
  	kvm_update_cpuid(vcpu);

+   kvm_update_vcpu_model(vcpu);
kvm_mmu_reset_context(vcpu);
  }

Re: [PATCH v3 3/8] KVM: X86: Introduce kvm_check_cpuid()

2020-07-08 Thread Xiaoyao Li


On 7/8/2020 2:50 PM, Xiaoyao Li wrote:

Use kvm_check_cpuid() to validate if userspace provides legal cpuid
settings and call it before KVM updates CPUID.

Signed-off-by: Xiaoyao Li 

[...]

@@ -202,12 +208,16 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
vcpu->arch.cpuid_entries[i].padding[2] = 0;
}
vcpu->arch.cpuid_nent = cpuid->nent;
+   r = kvm_check_cpuid(vcpu);
+   if (r) {
+   vcpu->arch.cpuid_nent = 0;


Paolo,

here lack a kvfree(cpuid_entries);
Can you help fix it?

Apologize for it.



+   goto out;
+   }
+
cpuid_fix_nx_cap(vcpu);
kvm_apic_set_version(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
-   r = kvm_update_cpuid(vcpu);
-   if (r)
-   vcpu->arch.cpuid_nent = 0;
+   kvm_update_cpuid(vcpu);
  
  	kvfree(cpuid_entries);

  out:

Re: [PATCH v2] x86/speculation/l1tf: Add KConfig for setting the L1D cache flush mode

2020-07-07 Thread Xiaoyao Li


On 7/3/2020 5:38 AM, Abhishek Bhardwaj wrote:

This change adds a new kernel configuration that sets the l1d cache
flush setting at compile time rather than at run time.

Signed-off-by: Abhishek Bhardwaj 

---

Changes in v2:
- Fix typo in the help of the new KConfig.

  arch/x86/kernel/cpu/bugs.c |  8 
  arch/x86/kvm/Kconfig   | 17 +
  2 files changed, 25 insertions(+)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 0b71970d2d3d2..1dcc875cf5547 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1406,7 +1406,15 @@ enum l1tf_mitigations l1tf_mitigation __ro_after_init = 
L1TF_MITIGATION_FLUSH;
  #if IS_ENABLED(CONFIG_KVM_INTEL)
  EXPORT_SYMBOL_GPL(l1tf_mitigation);
  #endif
+#if (CONFIG_KVM_VMENTRY_L1D_FLUSH == 1)
+enum vmx_l1d_flush_state l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_NEVER;
+#elif (CONFIG_KVM_VMENTRY_L1D_FLUSH == 2)
+enum vmx_l1d_flush_state l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_COND;
+#elif (CONFIG_KVM_VMENTRY_L1D_FLUSH == 3)
+enum vmx_l1d_flush_state l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_ALWAYS;
+#else
  enum vmx_l1d_flush_state l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO;
+#endif


how about

enum vmx_l1d_flush_state l1tf_vmx_mitigation =
#if (CONFIG_KVM_VMENTRY_L1D_FLUSH == 1)
VMENTER_L1D_FLUSH_NEVER;
#elif (CONFIG_KVM_VMENTRY_L1D_FLUSH == 2)
VMENTER_L1D_FLUSH_COND;
#elif (CONFIG_KVM_VMENTRY_L1D_FLUSH == 3)
VMENTER_L1D_FLUSH_ALWAYS;
#else
VMENTER_L1D_FLUSH_AUTO;
#endif

[PATCH v3 4/8] KVM: X86: Split kvm_update_cpuid()

2020-07-07 Thread Xiaoyao Li

Split the part of updating vcpu model out of kvm_update_cpuid(), and put
it into a new kvm_update_vcpu_model(). So it's more clear that
kvm_update_cpuid() is to update guest CPUID settings, while
kvm_update_vcpu_model() is to update vcpu model (settings) based on the
updated CPUID settings.

Signed-off-by: Xiaoyao Li 
---
 arch/x86/kvm/cpuid.c | 38 --
 arch/x86/kvm/cpuid.h |  1 +
 arch/x86/kvm/x86.c   |  1 +
 3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a825878b7f84..001f5a94880e 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -76,7 +76,6 @@ static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
 void kvm_update_cpuid(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
-   struct kvm_lapic *apic = vcpu->arch.apic;
 
best = kvm_find_cpuid_entry(vcpu, 1, 0);
if (best) {
@@ -89,26 +88,14 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu)
   vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
}
 
-   if (best && apic) {
-   if (cpuid_entry_has(best, X86_FEATURE_TSC_DEADLINE_TIMER))
-   apic->lapic_timer.timer_mode_mask = 3 << 17;
-   else
-   apic->lapic_timer.timer_mode_mask = 1 << 17;
-   }
-
best = kvm_find_cpuid_entry(vcpu, 7, 0);
if (best && boot_cpu_has(X86_FEATURE_PKU) && best->function == 0x7)
cpuid_entry_change(best, X86_FEATURE_OSPKE,
   kvm_read_cr4_bits(vcpu, X86_CR4_PKE));
 
best = kvm_find_cpuid_entry(vcpu, 0xD, 0);
-   if (!best) {
-   vcpu->arch.guest_supported_xcr0 = 0;
-   } else {
-   vcpu->arch.guest_supported_xcr0 =
-   (best->eax | ((u64)best->edx << 32)) & supported_xcr0;
+   if (best)
best->ebx = xstate_required_size(vcpu->arch.xcr0, false);
-   }
 
best = kvm_find_cpuid_entry(vcpu, 0xD, 1);
if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
@@ -127,6 +114,27 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu)
   vcpu->arch.ia32_misc_enable_msr &
   MSR_IA32_MISC_ENABLE_MWAIT);
}
+}
+
+void kvm_update_vcpu_model(struct kvm_vcpu *vcpu)
+{
+   struct kvm_lapic *apic = vcpu->arch.apic;
+   struct kvm_cpuid_entry2 *best;
+
+   best = kvm_find_cpuid_entry(vcpu, 1, 0);
+   if (best && apic) {
+   if (cpuid_entry_has(best, X86_FEATURE_TSC_DEADLINE_TIMER))
+   apic->lapic_timer.timer_mode_mask = 3 << 17;
+   else
+   apic->lapic_timer.timer_mode_mask = 1 << 17;
+   }
+
+   best = kvm_find_cpuid_entry(vcpu, 0xD, 0);
+   if (!best)
+   vcpu->arch.guest_supported_xcr0 = 0;
+   else
+   vcpu->arch.guest_supported_xcr0 =
+   (best->eax | ((u64)best->edx << 32)) & supported_xcr0;
 
/* Note, maxphyaddr must be updated before tdp_level. */
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
@@ -218,6 +226,7 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
kvm_apic_set_version(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
kvm_update_cpuid(vcpu);
+   kvm_update_vcpu_model(vcpu);
 
kvfree(cpuid_entries);
 out:
@@ -247,6 +256,7 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
kvm_apic_set_version(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
kvm_update_cpuid(vcpu);
+   kvm_update_vcpu_model(vcpu);
 out:
return r;
 }
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index f136de1debad..45e3643e2fba 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -10,6 +10,7 @@ extern u32 kvm_cpu_caps[NCAPINTS] __read_mostly;
 void kvm_set_cpu_caps(void);
 
 void kvm_update_cpuid(struct kvm_vcpu *vcpu);
+void kvm_update_vcpu_model(struct kvm_vcpu *vcpu);
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
  u32 function, u32 index);
 int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 09ee54f5e385..6f376392e6e6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8184,6 +8184,7 @@ static void enter_smm(struct kvm_vcpu *vcpu)
 #endif
 
kvm_update_cpuid(vcpu);
+   kvm_update_vcpu_model(vcpu);
kvm_mmu_reset_context(vcpu);
 }
 
-- 
2.18.4

[PATCH v3 8/8] KVM: X86: Move kvm_apic_set_version() to kvm_update_vcpu_model()

2020-07-07 Thread Xiaoyao Li

There is no dependencies between kvm_apic_set_version() and
kvm_update_cpuid() because kvm_apic_set_version() queries X2APIC CPUID bit,
which is not touched/changed by kvm_update_cpuid().

Obviously, kvm_apic_set_version() belongs to the category of updating
vcpu model.

Signed-off-by: Xiaoyao Li 
---
 arch/x86/kvm/cpuid.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 89ffd9dccfc6..c183a11dbcff 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -129,6 +129,8 @@ void kvm_update_vcpu_model(struct kvm_vcpu *vcpu)
apic->lapic_timer.timer_mode_mask = 3 << 17;
else
apic->lapic_timer.timer_mode_mask = 1 << 17;
+
+   kvm_apic_set_version(vcpu);
}
 
best = kvm_find_cpuid_entry(vcpu, 0xD, 0);
@@ -225,7 +227,6 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
}
 
cpuid_fix_nx_cap(vcpu);
-   kvm_apic_set_version(vcpu);
kvm_update_cpuid(vcpu);
kvm_update_vcpu_model(vcpu);
 
@@ -254,7 +255,6 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
goto out;
}
 
-   kvm_apic_set_version(vcpu);
kvm_update_cpuid(vcpu);
kvm_update_vcpu_model(vcpu);
 out:
-- 
2.18.4

[PATCH v3 6/8] KVM: X86: Move kvm_x86_ops.update_vcpu_model() into kvm_update_vcpu_model()

2020-07-07 Thread Xiaoyao Li

kvm_x86_ops.update_vcpu_model() is used to update vmx/svm vcpu settings
based on updated CPUID settings. So it's supposed to be called after
CPUIDs are fully updated, i.e., kvm_update_cpuid().

Currently, kvm_update_cpuid() only updates CPUID bits of OSXSAVE, APIC,
OSPKE, MWAIT, KVM_FEATURE_PV_UNHALT and ebx of (leaf 0xD, subleaf 0x0),
ebx of (leaf 0xD, subleaf 0x1). None of them is consumed by vmx/svm's
update_vcpu_model().

So there is no dependency between kvm_x86_ops.update_vcpu_model() and
kvm_update_cpuid(). Move kvm_x86_ops.update_vcpu_model() into
kvm_update_vcpu_model() make it more reasonable.

Signed-off-by: Xiaoyao Li 
---
---
 arch/x86/kvm/cpuid.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index d2f93823f9fd..89ffd9dccfc6 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -121,6 +121,8 @@ void kvm_update_vcpu_model(struct kvm_vcpu *vcpu)
struct kvm_lapic *apic = vcpu->arch.apic;
struct kvm_cpuid_entry2 *best;
 
+   kvm_x86_ops.update_vcpu_model(vcpu);
+
best = kvm_find_cpuid_entry(vcpu, 1, 0);
if (best && apic) {
if (cpuid_entry_has(best, X86_FEATURE_TSC_DEADLINE_TIMER))
@@ -224,7 +226,6 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
 
cpuid_fix_nx_cap(vcpu);
kvm_apic_set_version(vcpu);
-   kvm_x86_ops.update_vcpu_model(vcpu);
kvm_update_cpuid(vcpu);
kvm_update_vcpu_model(vcpu);
 
@@ -254,7 +255,6 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
}
 
kvm_apic_set_version(vcpu);
-   kvm_x86_ops.update_vcpu_model(vcpu);
kvm_update_cpuid(vcpu);
kvm_update_vcpu_model(vcpu);
 out:
-- 
2.18.4

[PATCH v3 7/8] KVM: lapic: Use guest_cpuid_has() in kvm_apic_set_version()

2020-07-07 Thread Xiaoyao Li

Only code cleanup and no functional change.

Signed-off-by: Xiaoyao Li 
---
 arch/x86/kvm/lapic.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5bf72fc86a8e..e5dbb7ebae78 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -354,7 +354,6 @@ static inline int apic_lvt_nmi_mode(u32 lvt_val)
 void kvm_apic_set_version(struct kvm_vcpu *vcpu)
 {
struct kvm_lapic *apic = vcpu->arch.apic;
-   struct kvm_cpuid_entry2 *feat;
u32 v = APIC_VERSION;
 
if (!lapic_in_kernel(vcpu))
@@ -367,8 +366,7 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
 * version first and level-triggered interrupts never get EOIed in
 * IOAPIC.
 */
-   feat = kvm_find_cpuid_entry(apic->vcpu, 0x1, 0);
-   if (feat && (feat->ecx & (1 << (X86_FEATURE_X2APIC & 31))) &&
+   if (guest_cpuid_has(vcpu, X86_FEATURE_X2APIC) &&
!ioapic_in_kernel(vcpu->kvm))
v |= APIC_LVR_DIRECTED_EOI;
kvm_lapic_set_reg(apic, APIC_LVR, v);
-- 
2.18.4

[PATCH v3 3/8] KVM: X86: Introduce kvm_check_cpuid()

2020-07-07 Thread Xiaoyao Li

Use kvm_check_cpuid() to validate if userspace provides legal cpuid
settings and call it before KVM updates CPUID.

Signed-off-by: Xiaoyao Li 
---
Is the check of virutal address width really necessary?
KVM doesn't check other bits at all. I guess the policy is that KVM allows
illegal CPUID settings as long as it doesn't break host kernel. Userspace
takes the consequences itself if it sets bogus CPUID settings that breaks
its guest.
But why vaddr_bits is special? It seems illegal vaddr_bits won't break host
kernel.
---
 arch/x86/kvm/cpuid.c | 54 
 arch/x86/kvm/cpuid.h |  2 +-
 2 files changed, 35 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0e3a23c2ea55..a825878b7f84 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -54,7 +54,26 @@ static u32 xstate_required_size(u64 xstate_bv, bool 
compacted)
 
 #define F feature_bit
 
-int kvm_update_cpuid(struct kvm_vcpu *vcpu)
+static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
+{
+   struct kvm_cpuid_entry2 *best;
+
+   /*
+* The existing code assumes virtual address is 48-bit or 57-bit in the
+* canonical address checks; exit if it is ever changed.
+*/
+   best = kvm_find_cpuid_entry(vcpu, 0x8008, 0);
+   if (best) {
+   int vaddr_bits = (best->eax & 0xff00) >> 8;
+
+   if (vaddr_bits != 48 && vaddr_bits != 57 && vaddr_bits != 0)
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+void kvm_update_cpuid(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
struct kvm_lapic *apic = vcpu->arch.apic;
@@ -96,18 +115,6 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
 
-   /*
-* The existing code assumes virtual address is 48-bit or 57-bit in the
-* canonical address checks; exit if it is ever changed.
-*/
-   best = kvm_find_cpuid_entry(vcpu, 0x8008, 0);
-   if (best) {
-   int vaddr_bits = (best->eax & 0xff00) >> 8;
-
-   if (vaddr_bits != 48 && vaddr_bits != 57 && vaddr_bits != 0)
-   return -EINVAL;
-   }
-
best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
if (kvm_hlt_in_guest(vcpu->kvm) && best &&
(best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
@@ -127,7 +134,6 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
kvm_mmu_reset_context(vcpu);
 
kvm_pmu_refresh(vcpu);
-   return 0;
 }
 
 static int is_efer_nx(void)
@@ -202,12 +208,16 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
vcpu->arch.cpuid_entries[i].padding[2] = 0;
}
vcpu->arch.cpuid_nent = cpuid->nent;
+   r = kvm_check_cpuid(vcpu);
+   if (r) {
+   vcpu->arch.cpuid_nent = 0;
+   goto out;
+   }
+
cpuid_fix_nx_cap(vcpu);
kvm_apic_set_version(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
-   r = kvm_update_cpuid(vcpu);
-   if (r)
-   vcpu->arch.cpuid_nent = 0;
+   kvm_update_cpuid(vcpu);
 
kvfree(cpuid_entries);
 out:
@@ -228,11 +238,15 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
   cpuid->nent * sizeof(struct kvm_cpuid_entry2)))
goto out;
vcpu->arch.cpuid_nent = cpuid->nent;
+   r = kvm_check_cpuid(vcpu);
+   if (r) {
+   vcpu->arch.cpuid_nent = 0;
+   goto out;
+   }
+
kvm_apic_set_version(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
-   r = kvm_update_cpuid(vcpu);
-   if (r)
-   vcpu->arch.cpuid_nent = 0;
+   kvm_update_cpuid(vcpu);
 out:
return r;
 }
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 05434cd9342f..f136de1debad 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -9,7 +9,7 @@
 extern u32 kvm_cpu_caps[NCAPINTS] __read_mostly;
 void kvm_set_cpu_caps(void);
 
-int kvm_update_cpuid(struct kvm_vcpu *vcpu);
+void kvm_update_cpuid(struct kvm_vcpu *vcpu);
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
  u32 function, u32 index);
 int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
-- 
2.18.4

[PATCH v3 2/8] KVM: X86: Go on updating other CPUID leaves when leaf 1 is absent

2020-07-07 Thread Xiaoyao Li

As handling of bits out of leaf 1 added over time, kvm_update_cpuid()
should not return directly if leaf 1 is absent, but should go on
updateing other CPUID leaves.

Keep the update of apic->lapic_timer.timer_mode_mask in a separate
wrapper, to minimize churn for code since it will be moved out of this
function in a future patch.

Signed-off-by: Xiaoyao Li 
---
 arch/x86/kvm/cpuid.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1d13bad42bf9..0e3a23c2ea55 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -60,18 +60,17 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
struct kvm_lapic *apic = vcpu->arch.apic;
 
best = kvm_find_cpuid_entry(vcpu, 1, 0);
-   if (!best)
-   return 0;
-
-   /* Update OSXSAVE bit */
-   if (boot_cpu_has(X86_FEATURE_XSAVE) && best->function == 0x1)
-   cpuid_entry_change(best, X86_FEATURE_OSXSAVE,
+   if (best) {
+   /* Update OSXSAVE bit */
+   if (boot_cpu_has(X86_FEATURE_XSAVE))
+   cpuid_entry_change(best, X86_FEATURE_OSXSAVE,
   kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE));
 
-   cpuid_entry_change(best, X86_FEATURE_APIC,
+   cpuid_entry_change(best, X86_FEATURE_APIC,
   vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
+   }
 
-   if (apic) {
+   if (best && apic) {
if (cpuid_entry_has(best, X86_FEATURE_TSC_DEADLINE_TIMER))
apic->lapic_timer.timer_mode_mask = 3 << 17;
else
-- 
2.18.4

[PATCH v3 5/8] KVM: X86: Rename cpuid_update() to update_vcpu_model()

2020-07-07 Thread Xiaoyao Li

The name of callback cpuid_update() is misleading that it's not about
updating CPUID settings of vcpu but updating the configurations of vcpu
based on the CPUIDs. So rename it to update_vcpu_model().

Signed-off-by: Xiaoyao Li 
---
 arch/x86/include/asm/kvm_host.h | 2 +-
 arch/x86/kvm/cpuid.c| 4 ++--
 arch/x86/kvm/svm/svm.c  | 4 ++--
 arch/x86/kvm/vmx/nested.c   | 2 +-
 arch/x86/kvm/vmx/vmx.c  | 4 ++--
 5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 97cb005c7aa7..c35d14b257c9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1051,7 +1051,7 @@ struct kvm_x86_ops {
void (*hardware_unsetup)(void);
bool (*cpu_has_accelerated_tpr)(void);
bool (*has_emulated_msr)(u32 index);
-   void (*cpuid_update)(struct kvm_vcpu *vcpu);
+   void (*update_vcpu_model)(struct kvm_vcpu *vcpu);
 
unsigned int vm_size;
int (*vm_init)(struct kvm *kvm);
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 001f5a94880e..d2f93823f9fd 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -224,7 +224,7 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
 
cpuid_fix_nx_cap(vcpu);
kvm_apic_set_version(vcpu);
-   kvm_x86_ops.cpuid_update(vcpu);
+   kvm_x86_ops.update_vcpu_model(vcpu);
kvm_update_cpuid(vcpu);
kvm_update_vcpu_model(vcpu);
 
@@ -254,7 +254,7 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
}
 
kvm_apic_set_version(vcpu);
-   kvm_x86_ops.cpuid_update(vcpu);
+   kvm_x86_ops.update_vcpu_model(vcpu);
kvm_update_cpuid(vcpu);
kvm_update_vcpu_model(vcpu);
 out:
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 74096aa72ad9..01f359e590d5 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3550,7 +3550,7 @@ static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t 
gfn, bool is_mmio)
return 0;
 }
 
-static void svm_cpuid_update(struct kvm_vcpu *vcpu)
+static void svm_update_vcpu_model(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
@@ -4050,7 +4050,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
.get_exit_info = svm_get_exit_info,
 
-   .cpuid_update = svm_cpuid_update,
+   .update_vcpu_model = svm_update_vcpu_model,
 
.has_wbinvd_exit = svm_has_wbinvd_exit,
 
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index b627c5f36b9e..85080a5b8d3c 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6354,7 +6354,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs 
*msrs, u32 ept_caps)
 
/*
 * secondary cpu-based controls.  Do not include those that
-* depend on CPUID bits, they are added later by vmx_cpuid_update.
+* depend on CPUID bits, they are added later by vmx_update_vcpu_model.
 */
if (msrs->procbased_ctls_high & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS)
rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 8187ca152ad2..4673c84b54ac 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7257,7 +7257,7 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
 }
 
-static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
+static void vmx_update_vcpu_model(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
 
@@ -7915,7 +7915,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 
.get_exit_info = vmx_get_exit_info,
 
-   .cpuid_update = vmx_cpuid_update,
+   .update_vcpu_model = vmx_update_vcpu_model,
 
.has_wbinvd_exit = cpu_has_vmx_wbinvd_exit,
 
-- 
2.18.4

[PATCH v3 1/8] KVM: X86: Reset vcpu->arch.cpuid_nent to 0 if SET_CPUID* fails

2020-07-07 Thread Xiaoyao Li

Current implementation keeps userspace input of CPUID configuration and
cpuid->nent even if kvm_update_cpuid() fails. Reset vcpu->arch.cpuid_nent
to 0 for the case of failure as a simple fix.

Besides, update the doc to explicitly state that if IOCTL SET_CPUID*
fail KVM gives no gurantee that previous valid CPUID configuration is
kept.

Signed-off-by: Xiaoyao Li 
---
 Documentation/virt/kvm/api.rst | 4 
 arch/x86/kvm/cpuid.c   | 4 
 2 files changed, 8 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 1cfe79b932d6..3ca809a1a44f 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -669,6 +669,10 @@ MSRs that have been set successfully.
 Defines the vcpu responses to the cpuid instruction.  Applications
 should use the KVM_SET_CPUID2 ioctl if available.
 
+Note, when this IOCTL fails, KVM gives no guarantees that previous valid CPUID
+configuration (if there is) is not corrupted. Userspace can get a copy of valid
+CPUID configuration through KVM_GET_CPUID2 in case.
+
 ::
 
   struct kvm_cpuid_entry {
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 8a294f9747aa..1d13bad42bf9 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -207,6 +207,8 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
kvm_apic_set_version(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
r = kvm_update_cpuid(vcpu);
+   if (r)
+   vcpu->arch.cpuid_nent = 0;
 
kvfree(cpuid_entries);
 out:
@@ -230,6 +232,8 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
kvm_apic_set_version(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
r = kvm_update_cpuid(vcpu);
+   if (r)
+   vcpu->arch.cpuid_nent = 0;
 out:
return r;
 }
-- 
2.18.4

[PATCH v3 0/8] Refactor handling flow of KVM_SET_CPUID*

2020-07-07 Thread Xiaoyao Li

This serial is the extended version of
https://lkml.kernel.org/r/20200528151927.14346-1-xiaoyao...@intel.com

First two patches are bug fixing, and the others aim to refactor the flow
of SET_CPUID* as:

1. cpuid check: check if userspace provides legal CPUID settings;

2. cpuid update: Update some special CPUID bits based on current vcpu
 state, e.g., OSXSAVE, OSPKE, ...

3. update vcpu model: Update vcpu model (settings) based on the final CPUID
  settings. 

v3:
 - Add a note in KVM api doc to state the previous CPUID configuration
   is not reliable if current KVM_SET_CPUID* fails [Jim]
 - Adjust Patch 2 to reduce code churn [Sean]
 - Commit message refine to add more justification [Sean]
 - Add a new patch (7)

v2:
https://lkml.kernel.org/r/20200623115816.24132-1-xiaoyao...@intel.com
 - rebase to kvm/queue: a037ff353ba6 ("Merge branch 'kvm-master' into HEAD")
 - change the name of kvm_update_state_based_on_cpuid() to
   kvm_update_vcpu_model() [Sean]
 - Add patch 5 to rename kvm_x86_ops.cpuid_date() to
   kvm_x86_ops.update_vcpu_model()

v1:
https://lkml.kernel.org/r/20200529085545.29242-1-xiaoyao...@intel.com

Xiaoyao Li (8):
  KVM: X86: Reset vcpu->arch.cpuid_nent to 0 if SET_CPUID* fails
  KVM: X86: Go on updating other CPUID leaves when leaf 1 is absent
  KVM: X86: Introduce kvm_check_cpuid()
  KVM: X86: Split kvm_update_cpuid()
  KVM: X86: Rename cpuid_update() to update_vcpu_model()
  KVM: X86: Move kvm_x86_ops.update_vcpu_model() into
kvm_update_vcpu_model()
  KVM: lapic: Use guest_cpuid_has() in kvm_apic_set_version()
  KVM: X86: Move kvm_apic_set_version() to kvm_update_vcpu_model()

 Documentation/virt/kvm/api.rst  |   4 ++
 arch/x86/include/asm/kvm_host.h |   2 +-
 arch/x86/kvm/cpuid.c| 107 
 arch/x86/kvm/cpuid.h|   3 +-
 arch/x86/kvm/lapic.c|   4 +-
 arch/x86/kvm/svm/svm.c  |   4 +-
 arch/x86/kvm/vmx/nested.c   |   2 +-
 arch/x86/kvm/vmx/vmx.c  |   4 +-
 arch/x86/kvm/x86.c  |   1 +
 9 files changed, 81 insertions(+), 50 deletions(-)

-- 
2.18.4

Re: [PATCH v12 07/11] KVM: vmx/pmu: Unmask LBR fields in the MSR_IA32_DEBUGCTLMSR emualtion

2020-07-07 Thread Xiaoyao Li


On 7/8/2020 4:21 AM, Sean Christopherson wrote:

On Sat, Jun 13, 2020 at 05:42:50PM +0800, Xu, Like wrote:

On 2020/6/13 17:14, Xiaoyao Li wrote:

On 6/13/2020 4:09 PM, Like Xu wrote:

[...]

@@ -237,6 +238,9 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu,
struct msr_data *msr_info)
   return 1;
   msr_info->data = vcpu->arch.perf_capabilities;
   return 0;
+    case MSR_IA32_DEBUGCTLMSR:
+    msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);


Can we put the emulation of MSR_IA32_DEBUGCTLMSR in vmx_{get/set})_msr().
AFAIK, MSR_IA32_DEBUGCTLMSR is not a pure PMU related MSR that there is
bit 2 to enable #DB for bus lock.

We already have "case MSR_IA32_DEBUGCTLMSR" handler in the vmx_set_msr()
and you may apply you bus lock changes in that handler.


Hrm, but that'd be weird dependency as vmx_set_msr() would need to check for
#DB bus lock support but not actually write GUEST_IA32_DEBUGCTL, or we'd end
up writing it twice when both bus lock and LBR are supported.


Yeah. That's what I concerned as well.


I don't see anything in the series that takes action on writes to
MSR_IA32_DEBUGCTLMSR beyond updating the VMCS, i.e. AFAICT there isn't any
reason to call into the PMU, VMX can simply query vmx_get_perf_capabilities()
to check if it's legal to enable DEBUGCTLMSR_LBR_MASK.

A question for both LBR and bus lock: would it make sense to cache the
guest's value in vcpu_vmx so that querying the guest value doesn't require
a VMREAD?  I don't have a good feel for how frequently it would be accessed.


Cache the guest's value is OK, even though #DB bus lock bit wouldn't be 
toggled frequently in a normal OS.

Re: [PATCH v2 7/7] KVM: X86: Move kvm_apic_set_version() to kvm_update_vcpu_model()

2020-07-02 Thread Xiaoyao Li


On 7/3/2020 3:00 AM, Sean Christopherson wrote:

On Tue, Jun 23, 2020 at 07:58:16PM +0800, Xiaoyao Li wrote:

Obviously, kvm_apic_set_version() fits well in kvm_update_vcpu_model().


Same as the last patch, it would be nice to explicitly document that there
are no dependencies between kvm_apic_set_version() and kvm_update_cpuid().


Sure, will do.

Thanks!


Signed-off-by: Xiaoyao Li 
---
  arch/x86/kvm/cpuid.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 5decc2dd5448..3428f4d84b42 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -129,6 +129,8 @@ void kvm_update_vcpu_model(struct kvm_vcpu *vcpu)
apic->lapic_timer.timer_mode_mask = 3 << 17;
else
apic->lapic_timer.timer_mode_mask = 1 << 17;
+
+   kvm_apic_set_version(vcpu);
}
  
  	best = kvm_find_cpuid_entry(vcpu, 0xD, 0);

@@ -226,7 +228,6 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
}
  
  	cpuid_fix_nx_cap(vcpu);

-   kvm_apic_set_version(vcpu);
kvm_update_cpuid(vcpu);
kvm_update_vcpu_model(vcpu);
  
@@ -255,7 +256,6 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,

goto out;
}
  
-	kvm_apic_set_version(vcpu);

kvm_update_cpuid(vcpu);
kvm_update_vcpu_model(vcpu);
  out:
--
2.18.2

Re: [PATCH v2 6/7] KVM: X86: Move kvm_x86_ops.update_vcpu_model() into kvm_update_vcpu_model()

2020-07-02 Thread Xiaoyao Li


On 7/3/2020 2:59 AM, Sean Christopherson wrote:

On Tue, Jun 23, 2020 at 07:58:15PM +0800, Xiaoyao Li wrote:

kvm_x86_ops.update_vcpu_model() is used to update vmx/svm vcpu settings
based on updated CPUID settings. So it's supposed to be called after
CPUIDs are fully updated, i.e., kvm_update_cpuid().

Move it in kvm_update_vcpu_model().


The changelog needs to provide an in-depth analysis of VMX and SVM to prove
that there are no existing dependencies in the ordering.  I've done the
analysis a few times over the past few years for a similar chage I carried
in my SGX code, but dropped that code a while back and haven't done the
analysis since.  Anyways, it should be documented.


No problem. Will add the analysis in next version.


Signed-off-by: Xiaoyao Li 
---
---
  arch/x86/kvm/cpuid.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index d2f93823f9fd..5decc2dd5448 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -121,6 +121,8 @@ void kvm_update_vcpu_model(struct kvm_vcpu *vcpu)
struct kvm_lapic *apic = vcpu->arch.apic;
struct kvm_cpuid_entry2 *best;
  
+	kvm_x86_ops.update_vcpu_model(vcpu);

+
best = kvm_find_cpuid_entry(vcpu, 1, 0);
if (best && apic) {
if (cpuid_entry_has(best, X86_FEATURE_TSC_DEADLINE_TIMER))
@@ -136,6 +138,7 @@ void kvm_update_vcpu_model(struct kvm_vcpu *vcpu)
vcpu->arch.guest_supported_xcr0 =
(best->eax | ((u64)best->edx << 32)) & supported_xcr0;
  
+


Spurious whitespace.


/* Note, maxphyaddr must be updated before tdp_level. */
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
vcpu->arch.tdp_level = kvm_x86_ops.get_tdp_level(vcpu);
@@ -224,7 +227,6 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
  
  	cpuid_fix_nx_cap(vcpu);

kvm_apic_set_version(vcpu);
-   kvm_x86_ops.update_vcpu_model(vcpu);
kvm_update_cpuid(vcpu);
kvm_update_vcpu_model(vcpu);
  
@@ -254,7 +256,6 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,

}
  
  	kvm_apic_set_version(vcpu);

-   kvm_x86_ops.update_vcpu_model(vcpu);
kvm_update_cpuid(vcpu);
kvm_update_vcpu_model(vcpu);
  out:
--
2.18.2

Re: [PATCH v2 2/7] KVM: X86: Go on updating other CPUID leaves when leaf 1 is absent

2020-07-02 Thread Xiaoyao Li


On 7/3/2020 3:02 AM, Sean Christopherson wrote:

On Thu, Jul 02, 2020 at 11:54:03AM -0700, Sean Christopherson wrote:

On Tue, Jun 23, 2020 at 07:58:11PM +0800, Xiaoyao Li wrote:

As handling of bits other leaf 1 added over time, kvm_update_cpuid()
should not return directly if leaf 1 is absent, but should go on
updateing other CPUID leaves.

Signed-off-by: Xiaoyao Li 


This should probably be marked for stable.


---
  arch/x86/kvm/cpuid.c | 23 +++
  1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1d13bad42bf9..0164dac95ef5 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -60,22 +60,21 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
struct kvm_lapic *apic = vcpu->arch.apic;
  
  	best = kvm_find_cpuid_entry(vcpu, 1, 0);

-   if (!best)
-   return 0;


Rather than wrap the existing code, what about throwing it in a separate
helper?  That generates an easier to read diff and also has the nice
property of getting 'apic' out of the common code.


Hrm, that'd be overkill once the apic code is moved in a few patches.
What if you keep the cpuid updates wrapped (as in this patch), but then
do

if (best && apic) {
}

for the apic path?  That'll minimize churn for code that is disappearing,
e.g. will make future git archaeologists happy :-).


Sure. I'll do it in next version.


-
-   /* Update OSXSAVE bit */
-   if (boot_cpu_has(X86_FEATURE_XSAVE) && best->function == 0x1)
-   cpuid_entry_change(best, X86_FEATURE_OSXSAVE,
+   if (best) {
+   /* Update OSXSAVE bit */
+   if (boot_cpu_has(X86_FEATURE_XSAVE))
+   cpuid_entry_change(best, X86_FEATURE_OSXSAVE,
   kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE));
  
-	cpuid_entry_change(best, X86_FEATURE_APIC,

+   cpuid_entry_change(best, X86_FEATURE_APIC,
   vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
  
-	if (apic) {

-   if (cpuid_entry_has(best, X86_FEATURE_TSC_DEADLINE_TIMER))
-   apic->lapic_timer.timer_mode_mask = 3 << 17;
-   else
-   apic->lapic_timer.timer_mode_mask = 1 << 17;
+   if (apic) {
+   if (cpuid_entry_has(best, 
X86_FEATURE_TSC_DEADLINE_TIMER))
+   apic->lapic_timer.timer_mode_mask = 3 << 17;
+   else
+   apic->lapic_timer.timer_mode_mask = 1 << 17;
+   }
}
  
  	best = kvm_find_cpuid_entry(vcpu, 7, 0);

--
2.18.2

Re: [PATCH v13 03/11] KVM: VMX: Set guest CET MSRs per KVM and host configuration

2020-07-02 Thread Xiaoyao Li


On 7/1/2020 4:04 PM, Yang Weijiang wrote:

CET MSRs pass through guest directly to enhance performance. CET runtime
control settings are stored in MSR_IA32_{U,S}_CET, Shadow Stack Pointer(SSP)
are stored in MSR_IA32_PL{0,1,2,3}_SSP, SSP table base address is stored in
MSR_IA32_INT_SSP_TAB, these MSRs are defined in kernel and re-used here.

MSR_IA32_U_CET and MSR_IA32_PL3_SSP are used for user-mode protection,the MSR
contents are switched between threads during scheduling, it makes sense to pass
through them so that the guest kernel can use xsaves/xrstors to operate them
efficiently. Other MSRs are used for non-user mode protection. See SDM for 
detailed
info.

The difference between CET VMCS fields and CET MSRs is that,the former are used
during VMEnter/VMExit, whereas the latter are used for CET state storage between
task/thread scheduling.

Co-developed-by: Zhang Yi Z 
Signed-off-by: Zhang Yi Z 
Signed-off-by: Yang Weijiang 
---
  arch/x86/kvm/vmx/vmx.c | 46 ++
  arch/x86/kvm/x86.c |  3 +++
  2 files changed, 49 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d52d470e36b1..97e766875a7e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3020,6 +3020,13 @@ void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned 
long cr3)
vmcs_writel(GUEST_CR3, guest_cr3);
  }
  
+static bool is_cet_state_supported(struct kvm_vcpu *vcpu, u32 xss_states)

+{
+   return ((supported_xss & xss_states) &&
+   (guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) ||
+   guest_cpuid_has(vcpu, X86_FEATURE_IBT)));
+}
+
  int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
  {
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -7098,6 +7105,42 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
  }
  
+static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu)

+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+   unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap;
+   bool incpt;
+
+   incpt = !is_cet_state_supported(vcpu, XFEATURE_MASK_CET_USER);
+   /*
+* U_CET is required for USER CET, and U_CET, PL3_SPP are bound as
+* one component and controlled by IA32_XSS[bit 11].
+*/
+   vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_U_CET, MSR_TYPE_RW,
+ incpt);
+   vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_PL3_SSP, MSR_TYPE_RW,
+ incpt);
+
+   incpt = !is_cet_state_supported(vcpu, XFEATURE_MASK_CET_KERNEL);
+   /*
+* S_CET is required for KERNEL CET, and PL0_SSP ... PL2_SSP are
+* bound as one component and controlled by IA32_XSS[bit 12].
+*/
+   vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_S_CET, MSR_TYPE_RW,
+ incpt);
+   vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_PL0_SSP, MSR_TYPE_RW,
+ incpt);
+   vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_PL1_SSP, MSR_TYPE_RW,
+ incpt);
+   vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_PL2_SSP, MSR_TYPE_RW,
+ incpt);
+
+   incpt |= !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
+   /* SSP_TAB is only available for KERNEL SHSTK.*/
+   vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_INT_SSP_TAB, MSR_TYPE_RW,
+ incpt);
+}
+
  static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
  {
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -7136,6 +7179,9 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
vmx_set_guest_msr(vmx, msr, enabled ? 0 : 
TSX_CTRL_RTM_DISABLE);
}
}
+
+   if (supported_xss & (XFEATURE_MASK_CET_KERNEL | XFEATURE_MASK_CET_USER))
+   vmx_update_intercept_for_cet_msr(vcpu);
  }
  
  static __init void vmx_set_cpu_caps(void)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c5835f9cb9ad..6390b62c12ed 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -186,6 +186,9 @@ static struct kvm_shared_msrs __percpu *shared_msrs;
| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
| XFEATURE_MASK_PKRU)
  
+#define KVM_SUPPORTED_XSS   (XFEATURE_MASK_CET_USER | \

+XFEATURE_MASK_CET_KERNEL)
+


This definition need to be moved to Patch 5?


  u64 __read_mostly host_efer;
  EXPORT_SYMBOL_GPL(host_efer);

Re: [RFC 2/2] KVM: VMX: Enable bus lock VM exit

2020-07-02 Thread Xiaoyao Li


On 7/1/2020 10:49 PM, Vitaly Kuznetsov wrote:

Xiaoyao Li  writes:


On 7/1/2020 8:44 PM, Vitaly Kuznetsov wrote:

Xiaoyao Li  writes:


On 7/1/2020 5:04 PM, Vitaly Kuznetsov wrote:

Chenyi Qiang  writes:

[...]

static const int kvm_vmx_max_exit_handlers =
@@ -6830,6 +6838,13 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
if (unlikely(vmx->exit_reason.failed_vmentry))
return EXIT_FASTPATH_NONE;

+	/*

+* check the exit_reason to see if there is a bus lock
+* happened in guest.
+*/
+   if (vmx->exit_reason.bus_lock_detected)
+   handle_bus_lock(vcpu);


In case the ultimate goal is to have an exit to userspace on bus lock,


I don't think we will need an exit to userspace on bus lock. See below.


the two ways to reach handle_bus_lock() are very different: in case
we're handling EXIT_REASON_BUS_LOCK we can easily drop to userspace by
returning 0 but what are we going to do in case of
exit_reason.bus_lock_detected? The 'higher priority VM exit' may require
exit to userspace too. So what's the plan? Maybe we can ignore the case
when we're exiting to userspace for some other reason as this is slow
already and force the exit otherwise?



And should we actually introduce
the KVM_EXIT_BUS_LOCK and a capability to enable it here?



Introducing KVM_EXIT_BUS_LOCK maybe help nothing. No matter
EXIT_REASON_BUS_LOCK or exit_reason.bus_lock_detected, the bus lock has
already happened. Exit to userspace cannot prevent bus lock, so what
userspace can do is recording and counting as what this patch does in
vcpu->stat.bus_locks.


Exiting to userspace would allow to implement custom 'throttling'
policies to mitigate the 'noisy neighbour' problem. The simplest would
be to just inject some sleep time.



So you want an exit to userspace for every bus lock and leave it all to
userspace. Yes, it's doable.



In some cases we may not even want to have a VM exit: think
e.g. real-time/partitioning case when even in case of bus lock we may
not want to add additional latency just to count such events. 


For real-time case, the bus lock needs to be avoided at all, since bus 
lock takes many cycles and prevents others accessing memory. If no bus 
lock, then no bus lock vm exit to worry about. If bus lock, the latency 
requirement maybe cannot be met due to it.



I'd
suggest we make the new capability tri-state:
- disabled (no vmexit, default)
- stats only (what this patch does)
- userspace exit
But maybe this is an overkill, I'd like to hear what others think.


Yeah. Others' thought is very welcomed.

Besides, for how to throttle, KVM maybe has to take kernel policy into 
account. Since in the spec, there is another feature for bare metal to 
raise a #DB for bus lock. Native kernel likely implements some policy to 
restrict the rate the bus lock can happen. So qemu threads will have to 
follow that as well.



As you said, the exit_reason.bus_lock_detected case is the tricky one.
We cannot do the similar to extend vcpu->run->exit_reason, this breaks
ABI. Maybe we can extend the vcpu->run->flags to indicate bus lock
detected for the other exit reason?


This is likely the easiest solution.

Re: [RFC 2/2] KVM: VMX: Enable bus lock VM exit

2020-07-01 Thread Xiaoyao Li


On 7/1/2020 8:44 PM, Vitaly Kuznetsov wrote:

Xiaoyao Li  writes:


On 7/1/2020 5:04 PM, Vitaly Kuznetsov wrote:

Chenyi Qiang  writes:

[...]

   static const int kvm_vmx_max_exit_handlers =
@@ -6830,6 +6838,13 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
if (unlikely(vmx->exit_reason.failed_vmentry))
return EXIT_FASTPATH_NONE;
   
+	/*

+* check the exit_reason to see if there is a bus lock
+* happened in guest.
+*/
+   if (vmx->exit_reason.bus_lock_detected)
+   handle_bus_lock(vcpu);


In case the ultimate goal is to have an exit to userspace on bus lock,


I don't think we will need an exit to userspace on bus lock. See below.


the two ways to reach handle_bus_lock() are very different: in case
we're handling EXIT_REASON_BUS_LOCK we can easily drop to userspace by
returning 0 but what are we going to do in case of
exit_reason.bus_lock_detected? The 'higher priority VM exit' may require
exit to userspace too. So what's the plan? Maybe we can ignore the case
when we're exiting to userspace for some other reason as this is slow
already and force the exit otherwise?



And should we actually introduce
the KVM_EXIT_BUS_LOCK and a capability to enable it here?



Introducing KVM_EXIT_BUS_LOCK maybe help nothing. No matter
EXIT_REASON_BUS_LOCK or exit_reason.bus_lock_detected, the bus lock has
already happened. Exit to userspace cannot prevent bus lock, so what
userspace can do is recording and counting as what this patch does in
vcpu->stat.bus_locks.


Exiting to userspace would allow to implement custom 'throttling'
policies to mitigate the 'noisy neighbour' problem. The simplest would
be to just inject some sleep time.



So you want an exit to userspace for every bus lock and leave it all to 
userspace. Yes, it's doable.


As you said, the exit_reason.bus_lock_detected case is the tricky one. 
We cannot do the similar to extend vcpu->run->exit_reason, this breaks 
ABI. Maybe we can extend the vcpu->run->flags to indicate bus lock 
detected for the other exit reason?

Re: [RFC 2/2] KVM: VMX: Enable bus lock VM exit

2020-07-01 Thread Xiaoyao Li


On 7/1/2020 5:04 PM, Vitaly Kuznetsov wrote:

Chenyi Qiang  writes:

[...]

  static const int kvm_vmx_max_exit_handlers =
@@ -6830,6 +6838,13 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
if (unlikely(vmx->exit_reason.failed_vmentry))
return EXIT_FASTPATH_NONE;
  
+	/*

+* check the exit_reason to see if there is a bus lock
+* happened in guest.
+*/
+   if (vmx->exit_reason.bus_lock_detected)
+   handle_bus_lock(vcpu);


In case the ultimate goal is to have an exit to userspace on bus lock,


I don't think we will need an exit to userspace on bus lock. See below.


the two ways to reach handle_bus_lock() are very different: in case
we're handling EXIT_REASON_BUS_LOCK we can easily drop to userspace by
returning 0 but what are we going to do in case of
exit_reason.bus_lock_detected? The 'higher priority VM exit' may require
exit to userspace too. So what's the plan? Maybe we can ignore the case
when we're exiting to userspace for some other reason as this is slow
already and force the exit otherwise? 



And should we actually introduce
the KVM_EXIT_BUS_LOCK and a capability to enable it here?



Introducing KVM_EXIT_BUS_LOCK maybe help nothing. No matter 
EXIT_REASON_BUS_LOCK or exit_reason.bus_lock_detected, the bus lock has 
already happened. Exit to userspace cannot prevent bus lock, so what 
userspace can do is recording and counting as what this patch does in 
vcpu->stat.bus_locks.

Re: [PATCH v9 0/8] KVM: Add virtualization support of split lock detection

2020-06-30 Thread Xiaoyao Li


Ping for comments.

On 5/9/2020 7:05 PM, Xiaoyao Li wrote:

This series aims to add the virtualization of split lock detection in
KVM.

Due to the fact that split lock detection is tightly coupled with CPU
model and CPU model is configurable by host VMM, we elect to use
paravirt method to expose and enumerate it for guest.

Changes in v9
  - rebase to v5.7-rc4
  - Add one patch to rename TIF_SLD to TIF_SLD_DISABLED;
  - Add one patch to remove bogus case in handle_guest_split_lock;
  - Introduce flag X86_FEATURE_SPLIT_LOCK_DETECT_FATAL and thus drop
sld_state;
  - Use X86_FEATURE_SPLIT_LOCK_DETECT and X86_FEATURE_SPLIT_LOCK_DETECT_FATAL
to determine the SLD state of host;
  - Introduce split_lock_virt_switch() and two wrappers for KVM instead
of sld_update_to();
  - Use paravirt to expose and enumerate split lock detection for guest;
  - Split lock detection can be exposed to guest when host is sld_fatal,
even though host is SMT available.

Changes in v8:
https://lkml.kernel.org/r/20200414063129.133630-1-xiaoyao...@intel.com
  - rebase to v5.7-rc1.
  - basic enabling of split lock detection already merged.
  - When host is sld_warn and nosmt, load guest's sld bit when in KVM
context, i.e., between vmx_prepare_switch_to_guest() and before
vmx_prepare_switch_to_host(), KVM uses guest sld setting.

Changes in v7:
https://lkml.kernel.org/r/20200325030924.132881-1-xiaoyao...@intel.com
  - only pick patch 1 and patch 2, and hold all the left.
  - Update SLD bit on each processor based on sld_state.

Changes in v6:
https://lkml.kernel.org/r/20200324151859.31068-1-xiaoyao...@intel.com
  - Drop the sld_not_exist flag and use X86_FEATURE_SPLIT_LOCK_DETECT to
check whether need to init split lock detection. [tglx]
  - Use tglx's method to verify the existence of split lock detectoin.
  - small optimization of sld_update_msr() that the default value of
msr_test_ctrl_cache has split_lock_detect bit cleared.
  - Drop the patch3 in v5 that introducing kvm_only option. [tglx]
  - Rebase patch4-8 to kvm/queue.
  - use the new kvm-cpu-cap to expose X86_FEATURE_CORE_CAPABILITIES in
Patch 6.

Changes in v5:
https://lkml.kernel.org/r/20200315050517.127446-1-xiaoyao...@intel.com
  - Use X86_FEATURE_SPLIT_LOCK_DETECT flag in kvm to ensure split lock
detection is really supported.
  - Add and export sld related helper functions in their related usecase
kvm patches.

Xiaoyao Li (8):
   x86/split_lock: Rename TIF_SLD to TIF_SLD_DISABLED
   x86/split_lock: Remove bogus case in handle_guest_split_lock()
   x86/split_lock: Introduce flag X86_FEATURE_SLD_FATAL and drop
 sld_state
   x86/split_lock: Introduce split_lock_virt_switch() and two wrappers
   x86/kvm: Introduce paravirt split lock detection enumeration
   KVM: VMX: Enable MSR TEST_CTRL for guest
   KVM: VMX: virtualize split lock detection
   x86/split_lock: Enable split lock detection initialization when
 running as an guest on KVM

  Documentation/virt/kvm/cpuid.rst | 29 +++
  arch/x86/include/asm/cpu.h   | 35 ++
  arch/x86/include/asm/cpufeatures.h   |  1 +
  arch/x86/include/asm/thread_info.h   |  6 +--
  arch/x86/include/uapi/asm/kvm_para.h |  8 ++--
  arch/x86/kernel/cpu/intel.c  | 59 ---
  arch/x86/kernel/kvm.c|  3 ++
  arch/x86/kernel/process.c|  2 +-
  arch/x86/kvm/cpuid.c |  6 +++
  arch/x86/kvm/vmx/vmx.c   | 72 +---
  arch/x86/kvm/vmx/vmx.h   |  3 ++
  arch/x86/kvm/x86.c   |  6 ++-
  arch/x86/kvm/x86.h   |  7 +++
  13 files changed, 196 insertions(+), 41 deletions(-)

Re: [PATCH v2 1/7] KVM: X86: Reset vcpu->arch.cpuid_nent to 0 if SET_CPUID fails

2020-06-23 Thread Xiaoyao Li


On 6/24/2020 2:20 AM, Jim Mattson wrote:

On Tue, Jun 23, 2020 at 4:58 AM Xiaoyao Li  wrote:


It needs to invalidate CPUID configruations if usersapce provides


Nits: configurations, userspace


oh, I'll fix it.


illegal input.

Signed-off-by: Xiaoyao Li 
---
  arch/x86/kvm/cpuid.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 8a294f9747aa..1d13bad42bf9 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -207,6 +207,8 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
 kvm_apic_set_version(vcpu);
 kvm_x86_ops.cpuid_update(vcpu);
 r = kvm_update_cpuid(vcpu);
+   if (r)
+   vcpu->arch.cpuid_nent = 0;

 kvfree(cpuid_entries);
  out:
@@ -230,6 +232,8 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
 kvm_apic_set_version(vcpu);
 kvm_x86_ops.cpuid_update(vcpu);
 r = kvm_update_cpuid(vcpu);
+   if (r)
+   vcpu->arch.cpuid_nent = 0;
  out:
 return r;
  }
--
2.18.2


What if vcpu->arch.cpuid_nent was greater than 0 before the ioctl in question?



Nice catch!

If considering it, then we have to restore the old CPUID configuration. 
So how about making it simpler to just add one line of comment in API doc:
If KVM_SET_CPUID{2} fails, the old valid configuration is cleared as a 
side effect.

[PATCH v2 2/7] KVM: X86: Go on updating other CPUID leaves when leaf 1 is absent

2020-06-23 Thread Xiaoyao Li

As handling of bits other leaf 1 added over time, kvm_update_cpuid()
should not return directly if leaf 1 is absent, but should go on
updateing other CPUID leaves.

Signed-off-by: Xiaoyao Li 
---
 arch/x86/kvm/cpuid.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1d13bad42bf9..0164dac95ef5 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -60,22 +60,21 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
struct kvm_lapic *apic = vcpu->arch.apic;
 
best = kvm_find_cpuid_entry(vcpu, 1, 0);
-   if (!best)
-   return 0;
-
-   /* Update OSXSAVE bit */
-   if (boot_cpu_has(X86_FEATURE_XSAVE) && best->function == 0x1)
-   cpuid_entry_change(best, X86_FEATURE_OSXSAVE,
+   if (best) {
+   /* Update OSXSAVE bit */
+   if (boot_cpu_has(X86_FEATURE_XSAVE))
+   cpuid_entry_change(best, X86_FEATURE_OSXSAVE,
   kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE));
 
-   cpuid_entry_change(best, X86_FEATURE_APIC,
+   cpuid_entry_change(best, X86_FEATURE_APIC,
   vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
 
-   if (apic) {
-   if (cpuid_entry_has(best, X86_FEATURE_TSC_DEADLINE_TIMER))
-   apic->lapic_timer.timer_mode_mask = 3 << 17;
-   else
-   apic->lapic_timer.timer_mode_mask = 1 << 17;
+   if (apic) {
+   if (cpuid_entry_has(best, 
X86_FEATURE_TSC_DEADLINE_TIMER))
+   apic->lapic_timer.timer_mode_mask = 3 << 17;
+   else
+   apic->lapic_timer.timer_mode_mask = 1 << 17;
+   }
}
 
best = kvm_find_cpuid_entry(vcpu, 7, 0);
-- 
2.18.2

[PATCH v2 4/7] KVM: X86: Split kvm_update_cpuid()

2020-06-23 Thread Xiaoyao Li

Split the part of updating vcpu model out of kvm_update_cpuid(), and put
it into a new kvm_update_vcpu_model(). So it's more clear that
kvm_update_cpuid() is to update guest CPUID settings, while
kvm_update_vcpu_model() is to update vcpu model (settings) based on the
updated CPUID settings.

Signed-off-by: Xiaoyao Li 
---
 arch/x86/kvm/cpuid.c | 38 --
 arch/x86/kvm/cpuid.h |  1 +
 arch/x86/kvm/x86.c   |  1 +
 3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 67e5c68fdb45..001f5a94880e 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -76,7 +76,6 @@ static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
 void kvm_update_cpuid(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
-   struct kvm_lapic *apic = vcpu->arch.apic;
 
best = kvm_find_cpuid_entry(vcpu, 1, 0);
if (best) {
@@ -87,13 +86,6 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu)
 
cpuid_entry_change(best, X86_FEATURE_APIC,
   vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
-
-   if (apic) {
-   if (cpuid_entry_has(best, 
X86_FEATURE_TSC_DEADLINE_TIMER))
-   apic->lapic_timer.timer_mode_mask = 3 << 17;
-   else
-   apic->lapic_timer.timer_mode_mask = 1 << 17;
-   }
}
 
best = kvm_find_cpuid_entry(vcpu, 7, 0);
@@ -102,13 +94,8 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu)
   kvm_read_cr4_bits(vcpu, X86_CR4_PKE));
 
best = kvm_find_cpuid_entry(vcpu, 0xD, 0);
-   if (!best) {
-   vcpu->arch.guest_supported_xcr0 = 0;
-   } else {
-   vcpu->arch.guest_supported_xcr0 =
-   (best->eax | ((u64)best->edx << 32)) & supported_xcr0;
+   if (best)
best->ebx = xstate_required_size(vcpu->arch.xcr0, false);
-   }
 
best = kvm_find_cpuid_entry(vcpu, 0xD, 1);
if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
@@ -127,6 +114,27 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu)
   vcpu->arch.ia32_misc_enable_msr &
   MSR_IA32_MISC_ENABLE_MWAIT);
}
+}
+
+void kvm_update_vcpu_model(struct kvm_vcpu *vcpu)
+{
+   struct kvm_lapic *apic = vcpu->arch.apic;
+   struct kvm_cpuid_entry2 *best;
+
+   best = kvm_find_cpuid_entry(vcpu, 1, 0);
+   if (best && apic) {
+   if (cpuid_entry_has(best, X86_FEATURE_TSC_DEADLINE_TIMER))
+   apic->lapic_timer.timer_mode_mask = 3 << 17;
+   else
+   apic->lapic_timer.timer_mode_mask = 1 << 17;
+   }
+
+   best = kvm_find_cpuid_entry(vcpu, 0xD, 0);
+   if (!best)
+   vcpu->arch.guest_supported_xcr0 = 0;
+   else
+   vcpu->arch.guest_supported_xcr0 =
+   (best->eax | ((u64)best->edx << 32)) & supported_xcr0;
 
/* Note, maxphyaddr must be updated before tdp_level. */
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
@@ -218,6 +226,7 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
kvm_apic_set_version(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
kvm_update_cpuid(vcpu);
+   kvm_update_vcpu_model(vcpu);
 
kvfree(cpuid_entries);
 out:
@@ -247,6 +256,7 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
kvm_apic_set_version(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
kvm_update_cpuid(vcpu);
+   kvm_update_vcpu_model(vcpu);
 out:
return r;
 }
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index f136de1debad..45e3643e2fba 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -10,6 +10,7 @@ extern u32 kvm_cpu_caps[NCAPINTS] __read_mostly;
 void kvm_set_cpu_caps(void);
 
 void kvm_update_cpuid(struct kvm_vcpu *vcpu);
+void kvm_update_vcpu_model(struct kvm_vcpu *vcpu);
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
  u32 function, u32 index);
 int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 00c88c2f34e4..4dee28bbc087 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8149,6 +8149,7 @@ static void enter_smm(struct kvm_vcpu *vcpu)
 #endif
 
kvm_update_cpuid(vcpu);
+   kvm_update_vcpu_model(vcpu);
kvm_mmu_reset_context(vcpu);
 }
 
-- 
2.18.2

[PATCH v2 3/7] KVM: X86: Introduce kvm_check_cpuid()

2020-06-23 Thread Xiaoyao Li

Use kvm_check_cpuid() to validate if userspace provides legal cpuid
settings and call it before KVM updates CPUID.

Signed-off-by: Xiaoyao Li 
---
Is the check of virutal address width really necessary?
KVM doesn't check other bits at all. I guess the policy is that KVM allows
illegal CPUID settings as long as it doesn't break host kernel. Userspace
takes the consequences itself if it sets bogus CPUID settings that breaks
its guest.
But why vaddr_bits is special? It seems illegal vaddr_bits won't break host
kernel.
---
 arch/x86/kvm/cpuid.c | 54 
 arch/x86/kvm/cpuid.h |  2 +-
 2 files changed, 35 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0164dac95ef5..67e5c68fdb45 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -54,7 +54,26 @@ static u32 xstate_required_size(u64 xstate_bv, bool 
compacted)
 
 #define F feature_bit
 
-int kvm_update_cpuid(struct kvm_vcpu *vcpu)
+static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
+{
+   struct kvm_cpuid_entry2 *best;
+
+   /*
+* The existing code assumes virtual address is 48-bit or 57-bit in the
+* canonical address checks; exit if it is ever changed.
+*/
+   best = kvm_find_cpuid_entry(vcpu, 0x8008, 0);
+   if (best) {
+   int vaddr_bits = (best->eax & 0xff00) >> 8;
+
+   if (vaddr_bits != 48 && vaddr_bits != 57 && vaddr_bits != 0)
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+void kvm_update_cpuid(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
struct kvm_lapic *apic = vcpu->arch.apic;
@@ -96,18 +115,6 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
 
-   /*
-* The existing code assumes virtual address is 48-bit or 57-bit in the
-* canonical address checks; exit if it is ever changed.
-*/
-   best = kvm_find_cpuid_entry(vcpu, 0x8008, 0);
-   if (best) {
-   int vaddr_bits = (best->eax & 0xff00) >> 8;
-
-   if (vaddr_bits != 48 && vaddr_bits != 57 && vaddr_bits != 0)
-   return -EINVAL;
-   }
-
best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
if (kvm_hlt_in_guest(vcpu->kvm) && best &&
(best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
@@ -127,7 +134,6 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
kvm_mmu_reset_context(vcpu);
 
kvm_pmu_refresh(vcpu);
-   return 0;
 }
 
 static int is_efer_nx(void)
@@ -202,12 +208,16 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
vcpu->arch.cpuid_entries[i].padding[2] = 0;
}
vcpu->arch.cpuid_nent = cpuid->nent;
+   r = kvm_check_cpuid(vcpu);
+   if (r) {
+   vcpu->arch.cpuid_nent = 0;
+   goto out;
+   }
+
cpuid_fix_nx_cap(vcpu);
kvm_apic_set_version(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
-   r = kvm_update_cpuid(vcpu);
-   if (r)
-   vcpu->arch.cpuid_nent = 0;
+   kvm_update_cpuid(vcpu);
 
kvfree(cpuid_entries);
 out:
@@ -228,11 +238,15 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
   cpuid->nent * sizeof(struct kvm_cpuid_entry2)))
goto out;
vcpu->arch.cpuid_nent = cpuid->nent;
+   r = kvm_check_cpuid(vcpu);
+   if (r) {
+   vcpu->arch.cpuid_nent = 0;
+   goto out;
+   }
+
kvm_apic_set_version(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
-   r = kvm_update_cpuid(vcpu);
-   if (r)
-   vcpu->arch.cpuid_nent = 0;
+   kvm_update_cpuid(vcpu);
 out:
return r;
 }
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 05434cd9342f..f136de1debad 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -9,7 +9,7 @@
 extern u32 kvm_cpu_caps[NCAPINTS] __read_mostly;
 void kvm_set_cpu_caps(void);
 
-int kvm_update_cpuid(struct kvm_vcpu *vcpu);
+void kvm_update_cpuid(struct kvm_vcpu *vcpu);
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
  u32 function, u32 index);
 int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
-- 
2.18.2

[PATCH v2 5/7] KVM: X86: Rename cpuid_update() to update_vcpu_model()

2020-06-23 Thread Xiaoyao Li

The name of callback cpuid_update() is misleading that it's not about
updating CPUID settings of vcpu but updating the configurations of vcpu
based on the CPUIDs. So rename it to update_vcpu_model().

Signed-off-by: Xiaoyao Li 
---
 arch/x86/include/asm/kvm_host.h | 2 +-
 arch/x86/kvm/cpuid.c| 4 ++--
 arch/x86/kvm/svm/svm.c  | 4 ++--
 arch/x86/kvm/vmx/nested.c   | 2 +-
 arch/x86/kvm/vmx/vmx.c  | 4 ++--
 5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f852ee350beb..e8ae89eef199 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1083,7 +1083,7 @@ struct kvm_x86_ops {
void (*hardware_unsetup)(void);
bool (*cpu_has_accelerated_tpr)(void);
bool (*has_emulated_msr)(u32 index);
-   void (*cpuid_update)(struct kvm_vcpu *vcpu);
+   void (*update_vcpu_model)(struct kvm_vcpu *vcpu);
 
unsigned int vm_size;
int (*vm_init)(struct kvm *kvm);
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 001f5a94880e..d2f93823f9fd 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -224,7 +224,7 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
 
cpuid_fix_nx_cap(vcpu);
kvm_apic_set_version(vcpu);
-   kvm_x86_ops.cpuid_update(vcpu);
+   kvm_x86_ops.update_vcpu_model(vcpu);
kvm_update_cpuid(vcpu);
kvm_update_vcpu_model(vcpu);
 
@@ -254,7 +254,7 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
}
 
kvm_apic_set_version(vcpu);
-   kvm_x86_ops.cpuid_update(vcpu);
+   kvm_x86_ops.update_vcpu_model(vcpu);
kvm_update_cpuid(vcpu);
kvm_update_vcpu_model(vcpu);
 out:
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c0da4dd78ac5..480d0354910a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3551,7 +3551,7 @@ static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t 
gfn, bool is_mmio)
return 0;
 }
 
-static void svm_cpuid_update(struct kvm_vcpu *vcpu)
+static void svm_update_vcpu_model(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
@@ -4051,7 +4051,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
.get_exit_info = svm_get_exit_info,
 
-   .cpuid_update = svm_cpuid_update,
+   .update_vcpu_model = svm_update_vcpu_model,
 
.has_wbinvd_exit = svm_has_wbinvd_exit,
 
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index d1af20b050a8..86ba7cd49c97 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6322,7 +6322,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs 
*msrs, u32 ept_caps)
 
/*
 * secondary cpu-based controls.  Do not include those that
-* depend on CPUID bits, they are added later by vmx_cpuid_update.
+* depend on CPUID bits, they are added later by vmx_update_vcpu_model.
 */
if (msrs->procbased_ctls_high & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS)
rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b1a23ad986ff..147c696d6445 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7251,7 +7251,7 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
 }
 
-static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
+static void vmx_update_vcpu_model(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
 
@@ -7945,7 +7945,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 
.get_exit_info = vmx_get_exit_info,
 
-   .cpuid_update = vmx_cpuid_update,
+   .update_vcpu_model = vmx_update_vcpu_model,
 
.has_wbinvd_exit = cpu_has_vmx_wbinvd_exit,
 
-- 
2.18.2

[PATCH v2 1/7] KVM: X86: Reset vcpu->arch.cpuid_nent to 0 if SET_CPUID fails

2020-06-23 Thread Xiaoyao Li

It needs to invalidate CPUID configruations if usersapce provides
illegal input.

Signed-off-by: Xiaoyao Li 
---
 arch/x86/kvm/cpuid.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 8a294f9747aa..1d13bad42bf9 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -207,6 +207,8 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
kvm_apic_set_version(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
r = kvm_update_cpuid(vcpu);
+   if (r)
+   vcpu->arch.cpuid_nent = 0;
 
kvfree(cpuid_entries);
 out:
@@ -230,6 +232,8 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
kvm_apic_set_version(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
r = kvm_update_cpuid(vcpu);
+   if (r)
+   vcpu->arch.cpuid_nent = 0;
 out:
return r;
 }
-- 
2.18.2

[PATCH v2 0/7] Refactor handling flow of SET_CPUID*

2020-06-23 Thread Xiaoyao Li

This serial is the extended version of
https://lkml.kernel.org/r/20200528151927.14346-1-xiaoyao...@intel.com

First two patches are bug fixing, and the others aim to refactor the flow
of SET_CPUID* as:

1. cpuid check: check if userspace provides legal CPUID settings;

2. cpuid update: Update some special CPUID bits based on current vcpu
 state, e.g., OSXSAVE, OSPKE, ...

3. update vcpu model: Update vcpu model (settings) based on the final CPUID
  settings. 


v2:
 - rebase to kvm/queue: a037ff353ba6 ("Merge branch 'kvm-master' into HEAD")
 - change the name of kvm_update_state_based_on_cpuid() to
   kvm_update_vcpu_model() [Sean]
 - Add patch 5 to rename kvm_x86_ops.cpuid_date() to
   kvm_x86_ops.update_vcpu_model()

v1:
https://lkml.kernel.org/r/20200529085545.29242-1-xiaoyao...@intel.com


Xiaoyao Li (7):
  KVM: X86: Reset vcpu->arch.cpuid_nent to 0 if SET_CPUID fails
  KVM: X86: Go on updating other CPUID leaves when leaf 1 is absent
  KVM: X86: Introduce kvm_check_cpuid()
  KVM: X86: Split kvm_update_cpuid()
  KVM: X86: Rename cpuid_update() to update_vcpu_model()
  KVM: X86: Move kvm_x86_ops.update_vcpu_model() into
kvm_update_vcpu_model()
  KVM: X86: Move kvm_apic_set_version() to kvm_update_vcpu_model()

 arch/x86/include/asm/kvm_host.h |   2 +-
 arch/x86/kvm/cpuid.c| 108 
 arch/x86/kvm/cpuid.h|   3 +-
 arch/x86/kvm/svm/svm.c  |   4 +-
 arch/x86/kvm/vmx/nested.c   |   2 +-
 arch/x86/kvm/vmx/vmx.c  |   4 +-
 arch/x86/kvm/x86.c  |   1 +
 7 files changed, 77 insertions(+), 47 deletions(-)

-- 
2.18.2

[PATCH v2 7/7] KVM: X86: Move kvm_apic_set_version() to kvm_update_vcpu_model()

2020-06-23 Thread Xiaoyao Li

Obviously, kvm_apic_set_version() fits well in kvm_update_vcpu_model().

Signed-off-by: Xiaoyao Li 
---
 arch/x86/kvm/cpuid.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 5decc2dd5448..3428f4d84b42 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -129,6 +129,8 @@ void kvm_update_vcpu_model(struct kvm_vcpu *vcpu)
apic->lapic_timer.timer_mode_mask = 3 << 17;
else
apic->lapic_timer.timer_mode_mask = 1 << 17;
+
+   kvm_apic_set_version(vcpu);
}
 
best = kvm_find_cpuid_entry(vcpu, 0xD, 0);
@@ -226,7 +228,6 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
}
 
cpuid_fix_nx_cap(vcpu);
-   kvm_apic_set_version(vcpu);
kvm_update_cpuid(vcpu);
kvm_update_vcpu_model(vcpu);
 
@@ -255,7 +256,6 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
goto out;
}
 
-   kvm_apic_set_version(vcpu);
kvm_update_cpuid(vcpu);
kvm_update_vcpu_model(vcpu);
 out:
-- 
2.18.2

[PATCH v2 6/7] KVM: X86: Move kvm_x86_ops.update_vcpu_model() into kvm_update_vcpu_model()

2020-06-23 Thread Xiaoyao Li

kvm_x86_ops.update_vcpu_model() is used to update vmx/svm vcpu settings
based on updated CPUID settings. So it's supposed to be called after
CPUIDs are fully updated, i.e., kvm_update_cpuid().

Move it in kvm_update_vcpu_model().

Signed-off-by: Xiaoyao Li 
---
---
 arch/x86/kvm/cpuid.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index d2f93823f9fd..5decc2dd5448 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -121,6 +121,8 @@ void kvm_update_vcpu_model(struct kvm_vcpu *vcpu)
struct kvm_lapic *apic = vcpu->arch.apic;
struct kvm_cpuid_entry2 *best;
 
+   kvm_x86_ops.update_vcpu_model(vcpu);
+
best = kvm_find_cpuid_entry(vcpu, 1, 0);
if (best && apic) {
if (cpuid_entry_has(best, X86_FEATURE_TSC_DEADLINE_TIMER))
@@ -136,6 +138,7 @@ void kvm_update_vcpu_model(struct kvm_vcpu *vcpu)
vcpu->arch.guest_supported_xcr0 =
(best->eax | ((u64)best->edx << 32)) & supported_xcr0;
 
+
/* Note, maxphyaddr must be updated before tdp_level. */
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
vcpu->arch.tdp_level = kvm_x86_ops.get_tdp_level(vcpu);
@@ -224,7 +227,6 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
 
cpuid_fix_nx_cap(vcpu);
kvm_apic_set_version(vcpu);
-   kvm_x86_ops.update_vcpu_model(vcpu);
kvm_update_cpuid(vcpu);
kvm_update_vcpu_model(vcpu);
 
@@ -254,7 +256,6 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
}
 
kvm_apic_set_version(vcpu);
-   kvm_x86_ops.update_vcpu_model(vcpu);
kvm_update_cpuid(vcpu);
kvm_update_vcpu_model(vcpu);
 out:
-- 
2.18.2

Re: [PATCH] KVM: VMX: Stop context switching MSR_IA32_UMWAIT_CONTROL

2020-06-22 Thread Xiaoyao Li


On 6/23/2020 8:51 AM, Sean Christopherson wrote:

Remove support for context switching between the guest's and host's
desired UMWAIT_CONTROL.  Propagating the guest's value to hardware isn't
required for correct functionality, e.g. KVM intercepts reads and writes
to the MSR, and the latency effects of the settings controlled by the
MSR are not architecturally visible.

As a general rule, KVM should not allow the guest to control power
management settings unless explicitly enabled by userspace, e.g. see
KVM_CAP_X86_DISABLE_EXITS.  E.g. Intel's SDM explicitly states that C0.2
can improve the performance of SMT siblings.  A devious guest could
disable C0.2 so as to improve the performance of their workloads at the
detriment to workloads running in the host or on other VMs.

Wholesale removal of UMWAIT_CONTROL context switching also fixes a race
condition where updates from the host may cause KVM to enter the guest
with the incorrect value.  Because updates are are propagated to all
CPUs via IPI (SMP function callback), the value in hardware may be
stale with respect to the cached value and KVM could enter the guest
with the wrong value in hardware.  As above, the guest can't observe the
bad value, but it's a weird and confusing wart in the implementation.

Removal also fixes the unnecessary usage of VMX's atomic load/store MSR
lists.  Using the lists is only necessary for MSRs that are required for
correct functionality immediately upon VM-Enter/VM-Exit, e.g. EFER on
old hardware, or for MSRs that need to-the-uop precision, e.g. perf
related MSRs.  For UMWAIT_CONTROL, the effects are only visible in the
kernel via TPAUSE/delay(), and KVM doesn't do any form of delay in
vcpu_vmx_run(). 



Using the atomic lists is undesirable as they are more
expensive than direct RDMSR/WRMSR.


Do you mean the extra handling of atomic list facility in kvm? Or just 
mean vm-exit/-entry MSR-load/save in VMX hardware is expensive than 
direct RDMSR/WRMSR instruction?

[PATCH] KVM: X86: Fix MSR range of APIC registers in X2APIC mode

2020-06-16 Thread Xiaoyao Li

Only MSR address range 0x800 through 0x8ff is architecturally reserved
and dedicated for accessing APIC registers in x2APIC mode.

Fixes: 0105d1a52640 ("KVM: x2apic interface to lapic")
Signed-off-by: Xiaoyao Li 
---
 arch/x86/kvm/x86.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 00c88c2f34e4..29d9b078ce69 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2856,7 +2856,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
return kvm_mtrr_set_msr(vcpu, msr, data);
case MSR_IA32_APICBASE:
return kvm_set_apic_base(vcpu, msr_info);
-   case APIC_BASE_MSR ... APIC_BASE_MSR + 0x3ff:
+   case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff:
return kvm_x2apic_msr_write(vcpu, msr, data);
case MSR_IA32_TSCDEADLINE:
kvm_set_lapic_tscdeadline_msr(vcpu, data);
@@ -3196,7 +3196,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_IA32_APICBASE:
msr_info->data = kvm_get_apic_base(vcpu);
break;
-   case APIC_BASE_MSR ... APIC_BASE_MSR + 0x3ff:
+   case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff:
return kvm_x2apic_msr_read(vcpu, msr_info->index, 
&msr_info->data);
case MSR_IA32_TSCDEADLINE:
msr_info->data = kvm_get_lapic_tscdeadline_msr(vcpu);
-- 
2.18.2

Re: [PATCH v12 07/11] KVM: vmx/pmu: Unmask LBR fields in the MSR_IA32_DEBUGCTLMSR emualtion

2020-06-13 Thread Xiaoyao Li


On 6/13/2020 4:09 PM, Like Xu wrote:

When the LBR feature is reported by the vmx_get_perf_capabilities(),
the LBR fields in the [vmx|vcpu]_supported debugctl should be unmasked.

The debugctl msr is handled separately in vmx/svm and they're not
completely identical, hence remove the common msr handling code.

Signed-off-by: Like Xu 
---
  arch/x86/kvm/vmx/capabilities.h | 12 
  arch/x86/kvm/vmx/pmu_intel.c| 19 +++
  arch/x86/kvm/x86.c  | 13 -
  3 files changed, 31 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index b633a90320ee..f6fcfabb1026 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -21,6 +21,8 @@ extern int __read_mostly pt_mode;
  #define PMU_CAP_FW_WRITES (1ULL << 13)
  #define PMU_CAP_LBR_FMT   0x3f
  
+#define DEBUGCTLMSR_LBR_MASK		(DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI)

+
  struct nested_vmx_msrs {
/*
 * We only store the "true" versions of the VMX capability MSRs. We
@@ -387,4 +389,14 @@ static inline u64 vmx_get_perf_capabilities(void)
return perf_cap;
  }
  
+static inline u64 vmx_get_supported_debugctl(void)

+{
+   u64 val = 0;
+
+   if (vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT)
+   val |= DEBUGCTLMSR_LBR_MASK;
+
+   return val;
+}
+
  #endif /* __KVM_X86_VMX_CAPS_H */
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index a953c7d633f6..d92e95b64c74 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -187,6 +187,7 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 
msr)
case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
ret = pmu->version > 1;
break;
+   case MSR_IA32_DEBUGCTLMSR:
case MSR_IA32_PERF_CAPABILITIES:
ret = 1;
break;
@@ -237,6 +238,9 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
return 1;
msr_info->data = vcpu->arch.perf_capabilities;
return 0;
+   case MSR_IA32_DEBUGCTLMSR:
+   msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);


Can we put the emulation of MSR_IA32_DEBUGCTLMSR in 
vmx_{get/set})_msr(). AFAIK, MSR_IA32_DEBUGCTLMSR is not a pure PMU 
related MSR that there is bit 2 to enable #DB for bus lock.



+   return 0;
default:
if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
(pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
@@ -282,6 +286,16 @@ static inline bool lbr_is_compatible(struct kvm_vcpu *vcpu)
return true;
  }
  
+static inline u64 vcpu_get_supported_debugctl(struct kvm_vcpu *vcpu)

+{
+   u64 debugctlmsr = vmx_get_supported_debugctl();
+
+   if (!lbr_is_enabled(vcpu))
+   debugctlmsr &= ~DEBUGCTLMSR_LBR_MASK;
+
+   return debugctlmsr;
+}
+
  static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
  {
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -336,6 +350,11 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
}
vcpu->arch.perf_capabilities = data;
return 0;
+   case MSR_IA32_DEBUGCTLMSR:
+   if (data & ~vcpu_get_supported_debugctl(vcpu))
+   return 1;
+   vmcs_write64(GUEST_IA32_DEBUGCTL, data);
+   return 0;
default:
if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
(pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 00c88c2f34e4..56f275eb4554 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2840,18 +2840,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
return 1;
}
break;
-   case MSR_IA32_DEBUGCTLMSR:
-   if (!data) {
-   /* We support the non-activated case already */
-   break;
-   } else if (data & ~(DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF)) {


So after this patch, guest trying to set bit DEBUGCTLMSR_BTF will get a 
#GP instead of being ignored and printing a log in kernel.


These codes were introduced ~12 years ago in commit b5e2fec0ebc3 ("KVM: 
Ignore DEBUGCTL MSRs with no effect"), just to make Netware happy. Maybe 
I'm overthinking for that too old thing.



-   /* Values other than LBR and BTF are vendor-specific,
-  thus reserved and should throw a #GP */
-   return 1;
-   }
-   vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n",
-   __func__, data);
-   break;
case 0x200 ... 0x2ff:
return kvm_mtrr_set_msr(

Re: [PATCH] x86/split_lock: Sanitize userspace and guest error output

2020-06-05 Thread Xiaoyao Li


On 6/6/2020 12:42 AM, Prarit Bhargava wrote:



On 6/5/20 11:29 AM, Xiaoyao Li wrote:

On 6/5/2020 7:44 PM, Prarit Bhargava wrote:

There are two problems with kernel messages in fatal mode that
were found during testing of guests and userspace programs.

The first is that no kernel message is output when the split lock detector
is triggered with a userspace program.  As a result the userspace process
dies from receiving SIGBUS with no indication to the user of what caused
the process to die.

The second problem is that only the first triggering guest causes a kernel
message to be output because the message is output with pr_warn_once().
This also results in a loss of information to the user.

While fixing these I noticed that the same message was being output
three times so I'm cleaning that up too.

Fix fatal mode output, and use consistent messages for fatal and
warn modes for both userspace and guests.

Signed-off-by: Prarit Bhargava 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: "Peter Zijlstra (Intel)" 
Cc: Sean Christopherson 
Cc: Rahul Tanwar 
Cc: Xiaoyao Li 
Cc: Ricardo Neri 
Cc: Dave Hansen 
---
   arch/x86/kernel/cpu/intel.c | 24 ++--
   1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 166d7c355896..463022aa9b7a 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1074,10 +1074,14 @@ static void split_lock_init(void)
   split_lock_verify_msr(sld_state != sld_off);
   }
   -static void split_lock_warn(unsigned long ip)
+static bool split_lock_warn(unsigned long ip, int fatal)
   {
-    pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 
0x%lx\n",
-    current->comm, current->pid, ip);
+    pr_warn_ratelimited("#AC: %s/%d %ssplit_lock trap at address: 0x%lx\n",
+    current->comm, current->pid,
+    sld_state == sld_fatal ? "fatal " : "", ip);
+
+    if (sld_state == sld_fatal || fatal)
+    return false;
     /*
    * Disable the split lock detection for this task so it can make
@@ -1086,18 +1090,13 @@ static void split_lock_warn(unsigned long ip)
    */
   sld_update_msr(false);
   set_tsk_thread_flag(current, TIF_SLD);
+    return true;
   }
     bool handle_guest_split_lock(unsigned long ip)
   {
-    if (sld_state == sld_warn) {
-    split_lock_warn(ip);
+    if (split_lock_warn(ip, 0))
   return true;
-    }
-
-    pr_warn_once("#AC: %s/%d %s split_lock trap at address: 0x%lx\n",
- current->comm, current->pid,
- sld_state == sld_fatal ? "fatal" : "bogus", ip);
     current->thread.error_code = 0;
   current->thread.trap_nr = X86_TRAP_AC;
@@ -1108,10 +1107,7 @@ EXPORT_SYMBOL_GPL(handle_guest_split_lock);
     bool handle_user_split_lock(struct pt_regs *regs, long error_code)
   {
-    if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
-    return false;
-    split_lock_warn(regs->ip);
-    return true;
+    return split_lock_warn(regs->ip, regs->flags & X86_EFLAGS_AC);


It's incorrect. You change the behavior that it will print the split lock
warning even when CPL 3 Alignment Check is turned on.


Do you want the message to be displayed in the fatal case of CPL 3 Alignment 
check?



No. It should never be displayed if #AC happens in CPL 3 and 
X86_EFLAGS_AC is set. In this case, an unaligned access triggers #AC 
regardless of #LOCK prefix. What's more, even there is a #LOCK prefix, 
we still cannot tell the cause because we don't know the priority of 
legacy alignment check #AC and split lock #AC.


If you do want a message, we can only say "unaligned access at address xxx".

Re: [PATCH] x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted

2020-06-05 Thread Xiaoyao Li


On 6/6/2020 3:26 AM, Sean Christopherson wrote:

Choo! Choo!  All aboard the Split Lock Express, with direct service to
Wreckage!

Skip split_lock_verify_msr() if the CPU isn't whitelisted as a possible
SLD-enabled CPU model to avoid writing MSR_TEST_CTRL.  MSR_TEST_CTRL
exists, and is writable, on many generations of CPUs.  Writing the MSR,
even with '0', can result in bizarre, undocumented behavior.

This fixes a crash on Haswell when resuming from suspend with a live KVM
guest.  Because APs use the standard SMP boot flow for resume, they will
go through split_lock_init() and the subsequent RDMSR/WRMSR sequence,
which runs even when sld_state==sld_off to ensure SLD is disabled.  On
Haswell (at least, my Haswell), writing MSR_TEST_CTRL with '0' will
succeed and _may_ take the SMT _sibling_ out of VMX root mode.

When KVM has an active guest, KVM performs VMXON as part of CPU onlining
(see kvm_starting_cpu()).  Because SMP boot is serialized, the resulting
flow is effectively:

   on_each_ap_cpu() {
  WRMSR(MSR_TEST_CTRL, 0)
  VMXON
   }

As a result, the WRMSR can disable VMX on a different CPU that has
already done VMXON.  This ultimately results in a #UD on VMPTRLD when
KVM regains control and attempt run its vCPUs.

The above voodoo was confirmed by reworking KVM's VMXON flow to write
MSR_TEST_CTRL prior to VMXON, and to serialize the sequence as above.
Further verification of the insanity was done by redoing VMXON on all
APs after the initial WRMSR->VMXON sequence.  The additional VMXON,
which should VM-Fail, occasionally succeeded, and also eliminated the
unexpected #UD on VMPTRLD.

The damage done by writing MSR_TEST_CTRL doesn't appear to be limited
to VMX, e.g. after suspend with an active KVM guest, subsequent reboots
almost always hang (even when fudging VMXON), a #UD on a random Jcc was
observed, suspend/resume stability is qualitatively poor, and so on and
so forth.



I'm wondering if all those side-effects of MSR_TEST_CTRL exist on CPUs 
have SLD feature, have you ever tested on a SLD capable CPU?

Re: [PATCH] x86/split_lock: Sanitize userspace and guest error output

2020-06-05 Thread Xiaoyao Li


On 6/5/2020 7:44 PM, Prarit Bhargava wrote:

There are two problems with kernel messages in fatal mode that
were found during testing of guests and userspace programs.

The first is that no kernel message is output when the split lock detector
is triggered with a userspace program.  As a result the userspace process
dies from receiving SIGBUS with no indication to the user of what caused
the process to die.

The second problem is that only the first triggering guest causes a kernel
message to be output because the message is output with pr_warn_once().
This also results in a loss of information to the user.

While fixing these I noticed that the same message was being output
three times so I'm cleaning that up too.

Fix fatal mode output, and use consistent messages for fatal and
warn modes for both userspace and guests.

Signed-off-by: Prarit Bhargava 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: "Peter Zijlstra (Intel)" 
Cc: Sean Christopherson 
Cc: Rahul Tanwar 
Cc: Xiaoyao Li 
Cc: Ricardo Neri 
Cc: Dave Hansen 
---
  arch/x86/kernel/cpu/intel.c | 24 ++--
  1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 166d7c355896..463022aa9b7a 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1074,10 +1074,14 @@ static void split_lock_init(void)
split_lock_verify_msr(sld_state != sld_off);
  }
  
-static void split_lock_warn(unsigned long ip)

+static bool split_lock_warn(unsigned long ip, int fatal)
  {
-   pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 
0x%lx\n",
-   current->comm, current->pid, ip);
+   pr_warn_ratelimited("#AC: %s/%d %ssplit_lock trap at address: 0x%lx\n",
+   current->comm, current->pid,
+   sld_state == sld_fatal ? "fatal " : "", ip);
+
+   if (sld_state == sld_fatal || fatal)
+   return false;
  
  	/*

 * Disable the split lock detection for this task so it can make
@@ -1086,18 +1090,13 @@ static void split_lock_warn(unsigned long ip)
 */
sld_update_msr(false);
set_tsk_thread_flag(current, TIF_SLD);
+   return true;
  }
  
  bool handle_guest_split_lock(unsigned long ip)

  {
-   if (sld_state == sld_warn) {
-   split_lock_warn(ip);
+   if (split_lock_warn(ip, 0))
return true;
-   }
-
-   pr_warn_once("#AC: %s/%d %s split_lock trap at address: 0x%lx\n",
-current->comm, current->pid,
-sld_state == sld_fatal ? "fatal" : "bogus", ip);
  
  	current->thread.error_code = 0;

current->thread.trap_nr = X86_TRAP_AC;
@@ -1108,10 +1107,7 @@ EXPORT_SYMBOL_GPL(handle_guest_split_lock);
  
  bool handle_user_split_lock(struct pt_regs *regs, long error_code)

  {
-   if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
-   return false;
-   split_lock_warn(regs->ip);
-   return true;
+   return split_lock_warn(regs->ip, regs->flags & X86_EFLAGS_AC);


It's incorrect. You change the behavior that it will print the split 
lock warning even when CPL 3 Alignment Check is turned on.



  }
  
  /*

Re: [PATCH][v6] KVM: X86: support APERF/MPERF registers

2020-06-04 Thread Xiaoyao Li


On 6/5/2020 9:44 AM, Li RongQing wrote:

Guest kernel reports a fixed cpu frequency in /proc/cpuinfo,
this is confused to user when turbo is enable, and aperf/mperf
can be used to show current cpu frequency after 7d5905dc14a
"(x86 / CPU: Always show current CPU frequency in /proc/cpuinfo)"
so guest should support aperf/mperf capability

This patch implements aperf/mperf by three mode: none, software
emulation, and pass-through

None: default mode, guest does not support aperf/mperf

Software emulation: the period of aperf/mperf in guest mode are
accumulated as emulated value

Pass-though: it is only suitable for KVM_HINTS_REALTIME, Because
that hint guarantees we have a 1:1 vCPU:CPU binding and guaranteed
no over-commit.

And a per-VM capability is added to configure aperfmperf mode

Signed-off-by: Li RongQing 
Signed-off-by: Chai Wen 
Signed-off-by: Jia Lina 
---
diff v5:
return error if guest is configured with mperf/aperf, but host cpu has not

diff v4:
fix maybe-uninitialized warning

diff v3:
fix interception of MSR_IA32_MPERF/APERF in svm

diff v2:
support aperfmperf pass though
move common codes to kvm_get_msr_common

diff v1:
1. support AMD, but not test
2. support per-vm capability to enable


  Documentation/virt/kvm/api.rst  | 10 ++
  arch/x86/include/asm/kvm_host.h | 11 +++
  arch/x86/kvm/cpuid.c| 15 ++-
  arch/x86/kvm/svm/svm.c  |  8 
  arch/x86/kvm/vmx/vmx.c  |  6 ++
  arch/x86/kvm/x86.c  | 42 +
  arch/x86/kvm/x86.h  | 15 +++
  include/uapi/linux/kvm.h|  1 +
  8 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index d871dacb984e..f854f4da6fd8 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6126,3 +6126,13 @@ KVM can therefore start protected VMs.
  This capability governs the KVM_S390_PV_COMMAND ioctl and the
  KVM_MP_STATE_LOAD MP_STATE. KVM_SET_MP_STATE can fail for protected
  guests when the state change is invalid.
+
+8.23 KVM_CAP_APERFMPERF
+
+
+:Architectures: x86
+:Parameters: args[0] is aperfmperf mode;
+ 0 for not support, 1 for software emulation, 2 for pass-through
+:Returns: 0 on success; -1 on error
+
+This capability indicates that KVM supports APERF and MPERF MSR registers
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fd78bd44b2d6..14643f8af9c4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -824,6 +824,9 @@ struct kvm_vcpu_arch {
  
  	/* AMD MSRC001_0015 Hardware Configuration */

u64 msr_hwcr;
+
+   u64 v_mperf;
+   u64 v_aperf;
  };
  
  struct kvm_lpage_info {

@@ -889,6 +892,12 @@ enum kvm_irqchip_mode {
KVM_IRQCHIP_SPLIT,/* created with KVM_CAP_SPLIT_IRQCHIP */
  };
  
+enum kvm_aperfmperf_mode {

+   KVM_APERFMPERF_NONE,
+   KVM_APERFMPERF_SOFT,  /* software emulate aperfmperf */
+   KVM_APERFMPERF_PT,/* pass-through aperfmperf to guest */
+};
+
  #define APICV_INHIBIT_REASON_DISABLE0
  #define APICV_INHIBIT_REASON_HYPERV 1
  #define APICV_INHIBIT_REASON_NESTED 2
@@ -986,6 +995,8 @@ struct kvm_arch {
  
  	struct kvm_pmu_event_filter *pmu_event_filter;

struct task_struct *nx_lpage_recovery_thread;
+
+   enum kvm_aperfmperf_mode aperfmperf_mode;
  };
  
  struct kvm_vm_stat {

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index cd708b0b460a..80f18b29a845 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -122,6 +122,16 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
   MSR_IA32_MISC_ENABLE_MWAIT);
}
  
+	best = kvm_find_cpuid_entry(vcpu, 6, 0);

+   if (best) {
+   if (guest_has_aperfmperf(vcpu->kvm)) {
+   if (!boot_cpu_has(X86_FEATURE_APERFMPERF))
+   return -EINVAL;


kvm_vm_ioctl_enable_cap() ensures that guest_has_aperfmperf() always 
aligns with boot_cpu_has(X86_FEATURE_APERFMPERF). So above is unnecessary.



+   best->ecx |= 1;
+   } else {
+   best->ecx &= ~1;
+   }
+   }


you could do

bool guest_cpuid_aperfmperf = false;
if (best)
guest_cpuid_aperfmperf = !!(best->ecx & BIT(0));

if (guest_cpuid_aperfmerf != guest_has_aperfmperf(vcpu->kvm))
return -EINVAL;


In fact, I think we can do nothing here. Leave it as what usersapce 
wants just like how KVM treats other CPUID bits.


Paolo,

What's your point?


/* Note, maxphyaddr must be updated before tdp_level. */
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
vcpu->arch.tdp_level = kvm_x86_ops.get_tdp_level(vcpu);


[...]


@@ -4930,6 +4939,11 @@ int kvm_vm_ioctl_enable_

[PATCH v2] KVM: x86: Assign correct value to array.maxnent

2020-06-03 Thread Xiaoyao Li

Delay the assignment of array.maxnent to use correct value for the case
cpuid->nent > KVM_MAX_CPUID_ENTRIES.

Fixes: e53c95e8d41e ("KVM: x86: Encapsulate CPUID entries and metadata in 
struct")
Signed-off-by: Xiaoyao Li 
---
v2:
   - remove "const" of maxnent to fix build error.
---
 arch/x86/kvm/cpuid.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 253b8e875ccd..3d88ddf781d0 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -426,7 +426,7 @@ EXPORT_SYMBOL_GPL(kvm_set_cpu_caps);
 
 struct kvm_cpuid_array {
struct kvm_cpuid_entry2 *entries;
-   const int maxnent;
+   int maxnent;
int nent;
 };
 
@@ -870,7 +870,6 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
 
struct kvm_cpuid_array array = {
.nent = 0,
-   .maxnent = cpuid->nent,
};
int r, i;
 
@@ -887,6 +886,8 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
if (!array.entries)
return -ENOMEM;
 
+   array.maxnent = cpuid->nent;
+
for (i = 0; i < ARRAY_SIZE(funcs); i++) {
r = get_cpuid_func(&array, funcs[i], type);
if (r)
-- 
2.18.2

Re: [PATCH] KVM: x86: Assign correct value to array.maxnent

2020-06-03 Thread Xiaoyao Li


On 6/4/2020 10:43 AM, Xiaoyao Li wrote:

Delay the assignment of array.maxnent to use correct value for the case
cpuid->nent > KVM_MAX_CPUID_ENTRIES.

Fixes: e53c95e8d41e ("KVM: x86: Encapsulate CPUID entries and metadata in 
struct")
Signed-off-by: Xiaoyao Li 
---
  arch/x86/kvm/cpuid.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 253b8e875ccd..befff01d100c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -870,7 +870,6 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
  
  	struct kvm_cpuid_array array = {

.nent = 0,
-   .maxnent = cpuid->nent,
};
int r, i;
  
@@ -887,6 +886,8 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,

if (!array.entries)
return -ENOMEM;
  
+	array.maxnent = cpuid->nent;


Miss the fact that maxnent is const, V2 is coming.


+
for (i = 0; i < ARRAY_SIZE(funcs); i++) {
r = get_cpuid_func(&array, funcs[i], type);
if (r)

[PATCH] KVM: x86: Assign correct value to array.maxnent

2020-06-03 Thread Xiaoyao Li

Delay the assignment of array.maxnent to use correct value for the case
cpuid->nent > KVM_MAX_CPUID_ENTRIES.

Fixes: e53c95e8d41e ("KVM: x86: Encapsulate CPUID entries and metadata in 
struct")
Signed-off-by: Xiaoyao Li 
---
 arch/x86/kvm/cpuid.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 253b8e875ccd..befff01d100c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -870,7 +870,6 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
 
struct kvm_cpuid_array array = {
.nent = 0,
-   .maxnent = cpuid->nent,
};
int r, i;
 
@@ -887,6 +886,8 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
if (!array.entries)
return -ENOMEM;
 
+   array.maxnent = cpuid->nent;
+
for (i = 0; i < ARRAY_SIZE(funcs); i++) {
r = get_cpuid_func(&array, funcs[i], type);
if (r)
-- 
2.18.2

Re: [PATCH 4/6] KVM: X86: Split kvm_update_cpuid()

2020-06-03 Thread Xiaoyao Li


On 6/3/2020 9:10 AM, Sean Christopherson wrote:

On Fri, May 29, 2020 at 04:55:43PM +0800, Xiaoyao Li wrote:

Split the part of updating KVM states from kvm_update_cpuid(), and put
it into a new kvm_update_state_based_on_cpuid(). So it's clear that
kvm_update_cpuid() is to update guest CPUID settings, while
kvm_update_state_based_on_cpuid() is to update KVM states based on the
updated CPUID settings.


What about kvm_update_vcpu_model()?  "state" isn't necessarily correct
either.



yeah, it's better.

Re: 答复: [PATCH][v5] KVM: X86: support APERF/MPERF registers

2020-05-30 Thread Xiaoyao Li


On 5/31/2020 10:08 AM, Li,Rongqing wrote:




-邮件原件-
发件人: Xiaoyao Li [mailto:xiaoyao...@intel.com]
发送时间: 2020年5月30日 18:40
收件人: Li,Rongqing ; linux-kernel@vger.kernel.org;
k...@vger.kernel.org; x...@kernel.org; h...@zytor.com; b...@alien8.de;
mi...@redhat.com; t...@linutronix.de; jmatt...@google.com;
wanpen...@tencent.com; vkuzn...@redhat.com;
sean.j.christopher...@intel.com; pbonz...@redhat.com;
wei.hua...@amd.com
主题: Re: [PATCH][v5] KVM: X86: support APERF/MPERF registers

On 5/30/2020 12:35 PM, Li RongQing wrote:

Guest kernel reports a fixed cpu frequency in /proc/cpuinfo, this is
confused to user when turbo is enable, and aperf/mperf can be used to
show current cpu frequency after 7d5905dc14a
"(x86 / CPU: Always show current CPU frequency in /proc/cpuinfo)"
so guest should support aperf/mperf capability

This patch implements aperf/mperf by three mode: none, software
emulation, and pass-through

None: default mode, guest does not support aperf/mperf

Software emulation: the period of aperf/mperf in guest mode are
accumulated as emulated value

Pass-though: it is only suitable for KVM_HINTS_REALTIME, Because that
hint guarantees we have a 1:1 vCPU:CPU binding and guaranteed no
over-commit.

And a per-VM capability is added to configure aperfmperf mode



[...]


diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index
cd708b0b460a..c960dda4251b 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -122,6 +122,14 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
   MSR_IA32_MISC_ENABLE_MWAIT);
}

+   best = kvm_find_cpuid_entry(vcpu, 6, 0);
+   if (best) {
+   if (guest_has_aperfmperf(vcpu->kvm) &&
+   boot_cpu_has(X86_FEATURE_APERFMPERF))
+   best->ecx |= 1;
+   else
+   best->ecx &= ~1;
+   }


In my understanding, KVM allows userspace to set a CPUID feature bit for
guest even if hardware doesn't support the feature.

So what makes X86_FEATURE_APERFMPERF different here? Is there any
concern I miss?

-Xiaoyao


Whether software emulation for aperf/mperf or pass-through depends on host cpu 
aperf/mperf feature.
  
Software emulation: the period of aperf/mperf in guest mode are accumulated as emulated value




I know it that you want to ensure the correctness of exposure of 
aperf/mperf.


But there are so many features other than aperf/mperf that KVM reports 
the supported settings of them through KVM_GET_SUPPORTED_CPUID, but 
doesn't check nor force the correctness of userspace input. i.e., KVM 
allows userspace to set bogus CPUID settings as long as it doesn't break 
KVM (host kernel).


Indeed, bogus CPUID settings more than likely breaks the guest. But it's 
not KVM's fault. KVM just do what userspace wants.


IMO, If we really want to ensure the correctness of userspace provided 
CPUID settings, we need to return ERROR to userspace instead of fixing 
it siliently.


- Xiaoyao

Re: [PATCH][v5] KVM: X86: support APERF/MPERF registers

2020-05-30 Thread Xiaoyao Li


On 5/30/2020 12:35 PM, Li RongQing wrote:

Guest kernel reports a fixed cpu frequency in /proc/cpuinfo,
this is confused to user when turbo is enable, and aperf/mperf
can be used to show current cpu frequency after 7d5905dc14a
"(x86 / CPU: Always show current CPU frequency in /proc/cpuinfo)"
so guest should support aperf/mperf capability

This patch implements aperf/mperf by three mode: none, software
emulation, and pass-through

None: default mode, guest does not support aperf/mperf

Software emulation: the period of aperf/mperf in guest mode are
accumulated as emulated value

Pass-though: it is only suitable for KVM_HINTS_REALTIME, Because
that hint guarantees we have a 1:1 vCPU:CPU binding and guaranteed
no over-commit.

And a per-VM capability is added to configure aperfmperf mode



[...]


diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index cd708b0b460a..c960dda4251b 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -122,6 +122,14 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
   MSR_IA32_MISC_ENABLE_MWAIT);
}
  
+	best = kvm_find_cpuid_entry(vcpu, 6, 0);

+   if (best) {
+   if (guest_has_aperfmperf(vcpu->kvm) &&
+   boot_cpu_has(X86_FEATURE_APERFMPERF))
+   best->ecx |= 1;
+   else
+   best->ecx &= ~1;
+   }


In my understanding, KVM allows userspace to set a CPUID feature bit for 
guest even if hardware doesn't support the feature.


So what makes X86_FEATURE_APERFMPERF different here? Is there any 
concern I miss?


-Xiaoyao


/* Note, maxphyaddr must be updated before tdp_level. */
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
vcpu->arch.tdp_level = kvm_x86_ops.get_tdp_level(vcpu);

[PATCH 5/6] KVM: X86: Move kvm_x86_ops.cpuid_update() into kvm_update_state_based_on_cpuid()

2020-05-29 Thread Xiaoyao Li

kvm_x86_ops.cpuid_update() is used to update vmx/svm settings based on
updated CPUID settings. So it's supposed to be called after CPUIDs are
fully updated, i.e., kvm_update_cpuid(), not in the middle stage.

Put it in kvm_update_state_based_on_cpuid() to make it clear that it's
to update vmx/svm specific states based on CPUID.

Signed-off-by: Xiaoyao Li 
---
Should we rename kvm_x86_ops.cpuid_update to something like
kvm_x86_ops.update_state_based_on_cpuid?

cpuid_update is really confusing especially when kvm_x86_ops.update_cpuid()
is needed someday.
---
 arch/x86/kvm/cpuid.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a4a2072f5253..5d4da8970940 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -136,6 +136,8 @@ void kvm_update_state_based_on_cpuid(struct kvm_vcpu *vcpu)
vcpu->arch.guest_supported_xcr0 =
(best->eax | ((u64)best->edx << 32)) & supported_xcr0;
 
+   kvm_x86_ops.cpuid_update(vcpu);
+
/* Note, maxphyaddr must be updated before tdp_level. */
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
vcpu->arch.tdp_level = kvm_x86_ops.get_tdp_level(vcpu);
@@ -227,7 +229,6 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
 
cpuid_fix_nx_cap(vcpu);
kvm_apic_set_version(vcpu);
-   kvm_x86_ops.cpuid_update(vcpu);
kvm_update_cpuid(vcpu);
kvm_update_state_based_on_cpuid(vcpu);
 
@@ -257,7 +258,6 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
}
 
kvm_apic_set_version(vcpu);
-   kvm_x86_ops.cpuid_update(vcpu);
kvm_update_cpuid(vcpu);
kvm_update_state_based_on_cpuid(vcpu);
 out:
-- 
2.18.2

[PATCH 2/6] KVM: X86: Go on updating other CPUID leaves when leaf 1 is absent

2020-05-29 Thread Xiaoyao Li

As handling of bits other leaf 1 added over time, kvm_update_cpuid()
should not return directly if leaf 1 is absent, but should go on
updateing other CPUID leaves.

Signed-off-by: Xiaoyao Li 
---
 arch/x86/kvm/cpuid.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 2f1a9650b7f2..795bbaf37110 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -60,22 +60,21 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
struct kvm_lapic *apic = vcpu->arch.apic;
 
best = kvm_find_cpuid_entry(vcpu, 1, 0);
-   if (!best)
-   return 0;
-
-   /* Update OSXSAVE bit */
-   if (boot_cpu_has(X86_FEATURE_XSAVE) && best->function == 0x1)
-   cpuid_entry_change(best, X86_FEATURE_OSXSAVE,
+   if (best) {
+   /* Update OSXSAVE bit */
+   if (boot_cpu_has(X86_FEATURE_XSAVE))
+   cpuid_entry_change(best, X86_FEATURE_OSXSAVE,
   kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE));
 
-   cpuid_entry_change(best, X86_FEATURE_APIC,
+   cpuid_entry_change(best, X86_FEATURE_APIC,
   vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
 
-   if (apic) {
-   if (cpuid_entry_has(best, X86_FEATURE_TSC_DEADLINE_TIMER))
-   apic->lapic_timer.timer_mode_mask = 3 << 17;
-   else
-   apic->lapic_timer.timer_mode_mask = 1 << 17;
+   if (apic) {
+   if (cpuid_entry_has(best, 
X86_FEATURE_TSC_DEADLINE_TIMER))
+   apic->lapic_timer.timer_mode_mask = 3 << 17;
+   else
+   apic->lapic_timer.timer_mode_mask = 1 << 17;
+   }
}
 
best = kvm_find_cpuid_entry(vcpu, 7, 0);
-- 
2.18.2

[PATCH 0/6] Refactor handling flow of SET_CPUID*

2020-05-29 Thread Xiaoyao Li

This serial is the extended version of
https://lkml.kernel.org/r/20200528151927.14346-1-xiaoyao...@intel.com

First two patches are bug fixing, and the other aim to refactor the flow
of SET_CPUID* as:
1. cpuid check: check if userspace provides legal CPUID settings;

2. cpuid update: Update some special CPUID bits based on current vcpu
 state, e.g., OSXSAVE, OSPKE, ...

3. update KVM state: Update KVM states based on the final CPUID
 settings. 

Xiaoyao Li (6):
  KVM: X86: Reset vcpu->arch.cpuid_nent to 0 if SET_CPUID fails
  KVM: X86: Go on updating other CPUID leaves when leaf 1 is absent
  KVM: X86: Introduce kvm_check_cpuid()
  KVM: X86: Split kvm_update_cpuid()
  KVM: X86: Move kvm_x86_ops.cpuid_update() into
kvm_update_state_based_on_cpuid()
  KVM: X86: Move kvm_apic_set_version() to
kvm_update_state_based_on_cpuid()

 arch/x86/kvm/cpuid.c | 107 +++
 arch/x86/kvm/cpuid.h |   3 +-
 arch/x86/kvm/x86.c   |   1 +
 3 files changed, 70 insertions(+), 41 deletions(-)

-- 
2.18.2

[PATCH 4/6] KVM: X86: Split kvm_update_cpuid()

2020-05-29 Thread Xiaoyao Li

Split the part of updating KVM states from kvm_update_cpuid(), and put
it into a new kvm_update_state_based_on_cpuid(). So it's clear that
kvm_update_cpuid() is to update guest CPUID settings, while
kvm_update_state_based_on_cpuid() is to update KVM states based on the
updated CPUID settings.

Signed-off-by: Xiaoyao Li 
---
 arch/x86/kvm/cpuid.c | 38 --
 arch/x86/kvm/cpuid.h |  1 +
 arch/x86/kvm/x86.c   |  1 +
 3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index c8cb373056f1..a4a2072f5253 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -76,7 +76,6 @@ static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
 void kvm_update_cpuid(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
-   struct kvm_lapic *apic = vcpu->arch.apic;
 
best = kvm_find_cpuid_entry(vcpu, 1, 0);
if (best) {
@@ -87,13 +86,6 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu)
 
cpuid_entry_change(best, X86_FEATURE_APIC,
   vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
-
-   if (apic) {
-   if (cpuid_entry_has(best, 
X86_FEATURE_TSC_DEADLINE_TIMER))
-   apic->lapic_timer.timer_mode_mask = 3 << 17;
-   else
-   apic->lapic_timer.timer_mode_mask = 1 << 17;
-   }
}
 
best = kvm_find_cpuid_entry(vcpu, 7, 0);
@@ -102,13 +94,8 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu)
   kvm_read_cr4_bits(vcpu, X86_CR4_PKE));
 
best = kvm_find_cpuid_entry(vcpu, 0xD, 0);
-   if (!best) {
-   vcpu->arch.guest_supported_xcr0 = 0;
-   } else {
-   vcpu->arch.guest_supported_xcr0 =
-   (best->eax | ((u64)best->edx << 32)) & supported_xcr0;
+   if (best)
best->ebx = xstate_required_size(vcpu->arch.xcr0, false);
-   }
 
best = kvm_find_cpuid_entry(vcpu, 0xD, 1);
if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
@@ -127,6 +114,27 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu)
   vcpu->arch.ia32_misc_enable_msr &
   MSR_IA32_MISC_ENABLE_MWAIT);
}
+}
+
+void kvm_update_state_based_on_cpuid(struct kvm_vcpu *vcpu)
+{
+   struct kvm_lapic *apic = vcpu->arch.apic;
+   struct kvm_cpuid_entry2 *best;
+
+   best = kvm_find_cpuid_entry(vcpu, 1, 0);
+   if (best && apic) {
+   if (cpuid_entry_has(best, X86_FEATURE_TSC_DEADLINE_TIMER))
+   apic->lapic_timer.timer_mode_mask = 3 << 17;
+   else
+   apic->lapic_timer.timer_mode_mask = 1 << 17;
+   }
+
+   best = kvm_find_cpuid_entry(vcpu, 0xD, 0);
+   if (!best)
+   vcpu->arch.guest_supported_xcr0 = 0;
+   else
+   vcpu->arch.guest_supported_xcr0 =
+   (best->eax | ((u64)best->edx << 32)) & supported_xcr0;
 
/* Note, maxphyaddr must be updated before tdp_level. */
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
@@ -221,6 +229,7 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
kvm_apic_set_version(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
kvm_update_cpuid(vcpu);
+   kvm_update_state_based_on_cpuid(vcpu);
 
 out:
vfree(cpuid_entries);
@@ -250,6 +259,7 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
kvm_apic_set_version(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
kvm_update_cpuid(vcpu);
+   kvm_update_state_based_on_cpuid(vcpu);
 out:
return r;
 }
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index f136de1debad..0ccf9ec4bf55 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -10,6 +10,7 @@ extern u32 kvm_cpu_caps[NCAPINTS] __read_mostly;
 void kvm_set_cpu_caps(void);
 
 void kvm_update_cpuid(struct kvm_vcpu *vcpu);
+void kvm_update_state_based_on_cpuid(struct kvm_vcpu *vcpu);
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
  u32 function, u32 index);
 int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dc993b79ae2f..fcb85432d30b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8100,6 +8100,7 @@ static void enter_smm(struct kvm_vcpu *vcpu)
 #endif
 
kvm_update_cpuid(vcpu);
+   kvm_update_state_based_on_cpuid(vcpu);
kvm_mmu_reset_context(vcpu);
 }
 
-- 
2.18.2

[PATCH 1/6] KVM: X86: Reset vcpu->arch.cpuid_nent to 0 if SET_CPUID fails

2020-05-29 Thread Xiaoyao Li

It needs to invalidate CPUID configruations if usersapce provides
illegal input.

Signed-off-by: Xiaoyao Li 
---
 arch/x86/kvm/cpuid.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index cd708b0b460a..2f1a9650b7f2 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -210,6 +210,8 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
kvm_apic_set_version(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
r = kvm_update_cpuid(vcpu);
+   if (r)
+   vcpu->arch.cpuid_nent = 0;
 
 out:
vfree(cpuid_entries);
@@ -233,6 +235,8 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
kvm_apic_set_version(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
r = kvm_update_cpuid(vcpu);
+   if (r)
+   vcpu->arch.cpuid_nent = 0;
 out:
return r;
 }
-- 
2.18.2

[PATCH 6/6] KVM: X86: Move kvm_apic_set_version() to kvm_update_state_based_on_cpuid()

2020-05-29 Thread Xiaoyao Li

Obviously, kvm_apic_set_version() fits well in
kvm_update_state_based_on_cpuid().

Signed-off-by: Xiaoyao Li 
---
 arch/x86/kvm/cpuid.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 5d4da8970940..eb60098aca29 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -127,6 +127,8 @@ void kvm_update_state_based_on_cpuid(struct kvm_vcpu *vcpu)
apic->lapic_timer.timer_mode_mask = 3 << 17;
else
apic->lapic_timer.timer_mode_mask = 1 << 17;
+
+   kvm_apic_set_version(vcpu);
}
 
best = kvm_find_cpuid_entry(vcpu, 0xD, 0);
@@ -228,7 +230,6 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
}
 
cpuid_fix_nx_cap(vcpu);
-   kvm_apic_set_version(vcpu);
kvm_update_cpuid(vcpu);
kvm_update_state_based_on_cpuid(vcpu);
 
@@ -257,7 +258,6 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
goto out;
}
 
-   kvm_apic_set_version(vcpu);
kvm_update_cpuid(vcpu);
kvm_update_state_based_on_cpuid(vcpu);
 out:
-- 
2.18.2

[PATCH 3/6] KVM: X86: Introduce kvm_check_cpuid()

2020-05-29 Thread Xiaoyao Li

Use kvm_check_cpuid() to validate if userspace provides legal cpuid
settings and call it before KVM updates CPUID.

Signed-off-by: Xiaoyao Li 
---
Is the check of virutal address width really necessary?
KVM doesn't check other bits at all. I guess the policy is that KVM allows
illegal CPUID settings as long as it doesn't break host kernel. Userspace
takes the consequences itself if it sets bogus CPUID settings that breaks
its guest.
But why vaddr_bits is special? It seems illegal vaddr_bits won't break host
kernel.
---
 arch/x86/kvm/cpuid.c | 54 
 arch/x86/kvm/cpuid.h |  2 +-
 2 files changed, 35 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 795bbaf37110..c8cb373056f1 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -54,7 +54,26 @@ static u32 xstate_required_size(u64 xstate_bv, bool 
compacted)
 
 #define F feature_bit
 
-int kvm_update_cpuid(struct kvm_vcpu *vcpu)
+static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
+{
+   struct kvm_cpuid_entry2 *best;
+
+   /*
+* The existing code assumes virtual address is 48-bit or 57-bit in the
+* canonical address checks; exit if it is ever changed.
+*/
+   best = kvm_find_cpuid_entry(vcpu, 0x8008, 0);
+   if (best) {
+   int vaddr_bits = (best->eax & 0xff00) >> 8;
+
+   if (vaddr_bits != 48 && vaddr_bits != 57 && vaddr_bits != 0)
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+void kvm_update_cpuid(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
struct kvm_lapic *apic = vcpu->arch.apic;
@@ -96,18 +115,6 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
 
-   /*
-* The existing code assumes virtual address is 48-bit or 57-bit in the
-* canonical address checks; exit if it is ever changed.
-*/
-   best = kvm_find_cpuid_entry(vcpu, 0x8008, 0);
-   if (best) {
-   int vaddr_bits = (best->eax & 0xff00) >> 8;
-
-   if (vaddr_bits != 48 && vaddr_bits != 57 && vaddr_bits != 0)
-   return -EINVAL;
-   }
-
best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
if (kvm_hlt_in_guest(vcpu->kvm) && best &&
(best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
@@ -127,7 +134,6 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
kvm_mmu_reset_context(vcpu);
 
kvm_pmu_refresh(vcpu);
-   return 0;
 }
 
 static int is_efer_nx(void)
@@ -205,12 +211,16 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
vcpu->arch.cpuid_entries[i].padding[2] = 0;
}
vcpu->arch.cpuid_nent = cpuid->nent;
+   r = kvm_check_cpuid(vcpu);
+   if (r) {
+   vcpu->arch.cpuid_nent = 0;
+   goto out;
+   }
+
cpuid_fix_nx_cap(vcpu);
kvm_apic_set_version(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
-   r = kvm_update_cpuid(vcpu);
-   if (r)
-   vcpu->arch.cpuid_nent = 0;
+   kvm_update_cpuid(vcpu);
 
 out:
vfree(cpuid_entries);
@@ -231,11 +241,15 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
   cpuid->nent * sizeof(struct kvm_cpuid_entry2)))
goto out;
vcpu->arch.cpuid_nent = cpuid->nent;
+   r = kvm_check_cpuid(vcpu);
+   if (r) {
+   vcpu->arch.cpuid_nent = 0;
+   goto out;
+   }
+
kvm_apic_set_version(vcpu);
kvm_x86_ops.cpuid_update(vcpu);
-   r = kvm_update_cpuid(vcpu);
-   if (r)
-   vcpu->arch.cpuid_nent = 0;
+   kvm_update_cpuid(vcpu);
 out:
return r;
 }
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 05434cd9342f..f136de1debad 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -9,7 +9,7 @@
 extern u32 kvm_cpu_caps[NCAPINTS] __read_mostly;
 void kvm_set_cpu_caps(void);
 
-int kvm_update_cpuid(struct kvm_vcpu *vcpu);
+void kvm_update_cpuid(struct kvm_vcpu *vcpu);
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
  u32 function, u32 index);
 int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
-- 
2.18.2

Re: [PATCH] KVM: X86: Call kvm_x86_ops.cpuid_update() after CPUIDs fully updated

2020-05-28 Thread Xiaoyao Li


On 5/29/2020 12:15 AM, Paolo Bonzini wrote:

On 28/05/20 17:40, Xiaoyao Li wrote:



kvm_x86_ops.cpuid_update() is used to update vmx/svm settings based on
updated CPUID settings. So it's supposed to be called after CPUIDs are
fully updated, not in the middle stage.

Signed-off-by: Xiaoyao Li 


Are you seeing anything bad happening from this?


Not yet.

IMO changing the order is more reasonable and less confusing.


Indeed, I just could not decide whether to include it in 5.7 or not.


Maybe for 5.8

I have a new idea to refactor a bit more that I find it does three 
things in kvm_update_cpuid():

- update cpuid;
- update vcpu states, e.g., apic->lapic_timer.timer_mode_mask, 
guest_supported_xcr0, maxphyaddr, ... etc,

- cpuid check, for vaddr_bits

I'm going to split it, and make the order as:
1. kvm_check_cpuid(), if invalid value return error;
2. kvm_update_cpuid();
3. kvm_update_state_based_on_cpuid();
   and kvm_x86_ops.kvm_x86_ops.cpuid_update() can be called inside it.

If you feel OK, I'll do it tomorrow.

-Xiaoyao

Re: [PATCH] KVM: X86: Call kvm_x86_ops.cpuid_update() after CPUIDs fully updated

2020-05-28 Thread Xiaoyao Li


On 5/28/2020 11:22 PM, Paolo Bonzini wrote:

On 28/05/20 17:19, Xiaoyao Li wrote:

kvm_x86_ops.cpuid_update() is used to update vmx/svm settings based on
updated CPUID settings. So it's supposed to be called after CPUIDs are
fully updated, not in the middle stage.

Signed-off-by: Xiaoyao Li 


Are you seeing anything bad happening from this?


Not yet.

IMO changing the order is more reasonable and less confusing.


Paolo


---
  arch/x86/kvm/cpuid.c | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index cd708b0b460a..753739bc1bf0 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -208,8 +208,11 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
vcpu->arch.cpuid_nent = cpuid->nent;
cpuid_fix_nx_cap(vcpu);
kvm_apic_set_version(vcpu);
-   kvm_x86_ops.cpuid_update(vcpu);
r = kvm_update_cpuid(vcpu);
+   if (r)
+   goto out;
+
+   kvm_x86_ops.cpuid_update(vcpu);
  
  out:

vfree(cpuid_entries);
@@ -231,8 +234,11 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
goto out;
vcpu->arch.cpuid_nent = cpuid->nent;
kvm_apic_set_version(vcpu);
-   kvm_x86_ops.cpuid_update(vcpu);
r = kvm_update_cpuid(vcpu);
+   if (r)
+   goto out;
+
+   kvm_x86_ops.cpuid_update(vcpu);
  out:
return r;
  }

[PATCH] KVM: X86: Call kvm_x86_ops.cpuid_update() after CPUIDs fully updated

2020-05-28 Thread Xiaoyao Li

kvm_x86_ops.cpuid_update() is used to update vmx/svm settings based on
updated CPUID settings. So it's supposed to be called after CPUIDs are
fully updated, not in the middle stage.

Signed-off-by: Xiaoyao Li 
---
 arch/x86/kvm/cpuid.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index cd708b0b460a..753739bc1bf0 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -208,8 +208,11 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
vcpu->arch.cpuid_nent = cpuid->nent;
cpuid_fix_nx_cap(vcpu);
kvm_apic_set_version(vcpu);
-   kvm_x86_ops.cpuid_update(vcpu);
r = kvm_update_cpuid(vcpu);
+   if (r)
+   goto out;
+
+   kvm_x86_ops.cpuid_update(vcpu);
 
 out:
vfree(cpuid_entries);
@@ -231,8 +234,11 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
goto out;
vcpu->arch.cpuid_nent = cpuid->nent;
kvm_apic_set_version(vcpu);
-   kvm_x86_ops.cpuid_update(vcpu);
r = kvm_update_cpuid(vcpu);
+   if (r)
+   goto out;
+
+   kvm_x86_ops.cpuid_update(vcpu);
 out:
return r;
 }
-- 
2.18.2

Re: [PATCH v9 0/8] KVM: Add virtualization support of split lock detection

2020-05-25 Thread Xiaoyao Li


Hi Thomas,

On 5/18/2020 9:27 AM, Xiaoyao Li wrote:

On 5/9/2020 7:05 PM, Xiaoyao Li wrote:

This series aims to add the virtualization of split lock detection in
KVM.

Due to the fact that split lock detection is tightly coupled with CPU
model and CPU model is configurable by host VMM, we elect to use
paravirt method to expose and enumerate it for guest.


Thomas and Paolo,

Do you have time to have a look at this version?


Does this series have any chance to meet 5.8?

If not, do you plan to take a look at it after merge window?

Thanks,
-Xiaoyao

Re: [PATCH 2/2] kvm/x86: don't expose MSR_IA32_UMWAIT_CONTROL unconditionally

2020-05-21 Thread Xiaoyao Li


On 5/21/2020 12:56 AM, Maxim Levitsky wrote:

On Wed, 2020-05-20 at 18:33 +0200, Vitaly Kuznetsov wrote:

Maxim Levitsky  writes:


This msr is only available when the host supports WAITPKG feature.

This breaks a nested guest, if the L1 hypervisor is set to ignore
unknown msrs, because the only other safety check that the
kernel does is that it attempts to read the msr and
rejects it if it gets an exception.

Fixes: 6e3ba4abce KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL

Signed-off-by: Maxim Levitsky 
---
  arch/x86/kvm/x86.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fe3a24fd6b263..9c507b32b1b77 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5314,6 +5314,10 @@ static void kvm_init_msr_list(void)
if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_EVENTSEL0 >=
min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp))
continue;
+   break;
+   case MSR_IA32_UMWAIT_CONTROL:
+   if (!kvm_cpu_cap_has(X86_FEATURE_WAITPKG))
+   continue;


I'm probably missing something but (if I understand correctly) the only
effect of dropping MSR_IA32_UMWAIT_CONTROL from msrs_to_save would be
that KVM userspace won't see it in e.g. KVM_GET_MSR_INDEX_LIST. But why
is this causing an issue? I see both vmx_get_msr()/vmx_set_msr() have
'host_initiated' check:

case MSR_IA32_UMWAIT_CONTROL:
 if (!msr_info->host_initiated && !vmx_has_waitpkg(vmx))
 return 1;


Here it fails like that:

1. KVM_GET_MSR_INDEX_LIST returns this msrs, and qemu notes that
it is supported in 'has_msr_umwait' global var


In general, KVM_GET_MSR_INDEX_LIST won't return MSR_IA32_UMWAIT_CONTROL 
if KVM cannot read this MSR, see kvm_init_msr_list().


You hit issue because you used "ignore_msrs".

Re: [PATCH 2/2] kvm/x86: don't expose MSR_IA32_UMWAIT_CONTROL unconditionally

2020-05-20 Thread Xiaoyao Li


On 5/21/2020 1:28 PM, Tao Xu wrote:



On 5/21/2020 12:33 PM, Xiaoyao Li wrote:

On 5/21/2020 5:05 AM, Paolo Bonzini wrote:

On 20/05/20 18:07, Maxim Levitsky wrote:

This msr is only available when the host supports WAITPKG feature.

This breaks a nested guest, if the L1 hypervisor is set to ignore
unknown msrs, because the only other safety check that the
kernel does is that it attempts to read the msr and
rejects it if it gets an exception.

Fixes: 6e3ba4abce KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL

Signed-off-by: Maxim Levitsky 
---
  arch/x86/kvm/x86.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fe3a24fd6b263..9c507b32b1b77 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5314,6 +5314,10 @@ static void kvm_init_msr_list(void)
  if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_EVENTSEL0 >=
  min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp))
  continue;
+    break;
+    case MSR_IA32_UMWAIT_CONTROL:
+    if (!kvm_cpu_cap_has(X86_FEATURE_WAITPKG))
+    continue;
  default:
  break;
  }


The patch is correct, and matches what is done for the other entries of
msrs_to_save_all.  However, while looking at it I noticed that
X86_FEATURE_WAITPKG is actually never added, and that is because it was
also not added to the supported CPUID in commit e69e72faa3a0 ("KVM: x86:
Add support for user wait instructions", 2019-09-24), which was before
the kvm_cpu_cap mechanism was added.

So while at it you should also fix that.  The right way to do that is to
add a

 if (vmx_waitpkg_supported())
 kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG);


+ Tao

I remember there is certainly some reason why we don't expose WAITPKG 
to guest by default.


Tao, please help clarify it.

Thanks,
-Xiaoyao



Because in VM, umwait and tpause can put a (psysical) CPU into a power 
saving state. So from host view, this cpu will be 100% usage by VM. 
Although umwait and tpause just cause short wait(maybe 100 
microseconds), we still want to unconditionally expose WAITPKG in VM.


I guess you typed "unconditionally" by mistake that you meant to say 
"conditionally" in fact?

Re: [PATCH 2/2] kvm/x86: don't expose MSR_IA32_UMWAIT_CONTROL unconditionally

2020-05-20 Thread Xiaoyao Li


On 5/21/2020 5:05 AM, Paolo Bonzini wrote:

On 20/05/20 18:07, Maxim Levitsky wrote:

This msr is only available when the host supports WAITPKG feature.

This breaks a nested guest, if the L1 hypervisor is set to ignore
unknown msrs, because the only other safety check that the
kernel does is that it attempts to read the msr and
rejects it if it gets an exception.

Fixes: 6e3ba4abce KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL

Signed-off-by: Maxim Levitsky 
---
  arch/x86/kvm/x86.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fe3a24fd6b263..9c507b32b1b77 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5314,6 +5314,10 @@ static void kvm_init_msr_list(void)
if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_EVENTSEL0 >=
min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp))
continue;
+   break;
+   case MSR_IA32_UMWAIT_CONTROL:
+   if (!kvm_cpu_cap_has(X86_FEATURE_WAITPKG))
+   continue;
default:
break;
}


The patch is correct, and matches what is done for the other entries of
msrs_to_save_all.  However, while looking at it I noticed that
X86_FEATURE_WAITPKG is actually never added, and that is because it was
also not added to the supported CPUID in commit e69e72faa3a0 ("KVM: x86:
Add support for user wait instructions", 2019-09-24), which was before
the kvm_cpu_cap mechanism was added.

So while at it you should also fix that.  The right way to do that is to
add a

 if (vmx_waitpkg_supported())
 kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG);


+ Tao

I remember there is certainly some reason why we don't expose WAITPKG to 
guest by default.


Tao, please help clarify it.

Thanks,
-Xiaoyao



in vmx_set_cpu_caps.

Thanks,

Paolo

Re: [PATCH] kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bits

2020-05-18 Thread Xiaoyao Li


On 5/18/2020 8:31 PM, Paolo Bonzini wrote:

On 18/05/20 06:52, Xiaoyao Li wrote:

On 5/6/2020 5:44 PM, Paolo Bonzini wrote:

Using CPUID data can be useful for the processor compatibility
check, but that's it.  Using it to compute guest-reserved bits
can have both false positives (such as LA57 and UMIP which we
are already handling) and false negatives:



in particular, with
this patch we don't allow anymore a KVM guest to set CR4.PKE
when CR4.PKE is clear on the host.


A common question about whether a feature can be exposed to guest:

Given a feature, there is a CPUID bit to enumerate it, and a CR4 bit to
turn it on/off. Whether the feature can be exposed to guest only depends
on host CR4 setting? I.e., if CPUID bit is not cleared in cpu_data in
host but host kernel doesn't set the corresponding CR4 bit to turn it
on, we cannot expose the feature to guest. right?


It depends.  The most obvious case is that the host kernel doesn't use
CR4.PSE but we even use 4MB pages to emulate paging disabled mode when
the processor doesn't support unrestricted guests.

Basically, the question is whether we are able to save/restore any
processor state attached to the CR4 bit on vmexit/vmentry.  In this case
there is no PKRU field in the VMCS and the RDPKRU/WRPKRU instructions
require CR4.PKE=1; therefore, we cannot let the guest enable CR4.PKE
unless it's also enabled on the host.



aha! That's reason!
Thanks for the clarification.

Re: [PATCH] kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bits

2020-05-17 Thread Xiaoyao Li


On 5/6/2020 5:44 PM, Paolo Bonzini wrote:

Using CPUID data can be useful for the processor compatibility
check, but that's it.  Using it to compute guest-reserved bits
can have both false positives (such as LA57 and UMIP which we
are already handling) and false negatives: 



in particular, with
this patch we don't allow anymore a KVM guest to set CR4.PKE
when CR4.PKE is clear on the host.


A common question about whether a feature can be exposed to guest:

Given a feature, there is a CPUID bit to enumerate it, and a CR4 bit to 
turn it on/off. Whether the feature can be exposed to guest only depends 
on host CR4 setting? I.e., if CPUID bit is not cleared in cpu_data in 
host but host kernel doesn't set the corresponding CR4 bit to turn it 
on, we cannot expose the feature to guest. right?

Re: [PATCH v9 0/8] KVM: Add virtualization support of split lock detection

2020-05-17 Thread Xiaoyao Li


On 5/9/2020 7:05 PM, Xiaoyao Li wrote:

This series aims to add the virtualization of split lock detection in
KVM.

Due to the fact that split lock detection is tightly coupled with CPU
model and CPU model is configurable by host VMM, we elect to use
paravirt method to expose and enumerate it for guest.


Thomas and Paolo,

Do you have time to have a look at this version?

Re: [PATCH v9 8/8] x86/split_lock: Enable split lock detection initialization when running as an guest on KVM

2020-05-10 Thread Xiaoyao Li


On 5/10/2020 1:15 PM, Andy Lutomirski wrote:

On Fri, May 8, 2020 at 8:04 PM Xiaoyao Li  wrote:


When running as guest, enumerating feature split lock detection through
CPU model is not easy since CPU model is configurable by host VMM.

If running upon KVM, it can be enumerated through
KVM_FEATURE_SPLIT_LOCK_DETECT,


This needs crystal clear documentation.  What, exactly, is the host
telling the guest if it sets this flag?


and if KVM_HINTS_SLD_FATAL is set, it
needs to be set to sld_fatal mode.



This needs much better docs.  Do you mean:

"If KVM_HINTS_SLD_FATAL is set, then the guest will get #AC if it does
a split-lock regardless of what is written to MSR_TEST_CTRL?"



Hi Andy,

KVM_FEATURE_SPLIT_LOCK_DETECT, KVM_HINTS_SLD_FATAL and their docs are 
introduced in Patch 5. Do I still need to explain them in detail in this 
patch?

[PATCH v9 3/8] x86/split_lock: Introduce flag X86_FEATURE_SLD_FATAL and drop sld_state

2020-05-08 Thread Xiaoyao Li

Introduce a synthetic feature flag X86_FEATURE_SLD_FATAL, which means
kernel is in sld_fatal mode if set.

Now sld_state is not needed any more that the state of SLD can be
inferred from X86_FEATURE_SPLIT_LOCK_DETECT and X86_FEATURE_SLD_FATAL.

Suggested-by: Sean Christopherson 
Signed-off-by: Xiaoyao Li 
---
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/kernel/cpu/intel.c| 16 ++--
 2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index db189945e9b0..260adfc6c61a 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -286,6 +286,7 @@
 #define X86_FEATURE_FENCE_SWAPGS_USER  (11*32+ 4) /* "" LFENCE in user entry 
SWAPGS path */
 #define X86_FEATURE_FENCE_SWAPGS_KERNEL(11*32+ 5) /* "" LFENCE in 
kernel entry SWAPGS path */
 #define X86_FEATURE_SPLIT_LOCK_DETECT  (11*32+ 6) /* #AC for split lock */
+#define X86_FEATURE_SLD_FATAL  (11*32+ 7) /* split lock detection in 
fatal mode */
 
 /* Intel-defined CPU features, CPUID level 0x0007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX512_BF16(12*32+ 5) /* AVX512 BFLOAT16 
instructions */
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 4602dac14dcb..93b8ccf2fa11 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -40,12 +40,6 @@ enum split_lock_detect_state {
sld_fatal,
 };
 
-/*
- * Default to sld_off because most systems do not support split lock detection
- * split_lock_setup() will switch this to sld_warn on systems that support
- * split lock detect, unless there is a command line override.
- */
-static enum split_lock_detect_state sld_state __ro_after_init = sld_off;
 static u64 msr_test_ctrl_cache __ro_after_init;
 
 /*
@@ -1043,8 +1037,9 @@ static void __init split_lock_setup(void)
return;
}
 
-   sld_state = state;
setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+   if (state == sld_fatal)
+   setup_force_cpu_cap(X86_FEATURE_SLD_FATAL);
 }
 
 /*
@@ -1064,7 +1059,7 @@ static void sld_update_msr(bool on)
 
 static void split_lock_init(void)
 {
-   split_lock_verify_msr(sld_state != sld_off);
+   split_lock_verify_msr(boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT));
 }
 
 static void split_lock_warn(unsigned long ip)
@@ -1083,7 +1078,7 @@ static void split_lock_warn(unsigned long ip)
 
 bool handle_guest_split_lock(unsigned long ip)
 {
-   if (sld_state == sld_warn) {
+   if (!boot_cpu_has(X86_FEATURE_SLD_FATAL)) {
split_lock_warn(ip);
return true;
}
@@ -1100,7 +1095,8 @@ EXPORT_SYMBOL_GPL(handle_guest_split_lock);
 
 bool handle_user_split_lock(struct pt_regs *regs, long error_code)
 {
-   if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+   if ((regs->flags & X86_EFLAGS_AC) ||
+   boot_cpu_has(X86_FEATURE_SLD_FATAL))
return false;
split_lock_warn(regs->ip);
return true;
-- 
2.18.2

[PATCH v9 7/8] KVM: VMX: virtualize split lock detection

2020-05-08 Thread Xiaoyao Li

TEST_CTRL MSR is per-core scope, i.e., the sibling threads in the same
physical core share the same MSR. This requires additional constraint
when exposing it to guest.

1) When host SLD state is sld_off (no X86_FEATURE_SPLIT_LOCK_DETECT),
   feature split lock detection is unsupported/disabled.
   Cannot expose it to guest.

2) When host SLD state is sld_warn (has X86_FEATURE_SPLIT_LOCK_DETECT
   but no X86_FEATURE_SLD_FATAL), feature split lock detection can be
   exposed to guest only when nosmt due to the per-core scope.

   In this case, guest's setting can be propagated into real hardware
   MSR. Further, to avoid the potiential MSR_TEST_CTRL.SLD toggling
   overhead during every vm-enter and vm-exit, it loads and keeps
   guest's SLD setting when in vcpu run loop and guest_state_loaded,
   i.e., betweer vmx_prepare_switch_to_guest() and
   vmx_prepare_switch_to_host().

3) when host SLD state is sld_fatal (has X86_FEATURE_SLD_FATAL),
   feature split lock detection can be exposed to guest regardless of
   SMT but KVM_HINTS_SLD_FATAL needs to be set.

   In this case, guest can still set and clear MSR_TEST_CTRL.SLD bit,
   but the bit value never be propagated to real MSR. KVM always keeps
   SLD bit turned on for guest vcpu. The reason why not force guest
   MSR_CTRL.SLD bit to 1 is that guest needs to set this bit to 1 itself
   to tell KVM it's SLD-aware.

Signed-off-by: Xiaoyao Li 
---
 arch/x86/kvm/cpuid.c   |  6 
 arch/x86/kvm/vmx/vmx.c | 68 --
 arch/x86/kvm/vmx/vmx.h |  3 ++
 arch/x86/kvm/x86.c |  6 +++-
 arch/x86/kvm/x86.h |  7 +
 5 files changed, 80 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 901cd1fdecd9..7d9f2daddaf3 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -717,9 +717,15 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array 
*array, u32 function)
if (sched_info_on())
entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
 
+   if (kvm_split_lock_detect_supported())
+   entry->eax |= (1 << KVM_FEATURE_SPLIT_LOCK_DETECT);
+
entry->ebx = 0;
entry->ecx = 0;
entry->edx = 0;
+
+   if (boot_cpu_has(X86_FEATURE_SLD_FATAL))
+   entry->edx |= (1 << KVM_HINTS_SLD_FATAL);
break;
case 0x8000:
entry->eax = min(entry->eax, 0x801f);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index dbec38ad5035..1cc386c5801d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1120,6 +1120,29 @@ void vmx_set_host_fs_gs(struct vmcs_host_state *host, 
u16 fs_sel, u16 gs_sel,
}
 }
 
+static inline u64 vmx_msr_test_ctrl_valid_bits(struct vcpu_vmx *vmx)
+{
+   u64 valid_bits = 0;
+
+   if (vmx->guest_has_sld)
+   valid_bits |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+   return valid_bits;
+}
+
+static inline bool guest_sld_on(struct vcpu_vmx *vmx)
+{
+   return vmx->msr_test_ctrl & MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+}
+
+static inline void vmx_update_guest_sld(struct vcpu_vmx *vmx)
+{
+   preempt_disable();
+   if (vmx->guest_state_loaded)
+   split_lock_set_guest(guest_sld_on(vmx));
+   preempt_enable();
+}
+
 void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -1188,6 +1211,10 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 #endif
 
vmx_set_host_fs_gs(host_state, fs_sel, gs_sel, fs_base, gs_base);
+
+   if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) && vmx->guest_has_sld)
+   vmx->host_sld_on = split_lock_set_guest(guest_sld_on(vmx));
+
vmx->guest_state_loaded = true;
 }
 
@@ -1226,6 +1253,10 @@ static void vmx_prepare_switch_to_host(struct vcpu_vmx 
*vmx)
wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base);
 #endif
load_fixmap_gdt(raw_smp_processor_id());
+
+   if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) && vmx->guest_has_sld)
+   split_lock_restore_host(vmx->host_sld_on);
+
vmx->guest_state_loaded = false;
vmx->guest_msrs_ready = false;
 }
@@ -1790,7 +1821,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
 
switch (msr_info->index) {
case MSR_TEST_CTRL:
-   msr_info->data = 0;
+   msr_info->data = vmx->msr_test_ctrl;
break;
 #ifdef CONFIG_X86_64
case MSR_FS_BASE:
@@ -1946,9 +1977,12 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
 
switch (msr_index) {
case MSR_TEST_CTRL:
-   if (data)
+   if (data & ~vmx_msr_test_ctrl_valid_bits(vmx))
return 1;

[PATCH v9 8/8] x86/split_lock: Enable split lock detection initialization when running as an guest on KVM

2020-05-08 Thread Xiaoyao Li

When running as guest, enumerating feature split lock detection through
CPU model is not easy since CPU model is configurable by host VMM.

If running upon KVM, it can be enumerated through
KVM_FEATURE_SPLIT_LOCK_DETECT, and if KVM_HINTS_SLD_FATAL is set, it
needs to be set to sld_fatal mode.

Signed-off-by: Xiaoyao Li 
---
 arch/x86/include/asm/cpu.h  |  2 ++
 arch/x86/kernel/cpu/intel.c | 12 ++--
 arch/x86/kernel/kvm.c   |  3 +++
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index a57f00f1d5b5..5d5b488b4b45 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -42,12 +42,14 @@ unsigned int x86_model(unsigned int sig);
 unsigned int x86_stepping(unsigned int sig);
 #ifdef CONFIG_CPU_SUP_INTEL
 extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern void __init split_lock_setup(bool fatal);
 extern void switch_to_sld(unsigned long tifn);
 extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
 extern bool handle_guest_split_lock(unsigned long ip);
 extern bool split_lock_virt_switch(bool on);
 #else
 static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline void __init split_lock_setup(bool fatal) {}
 static inline void switch_to_sld(unsigned long tifn) {}
 static inline bool handle_user_split_lock(struct pt_regs *regs, long 
error_code)
 {
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 1e2a74e8c592..02e24134b9b5 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -996,12 +996,18 @@ static bool split_lock_verify_msr(bool on)
return ctrl == tmp;
 }
 
-static void __init split_lock_setup(void)
+void __init split_lock_setup(bool fatal)
 {
enum split_lock_detect_state state = sld_warn;
char arg[20];
int i, ret;
 
+   if (fatal) {
+   state = sld_fatal;
+   pr_info("forced on, sending SIGBUS on user-space 
split_locks\n");
+   goto set_cap;
+   }
+
if (!split_lock_verify_msr(false)) {
pr_info("MSR access failed: Disabled\n");
return;
@@ -1037,6 +1043,7 @@ static void __init split_lock_setup(void)
return;
}
 
+set_cap:
setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
if (state == sld_fatal)
setup_force_cpu_cap(X86_FEATURE_SLD_FATAL);
@@ -1161,6 +1168,7 @@ void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
const struct x86_cpu_id *m;
u64 ia32_core_caps;
 
+   /* Note, paravirt support can enable SLD, e.g., see kvm_guest_init(). */
if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
return;
 
@@ -1182,5 +1190,5 @@ void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
return;
}
 
-   split_lock_setup();
+   split_lock_setup(false);
 }
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 6efe0410fb72..489ea89e2e8e 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -670,6 +670,9 @@ static void __init kvm_guest_init(void)
 * overcommitted.
 */
hardlockup_detector_disable();
+
+   if (kvm_para_has_feature(KVM_FEATURE_SPLIT_LOCK_DETECT))
+   split_lock_setup(kvm_para_has_hint(KVM_HINTS_SLD_FATAL));
 }
 
 static noinline uint32_t __kvm_cpuid_base(void)
-- 
2.18.2

1 2 3 >

1 - 100 of 201 matches

Mail list logo