RE: KVM: VMX: disable SMEP feature when guest is in non-paging mode
-Original Message- From: Gleb Natapov [mailto:g...@redhat.com] Sent: Sunday, February 03, 2013 9:57 PM To: Xu, Dongxiao Cc: kvm@vger.kernel.org Subject: Re: KVM: VMX: disable SMEP feature when guest is in non-paging mode On Fri, Feb 01, 2013 at 08:30:07AM -, Xu wrote: SMEP is disabled if CPU is in non-paging mode in hardware. However KVM always uses paging mode to emulate guest non-paging mode with HAP. Not always, only if unrestricted mode is disabled, since vm86 mode, that is used otherwise, requires paging. To emulate this behavior, SMEP needs to be manually disabled when guest switches to non-paging mode. We met an issue that, SMP Linux guest with recent kernel (enable SMEP support, for example, 3.5.3) would crash with triple fault if setting unrestricted_guest=0. This is because KVM uses an identity mapping page table to emulate the non-paging mode, where the page table is set with USER flag. If SMEP is still enabled in this case, guest will meet unhandlable page fault and then crash. Signed-off-by: Dongxiao Xu dongxiao...@intel.com Signed-off-by: Xiantao Zhang xiantao.zh...@intel.com Reviewed-by: Gleb Natapov g...@redhat.com But HAP is XEN terminology AFAIK. KVM speak is tdp (two dimensional paging). If would be nice to change it in the commit message and the comment before committing. Thanks for reminding this. I've sent out the v2 patch to address this issue. Thanks, Dongxiao --- arch/x86/kvm/vmx.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 9120ae1..e82f20d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3155,6 +3155,14 @@ static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) if (!is_paging(vcpu)) { hw_cr4 = ~X86_CR4_PAE; hw_cr4 |= X86_CR4_PSE; + /* +* SMEP is disabled if CPU is in non-paging mode in +* hardware. However KVM always uses paging mode to +* emulate guest non-paging mode with HAP. +* To emulate this behavior, SMEP needs to be manually +* disabled when guest switches to non-paging mode. +*/ + hw_cr4 = ~X86_CR4_SMEP; } else if (!(cr4 X86_CR4_PAE)) { hw_cr4 = ~X86_CR4_PAE; } -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2 4/4] nested vmx: use a list to store the launched vmcs12 for L1 VMM
-Original Message- From: Gleb Natapov [mailto:g...@redhat.com] Sent: Wednesday, November 28, 2012 7:28 PM To: Marcelo Tosatti Cc: Xu, Dongxiao; kvm@vger.kernel.org Subject: Re: [PATCH v2 4/4] nested vmx: use a list to store the launched vmcs12 for L1 VMM On Tue, Nov 27, 2012 at 10:29:08PM -0200, Marcelo Tosatti wrote: On Thu, Nov 22, 2012 at 12:51:59PM +0800, Dongxiao Xu wrote: The launch state is not a member in the VMCS area, use a separate variable (list) to store it instead. Signed-off-by: Dongxiao Xu dongxiao...@intel.com 1. What is the problem with keeping launched state in the VMCS? Assuming there is a positive answer to the above: 2. Don't you have to change VMCS ID? 3. Can't it be kept somewhere else other than a list? Current scheme allows guest to allocate unlimited amounts of host memory. 4. What is the state of migration / nested vmx again? If vmcs12 is migrated, this means launched state is not migrated anymore. Patches 1-3 seem fine. According to Dongxiao they are slowing down nested guest by 4%. For this version, it will introduce certain performance downgrade. Actually in my new patch, I simplified the vmcs12_read() and vmcs12_write() functions and there is no obvious performance downgrade. Thanks, Dongxiao -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2 4/4] nested vmx: use a list to store the launched vmcs12 for L1 VMM
-Original Message- From: Orit Wasserman [mailto:owass...@redhat.com] Sent: Wednesday, November 28, 2012 8:30 PM To: Marcelo Tosatti Cc: Xu, Dongxiao; kvm@vger.kernel.org; g...@redhat.com Subject: Re: [PATCH v2 4/4] nested vmx: use a list to store the launched vmcs12 for L1 VMM On 11/28/2012 02:29 AM, Marcelo Tosatti wrote: On Thu, Nov 22, 2012 at 12:51:59PM +0800, Dongxiao Xu wrote: The launch state is not a member in the VMCS area, use a separate variable (list) to store it instead. Signed-off-by: Dongxiao Xu dongxiao...@intel.com 1. What is the problem with keeping launched state in the VMCS? Assuming there is a positive answer to the above: 2. Don't you have to change VMCS ID? 3. Can't it be kept somewhere else other than a list? Current scheme allows guest to allocate unlimited amounts of host memory. I agree with Marcelo you have to limit the number of VMCS in the list otherwise it will be easy to attack a host with nested :) Yes it is a point. I will add a limitation of the VMCS number for the guest VMM. Thanks, Dongxiao 4. What is the state of migration / nested vmx again? If vmcs12 is migrated, this means launched state is not migrated anymore. Patches 1-3 seem fine. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 4/4] nested vmx: use a list to store the launched vmcs12 for L1 VMM
-Original Message- From: Gleb Natapov [mailto:g...@redhat.com] Sent: Wednesday, November 21, 2012 10:15 PM To: Xu, Dongxiao Cc: kvm@vger.kernel.org; mtosa...@redhat.com Subject: Re: [PATCH 4/4] nested vmx: use a list to store the launched vmcs12 for L1 VMM On Wed, Nov 21, 2012 at 05:04:37PM +0800, Dongxiao Xu wrote: The launch state is not a member in the VMCS area, use a separate variable (list) to store it instead. Why? Guest shouldn't be aware of the format of VMCS area. Yes, I agree. Guest VMM/L1 VMM shouldn't be aware of the VMCS format. For Root VMM/L0 VMM, it need to track the launch state of the vmcs12, in order to correctly emulate the VMLAUNCH and VMRESUME instructions. Originally we store the launch state in the VMCS area, however in fact, there is no launch state field in VMCS. This patch is to move it out and use a separate list to store it. Thanks, Dongxiao Signed-off-by: Dongxiao Xu dongxiao...@intel.com --- arch/x86/kvm/vmx.c | 86 +--- 1 files changed, 81 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6687fb6..d03ab4e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -177,8 +177,7 @@ struct __packed vmcs12 { u32 revision_id; u32 abort; - u32 launch_state; /* set to 0 by VMCLEAR, to 1 by VMLAUNCH */ - u32 padding[7]; /* room for future expansion */ + u32 padding[8]; /* room for future expansion */ u64 io_bitmap_a; u64 io_bitmap_b; @@ -339,6 +338,11 @@ struct vmcs02_list { struct loaded_vmcs vmcs02; }; +struct vmcs12_list { + unsigned long vmcs12_pa; + struct list_head node; +}; + /* * The nested_vmx structure is part of vcpu_vmx, and holds information we need * for correct emulation of VMX (i.e., nested VMX) on this vcpu. @@ -364,6 +368,8 @@ struct nested_vmx { * we must keep them pinned while L2 runs. */ struct page *apic_access_page; + /* vmcs12_pool contains the launched vmcs12. */ + struct list_head vmcs12_pool; }; struct vcpu_vmx { @@ -614,6 +620,58 @@ static void nested_release_page_clean(struct page *page) kvm_release_page_clean(page); } +static int vmcs12_launched(struct list_head *vmcs12_pool, + unsigned long vmcs12_pa) +{ + struct vmcs12_list *iter; + struct list_head *pos; + int launched = 0; + + list_for_each(pos, vmcs12_pool) { + iter = list_entry(pos, struct vmcs12_list, node); + if (vmcs12_pa == iter-vmcs12_pa) { + launched = 1; + break; + } + } + + return launched; +} + +static int set_vmcs12_launched(struct list_head *vmcs12_pool, + unsigned long vmcs12_pa) +{ + struct vmcs12_list *vmcs12; + + if (vmcs12_launched(vmcs12_pool, vmcs12_pa)) + return 0; + + vmcs12 = kzalloc(sizeof(struct vmcs12_list), GFP_KERNEL); + if (!vmcs12) + return -ENOMEM; + + vmcs12-vmcs12_pa = vmcs12_pa; + list_add(vmcs12-node, vmcs12_pool); + + return 0; +} + +static void clear_vmcs12_launched(struct list_head *vmcs12_pool, + unsigned long vmcs12_pa) +{ + struct vmcs12_list *iter; + struct list_head *pos; + + list_for_each(pos, vmcs12_pool) { + iter = list_entry(pos, struct vmcs12_list, node); + if (vmcs12_pa == iter-vmcs12_pa) { + list_del(iter-node); + kfree(iter); + break; + } + } +} + static u64 construct_eptp(unsigned long root_hpa); static void kvm_cpu_vmxon(u64 addr); static void kvm_cpu_vmxoff(void); @@ -5111,6 +5169,18 @@ static void nested_free_all_saved_vmcss(struct vcpu_vmx *vmx) } /* + * Free the vmcs12 list. + */ +static void nested_free_vmcs12_list(struct vcpu_vmx *vmx) { + struct vmcs12_list *item, *n; + list_for_each_entry_safe(item, n, vmx-nested.vmcs12_pool, node) { + list_del(item-node); + kfree(item); + } +} + +/* * Emulate the VMXON instruction. * Currently, we just remember that VMX is active, and do not save or even * inspect the argument to VMXON (the so-called VMXON pointer) because we @@ -5207,6 +5277,7 @@ static void free_nested(struct vcpu_vmx *vmx) } nested_free_all_saved_vmcss(vmx); + nested_free_vmcs12_list(vmx); } /* Emulate the VMXOFF instruction */ @@ -5359,7 +5430,7 @@ static int handle_vmclear(struct kvm_vcpu *vcpu) return 1; } vmcs12 = kmap(page); - vmcs12-launch_state = 0; + clear_vmcs12_launched(vmx-nested.vmcs12_pool, __pa(vmcs12)); kunmap(page); nested_release_page(page); @@ -6467,6 +6538,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm
RE: [PATCH 1/4] nested vmx: clean up for vmcs12 read and write
-Original Message- From: Gleb Natapov [mailto:g...@redhat.com] Sent: Wednesday, November 21, 2012 9:27 PM To: Xu, Dongxiao Cc: kvm@vger.kernel.org; mtosa...@redhat.com Subject: Re: [PATCH 1/4] nested vmx: clean up for vmcs12 read and write On Wed, Nov 21, 2012 at 05:04:34PM +0800, Dongxiao Xu wrote: abstract vmcs12_read and vmcs12_write functions to do the vmcs12 read/write operations. Signed-off-by: Dongxiao Xu dongxiao...@intel.com --- arch/x86/kvm/vmx.c | 86 +++- 1 files changed, 45 insertions(+), 41 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index f858159..d8670e4 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5407,32 +5407,67 @@ static inline int vmcs_field_readonly(unsigned long field) * some of the bits we return here (e.g., on 32-bit guests, only 32 bits of * 64-bit fields are to be returned). */ -static inline bool vmcs12_read_any(struct kvm_vcpu *vcpu, - unsigned long field, u64 *ret) +static inline u64 vmcs12_read(struct kvm_vcpu *vcpu, unsigned long +field) { short offset = vmcs_field_to_offset(field); char *p; - if (offset 0) + if (offset 0) { + nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT); + skip_emulated_instruction(vcpu); return 0; + } p = ((char *)(get_vmcs12(vcpu))) + offset; switch (vmcs_field_type(field)) { case VMCS_FIELD_TYPE_NATURAL_WIDTH: - *ret = *((natural_width *)p); + return *((natural_width *)p); + case VMCS_FIELD_TYPE_U16: + return *((u16 *)p); + case VMCS_FIELD_TYPE_U32: + return *((u32 *)p); + case VMCS_FIELD_TYPE_U64: + return *((u64 *)p); + default: + nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT); + skip_emulated_instruction(vcpu); + return 0; /* can never happen. */ + } +} + +static inline int vmcs12_write(struct kvm_vcpu *vcpu, + unsigned long field, + u64 value) +{ + short offset = vmcs_field_to_offset(field); + char *p; + + if (offset 0) { + nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT); + skip_emulated_instruction(vcpu); + return 0; + } + + p = ((char *)(get_vmcs12(vcpu))) + offset; + + switch (vmcs_field_type(field)) { + case VMCS_FIELD_TYPE_NATURAL_WIDTH: + *(natural_width *)p = value; return 1; case VMCS_FIELD_TYPE_U16: - *ret = *((u16 *)p); + *(u16 *)p = value; return 1; case VMCS_FIELD_TYPE_U32: - *ret = *((u32 *)p); + *(u32 *)p = value; return 1; case VMCS_FIELD_TYPE_U64: - *ret = *((u64 *)p); + *(u64 *)p = value; return 1; default: - return 0; /* can never happen. */ + nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT); + skip_emulated_instruction(vcpu); + return 0; } } @@ -5466,11 +5501,7 @@ static int handle_vmread(struct kvm_vcpu *vcpu) /* Decode instruction info and find the field to read */ field = kvm_register_read(vcpu, (((vmx_instruction_info) 28) 0xf)); /* Read the field, zero-extended to a u64 field_value */ - if (!vmcs12_read_any(vcpu, field, field_value)) { - nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT); - skip_emulated_instruction(vcpu); - return 1; - } + field_value = vmcs12_read(vcpu, field); You do not handle failure here and always write back field_value even if vmcs12_read() failed. Actually now it is impossible to detect a failure. Call to nested_vmx_failValid() in vmcs12_read() will be overwritten by call to nested_vmx_succeed() at the end of handle_vmread() and skip_emulated_instruction() will be called twice. Thanks Gleb and Orit to raise this issue. What about moving the offset check outside the vmcs12_read() and vmcs12_write() function, and put it directly in handle_vmread() and handle_vmwrite()? I think we only need to do offset error check in handle_vmread() and handle_vmwrite() since they are to emulate correct behavior for guest VMM. For example, if guest VMM reads a field that is not valid or writes a field that is read only, then in emulation code handle_vmread() and handle_vmwrite, we need to raise error to guest VMM. For other calling of vmcs12_read() and vmcs12_write() functions in KVM hypervisor (see PATCH 3/4), actually the caller needs to ensure the field should be valid. Otherwise it is a bug in KVM hypervisor itself. Does it make sense? If so, I will revise this patch according to above. Thanks, Dongxiao
RE: [PATCH 1/4] nested vmx: clean up for vmcs12 read and write
-Original Message- From: Gleb Natapov [mailto:g...@redhat.com] Sent: Wednesday, November 21, 2012 9:04 PM To: Xu, Dongxiao Cc: kvm@vger.kernel.org; mtosa...@redhat.com Subject: Re: [PATCH 1/4] nested vmx: clean up for vmcs12 read and write On Wed, Nov 21, 2012 at 05:04:34PM +0800, Dongxiao Xu wrote: abstract vmcs12_read and vmcs12_write functions to do the vmcs12 read/write operations. Signed-off-by: Dongxiao Xu dongxiao...@intel.com --- arch/x86/kvm/vmx.c | 86 +++- 1 files changed, 45 insertions(+), 41 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index f858159..d8670e4 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5407,32 +5407,67 @@ static inline int vmcs_field_readonly(unsigned long field) * some of the bits we return here (e.g., on 32-bit guests, only 32 bits of * 64-bit fields are to be returned). */ -static inline bool vmcs12_read_any(struct kvm_vcpu *vcpu, - unsigned long field, u64 *ret) +static inline u64 vmcs12_read(struct kvm_vcpu *vcpu, unsigned long +field) { short offset = vmcs_field_to_offset(field); char *p; - if (offset 0) + if (offset 0) { + nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT); + skip_emulated_instruction(vcpu); return 0; + } p = ((char *)(get_vmcs12(vcpu))) + offset; switch (vmcs_field_type(field)) { case VMCS_FIELD_TYPE_NATURAL_WIDTH: - *ret = *((natural_width *)p); + return *((natural_width *)p); + case VMCS_FIELD_TYPE_U16: + return *((u16 *)p); + case VMCS_FIELD_TYPE_U32: + return *((u32 *)p); + case VMCS_FIELD_TYPE_U64: + return *((u64 *)p); + default: + nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT); + skip_emulated_instruction(vcpu); + return 0; /* can never happen. */ + } +} + +static inline int vmcs12_write(struct kvm_vcpu *vcpu, + unsigned long field, + u64 value) +{ + short offset = vmcs_field_to_offset(field); + char *p; + + if (offset 0) { + nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT); + skip_emulated_instruction(vcpu); + return 0; + } + Shouldn't vmcs_field_readonly() check be in vmcs12_write() instead of a caller? Well, you can see from the later patch that, we construct vmcs12 through the vmcs12_write() function. Some fields like the exit reason, exit qualification are needed to be written into the vmcs12 area. Therefore we could not put the read only check into the vmcs12_write() function. Thanks, Dongxiao + p = ((char *)(get_vmcs12(vcpu))) + offset; + + switch (vmcs_field_type(field)) { + case VMCS_FIELD_TYPE_NATURAL_WIDTH: + *(natural_width *)p = value; return 1; case VMCS_FIELD_TYPE_U16: - *ret = *((u16 *)p); + *(u16 *)p = value; return 1; case VMCS_FIELD_TYPE_U32: - *ret = *((u32 *)p); + *(u32 *)p = value; return 1; case VMCS_FIELD_TYPE_U64: - *ret = *((u64 *)p); + *(u64 *)p = value; return 1; default: - return 0; /* can never happen. */ + nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT); + skip_emulated_instruction(vcpu); + return 0; } } @@ -5466,11 +5501,7 @@ static int handle_vmread(struct kvm_vcpu *vcpu) /* Decode instruction info and find the field to read */ field = kvm_register_read(vcpu, (((vmx_instruction_info) 28) 0xf)); /* Read the field, zero-extended to a u64 field_value */ - if (!vmcs12_read_any(vcpu, field, field_value)) { - nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT); - skip_emulated_instruction(vcpu); - return 1; - } + field_value = vmcs12_read(vcpu, field); /* * Now copy part of this value to register or memory, as requested. * Note that the number of bits actually copied is 32 or 64 depending @@ -5500,8 +5531,6 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu) gva_t gva; unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION); u32 vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO); - char *p; - short offset; /* The value to write might be 32 or 64 bits, depending on L1's long * mode, and eventually we need to write that into a field of several * possible lengths. The code below first zero-extends the value to 64 @@ -5537,33 +5566,8 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu) skip_emulated_instruction(vcpu); return 1
RE: [PATCH 4/4 v4] KVM: VMX: VMXON/VMXOFF usage changes.
Marcelo Tosatti wrote: On Tue, May 11, 2010 at 06:29:48PM +0800, Xu, Dongxiao wrote: From: Dongxiao Xu dongxiao...@intel.com SDM suggests VMXON should be called before VMPTRLD, and VMXOFF should be called after doing VMCLEAR. Therefore in vmm coexistence case, we should firstly call VMXON before any VMCS operation, and then call VMXOFF after the operation is done. Signed-off-by: Dongxiao Xu dongxiao...@intel.com --- arch/x86/kvm/vmx.c | 38 +++--- 1 files changed, 31 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index c536b9d..dbd47a7 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -168,6 +168,8 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu) static int init_rmode(struct kvm *kvm); static u64 construct_eptp(unsigned long root_hpa); +static void kvm_cpu_vmxon(u64 addr); +static void kvm_cpu_vmxoff(void); static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -786,8 +788,11 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); u64 tsc_this, delta, new_offset; +u64 phys_addr = __pa(per_cpu(vmxarea, cpu)); -if (vmm_exclusive vcpu-cpu != cpu) +if (!vmm_exclusive) +kvm_cpu_vmxon(phys_addr); +else if (vcpu-cpu != cpu) vcpu_clear(vmx); if (per_cpu(current_vmcs, cpu) != vmx-vmcs) { @@ -833,8 +838,10 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) static void vmx_vcpu_put(struct kvm_vcpu *vcpu) { __vmx_load_host_state(to_vmx(vcpu)); -if (!vmm_exclusive) +if (!vmm_exclusive) { __vcpu_clear(to_vmx(vcpu)); +kvm_cpu_vmxoff(); +} } static void vmx_fpu_activate(struct kvm_vcpu *vcpu) @@ -1257,9 +1264,11 @@ static int hardware_enable(void *garbage) FEATURE_CONTROL_LOCKED | FEATURE_CONTROL_VMXON_ENABLED); write_cr4(read_cr4() | X86_CR4_VMXE); /* FIXME: not cpu hotplug safe */ -kvm_cpu_vmxon(phys_addr); -ept_sync_global(); +if (vmm_exclusive) { +kvm_cpu_vmxon(phys_addr); +ept_sync_global(); +} return 0; The documentation recommends usage of INVEPT all-context after execution of VMXON and prior to execution of VMXOFF. Is it not necessary? After adding the patch, when vCPU is scheduled in a CPU, it will call tlb_flush() to invalidate the EPT and VPID cache/tlb for the vCPU. Therefore the correctness for KVM is guaranteed. Thanks, Dongxiao-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 4/4 v3] KVM: VMX: VMXON/VMXOFF usage changes.
Avi Kivity wrote: On 05/07/2010 05:43 AM, Xu, Dongxiao wrote: From: Dongxiao Xudongxiao...@intel.com SDM suggests VMXON should be called before VMPTRLD, and VMXOFF should be called after doing VMCLEAR. Therefore in vmm coexistence case, we should firstly call VMXON before any VMCS operation, and then call VMXOFF after the operation is done. @@ -4118,7 +4143,10 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu) struct kvm_cpuid_entry2 *best; struct vcpu_vmx *vmx = to_vmx(vcpu); u32 exec_control; +int cpu; +cpu = get_cpu(); +vmx_vcpu_load(vmx-vcpu, cpu); vmx-rdtscp_enabled = false; if (vmx_rdtscp_supported()) { exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL); @@ -4133,6 +4161,9 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu) } } } +vmx_vcpu_put(vmx-vcpu); +put_cpu(); + } static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry) I'm not sure why this is needed. vmx_cpuid_update() is called from a vcpu ioctl which should have called vcpu_load() before. Apart from that, everything looks good for merging. Vcpu_load() and vcpu_put() is not called in that ioctl. I will add that and send out another version of patchset. Thanks, Dongxiao -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 4/4 v3] KVM: VMX: VMXON/VMXOFF usage changes.
Avi Kivity wrote: On 05/11/2010 12:38 PM, Xu, Dongxiao wrote: I'm not sure why this is needed. vmx_cpuid_update() is called from a vcpu ioctl which should have called vcpu_load() before. Apart from that, everything looks good for merging. Vcpu_load() and vcpu_put() is not called in that ioctl. I will add that and send out another version of patchset. That's a serious bug even before this patchset, since vmx_cpuid_update accesses the vmcs. Please send the fix as a separate patch (which will be backported to -stable). That's OK. Thanks Dongxiao-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86: Call vcpu_load and vcpu_put in cpuid_update.
From: Dongxiao Xu dongxiao...@intel.com cpuid_update may operate VMCS, so vcpu_load() and vcpu_put() should be called to ensure correctness. Signed-off-by: Dongxiao Xu dongxiao...@intel.com --- arch/x86/kvm/x86.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6b2ce1d..08edfc8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1773,6 +1773,7 @@ static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu, if (copy_from_user(cpuid_entries, entries, cpuid-nent * sizeof(struct kvm_cpuid_entry))) goto out_free; + vcpu_load(vcpu); for (i = 0; i cpuid-nent; i++) { vcpu-arch.cpuid_entries[i].function = cpuid_entries[i].function; vcpu-arch.cpuid_entries[i].eax = cpuid_entries[i].eax; @@ -1790,6 +1791,7 @@ static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu, r = 0; kvm_apic_set_version(vcpu); kvm_x86_ops-cpuid_update(vcpu); + vcpu_put(vcpu); out_free: vfree(cpuid_entries); @@ -1810,9 +1812,11 @@ static int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu, if (copy_from_user(vcpu-arch.cpuid_entries, entries, cpuid-nent * sizeof(struct kvm_cpuid_entry2))) goto out; + vcpu_load(vcpu); vcpu-arch.cpuid_nent = cpuid-nent; kvm_apic_set_version(vcpu); kvm_x86_ops-cpuid_update(vcpu); + vcpu_put(vcpu); return 0; out: -- 1.6.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4 v4] KVM: VMX: Support hosted VMM coexsitence.
Hi all, This is hosted VMM coexistence support v4. Main changes from v3: Remove the change to vmx_cpuid_update() since vcpu_load() and vcpu_put() will be added around cpuid_update by another patch. Main changes from v2: 1) Change vmm_coexistence to vmm_exclusive. 2) Some code structure changes. Split the original 3 patches to 4. 3) Address some comments from Avi. Main changes from v1: 1) Add an module option vmm_coexistence to decide whether to enable this feature. Currently it is off defaultly. 2) Each time when a KVM vcpu is scheduled in, it will invalidate EPT and VPID TLBs to avoid conflict between different VMMs. VMX: Support for coexistence of KVM and other hosted VMMs. The following NOTE is picked up from Intel SDM 3B 27.3 chapter, MANAGING VMCS REGIONS AND POINTERS. -- NOTE As noted in Section 21.1, the processor may optimize VMX operation by maintaining the state of an active VMCS (one for which VMPTRLD has been executed) on the processor. Before relinquishing control to other system software that may, without informing the VMM, remove power from the processor (e.g., for transitions to S3 or S4) or leave VMX operation, a VMM must VMCLEAR all active VMCSs. This ensures that all VMCS data cached by the processor are flushed to memory and that no other software can corrupt the current VMM's VMCS data. It is also recommended that the VMM execute VMXOFF after such executions of VMCLEAR. -- Currently, VMCLEAR is called at VCPU migration. To support hosted VMM coexistence, this patch modifies the VMCLEAR/VMPTRLD and VMXON/VMXOFF usages. VMCLEAR will be called when VCPU is scheduled out of a physical CPU, while VMPTRLD is called when VCPU is scheduled in a physical CPU. Also this approach could eliminates the IPI mechanism for original VMCLEAR. As suggested by SDM, VMXOFF will be called after VMCLEAR, and VMXON will be called before VMPTRLD. With this patchset, KVM and VMware Workstation 7 could launch serapate guests and they can work well with each other. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4 v4] KVM: VMX: Define new functions to wrapper direct call of asm code.
From: Dongxiao Xu dongxiao...@intel.com Define vmcs_load() and kvm_cpu_vmxon() to avoid direct call of asm code. Also move VMXE bit operation out of kvm_cpu_vmxoff(). Signed-off-by: Dongxiao Xu dongxiao...@intel.com --- arch/x86/kvm/vmx.c | 36 +++- 1 files changed, 23 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 875b785..e77da89 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -445,6 +445,19 @@ static void vmcs_clear(struct vmcs *vmcs) vmcs, phys_addr); } +static void vmcs_load(struct vmcs *vmcs) +{ + u64 phys_addr = __pa(vmcs); + u8 error; + + asm volatile (__ex(ASM_VMX_VMPTRLD_RAX) ; setna %0 + : =g(error) : a(phys_addr), m(phys_addr) + : cc, memory); + if (error) + printk(KERN_ERR kvm: vmptrld %p/%llx fail\n, + vmcs, phys_addr); +} + static void __vcpu_clear(void *arg) { struct vcpu_vmx *vmx = arg; @@ -769,7 +782,6 @@ static void vmx_load_host_state(struct vcpu_vmx *vmx) static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); - u64 phys_addr = __pa(vmx-vmcs); u64 tsc_this, delta, new_offset; if (vcpu-cpu != cpu) { @@ -783,15 +795,8 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) } if (per_cpu(current_vmcs, cpu) != vmx-vmcs) { - u8 error; - per_cpu(current_vmcs, cpu) = vmx-vmcs; - asm volatile (__ex(ASM_VMX_VMPTRLD_RAX) ; setna %0 - : =g(error) : a(phys_addr), m(phys_addr) - : cc); - if (error) - printk(KERN_ERR kvm: vmptrld %p/%llx fail\n, - vmx-vmcs, phys_addr); + vmcs_load(vmx-vmcs); } if (vcpu-cpu != cpu) { @@ -1220,6 +1225,13 @@ static __init int vmx_disabled_by_bios(void) /* locked but not enabled */ } +static void kvm_cpu_vmxon(u64 addr) +{ + asm volatile (ASM_VMX_VMXON_RAX + : : a(addr), m(addr) + : memory, cc); +} + static int hardware_enable(void *garbage) { int cpu = raw_smp_processor_id(); @@ -1240,9 +1252,7 @@ static int hardware_enable(void *garbage) FEATURE_CONTROL_LOCKED | FEATURE_CONTROL_VMXON_ENABLED); write_cr4(read_cr4() | X86_CR4_VMXE); /* FIXME: not cpu hotplug safe */ - asm volatile (ASM_VMX_VMXON_RAX - : : a(phys_addr), m(phys_addr) - : memory, cc); + kvm_cpu_vmxon(phys_addr); ept_sync_global(); @@ -1266,13 +1276,13 @@ static void vmclear_local_vcpus(void) static void kvm_cpu_vmxoff(void) { asm volatile (__ex(ASM_VMX_VMXOFF) : : : cc); - write_cr4(read_cr4() ~X86_CR4_VMXE); } static void hardware_disable(void *garbage) { vmclear_local_vcpus(); kvm_cpu_vmxoff(); + write_cr4(read_cr4() ~X86_CR4_VMXE); } static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt, -- 1.6.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4 v4] KVM: VMX: VMCLEAR/VMPTRLD usage changes.
From: Dongxiao Xu dongxiao...@intel.com Originally VMCLEAR/VMPTRLD is called on vcpu migration. To support hosted VMM coexistance, VMCLEAR is executed on vcpu schedule out, and VMPTRLD is executed on vcpu schedule in. This could also eliminate the IPI when doing VMCLEAR. Signed-off-by: Dongxiao Xu dongxiao...@intel.com --- arch/x86/kvm/vmx.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 49b0850..c536b9d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -62,6 +62,9 @@ module_param_named(unrestricted_guest, static int __read_mostly emulate_invalid_guest_state = 0; module_param(emulate_invalid_guest_state, bool, S_IRUGO); +static int __read_mostly vmm_exclusive = 1; +module_param(vmm_exclusive, bool, S_IRUGO); + #define KVM_GUEST_CR0_MASK_UNRESTRICTED_GUEST \ (X86_CR0_WP | X86_CR0_NE | X86_CR0_NW | X86_CR0_CD) #define KVM_GUEST_CR0_MASK \ @@ -784,7 +787,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) struct vcpu_vmx *vmx = to_vmx(vcpu); u64 tsc_this, delta, new_offset; - if (vcpu-cpu != cpu) + if (vmm_exclusive vcpu-cpu != cpu) vcpu_clear(vmx); if (per_cpu(current_vmcs, cpu) != vmx-vmcs) { @@ -830,6 +833,8 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) static void vmx_vcpu_put(struct kvm_vcpu *vcpu) { __vmx_load_host_state(to_vmx(vcpu)); + if (!vmm_exclusive) + __vcpu_clear(to_vmx(vcpu)); } static void vmx_fpu_activate(struct kvm_vcpu *vcpu) -- 1.6.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4 v4] KVM: VMX: VMXON/VMXOFF usage changes.
From: Dongxiao Xu dongxiao...@intel.com SDM suggests VMXON should be called before VMPTRLD, and VMXOFF should be called after doing VMCLEAR. Therefore in vmm coexistence case, we should firstly call VMXON before any VMCS operation, and then call VMXOFF after the operation is done. Signed-off-by: Dongxiao Xu dongxiao...@intel.com --- arch/x86/kvm/vmx.c | 38 +++--- 1 files changed, 31 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index c536b9d..dbd47a7 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -168,6 +168,8 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu) static int init_rmode(struct kvm *kvm); static u64 construct_eptp(unsigned long root_hpa); +static void kvm_cpu_vmxon(u64 addr); +static void kvm_cpu_vmxoff(void); static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -786,8 +788,11 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); u64 tsc_this, delta, new_offset; + u64 phys_addr = __pa(per_cpu(vmxarea, cpu)); - if (vmm_exclusive vcpu-cpu != cpu) + if (!vmm_exclusive) + kvm_cpu_vmxon(phys_addr); + else if (vcpu-cpu != cpu) vcpu_clear(vmx); if (per_cpu(current_vmcs, cpu) != vmx-vmcs) { @@ -833,8 +838,10 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) static void vmx_vcpu_put(struct kvm_vcpu *vcpu) { __vmx_load_host_state(to_vmx(vcpu)); - if (!vmm_exclusive) + if (!vmm_exclusive) { __vcpu_clear(to_vmx(vcpu)); + kvm_cpu_vmxoff(); + } } static void vmx_fpu_activate(struct kvm_vcpu *vcpu) @@ -1257,9 +1264,11 @@ static int hardware_enable(void *garbage) FEATURE_CONTROL_LOCKED | FEATURE_CONTROL_VMXON_ENABLED); write_cr4(read_cr4() | X86_CR4_VMXE); /* FIXME: not cpu hotplug safe */ - kvm_cpu_vmxon(phys_addr); - ept_sync_global(); + if (vmm_exclusive) { + kvm_cpu_vmxon(phys_addr); + ept_sync_global(); + } return 0; } @@ -1285,8 +1294,10 @@ static void kvm_cpu_vmxoff(void) static void hardware_disable(void *garbage) { - vmclear_local_vcpus(); - kvm_cpu_vmxoff(); + if (vmm_exclusive) { + vmclear_local_vcpus(); + kvm_cpu_vmxoff(); + } write_cr4(read_cr4() ~X86_CR4_VMXE); } @@ -3949,6 +3960,19 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu) kmem_cache_free(kvm_vcpu_cache, vmx); } +static inline void vmcs_init(struct vmcs *vmcs) +{ + u64 phys_addr = __pa(per_cpu(vmxarea, raw_smp_processor_id())); + + if (!vmm_exclusive) + kvm_cpu_vmxon(phys_addr); + + vmcs_clear(vmcs); + + if (!vmm_exclusive) + kvm_cpu_vmxoff(); +} + static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) { int err; @@ -3974,7 +3998,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) if (!vmx-vmcs) goto free_msrs; - vmcs_clear(vmx-vmcs); + vmcs_init(vmx-vmcs); cpu = get_cpu(); vmx_vcpu_load(vmx-vcpu, cpu); -- 1.6.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4 v4] KVM: VMX: Some minor changes to code structure.
From: Dongxiao Xu dongxiao...@intel.com Do some preparations for vmm coexistence support. Signed-off-by: Dongxiao Xu dongxiao...@intel.com --- arch/x86/kvm/vmx.c | 16 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index e77da89..49b0850 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -784,15 +784,8 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) struct vcpu_vmx *vmx = to_vmx(vcpu); u64 tsc_this, delta, new_offset; - if (vcpu-cpu != cpu) { + if (vcpu-cpu != cpu) vcpu_clear(vmx); - kvm_migrate_timers(vcpu); - set_bit(KVM_REQ_TLB_FLUSH, vcpu-requests); - local_irq_disable(); - list_add(vmx-local_vcpus_link, -per_cpu(vcpus_on_cpu, cpu)); - local_irq_enable(); - } if (per_cpu(current_vmcs, cpu) != vmx-vmcs) { per_cpu(current_vmcs, cpu) = vmx-vmcs; @@ -803,6 +796,13 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) struct desc_ptr dt; unsigned long sysenter_esp; + kvm_migrate_timers(vcpu); + set_bit(KVM_REQ_TLB_FLUSH, vcpu-requests); + local_irq_disable(); + list_add(vmx-local_vcpus_link, +per_cpu(vcpus_on_cpu, cpu)); + local_irq_enable(); + vcpu-cpu = cpu; /* * Linux uses per-cpu TSS and GDT, so set these when switching -- 1.6.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/4 v3] KVM: VMX: Support hosted VMM coexsitence.
Hi Avi, Do you have other comments on this version of patch? Thanks, Dongxiao Xu, Dongxiao wrote: Hi all, This is hosted VMM coexistence support v3. Main changes from v2: 1) Change vmm_coexistence to vmm_exclusive. 2) Some code structure changes. Split the original 3 patches to 4. 3) Address some comments from Avi. Main changes from v1: 1) Add an module option vmm_coexistence to decide whether to enable this feature. Currently it is off defaultly. 2) Each time when a KVM vcpu is scheduled in, it will invalidate EPT and VPID TLBs to avoid conflict between different VMMs. VMX: Support for coexistence of KVM and other hosted VMMs. The following NOTE is picked up from Intel SDM 3B 27.3 chapter, MANAGING VMCS REGIONS AND POINTERS. -- NOTE As noted in Section 21.1, the processor may optimize VMX operation by maintaining the state of an active VMCS (one for which VMPTRLD has been executed) on the processor. Before relinquishing control to other system software that may, without informing the VMM, remove power from the processor (e.g., for transitions to S3 or S4) or leave VMX operation, a VMM must VMCLEAR all active VMCSs. This ensures that all VMCS data cached by the processor are flushed to memory and that no other software can corrupt the current VMM's VMCS data. It is also recommended that the VMM execute VMXOFF after such executions of VMCLEAR. -- Currently, VMCLEAR is called at VCPU migration. To support hosted VMM coexistence, this patch modifies the VMCLEAR/VMPTRLD and VMXON/VMXOFF usages. VMCLEAR will be called when VCPU is scheduled out of a physical CPU, while VMPTRLD is called when VCPU is scheduled in a physical CPU. Also this approach could eliminates the IPI mechanism for original VMCLEAR. As suggested by SDM, VMXOFF will be called after VMCLEAR, and VMXON will be called before VMPTRLD. With this patchset, KVM and VMware Workstation 7 could launch serapate guests and they can work well with each other. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3 v2] KVM: VMX: Support hosted VMM coexsitence.
Main changes from v1: 1) Add an module option vmm_coexistence to decide whether to enable this feature. Currently it is off defaultly. 2) Each time when a KVM vcpu is scheduled in, it will invalidate EPT and VPID TLBs to avoid conflict between different VMMs. VMX: Support for coexistence of KVM and other hosted VMMs. The following NOTE is picked up from Intel SDM 3B 27.3 chapter, MANAGING VMCS REGIONS AND POINTERS. -- NOTE As noted in Section 21.1, the processor may optimize VMX operation by maintaining the state of an active VMCS (one for which VMPTRLD has been executed) on the processor. Before relinquishing control to other system software that may, without informing the VMM, remove power from the processor (e.g., for transitions to S3 or S4) or leave VMX operation, a VMM must VMCLEAR all active VMCSs. This ensures that all VMCS data cached by the processor are flushed to memory and that no other software can corrupt the current VMM's VMCS data. It is also recommended that the VMM execute VMXOFF after such executions of VMCLEAR. -- Currently, VMCLEAR is called at VCPU migration. To support hosted VMM coexistence, this patch modifies the VMCLEAR/VMPTRLD and VMXON/VMXOFF usages. VMCLEAR will be called when VCPU is scheduled out of a physical CPU, while VMPTRLD is called when VCPU is scheduled in a physical CPU. Also this approach could eliminates the IPI mechanism for original VMCLEAR. As suggested by SDM, VMXOFF will be called after VMCLEAR, and VMXON will be called before VMPTRLD. With this patchset, KVM and VMware Workstation 7 could launch serapate guests and they can work well with each other. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/3 v2] KVM: VMX: VMCLEAR/VMPTRLD usage changes.
Avi Kivity wrote: On 05/06/2010 11:45 AM, Xu, Dongxiao wrote: From: Dongxiao Xudongxiao...@intel.com Originally VMCLEAR/VMPTRLD is called on vcpu migration. To support hosted VMM coexistance, VMCLEAR is executed on vcpu schedule out, and VMPTRLD is executed on vcpu schedule in. This could also eliminate the IPI when doing VMCLEAR. +static int __read_mostly vmm_coexistence; +module_param(vmm_coexistence, bool, S_IRUGO); I think we can call it 'exclusive' defaulting to true. I will change it in next version. + #define KVM_GUEST_CR0_MASK_UNRESTRICTED_GUEST \ (X86_CR0_WP | X86_CR0_NE | X86_CR0_NW | X86_CR0_CD) #define KVM_GUEST_CR0_MASK \ @@ -222,6 +225,7 @@ static u64 host_efer; static void ept_save_pdptrs(struct kvm_vcpu *vcpu); +static void vmx_flush_tlb(struct kvm_vcpu *vcpu); /* * Keep MSR_K6_STAR at the end, as setup_msrs() will try to optimize it * away by decrementing the array size. @@ -784,25 +788,31 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) struct vcpu_vmx *vmx = to_vmx(vcpu); u64 tsc_this, delta, new_offset; -if (vcpu-cpu != cpu) { -vcpu_clear(vmx); -kvm_migrate_timers(vcpu); +if (vmm_coexistence) { +vmcs_load(vmx-vmcs); set_bit(KVM_REQ_TLB_FLUSH,vcpu-requests); -local_irq_disable(); -list_add(vmx-local_vcpus_link, -per_cpu(vcpus_on_cpu, cpu)); -local_irq_enable(); -} +} else { +if (vcpu-cpu != cpu) { +vcpu_clear(vmx); +set_bit(KVM_REQ_TLB_FLUSH,vcpu-requests); +local_irq_disable(); +list_add(vmx-local_vcpus_link, +per_cpu(vcpus_on_cpu, cpu)); +local_irq_enable(); +} -if (per_cpu(current_vmcs, cpu) != vmx-vmcs) { -per_cpu(current_vmcs, cpu) = vmx-vmcs; -vmcs_load(vmx-vmcs); +if (per_cpu(current_vmcs, cpu) != vmx-vmcs) { +per_cpu(current_vmcs, cpu) = vmx-vmcs; +vmcs_load(vmx-vmcs); +} } Please keep the exclusive use code as it is, and just add !exclusive cases. For example. the current_vmcs test will always fail since current_vmcs will be set to NULL on vmx_vcpu_put, so we can have just one vmcs_load(). Thanks for the suggestion. @@ -829,7 +839,14 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) static void vmx_vcpu_put(struct kvm_vcpu *vcpu) { +struct vcpu_vmx *vmx = to_vmx(vcpu); + __vmx_load_host_state(to_vmx(vcpu)); +if (vmm_coexistence) { +vmcs_clear(vmx-vmcs); +rdtscll(vmx-vcpu.arch.host_tsc); +vmx-launched = 0; This code is also duplicated. Please refactor to avoid duplication. Thanks. +} } static void vmx_fpu_activate(struct kvm_vcpu *vcpu) @@ -1280,7 +1297,8 @@ static void kvm_cpu_vmxoff(void) static void hardware_disable(void *garbage) { -vmclear_local_vcpus(); +if (!vmm_coexistence) +vmclear_local_vcpus(); kvm_cpu_vmxoff(); write_cr4(read_cr4() ~X86_CR4_VMXE); } @@ -3927,7 +3945,8 @@ static void vmx_free_vmcs(struct kvm_vcpu *vcpu) struct vcpu_vmx *vmx = to_vmx(vcpu); if (vmx-vmcs) { -vcpu_clear(vmx); +if (!vmm_coexistence) +vcpu_clear(vmx); What's wrong with doing this unconditionally? The change should have beed put in PATCH 3/3. This judgement is to avoid calling vcpu_clear() after kvm_cpu_vmxoff(); However it will not appear in next version since I will set vcpu-cpu=-1 in vmx_vcpu_put() !vmm_exclusive case, so that vcpu_clear() will not executed actually. free_vmcs(vmx-vmcs); vmx-vmcs = NULL; } @@ -3969,13 +3988,20 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) if (!vmx-vmcs) goto free_msrs; -vmcs_clear(vmx-vmcs); - -cpu = get_cpu(); -vmx_vcpu_load(vmx-vcpu, cpu); -err = vmx_vcpu_setup(vmx); -vmx_vcpu_put(vmx-vcpu); -put_cpu(); +if (vmm_coexistence) { +preempt_disable(); +vmcs_load(vmx-vmcs); +err = vmx_vcpu_setup(vmx); +vmcs_clear(vmx-vmcs); +preempt_enable(); Why won't the normal sequence work? Yes you are right. +} else { +vmcs_clear(vmx-vmcs); +cpu = get_cpu(); +vmx_vcpu_load(vmx-vcpu, cpu); +err = vmx_vcpu_setup(vmx); +vmx_vcpu_put(vmx-vcpu); +put_cpu(); +} if (err) goto free_vmcs; if (vm_need_virtualize_apic_accesses(kvm)) @@ -4116,6 +4142,8 @@ static void vmx_cpuid_update
[PATCH 1/4 v3] KVM: VMX: Define new functions to wrapper direct call of asm code.
From: Dongxiao Xu dongxiao...@intel.com Define vmcs_load() and kvm_cpu_vmxon() to avoid direct call of asm code. Also move VMXE bit operation out of kvm_cpu_vmxoff(). Signed-off-by: Dongxiao Xu dongxiao...@intel.com --- arch/x86/kvm/vmx.c | 36 +++- 1 files changed, 23 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 875b785..e77da89 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -445,6 +445,19 @@ static void vmcs_clear(struct vmcs *vmcs) vmcs, phys_addr); } +static void vmcs_load(struct vmcs *vmcs) +{ + u64 phys_addr = __pa(vmcs); + u8 error; + + asm volatile (__ex(ASM_VMX_VMPTRLD_RAX) ; setna %0 + : =g(error) : a(phys_addr), m(phys_addr) + : cc, memory); + if (error) + printk(KERN_ERR kvm: vmptrld %p/%llx fail\n, + vmcs, phys_addr); +} + static void __vcpu_clear(void *arg) { struct vcpu_vmx *vmx = arg; @@ -769,7 +782,6 @@ static void vmx_load_host_state(struct vcpu_vmx *vmx) static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); - u64 phys_addr = __pa(vmx-vmcs); u64 tsc_this, delta, new_offset; if (vcpu-cpu != cpu) { @@ -783,15 +795,8 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) } if (per_cpu(current_vmcs, cpu) != vmx-vmcs) { - u8 error; - per_cpu(current_vmcs, cpu) = vmx-vmcs; - asm volatile (__ex(ASM_VMX_VMPTRLD_RAX) ; setna %0 - : =g(error) : a(phys_addr), m(phys_addr) - : cc); - if (error) - printk(KERN_ERR kvm: vmptrld %p/%llx fail\n, - vmx-vmcs, phys_addr); + vmcs_load(vmx-vmcs); } if (vcpu-cpu != cpu) { @@ -1220,6 +1225,13 @@ static __init int vmx_disabled_by_bios(void) /* locked but not enabled */ } +static void kvm_cpu_vmxon(u64 addr) +{ + asm volatile (ASM_VMX_VMXON_RAX + : : a(addr), m(addr) + : memory, cc); +} + static int hardware_enable(void *garbage) { int cpu = raw_smp_processor_id(); @@ -1240,9 +1252,7 @@ static int hardware_enable(void *garbage) FEATURE_CONTROL_LOCKED | FEATURE_CONTROL_VMXON_ENABLED); write_cr4(read_cr4() | X86_CR4_VMXE); /* FIXME: not cpu hotplug safe */ - asm volatile (ASM_VMX_VMXON_RAX - : : a(phys_addr), m(phys_addr) - : memory, cc); + kvm_cpu_vmxon(phys_addr); ept_sync_global(); @@ -1266,13 +1276,13 @@ static void vmclear_local_vcpus(void) static void kvm_cpu_vmxoff(void) { asm volatile (__ex(ASM_VMX_VMXOFF) : : : cc); - write_cr4(read_cr4() ~X86_CR4_VMXE); } static void hardware_disable(void *garbage) { vmclear_local_vcpus(); kvm_cpu_vmxoff(); + write_cr4(read_cr4() ~X86_CR4_VMXE); } static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt, -- 1.6.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4 v3] KVM: VMX: Some minor changes to code structure.
From: Dongxiao Xu dongxiao...@intel.com Do some preparations for vmm coexistence support. Signed-off-by: Dongxiao Xu dongxiao...@intel.com --- arch/x86/kvm/vmx.c | 16 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index e77da89..49b0850 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -784,15 +784,8 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) struct vcpu_vmx *vmx = to_vmx(vcpu); u64 tsc_this, delta, new_offset; - if (vcpu-cpu != cpu) { + if (vcpu-cpu != cpu) vcpu_clear(vmx); - kvm_migrate_timers(vcpu); - set_bit(KVM_REQ_TLB_FLUSH, vcpu-requests); - local_irq_disable(); - list_add(vmx-local_vcpus_link, -per_cpu(vcpus_on_cpu, cpu)); - local_irq_enable(); - } if (per_cpu(current_vmcs, cpu) != vmx-vmcs) { per_cpu(current_vmcs, cpu) = vmx-vmcs; @@ -803,6 +796,13 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) struct desc_ptr dt; unsigned long sysenter_esp; + kvm_migrate_timers(vcpu); + set_bit(KVM_REQ_TLB_FLUSH, vcpu-requests); + local_irq_disable(); + list_add(vmx-local_vcpus_link, +per_cpu(vcpus_on_cpu, cpu)); + local_irq_enable(); + vcpu-cpu = cpu; /* * Linux uses per-cpu TSS and GDT, so set these when switching -- 1.6.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4 v3] KVM: VMX: Support hosted VMM coexsitence.
Hi all, This is hosted VMM coexistence support v3. Main changes from v2: 1) Change vmm_coexistence to vmm_exclusive. 2) Some code structure changes. Split the original 3 patches to 4. 3) Address some comments from Avi. Main changes from v1: 1) Add an module option vmm_coexistence to decide whether to enable this feature. Currently it is off defaultly. 2) Each time when a KVM vcpu is scheduled in, it will invalidate EPT and VPID TLBs to avoid conflict between different VMMs. VMX: Support for coexistence of KVM and other hosted VMMs. The following NOTE is picked up from Intel SDM 3B 27.3 chapter, MANAGING VMCS REGIONS AND POINTERS. -- NOTE As noted in Section 21.1, the processor may optimize VMX operation by maintaining the state of an active VMCS (one for which VMPTRLD has been executed) on the processor. Before relinquishing control to other system software that may, without informing the VMM, remove power from the processor (e.g., for transitions to S3 or S4) or leave VMX operation, a VMM must VMCLEAR all active VMCSs. This ensures that all VMCS data cached by the processor are flushed to memory and that no other software can corrupt the current VMM's VMCS data. It is also recommended that the VMM execute VMXOFF after such executions of VMCLEAR. -- Currently, VMCLEAR is called at VCPU migration. To support hosted VMM coexistence, this patch modifies the VMCLEAR/VMPTRLD and VMXON/VMXOFF usages. VMCLEAR will be called when VCPU is scheduled out of a physical CPU, while VMPTRLD is called when VCPU is scheduled in a physical CPU. Also this approach could eliminates the IPI mechanism for original VMCLEAR. As suggested by SDM, VMXOFF will be called after VMCLEAR, and VMXON will be called before VMPTRLD. With this patchset, KVM and VMware Workstation 7 could launch serapate guests and they can work well with each other. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4 v3] KVM: VMX: VMXON/VMXOFF usage changes.
From: Dongxiao Xu dongxiao...@intel.com SDM suggests VMXON should be called before VMPTRLD, and VMXOFF should be called after doing VMCLEAR. Therefore in vmm coexistence case, we should firstly call VMXON before any VMCS operation, and then call VMXOFF after the operation is done. Signed-off-by: Dongxiao Xu dongxiao...@intel.com --- arch/x86/kvm/vmx.c | 45 ++--- 1 files changed, 38 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index c536b9d..59d7443 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -168,6 +168,8 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu) static int init_rmode(struct kvm *kvm); static u64 construct_eptp(unsigned long root_hpa); +static void kvm_cpu_vmxon(u64 addr); +static void kvm_cpu_vmxoff(void); static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -786,8 +788,11 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); u64 tsc_this, delta, new_offset; + u64 phys_addr = __pa(per_cpu(vmxarea, cpu)); - if (vmm_exclusive vcpu-cpu != cpu) + if (!vmm_exclusive) + kvm_cpu_vmxon(phys_addr); + else if (vcpu-cpu != cpu) vcpu_clear(vmx); if (per_cpu(current_vmcs, cpu) != vmx-vmcs) { @@ -833,8 +838,10 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) static void vmx_vcpu_put(struct kvm_vcpu *vcpu) { __vmx_load_host_state(to_vmx(vcpu)); - if (!vmm_exclusive) + if (!vmm_exclusive) { __vcpu_clear(to_vmx(vcpu)); + kvm_cpu_vmxoff(); + } } static void vmx_fpu_activate(struct kvm_vcpu *vcpu) @@ -1257,9 +1264,11 @@ static int hardware_enable(void *garbage) FEATURE_CONTROL_LOCKED | FEATURE_CONTROL_VMXON_ENABLED); write_cr4(read_cr4() | X86_CR4_VMXE); /* FIXME: not cpu hotplug safe */ - kvm_cpu_vmxon(phys_addr); - ept_sync_global(); + if (vmm_exclusive) { + kvm_cpu_vmxon(phys_addr); + ept_sync_global(); + } return 0; } @@ -1285,8 +1294,10 @@ static void kvm_cpu_vmxoff(void) static void hardware_disable(void *garbage) { - vmclear_local_vcpus(); - kvm_cpu_vmxoff(); + if (vmm_exclusive) { + vmclear_local_vcpus(); + kvm_cpu_vmxoff(); + } write_cr4(read_cr4() ~X86_CR4_VMXE); } @@ -3949,6 +3960,19 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu) kmem_cache_free(kvm_vcpu_cache, vmx); } +static inline void vmcs_init(struct vmcs *vmcs) +{ + u64 phys_addr = __pa(per_cpu(vmxarea, raw_smp_processor_id())); + + if (!vmm_exclusive) + kvm_cpu_vmxon(phys_addr); + + vmcs_clear(vmcs); + + if (!vmm_exclusive) + kvm_cpu_vmxoff(); +} + static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) { int err; @@ -3974,13 +3998,14 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) if (!vmx-vmcs) goto free_msrs; - vmcs_clear(vmx-vmcs); + vmcs_init(vmx-vmcs); cpu = get_cpu(); vmx_vcpu_load(vmx-vcpu, cpu); err = vmx_vcpu_setup(vmx); vmx_vcpu_put(vmx-vcpu); put_cpu(); + if (err) goto free_vmcs; if (vm_need_virtualize_apic_accesses(kvm)) @@ -4118,7 +4143,10 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu) struct kvm_cpuid_entry2 *best; struct vcpu_vmx *vmx = to_vmx(vcpu); u32 exec_control; + int cpu; + cpu = get_cpu(); + vmx_vcpu_load(vmx-vcpu, cpu); vmx-rdtscp_enabled = false; if (vmx_rdtscp_supported()) { exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL); @@ -4133,6 +4161,9 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu) } } } + vmx_vcpu_put(vmx-vcpu); + put_cpu(); + } static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry) -- 1.6.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4 v3] KVM: VMX: VMCLEAR/VMPTRLD usage changes.
From: Dongxiao Xu dongxiao...@intel.com Originally VMCLEAR/VMPTRLD is called on vcpu migration. To support hosted VMM coexistance, VMCLEAR is executed on vcpu schedule out, and VMPTRLD is executed on vcpu schedule in. This could also eliminate the IPI when doing VMCLEAR. vmm_exclusive is introduced as a module parameter to indicate whether the vmm coexistence feature is enabled or not. Currently the feature is disabled in default. Signed-off-by: Dongxiao Xu dongxiao...@intel.com --- arch/x86/kvm/vmx.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 49b0850..c536b9d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -62,6 +62,9 @@ module_param_named(unrestricted_guest, static int __read_mostly emulate_invalid_guest_state = 0; module_param(emulate_invalid_guest_state, bool, S_IRUGO); +static int __read_mostly vmm_exclusive = 1; +module_param(vmm_exclusive, bool, S_IRUGO); + #define KVM_GUEST_CR0_MASK_UNRESTRICTED_GUEST \ (X86_CR0_WP | X86_CR0_NE | X86_CR0_NW | X86_CR0_CD) #define KVM_GUEST_CR0_MASK \ @@ -784,7 +787,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) struct vcpu_vmx *vmx = to_vmx(vcpu); u64 tsc_this, delta, new_offset; - if (vcpu-cpu != cpu) + if (vmm_exclusive vcpu-cpu != cpu) vcpu_clear(vmx); if (per_cpu(current_vmcs, cpu) != vmx-vmcs) { @@ -830,6 +833,8 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) static void vmx_vcpu_put(struct kvm_vcpu *vcpu) { __vmx_load_host_state(to_vmx(vcpu)); + if (!vmm_exclusive) + __vcpu_clear(to_vmx(vcpu)); } static void vmx_fpu_activate(struct kvm_vcpu *vcpu) -- 1.6.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/3] KVM: VMX: Support hosted VMM coexsitence.
Avi Kivity wrote: On 03/23/2010 06:01 AM, Xu, Dongxiao wrote: Avi Kivity wrote: On 03/18/2010 11:49 AM, Xu, Dongxiao wrote: VMX: Support for coexistence of KVM and other hosted VMMs. The following NOTE is picked up from Intel SDM 3B 27.3 chapter, MANAGING VMCS REGIONS AND POINTERS. -- NOTE As noted in Section 21.1, the processor may optimize VMX operation by maintaining the state of an active VMCS (one for which VMPTRLD has been executed) on the processor. Before relinquishing control to other system software that may, without informing the VMM, remove power from the processor (e.g., for transitions to S3 or S4) or leave VMX operation, a VMM must VMCLEAR all active VMCSs. This ensures that all VMCS data cached by the processor are flushed to memory and that no other software can corrupt the current VMM's VMCS data. It is also recommended that the VMM execute VMXOFF after such executions of VMCLEAR. -- Currently, VMCLEAR is called at VCPU migration. To support hosted VMM coexistence, this patch modifies the VMCLEAR/VMPTRLD and VMXON/VMXOFF usages. VMCLEAR will be called when VCPU is scheduled out of a physical CPU, while VMPTRLD is called when VCPU is scheduled in a physical CPU. Also this approach could eliminates the IPI mechanism for original VMCLEAR. As suggested by SDM, VMXOFF will be called after VMCLEAR, and VMXON will be called before VMPTRLD. My worry is that newer processors will cache more and more VMCS contents on-chip, so the VMCLEAR/VMXOFF will cause a greater loss with newer processors. Based on our intenal testing, we saw less than 1% of performance differences even on such processors. Did you measure workloads that exit to userspace very often? Also, what about future processors? My understanding is that the manual recommends keeping things cached, the above description is for sleep states. I measured the performance by using kernel build in guest. I launched 6 guests, 5 of them and the host are doing while(1) loop, and the left guest is doing kernel build. The CPU overcommitment is 7:1, and vcpu schedule frequency is about 15k/sec. I tested this with Intel new processors on my hand, and the performance difference is little. With this patchset, KVM and VMware Workstation 7 could launch serapate guests and they can work well with each other. Besides, I measured the performance for this patch, there is no visable performance loss according to the test results. Is that the only motivation? It seems like an odd use-case. If there was no performance impact (current or future), I wouldn't mind, but the design of VMPTRLD/VMCLEAR/VMXON/VMXOFF seems to indicate that we want to keep a VMCS loaded as much as possible on the processor. I just used KVM and VMware Workstation 7 for testing this patchset. Through this new usage of VMPTRLD/VMCLEAR/VMXON/VMXOFF, we could make hosted VMMs work separately and do not impact each other. What I am questioning is whether a significant number of users want to run kvm in parallel with another hypervisor. At least this approach gives users an option to run VMMs in parallel without significant performance loss. Think of this senario, if a server has already Deployed VMware software, but some new customers want to use KVM, this patch could help them to meet their requirements. I ever tested this case, if a KVM guest is already run, and the user launches VMware guest, the KVM guest will die and VMware guest still runs well. This patchset can solve the problem. Thanks! Dongxiao -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/3] KVM: VMX: Support hosted VMM coexsitence.
Avi Kivity wrote: On 03/18/2010 11:49 AM, Xu, Dongxiao wrote: VMX: Support for coexistence of KVM and other hosted VMMs. The following NOTE is picked up from Intel SDM 3B 27.3 chapter, MANAGING VMCS REGIONS AND POINTERS. -- NOTE As noted in Section 21.1, the processor may optimize VMX operation by maintaining the state of an active VMCS (one for which VMPTRLD has been executed) on the processor. Before relinquishing control to other system software that may, without informing the VMM, remove power from the processor (e.g., for transitions to S3 or S4) or leave VMX operation, a VMM must VMCLEAR all active VMCSs. This ensures that all VMCS data cached by the processor are flushed to memory and that no other software can corrupt the current VMM's VMCS data. It is also recommended that the VMM execute VMXOFF after such executions of VMCLEAR. -- Currently, VMCLEAR is called at VCPU migration. To support hosted VMM coexistence, this patch modifies the VMCLEAR/VMPTRLD and VMXON/VMXOFF usages. VMCLEAR will be called when VCPU is scheduled out of a physical CPU, while VMPTRLD is called when VCPU is scheduled in a physical CPU. Also this approach could eliminates the IPI mechanism for original VMCLEAR. As suggested by SDM, VMXOFF will be called after VMCLEAR, and VMXON will be called before VMPTRLD. My worry is that newer processors will cache more and more VMCS contents on-chip, so the VMCLEAR/VMXOFF will cause a greater loss with newer processors. Based on our intenal testing, we saw less than 1% of performance differences even on such processors. With this patchset, KVM and VMware Workstation 7 could launch serapate guests and they can work well with each other. Besides, I measured the performance for this patch, there is no visable performance loss according to the test results. Is that the only motivation? It seems like an odd use-case. If there was no performance impact (current or future), I wouldn't mind, but the design of VMPTRLD/VMCLEAR/VMXON/VMXOFF seems to indicate that we want to keep a VMCS loaded as much as possible on the processor. I just used KVM and VMware Workstation 7 for testing this patchset. Through this new usage of VMPTRLD/VMCLEAR/VMXON/VMXOFF, we could make hosted VMMs work separately and do not impact each other. Thanks! Dongxiao -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] KVM: VMX code cleanup and preparation.
From: Dongxiao Xu dongxiao...@intel.com Signed-off-by: Dongxiao Xu dongxiao...@intel.com --- arch/x86/kvm/vmx.c | 36 +++- 1 files changed, 23 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index ae3217d..10e2c3c 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -442,6 +442,19 @@ static void vmcs_clear(struct vmcs *vmcs) vmcs, phys_addr); } +static void vmcs_load(struct vmcs *vmcs) +{ + u64 phys_addr = __pa(vmcs); + u8 error; + + asm volatile (__ex(ASM_VMX_VMPTRLD_RAX) ; setna %0 + : =g(error) : a(phys_addr), m(phys_addr) + : cc, memory); + if (error) + printk(KERN_ERR kvm: vmptrld %p/%llx fail\n, + vmcs, phys_addr); +} + static void __vcpu_clear(void *arg) { struct vcpu_vmx *vmx = arg; @@ -766,7 +779,6 @@ static void vmx_load_host_state(struct vcpu_vmx *vmx) static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); - u64 phys_addr = __pa(vmx-vmcs); u64 tsc_this, delta, new_offset; if (vcpu-cpu != cpu) { @@ -780,15 +792,8 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) } if (per_cpu(current_vmcs, cpu) != vmx-vmcs) { - u8 error; - per_cpu(current_vmcs, cpu) = vmx-vmcs; - asm volatile (__ex(ASM_VMX_VMPTRLD_RAX) ; setna %0 - : =g(error) : a(phys_addr), m(phys_addr) - : cc); - if (error) - printk(KERN_ERR kvm: vmptrld %p/%llx fail\n, - vmx-vmcs, phys_addr); + vmcs_load(vmx-vmcs); } if (vcpu-cpu != cpu) { @@ -1211,6 +1216,13 @@ static __init int vmx_disabled_by_bios(void) /* locked but not enabled */ } +static void kvm_cpu_vmxon(u64 addr) +{ + asm volatile (ASM_VMX_VMXON_RAX + : : a(addr), m(addr) + : memory, cc); +} + static int hardware_enable(void *garbage) { int cpu = raw_smp_processor_id(); @@ -1231,9 +1243,7 @@ static int hardware_enable(void *garbage) FEATURE_CONTROL_LOCKED | FEATURE_CONTROL_VMXON_ENABLED); write_cr4(read_cr4() | X86_CR4_VMXE); /* FIXME: not cpu hotplug safe */ - asm volatile (ASM_VMX_VMXON_RAX - : : a(phys_addr), m(phys_addr) - : memory, cc); + kvm_cpu_vmxon(phys_addr); ept_sync_global(); @@ -1257,13 +1267,13 @@ static void vmclear_local_vcpus(void) static void kvm_cpu_vmxoff(void) { asm volatile (__ex(ASM_VMX_VMXOFF) : : : cc); - write_cr4(read_cr4() ~X86_CR4_VMXE); } static void hardware_disable(void *garbage) { vmclear_local_vcpus(); kvm_cpu_vmxoff(); + write_cr4(read_cr4() ~X86_CR4_VMXE); } static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt, -- 1.6.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] KVM: VMX: VMXON/VMXOFF usage changes.
From: Dongxiao Xu dongxiao...@intel.com Intel SDM also suggests that VMXOFF should be called after doing VMCLEAR. Therefore VMXON should be called before VMPTRLD. Signed-off-by: Dongxiao Xu dongxiao...@intel.com --- arch/x86/kvm/vmx.c | 21 - 1 files changed, 20 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index aaf5b52..dd3a7fa 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -161,8 +161,11 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu) static int init_rmode(struct kvm *kvm); static u64 construct_eptp(unsigned long root_hpa); +static void kvm_cpu_vmxon(u64 addr); +static void kvm_cpu_vmxoff(void); static DEFINE_PER_CPU(struct vmcs *, vmxarea); +static DEFINE_PER_CPU(bool, vmx_mode); static unsigned long *vmx_io_bitmap_a; static unsigned long *vmx_io_bitmap_b; @@ -755,7 +758,9 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); u64 tsc_this, delta, new_offset; + u64 phys_addr = __pa(per_cpu(vmxarea, cpu)); + kvm_cpu_vmxon(phys_addr); vmcs_load(vmx-vmcs); if (vcpu-cpu != cpu) { @@ -796,6 +801,7 @@ static void vmx_vcpu_put(struct kvm_vcpu *vcpu) vmcs_clear(vmx-vmcs); rdtscll(vmx-vcpu.arch.host_tsc); vmx-launched = 0; + kvm_cpu_vmxoff(); } static void vmx_fpu_activate(struct kvm_vcpu *vcpu) @@ -1187,9 +1193,11 @@ static __init int vmx_disabled_by_bios(void) static void kvm_cpu_vmxon(u64 addr) { + int cpu = raw_smp_processor_id(); asm volatile (ASM_VMX_VMXON_RAX : : a(addr), m(addr) : memory, cc); + per_cpu(vmx_mode, cpu) = 1; } static int hardware_enable(void *garbage) @@ -1223,12 +1231,16 @@ static int hardware_enable(void *garbage) */ static void kvm_cpu_vmxoff(void) { + int cpu = raw_smp_processor_id(); asm volatile (__ex(ASM_VMX_VMXOFF) : : : cc); + per_cpu(vmx_mode, cpu) = 0; } static void hardware_disable(void *garbage) { - kvm_cpu_vmxoff(); + int cpu = raw_smp_processor_id(); + if (per_cpu(vmx_mode, cpu)) + kvm_cpu_vmxoff(); write_cr4(read_cr4() ~X86_CR4_VMXE); } @@ -3930,6 +3942,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) { int err; struct vcpu_vmx *vmx = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL); + u64 phys_addr = __pa(per_cpu(vmxarea, smp_processor_id())); if (!vmx) return ERR_PTR(-ENOMEM); @@ -3951,9 +3964,11 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) goto free_msrs; preempt_disable(); + kvm_cpu_vmxon(phys_addr); vmcs_load(vmx-vmcs); err = vmx_vcpu_setup(vmx); vmcs_clear(vmx-vmcs); + kvm_cpu_vmxoff(); preempt_enable(); if (err) @@ -4091,10 +4106,13 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *best; struct vcpu_vmx *vmx = to_vmx(vcpu); + int cpu = raw_smp_processor_id(); + u64 phys_addr = __pa(per_cpu(vmxarea, cpu)); u32 exec_control; vmx-rdtscp_enabled = false; if (vmx_rdtscp_supported()) { + kvm_cpu_vmxon(phys_addr); vmcs_load(vmx-vmcs); exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL); if (exec_control SECONDARY_EXEC_RDTSCP) { @@ -4108,6 +4126,7 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu) } } vmcs_clear(vmx-vmcs); + kvm_cpu_vmxoff(); } } -- 1.6.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] KVM: VMX: Support hosted VMM coexsitence.
VMX: Support for coexistence of KVM and other hosted VMMs. The following NOTE is picked up from Intel SDM 3B 27.3 chapter, MANAGING VMCS REGIONS AND POINTERS. -- NOTE As noted in Section 21.1, the processor may optimize VMX operation by maintaining the state of an active VMCS (one for which VMPTRLD has been executed) on the processor. Before relinquishing control to other system software that may, without informing the VMM, remove power from the processor (e.g., for transitions to S3 or S4) or leave VMX operation, a VMM must VMCLEAR all active VMCSs. This ensures that all VMCS data cached by the processor are flushed to memory and that no other software can corrupt the current VMM's VMCS data. It is also recommended that the VMM execute VMXOFF after such executions of VMCLEAR. -- Currently, VMCLEAR is called at VCPU migration. To support hosted VMM coexistence, this patch modifies the VMCLEAR/VMPTRLD and VMXON/VMXOFF usages. VMCLEAR will be called when VCPU is scheduled out of a physical CPU, while VMPTRLD is called when VCPU is scheduled in a physical CPU. Also this approach could eliminates the IPI mechanism for original VMCLEAR. As suggested by SDM, VMXOFF will be called after VMCLEAR, and VMXON will be called before VMPTRLD. With this patchset, KVM and VMware Workstation 7 could launch serapate guests and they can work well with each other. Besides, I measured the performance for this patch, there is no visable performance loss according to the test results. The following performance results are got from a host with 8 cores. 1. vConsolidate benchmarks on KVM Test Round WebBenchSPECjbb SysBenchLoadSim GEOMEAN 1 W/O patch 2,614.7228,053.09 1,108.4116.30 1,072.95 W/ patch 2,691.5528,145.71 1,128.4116.47 1,089.28 2 W/O patch 2,642.3928,104.79 1,096.9917.79 1,097.19 W/ patch 2,699.2528,092.62 1,116.1015.54 1,070.98 3 W/O patch 2,571.5828,131.17 1,108.4316.39 1,070.70 W/ patch 2,627.8928,090.19 1,110.9417.00 1,086.57 Average W/O patch 2,609.5628,096.35 1,104.6116.83 1,080.28 W/ patch2,672.9028,109.51 1,118.4816.34 1,082.28 2. CPU overcommitment tests for KVM A) Run 8 while(1) in host which pin with 8 cores. B) Launch 6 guests, each has 8 VCPUs, pin each VCPU with one core. C) Among the 6 guests, 5 of them are running 8*while(1). D) The left guest is doing kernel build make -j9 under ramdisk. In this case, the overcommitment ratio for each core is 7:1. The VCPU schedule frequency on all cores is totally ~15k/sec. l record the kernel build time. While doing the average, the first round data is treated as invalid, which isn't counted into the final average result. Kernel Build Time (second) Round w/o patch w/ patch 1 541 501 2 488 490 3 488 492 4 492 493 5 489 491 6 494 487 7 497 494 8 492 492 9 493 496 10 492 495 11 490 496 12 489 494 13 489 490 14 490 491 15 494 497 16 495 496 17 496 496 18 493 492 19 493 500 20 490 499 Average 491.79 493.74 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Handle INIT before SIPI.
Hi, Gleb, It seems that this patch is x86 specific, and it breaks the IA-64 compilation. Do you have any good idea to solve this? Thanks! Best Regards, -- Dongxiao -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gleb Natapov Sent: 2008年10月16日 23:21 To: [EMAIL PROTECTED] Cc: kvm@vger.kernel.org Subject: Handle INIT before SIPI. Handle INIT before SIPI. If SIPI was issued before CPU handled INIT signal the CPU is not reseted. Signed-off-by: Gleb Natapov [EMAIL PROTECTED] diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c index 577f2b2..dae05b0 100644 --- a/qemu/qemu-kvm.c +++ b/qemu/qemu-kvm.c @@ -349,13 +349,20 @@ static void update_regs_for_sipi(CPUState *env) { kvm_arch_update_regs_for_sipi(env); vcpu_info[env-cpu_index].sipi_needed = 0; -vcpu_info[env-cpu_index].init = 0; } static void update_regs_for_init(CPUState *env) { +SegmentCache cs = env-segs[R_CS]; + cpu_reset(env); + +/* restore SIPI vector */ +if(vcpu_info[env-cpu_index].sipi_needed) +env-segs[R_CS] = cs; + kvm_arch_load_regs(env); +vcpu_info[env-cpu_index].init = 0; } static void setup_kernel_sigmask(CPUState *env) @@ -411,10 +418,12 @@ static int kvm_main_loop_cpu(CPUState *env) kvm_main_loop_wait(env, 1000); if (env-interrupt_request (CPU_INTERRUPT_HARD | CPU_INTERRUPT_NMI)) env-halted = 0; - if (!kvm_irqchip_in_kernel(kvm_context) info-sipi_needed) - update_regs_for_sipi(env); - if (!kvm_irqchip_in_kernel(kvm_context) info-init) - update_regs_for_init(env); +if (!kvm_irqchip_in_kernel(kvm_context)) { + if (info-init) + update_regs_for_init(env); + if (info-sipi_needed) + update_regs_for_sipi(env); +} if (!env-halted !info-init) kvm_cpu_exec(env); env-interrupt_request = ~CPU_INTERRUPT_EXIT; -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html