RE: [PATCH 1/1] KVM: X86: add the support of XSAVE/XRSTOR to guest
Avi Kivity wrote: On 05/06/2010 07:23 AM, Cui, Dexuan wrote: + goto err; + vcpu-arch.guest_xstate_mask = new_bv; + xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu-arch.guest_xstate_mask); Can't we run with the host xcr0? isn't it guaranteed to be a superset of the guest xcr0? I agree host xcr0 is a superset of guest xcr0. In the case guest xcr0 has less bits set than host xcr0, I suppose running with guest xcr0 would be a bit faster. However, switching xcr0 may be slow. That's our experience with msrs. Can you measure its latency? We can measure that. However, I think the changing xcr0 to guest xcr0 in handle_xsetbv() is necessary -- or else, inside guest xgetbv() would return host xcr0 rather than guest xcr0 -- this is obviously incorrect. Once handle_xsetbv() changes the xcr0 to guest's value, the xsetbv() in kvm_fx_restore_host() is also necessary, and the xsetbv() in kvm_fx_restore_guest() is also necessary. So looks guest can't run with the host xcr0. Can also be avoided if guest and host xcr0 match. If you think simplying the code by removing the field guest_xstate_mask is more important, we can do it. Well we need guest_xstate_mask, it's the guest's xcr0, no? I misread it. Yes, we need geust_xsave_mask. btw, it needs save/restore for live migration, as well as save/restore for the new fpu state. Yes. This part is missing. Sheng and I is also doing this -- it may be a bittle troublesome as the number of XSTATEs can grown as time goes on. We'll have to handle the compatibility issue. + skip_emulated_instruction(vcpu); + return 1; +err: + kvm_inject_gp(vcpu, 0); Need to #UD in some circumstances. I don't think so. When the CPU doesn't suport XSAVE, or guest doesn't set guest CR4.OSXSAVE, guest's attempt of exectuing xsetbv would get a #UD immediately and no VMexit would occur. Ok. @@ -62,6 +64,7 @@ (~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE\ | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE \ | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR \ +| (cpu_has_xsave ? X86_CR4_OSXSAVE : 0) \ | X86_CR4_OSXMMEXCPT | X86_CR4_VMXE)) It also depends on the guest cpuid value. Please add it outside the If cpu_has_xsave is true, we always present xsave to guest via cpuid, so I think the code is correct here. We don't pass all host cpuid values to the guest. We expose them to userspace via KVM_GET_SUPPORTED_CPUID, and then userspace decides what cpuid to present to the guest via KVM_SET_CPUID2. So the guest may run with fewer features than the host. Sorry, I didn't notice this. Will look into this. I saw the 2 patches you sent. They (like the new APIs fpu_alloc/fpu_free) will simplify the implementation of kvm xsave support. Thanks a lot! Thanks. To simplify things, please separate host xsave support (switching to the fpu api) and guest xsave support (cpuid, xsetbv, new ioctls) into separate patches. In fact, guest xsave support is probably best split into patches as well. Ok. Will try to do these. Thanks, -- Dexuan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] KVM: X86: add the support of XSAVE/XRSTOR to guest
On 05/06/2010 05:20 PM, Cui, Dexuan wrote: However, switching xcr0 may be slow. That's our experience with msrs. Can you measure its latency? We can measure that. However, I think the changing xcr0 to guest xcr0 in handle_xsetbv() is necessary -- or else, inside guest xgetbv() would return host xcr0 rather than guest xcr0 -- this is obviously incorrect. Once handle_xsetbv() changes the xcr0 to guest's value, the xsetbv() in kvm_fx_restore_host() is also necessary, and the xsetbv() in kvm_fx_restore_guest() is also necessary. So looks guest can't run with the host xcr0. Right. Moreover, xsave would write into memory the guest doesn't expect. btw, it needs save/restore for live migration, as well as save/restore for the new fpu state. Yes. This part is missing. Sheng and I is also doing this -- it may be a bittle troublesome as the number of XSTATEs can grown as time goes on. We'll have to handle the compatibility issue. Reserve tons of space in the ioctl - and we can use the same format as xsave. All those control registers are annoying, we have cr1 and cr5-cr7 free, cr9-cr15 on x86_64, infinite msr space, and now we have XCRs. Great. Looking forward to YCR0. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] KVM: X86: add the support of XSAVE/XRSTOR to guest
On 05/06/2010 10:45 PM, Avi Kivity wrote: All those control registers are annoying, we have cr1 and cr5-cr7 free, cr9-cr15 on x86_64, infinite msr space, and now we have XCRs. Great. Looking forward to YCR0. I think I see the reason - xgetbv is unprivileged, so applications can see what the OS supports instead of looking at cpuid and hoping. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/1] KVM: X86: add the support of XSAVE/XRSTOR to guest
Avi Kivity wrote: On 04/29/2010 08:22 AM, Dexuan Cui wrote: When the host enables XSAVE/XRSTOR, the patch exposes the XSAVE/XRSTOR related CPUID leaves to guest by fixing up kvm_emulate_cpuid() and the patch allows guest to set CR4.OSXSAVE to enable XSAVE. The patch adds per-vcpu host/guest xstate image/mask and enhances the current FXSAVE/FRSTOR with the new XSAVE/XRSTOR on the host xstate (FPU/SSE/YMM) switch. 5 files changed, 242 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 3f0007b..60be1a7 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -303,6 +303,11 @@ struct kvm_vcpu_arch { struct i387_fxsave_struct host_fx_image; struct i387_fxsave_struct guest_fx_image; +struct xsave_struct *host_xstate_image; +struct xsave_struct *guest_xstate_image; +uint64_t host_xstate_mask; Does host_xstate_mask need to be per-vcpu, or can it be global? (I'm sorry for my late reply as I was on a long leave...) Yes, I think host_xstate_mask can be global because in Linux xsave_cntxt_init() and xsave_init() always enables all the XSTATEs on all CPUs (currently XCNTXT_MASK has 3 bits). In host, only the kernel can execute xsetbv() and applications can't change the XCR.I think this usage of Linux won't change in future. +uint64_t guest_xstate_mask; Can be called xcr0, like other shadow registers. OK. + gva_t mmio_fault_cr2; struct kvm_pio_request pio; void *pio_data; @@ -3258,6 +3262,25 @@ static int handle_wbinvd(struct kvm_vcpu *vcpu) return 1; } +static int handle_xsetbv(struct kvm_vcpu *vcpu) +{ +u64 new_bv = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDX)) | +kvm_register_read(vcpu, VCPU_REGS_RAX); Missing shift? Oh, my carelessness! Thanks for pointing it out. Probably worthwhile to create a helper for reading/writing edx:eax into a u64. OK. +u64 host_bv = vcpu-arch.host_xstate_mask; What about ecx? Sorry, I missed ecx. Currently only ecx==0 is defined. When a guest tries a non-zero value of ecx, a #GP should be injected into the guest. + +if (((new_bv ^ host_bv) ~host_bv) Isn't (new_bv ~host_bv) equivalent? (guest cannot exceed host...) OK. Will use the simple one. :-) || !(new_bv 1)) Symbolic value or comment. OK. the 1 here should be XSTATE_FP. +goto err; +if ((host_bv XSTATE_YMM new_bv) !(new_bv XSTATE_SSE)) host_bv unneeded, I think. Agree. +goto err; +vcpu-arch.guest_xstate_mask = new_bv; +xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu-arch.guest_xstate_mask); Can't we run with the host xcr0? isn't it guaranteed to be a superset of the guest xcr0? I agree host xcr0 is a superset of guest xcr0. In the case guest xcr0 has less bits set than host xcr0, I suppose running with guest xcr0 would be a bit faster. If you think simplying the code by removing the field guest_xstate_mask is more important, we can do it. +skip_emulated_instruction(vcpu); +return 1; +err: +kvm_inject_gp(vcpu, 0); Need to #UD in some circumstances. I don't think so. When the CPU doesn't suport XSAVE, or guest doesn't set guest CR4.OSXSAVE, guest's attempt of exectuing xsetbv would get a #UD immediately and no VMexit would occur. BTW: I missed injecting #GP in the cases ECX !=0 or CPL !=0. Will add them. +return 1; +} + static int handle_apic_access(struct kvm_vcpu *vcpu) { unsigned long exit_qualification; @@ -3556,6 +3579,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = { [EXIT_REASON_TPR_BELOW_THRESHOLD] = handle_tpr_below_threshold, [EXIT_REASON_APIC_ACCESS] = handle_apic_access, [EXIT_REASON_WBINVD] = handle_wbinvd, + [EXIT_REASON_XSETBV] = handle_xsetbv, [EXIT_REASON_TASK_SWITCH] = handle_task_switch, [EXIT_REASON_MCE_DURING_VMENTRY] = handle_machine_check, [EXIT_REASON_EPT_VIOLATION] = handle_ept_violation, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6b2ce1d..2af3fbe 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -52,6 +52,8 @@ #includeasm/desc.h #includeasm/mtrr.h #includeasm/mce.h +#includeasm/i387.h +#includeasm/xcr.h #define MAX_IO_MSRS 256 #define CR0_RESERVED_BITS \ @@ -62,6 +64,7 @@ (~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE\ | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE \ | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR \ + | (cpu_has_xsave ? X86_CR4_OSXSAVE : 0) \ | X86_CR4_OSXMMEXCPT | X86_CR4_VMXE)) It also depends on the guest cpuid value. Please add it outside the If cpu_has_xsave is
Re: [PATCH 1/1] KVM: X86: add the support of XSAVE/XRSTOR to guest
On 04/29/2010 08:22 AM, Dexuan Cui wrote: When the host enables XSAVE/XRSTOR, the patch exposes the XSAVE/XRSTOR related CPUID leaves to guest by fixing up kvm_emulate_cpuid() and the patch allows guest to set CR4.OSXSAVE to enable XSAVE. The patch adds per-vcpu host/guest xstate image/mask and enhances the current FXSAVE/FRSTOR with the new XSAVE/XRSTOR on the host xstate (FPU/SSE/YMM) switch. 5 files changed, 242 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 3f0007b..60be1a7 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -303,6 +303,11 @@ struct kvm_vcpu_arch { struct i387_fxsave_struct host_fx_image; struct i387_fxsave_struct guest_fx_image; + struct xsave_struct *host_xstate_image; + struct xsave_struct *guest_xstate_image; + uint64_t host_xstate_mask; Does host_xstate_mask need to be per-vcpu, or can it be global? + uint64_t guest_xstate_mask; Can be called xcr0, like other shadow registers. + gva_t mmio_fault_cr2; struct kvm_pio_request pio; void *pio_data; @@ -3258,6 +3262,25 @@ static int handle_wbinvd(struct kvm_vcpu *vcpu) return 1; } +static int handle_xsetbv(struct kvm_vcpu *vcpu) +{ + u64 new_bv = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDX)) | + kvm_register_read(vcpu, VCPU_REGS_RAX); Missing shift? Probably worthwhile to create a helper for reading/writing edx:eax into a u64. + u64 host_bv = vcpu-arch.host_xstate_mask; What about ecx? + + if (((new_bv ^ host_bv) ~host_bv) Isn't (new_bv ~host_bv) equivalent? (guest cannot exceed host...) || !(new_bv 1)) Symbolic value or comment. + goto err; + if ((host_bv XSTATE_YMM new_bv) !(new_bv XSTATE_SSE)) host_bv unneeded, I think. + goto err; + vcpu-arch.guest_xstate_mask = new_bv; + xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu-arch.guest_xstate_mask); Can't we run with the host xcr0? isn't it guaranteed to be a superset of the guest xcr0? + skip_emulated_instruction(vcpu); + return 1; +err: + kvm_inject_gp(vcpu, 0); Need to #UD in some circumstances. + return 1; +} + static int handle_apic_access(struct kvm_vcpu *vcpu) { unsigned long exit_qualification; @@ -3556,6 +3579,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = { [EXIT_REASON_TPR_BELOW_THRESHOLD] = handle_tpr_below_threshold, [EXIT_REASON_APIC_ACCESS] = handle_apic_access, [EXIT_REASON_WBINVD] = handle_wbinvd, + [EXIT_REASON_XSETBV] = handle_xsetbv, [EXIT_REASON_TASK_SWITCH] = handle_task_switch, [EXIT_REASON_MCE_DURING_VMENTRY] = handle_machine_check, [EXIT_REASON_EPT_VIOLATION] = handle_ept_violation, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6b2ce1d..2af3fbe 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -52,6 +52,8 @@ #includeasm/desc.h #includeasm/mtrr.h #includeasm/mce.h +#includeasm/i387.h +#includeasm/xcr.h #define MAX_IO_MSRS 256 #define CR0_RESERVED_BITS \ @@ -62,6 +64,7 @@ (~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE\ | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE \ | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR \ + | (cpu_has_xsave ? X86_CR4_OSXSAVE : 0) \ | X86_CR4_OSXMMEXCPT | X86_CR4_VMXE)) It also depends on the guest cpuid value. Please add it outside the macro, it's confusing to read something that looks like a constant but isn't. int kvm_emulate_halt(struct kvm_vcpu *vcpu) @@ -4307,6 +4346,65 @@ not_found: return 36; } +#define bitmaskof(idx) (1U ((idx) 31)) +static void kvm_emulate_cpuid_fixup(struct kvm_vcpu *vcpu, u32 func, u32 idx) +{ + u32 eax, ebx, ecx, edx; + + if (func != 0 func != 1 func != 0xd) + return; + + eax = kvm_register_read(vcpu, VCPU_REGS_RAX); + ebx = kvm_register_read(vcpu, VCPU_REGS_RBX); + ecx = kvm_register_read(vcpu, VCPU_REGS_RCX); + edx = kvm_register_read(vcpu, VCPU_REGS_RDX); + + switch (func) { + case 0: + /* fixup the Maximum Input Value */ + if (cpu_has_xsave eax 0xd) + eax = 0xd; + break; + case 1: + ecx= ~(bitmaskof(X86_FEATURE_XSAVE) | + bitmaskof(X86_FEATURE_OSXSAVE)); + if (!cpu_has_xsave) + break; + ecx |= bitmaskof(X86_FEATURE_XSAVE); + if (kvm_read_cr4(vcpu) X86_CR4_OSXSAVE) + ecx |=
[PATCH 1/1] KVM: X86: add the support of XSAVE/XRSTOR to guest
When the host enables XSAVE/XRSTOR, the patch exposes the XSAVE/XRSTOR related CPUID leaves to guest by fixing up kvm_emulate_cpuid() and the patch allows guest to set CR4.OSXSAVE to enable XSAVE. The patch adds per-vcpu host/guest xstate image/mask and enhances the current FXSAVE/FRSTOR with the new XSAVE/XRSTOR on the host xstate (FPU/SSE/YMM) switch. Signed-off-by: Dexuan Cui dexuan@intel.com --- arch/x86/include/asm/kvm_host.h | 15 +-- arch/x86/include/asm/vmx.h |1 + arch/x86/include/asm/xsave.h|3 + arch/x86/kvm/vmx.c | 24 + arch/x86/kvm/x86.c | 217 +-- 5 files changed, 242 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 3f0007b..60be1a7 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -303,6 +303,11 @@ struct kvm_vcpu_arch { struct i387_fxsave_struct host_fx_image; struct i387_fxsave_struct guest_fx_image; + struct xsave_struct *host_xstate_image; + struct xsave_struct *guest_xstate_image; + uint64_t host_xstate_mask; + uint64_t guest_xstate_mask; + gva_t mmio_fault_cr2; struct kvm_pio_request pio; void *pio_data; @@ -718,16 +723,6 @@ static inline unsigned long read_msr(unsigned long msr) } #endif -static inline void kvm_fx_save(struct i387_fxsave_struct *image) -{ - asm(fxsave (%0):: r (image)); -} - -static inline void kvm_fx_restore(struct i387_fxsave_struct *image) -{ - asm(fxrstor (%0):: r (image)); -} - static inline void kvm_fx_finit(void) { asm(finit); diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index fb9a080..842286b 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -260,6 +260,7 @@ enum vmcs_field { #define EXIT_REASON_EPT_VIOLATION 48 #define EXIT_REASON_EPT_MISCONFIG 49 #define EXIT_REASON_WBINVD 54 +#define EXIT_REASON_XSETBV 55 /* * Interruption-information format diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h index ddc04cc..ada81a2 100644 --- a/arch/x86/include/asm/xsave.h +++ b/arch/x86/include/asm/xsave.h @@ -13,6 +13,9 @@ #define FXSAVE_SIZE512 +#define XSTATE_YMM_SIZE 256 +#define XSTATE_YMM_OFFSET (512 + 64) + /* * These are the features that the OS can handle currently. */ diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 875b785..a72d024 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -35,6 +35,8 @@ #include asm/vmx.h #include asm/virtext.h #include asm/mce.h +#include asm/i387.h +#include asm/xcr.h #include trace.h @@ -2517,6 +2519,8 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) vmx-vcpu.arch.cr4_guest_owned_bits = KVM_CR4_GUEST_OWNED_BITS; if (enable_ept) vmx-vcpu.arch.cr4_guest_owned_bits |= X86_CR4_PGE; + if (cpu_has_xsave) + vmx-vcpu.arch.cr4_guest_owned_bits |= X86_CR4_OSXSAVE; vmcs_writel(CR4_GUEST_HOST_MASK, ~vmx-vcpu.arch.cr4_guest_owned_bits); tsc_base = vmx-vcpu.kvm-arch.vm_init_tsc; @@ -3258,6 +3262,25 @@ static int handle_wbinvd(struct kvm_vcpu *vcpu) return 1; } +static int handle_xsetbv(struct kvm_vcpu *vcpu) +{ + u64 new_bv = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDX)) | + kvm_register_read(vcpu, VCPU_REGS_RAX); + u64 host_bv = vcpu-arch.host_xstate_mask; + + if (((new_bv ^ host_bv) ~host_bv) || !(new_bv 1)) + goto err; + if ((host_bv XSTATE_YMM new_bv) !(new_bv XSTATE_SSE)) + goto err; + vcpu-arch.guest_xstate_mask = new_bv; + xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu-arch.guest_xstate_mask); + skip_emulated_instruction(vcpu); + return 1; +err: + kvm_inject_gp(vcpu, 0); + return 1; +} + static int handle_apic_access(struct kvm_vcpu *vcpu) { unsigned long exit_qualification; @@ -3556,6 +3579,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = { [EXIT_REASON_TPR_BELOW_THRESHOLD] = handle_tpr_below_threshold, [EXIT_REASON_APIC_ACCESS] = handle_apic_access, [EXIT_REASON_WBINVD] = handle_wbinvd, + [EXIT_REASON_XSETBV] = handle_xsetbv, [EXIT_REASON_TASK_SWITCH] = handle_task_switch, [EXIT_REASON_MCE_DURING_VMENTRY] = handle_machine_check, [EXIT_REASON_EPT_VIOLATION] = handle_ept_violation, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6b2ce1d..2af3fbe 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -52,6 +52,8 @@ #include asm/desc.h #include asm/mtrr.h #include asm/mce.h +#include asm/i387.h +#include asm/xcr.h #define MAX_IO_MSRS 256 #define CR0_RESERVED_BITS \ @@ -62,6 +64,7 @@