Re: How to do fast accesses to LAPIC TPR under kvm?
On 10/24/2012 11:19 AM, Stefan Fritsch wrote: With the decode table fix I think it should work. It needs some more changes. The patch below did the trick for me. It is against 3.5, because I didn't want to build a whole new kernel (my test machine is a dead slow AMD E-350). The patch is definitely incomplete. It now allows the lock prefix for all mov operations on the cr1-7, which should not be the case. Apart from that, do the changes look reasonable? I have not checked that this is the minimal patch that works. But the LockReg bit was definitely necessary, that was the final piece to make it work. Cheers, Stefan diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 4837375..c7f0ec7 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -128,6 +128,7 @@ #define Priv(127) /* instruction generates #GP if current CPL != 0 */ #define No64(128) #define PageTable (1 29) /* instruction used to write page table */ +#define LockReg (130) /* lock prefix is allowed for the instruction even for reg destination */ /* Source 2 operand type */ #define Src2Shift (30) LockReg conflicts with Src2Shift. #define Src2None(OpNone Src2Shift) @@ -420,6 +421,7 @@ static int emulator_check_intercept(struct x86_emulate_ctxt *ctxt, struct x86_instruction_info info = { .intercept = intercept, .rep_prefix = ctxt-rep_prefix, +.lock_prefix = ctxt-lock_prefix, .modrm_mod = ctxt-modrm_mod, .modrm_reg = ctxt-modrm_reg, .modrm_rm = ctxt-modrm_rm, @@ -2874,7 +2876,10 @@ static int em_mov(struct x86_emulate_ctxt *ctxt) static int em_cr_write(struct x86_emulate_ctxt *ctxt) { -if (ctxt-ops-set_cr(ctxt, ctxt-modrm_reg, ctxt-src.val)) +int cr = ctxt-modrm_reg; Blank line here. +if (ctxt-lock_prefix cr == 0) +cr = 8; But maybe this is better dealt with during general decode, and ctxt-modrm_reg adjusted instead. This removes the code triplicstion. Please also #UD if modrm_reg != 0, and if the feature is not exposed to the guest via cpuid. Please regenerate against kvm.git next, there have been changes to emulate.c. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to do fast accesses to LAPIC TPR under kvm?
On 10/20/2012 12:39 AM, Stefan Fritsch wrote: On Thursday 18 October 2012, Avi Kivity wrote: On 10/18/2012 11:35 AM, Gleb Natapov wrote: You misunderstood the description. V_INTR_MASKING=1 means that CR8 writes are not propagated to real HW APIC. But KVM does not trap access to CR8 unconditionally. It enables CR8 intercept only when there is pending interrupt in IRR that cannot be immediately delivered due to current TPR value. This should eliminate 99% of CR8 intercepts. Right. You will need to expose the alternate encoding of cr8 (IIRC lock mov reg, cr0) on AMD via cpuid, but otherwise it should just work. Be aware that this will break cross-vendor migration. I get an exception and I am not sure why: kvm_entry: vcpu 0 kvm_exit: reason write_cr8 rip 0xd0203788 info 0 0 kvm_emulate_insn: 0:d0203788: f0 0f 22 c0 (prot32) kvm_inj_exception: #UD (0x0) This is qemu-kvm 1.1.2 on Linux 3.2. When I look at arch/x86/kvm/emulate.c (both the current and the v3.2 version), I don't see any special case handling for lock mov reg, cr0 to mean mov reg, cr8. emulate.c will #UD is the Lock flag is missing in the instruction decode table. Before I spend lots of time on debugging my code, can you verify if the alternate encoding of cr8 is actually supported in kvm or if it is maybe missing? Thanks in advance. With the decode table fix I think it should work. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to do fast accesses to LAPIC TPR under kvm?
On Thursday 18 October 2012, Avi Kivity wrote: On 10/18/2012 11:35 AM, Gleb Natapov wrote: You misunderstood the description. V_INTR_MASKING=1 means that CR8 writes are not propagated to real HW APIC. But KVM does not trap access to CR8 unconditionally. It enables CR8 intercept only when there is pending interrupt in IRR that cannot be immediately delivered due to current TPR value. This should eliminate 99% of CR8 intercepts. Right. You will need to expose the alternate encoding of cr8 (IIRC lock mov reg, cr0) on AMD via cpuid, but otherwise it should just work. Be aware that this will break cross-vendor migration. I get an exception and I am not sure why: kvm_entry: vcpu 0 kvm_exit: reason write_cr8 rip 0xd0203788 info 0 0 kvm_emulate_insn: 0:d0203788: f0 0f 22 c0 (prot32) kvm_inj_exception: #UD (0x0) This is qemu-kvm 1.1.2 on Linux 3.2. When I look at arch/x86/kvm/emulate.c (both the current and the v3.2 version), I don't see any special case handling for lock mov reg, cr0 to mean mov reg, cr8. Before I spend lots of time on debugging my code, can you verify if the alternate encoding of cr8 is actually supported in kvm or if it is maybe missing? Thanks in advance. Cheers, Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to do fast accesses to LAPIC TPR under kvm?
On 2012-10-17 21:24, Stefan Fritsch wrote: Hi, OpenBSD/i386 seems to be one of the few operating systems that still uses the LAPIC taks priority register for interrupt handling. On AMD Yeah, only very special OSes do this... ;) CPUs and on older Intel CPUs without the flexpriority feature, this causes a huge performance impact on kvm. I have seen slowdown by a factor of 10. Is there a way to use the TPR under kvm without the slowdown? There are some MSRs inherited from Hyper-V, but using these does not make that much difference. I think this is because they still cause an vmexit for every TPR access. I expect the the same is true for x2apic emulation, isn't it? Didn't study the HyperV interface yet, but the trick is indeed to avoid as many vmexits as possible, specifically when lowering the TPR value has no effect as no interrupts are pending. There is also the kvmvapic, but kvm does not expose a sane interface to it and only uses it for Windows XP specific binary patching. The kvmvapic is not a classic paravirtual interface in that it does not really require guest OS awareness. But it requires the guest to accept being patched. That's the case for certain Windows versions. Also, the option ROMs, including our kvmvapic ROM, have to be mapped at fixed, accessible addresses to allow jumping to it from a patched TPR instructions. Therefore, we limited the patching to known OS versions, avoiding to mess around with other, untested OSes. However, it may be possible to accept OpenBSD as well by adjusting the tests in kvmvapic and possibly adjusting some other details. Another possibility is TPR access via CR8 on AMD, but the missing cr8_legacy CPUID bit and this discussion [1] make me believe that this is not supported under kvm, at least in 32bit mode. Could this be easily fixed? If yes, would it solve the performance problems, i.e. offer performance comparable to Intel's flexpriority feature? Everything that unconditionally traps, and so do CR8 accesses, does not help. OpenBSD seems to be reluctant to stop using the TPR. In fact, in a recent discussion, there has been a suggestion that OpenBSD should switch to using TPR also on OpenBSD/amd64 to solve some problems with boot interrupts. How do you expect this would affect performance under kvm (if using CR8)? Or do you have any other suggestions? One could also modify kvm to expose a real interface to kvmvapic, e.g. allow the guest OS to provide the virtual address of the option rom and the offset of the CPU number in the %fs segment, instead of using hard coded values for Windows XP. Of course, though all we need is a stable address in fact. See vapic_write() for the existing PV interface (between option ROM and hypervisor so far). We can extend it as long as it is compatible with the existing one. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to do fast accesses to LAPIC TPR under kvm?
Hi Jan, On Thu, 18 Oct 2012, Jan Kiszka wrote: There is also the kvmvapic, but kvm does not expose a sane interface to it and only uses it for Windows XP specific binary patching. The kvmvapic is not a classic paravirtual interface in that it does not really require guest OS awareness. But it requires the guest to accept being patched. That's the case for certain Windows versions. Also, the option ROMs, including our kvmvapic ROM, have to be mapped at fixed, accessible addresses to allow jumping to it from a patched TPR instructions. Therefore, we limited the patching to known OS versions, avoiding to mess around with other, untested OSes. However, it may be possible to accept OpenBSD as well by adjusting the tests in kvmvapic and possibly adjusting some other details. The problem I see here is that OpenBSD is, other than Windows XP, still a moving target and details like the offsets in the cpu info struct may change in the future. So some interface where the guest OS provides the necessary details may be nicer. Another possibility is TPR access via CR8 on AMD, but the missing cr8_legacy CPUID bit and this discussion [1] make me believe that this is not supported under kvm, at least in 32bit mode. Could this be easily fixed? If yes, would it solve the performance problems, i.e. offer performance comparable to Intel's flexpriority feature? Everything that unconditionally traps, and so do CR8 accesses, does not help. I was hoping that CR8 access would not trap unconditionally. The AMD Programmer's Manual Vol. 2, section 15.21.2 seems to imply that there is a mode where this is not the case: quote SVM provides a virtual TPR register, V_TPR, for use by the guest; its value is loaded from the VMCB by VMRUN and written back to the VMCB by #VMEXIT. The APIC's TPR always controls the task priority for physical interrupts, and the V_TPR always controls virtual interrupts. While running a guest with V_INTR_MASKING cleared to 0: * Writes to CR8 affect both the APIC's TPR and the V_TPR register * Reads from CR8 operate as they would without SVM While running a guest with V_INTR_MASKING set to 1: * Writes to CR8 affect only the V_TPR register * Reads from CR8 return V_TPR. /quote Is V_INTR_MASKING == 1 not used in kvm? Is it not usable at all for some reason? Or have I misunderstood the description? It would be more likely that other hypervisors add support for CR8 access than that they add kvmvapic compatibility. Therefore such a solution would seem preferable, if it is possible. Cheers, Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to do fast accesses to LAPIC TPR under kvm?
On Thu, Oct 18, 2012 at 09:43:46AM +0200, Stefan Fritsch wrote: Everything that unconditionally traps, and so do CR8 accesses, does not help. I was hoping that CR8 access would not trap unconditionally. The AMD Programmer's Manual Vol. 2, section 15.21.2 seems to imply that there is a mode where this is not the case: quote SVM provides a virtual TPR register, V_TPR, for use by the guest; its value is loaded from the VMCB by VMRUN and written back to the VMCB by #VMEXIT. The APIC's TPR always controls the task priority for physical interrupts, and the V_TPR always controls virtual interrupts. While running a guest with V_INTR_MASKING cleared to 0: * Writes to CR8 affect both the APIC's TPR and the V_TPR register * Reads from CR8 operate as they would without SVM While running a guest with V_INTR_MASKING set to 1: * Writes to CR8 affect only the V_TPR register * Reads from CR8 return V_TPR. /quote Is V_INTR_MASKING == 1 not used in kvm? Is it not usable at all for some reason? Or have I misunderstood the description? You misunderstood the description. V_INTR_MASKING=1 means that CR8 writes are not propagated to real HW APIC. But KVM does not trap access to CR8 unconditionally. It enables CR8 intercept only when there is pending interrupt in IRR that cannot be immediately delivered due to current TPR value. This should eliminate 99% of CR8 intercepts. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to do fast accesses to LAPIC TPR under kvm?
On 10/18/2012 11:35 AM, Gleb Natapov wrote: You misunderstood the description. V_INTR_MASKING=1 means that CR8 writes are not propagated to real HW APIC. But KVM does not trap access to CR8 unconditionally. It enables CR8 intercept only when there is pending interrupt in IRR that cannot be immediately delivered due to current TPR value. This should eliminate 99% of CR8 intercepts. Right. You will need to expose the alternate encoding of cr8 (IIRC lock mov reg, cr0) on AMD via cpuid, but otherwise it should just work. Be aware that this will break cross-vendor migration. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to do fast accesses to LAPIC TPR under kvm?
On Thu, 18 Oct 2012, Avi Kivity wrote: On 10/18/2012 11:35 AM, Gleb Natapov wrote: You misunderstood the description. V_INTR_MASKING=1 means that CR8 writes are not propagated to real HW APIC. But KVM does not trap access to CR8 unconditionally. It enables CR8 intercept only when there is pending interrupt in IRR that cannot be immediately delivered due to current TPR value. This should eliminate 99% of CR8 intercepts. Right. You will need to expose the alternate encoding of cr8 (IIRC lock mov reg, cr0) on AMD via cpuid, but otherwise it should just work. Be aware that this will break cross-vendor migration. Thanks for the clarifications. I will try that. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
How to do fast accesses to LAPIC TPR under kvm?
Hi, OpenBSD/i386 seems to be one of the few operating systems that still uses the LAPIC taks priority register for interrupt handling. On AMD CPUs and on older Intel CPUs without the flexpriority feature, this causes a huge performance impact on kvm. I have seen slowdown by a factor of 10. Is there a way to use the TPR under kvm without the slowdown? There are some MSRs inherited from Hyper-V, but using these does not make that much difference. I think this is because they still cause an vmexit for every TPR access. I expect the the same is true for x2apic emulation, isn't it? There is also the kvmvapic, but kvm does not expose a sane interface to it and only uses it for Windows XP specific binary patching. Another possibility is TPR access via CR8 on AMD, but the missing cr8_legacy CPUID bit and this discussion [1] make me believe that this is not supported under kvm, at least in 32bit mode. Could this be easily fixed? If yes, would it solve the performance problems, i.e. offer performance comparable to Intel's flexpriority feature? OpenBSD seems to be reluctant to stop using the TPR. In fact, in a recent discussion, there has been a suggestion that OpenBSD should switch to using TPR also on OpenBSD/amd64 to solve some problems with boot interrupts. How do you expect this would affect performance under kvm (if using CR8)? Or do you have any other suggestions? One could also modify kvm to expose a real interface to kvmvapic, e.g. allow the guest OS to provide the virtual address of the option rom and the offset of the CPU number in the %fs segment, instead of using hard coded values for Windows XP. Cheers, Stefan [1] http://www.mail-archive.com/kvm@vger.kernel.org/msg30627.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html