Re: [PATCH 5/6] KVM: PPC: Book3S HV: Access XICS in BE
On 17.06.14 03:02, Paul Mackerras wrote: On Wed, Jun 11, 2014 at 12:33:50PM +0200, Alexander Graf wrote: On the exit path from the guest we check what type of interrupt we received if we received one. This means we're doing hardware access to the XICS interrupt controller. However, when running on a little endian system, this access is byte reversed. So let's make sure to swizzle the bytes back again and virtuall make XICS accesses big endian. ... @@ -2241,7 +2253,8 @@ kvmppc_read_intr: 42: /* It's not an IPI and it's for the host, stash it in the PACA * before exit, it will be picked up by the host ICP driver */ - stw r0, HSTATE_SAVED_XIRR(r13) + li r4, HSTATE_SAVED_XIRR + STWX_BE r0, r13, r4 This is a paca field, not something mandated by PAPR or shared with the guest, so why do we need to keep it BE? If you do make it BE, don't you also need to fix kvmppc_get_xics_latch()? Yikes. Yes. Thanks a lot for the catch! Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
On vcpu schedule, the condition checked for tlb pollution is too loose. The tlb entries of a vcpu become polluted (vs stale) only when a different vcpu within the same logical partition runs in-between. Optimize the tlb invalidation condition taking into account the logical partition id. With the new invalidation condition, a guest shows 4% performance improvement on P5020DS while running a memory stress application with the cpu oversubscribed, the other guest running a cpu intensive workload. Guest - old invalidation condition real 3.89 user 3.87 sys 0.01 Guest - enhanced invalidation condition real 3.75 user 3.73 sys 0.01 Host real 3.70 user 1.85 sys 0.00 The memory stress application accesses 4KB pages backed by 75% of available TLB0 entries: char foo[ENTRIES][4096] __attribute__ ((aligned (4096))); int main() { char bar; int i, j; for (i = 0; i ITERATIONS; i++) for (j = 0; j ENTRIES; j++) bar = foo[j][0]; return 0; } Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Cc: Scott Wood scottw...@freescale.com --- v2: - improve patch name and description - add performance results arch/powerpc/kvm/e500mc.c | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c index 17e4562..d3b814b0 100644 --- a/arch/powerpc/kvm/e500mc.c +++ b/arch/powerpc/kvm/e500mc.c @@ -111,10 +111,12 @@ void kvmppc_mmu_msr_notify(struct kvm_vcpu *vcpu, u32 old_msr) } static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu_on_cpu); +static DEFINE_PER_CPU(int, last_lpid_on_cpu); static void kvmppc_core_vcpu_load_e500mc(struct kvm_vcpu *vcpu, int cpu) { struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu); + bool update_last = false, inval_tlb = false; kvmppc_booke_vcpu_load(vcpu, cpu); @@ -140,12 +142,24 @@ static void kvmppc_core_vcpu_load_e500mc(struct kvm_vcpu *vcpu, int cpu) mtspr(SPRN_GDEAR, vcpu-arch.shared-dar); mtspr(SPRN_GESR, vcpu-arch.shared-esr); - if (vcpu-arch.oldpir != mfspr(SPRN_PIR) || - __get_cpu_var(last_vcpu_on_cpu) != vcpu) { - kvmppc_e500_tlbil_all(vcpu_e500); + if (vcpu-arch.oldpir != mfspr(SPRN_PIR)) { + /* stale tlb entries */ + inval_tlb = update_last = true; + } else if (__get_cpu_var(last_vcpu_on_cpu) != vcpu) { + update_last = true; + /* polluted tlb entries */ + inval_tlb = __get_cpu_var(last_lpid_on_cpu) == + vcpu-kvm-arch.lpid; + } + + if (update_last) { __get_cpu_var(last_vcpu_on_cpu) = vcpu; + __get_cpu_var(last_lpid_on_cpu) = vcpu-kvm-arch.lpid; } + if (inval_tlb) + kvmppc_e500_tlbil_all(vcpu_e500); + kvmppc_load_guest_fp(vcpu); } -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Book3s PR: Disable AIL mode with OPAL
On 12.06.14 05:56, Paul Mackerras wrote: On Tue, Jun 10, 2014 at 07:23:00PM +0200, Alexander Graf wrote: When we're using PR KVM we must not allow the CPU to take interrupts in virtual mode, as the SLB does not contain host kernel mappings when running inside the guest context. To make sure we get good performance for non-KVM tasks but still properly functioning PR KVM, let's just disable AIL whenever a vcpu is scheduled in. This patch fixes running PR KVM on POWER8 bare metal for me. We already handle this for the situation where we're running under a hypervisor with the calls to pSeries_disable_reloc_on_exc() and pSeries_enable_reloc_on_exc() in kvmppc_core_init_vm_pr() and kvmppc_core_destroy_vm_pr() respectively. The obvious approach to fixing this problem would be to generalize those calls, perhaps via a ppc_md callback, to work on the powernv platform too. If you don't want to do that, for instance because those calls are defined to operate across the whole machine rather than a single CPU thread, and you prefer to affect just the one thread we're running on, then I think you need to explain that in the commit message. It's what I've done at first, yes. Unfortunately the pSeries call is system global, while we need to do the AIL switching per cpu on bare metal. Once you start considering CPU hotplug and how that affects the secondary bringup paths you start wondering whether it's worth the hassle :). This way you can have a single guest running which only slows down (disables AIL) on its own vcpu, while all the others still enjoy the benefits of AIL. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc/kvm: support to handle sw breakpoint
On Tue, 2014-06-17 at 11:25 +0200, Alexander Graf wrote: On 17.06.14 11:22, Benjamin Herrenschmidt wrote: On Tue, 2014-06-17 at 10:54 +0200, Alexander Graf wrote: Also, why don't we use twi always or something else that actually is defined as illegal instruction? I would like to see this shared with book3s_32 PR. twi will be directed to the guest on HV no ? We want a real illegal because those go to the host (for potential emulation by the HV). Ah, good point. I guess we need different one for PR and HV then to ensure compatibility with older ISAs on PR. Well, we also need to be careful with what happens if a PR guest puts that instruction in, do that stop its HV guest/host ? What if it's done in userspace ? Do that stop the kernel ? :-) Maddy, I haven't checked, does your patch ensure that we only ever stop if the instruction is at a recorded bkpt address ? It still means that a userspace process can practically DOS its kernel by issuing a lot of these causing a crapload of exits. Cheers, Ben. Alex I'm trying to see if I can get the architect to set one in stone in a future proof way. Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc/kvm: support to handle sw breakpoint
On 17.06.14 11:32, Benjamin Herrenschmidt wrote: On Tue, 2014-06-17 at 11:25 +0200, Alexander Graf wrote: On 17.06.14 11:22, Benjamin Herrenschmidt wrote: On Tue, 2014-06-17 at 10:54 +0200, Alexander Graf wrote: Also, why don't we use twi always or something else that actually is defined as illegal instruction? I would like to see this shared with book3s_32 PR. twi will be directed to the guest on HV no ? We want a real illegal because those go to the host (for potential emulation by the HV). Ah, good point. I guess we need different one for PR and HV then to ensure compatibility with older ISAs on PR. Well, we also need to be careful with what happens if a PR guest puts that instruction in, do that stop its HV guest/host ? What if it's done in userspace ? Do that stop the kernel ? :-) The way SW breakpointing is handled is that when we see one, it gets deflected into user space. User space then has an array of breakpoints it configured itself. If the breakpoint is part of that list, it consumes it. If not, it injects a debug interrupt (program in this case) into the guest. That way we can overlay that one instruction with as many layers as we like :). We only get a performance hit on execution of that instruction. Maddy, I haven't checked, does your patch ensure that we only ever stop if the instruction is at a recorded bkpt address ? It still means that a userspace process can practically DOS its kernel by issuing a lot of these causing a crapload of exits. Only user space knows about its breakpoint addresses, so we have to deflect. However since time still ticks on, we only increase jitter of the guest. The process would still get scheduled away after the same amount of real time, no? Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] KVM: PPC: Book3s PR: Disable AIL mode with OPAL
When we're using PR KVM we must not allow the CPU to take interrupts in virtual mode, as the SLB does not contain host kernel mappings when running inside the guest context. To make sure we get good performance for non-KVM tasks but still properly functioning PR KVM, let's just disable AIL whenever a vcpu is scheduled in. This is fundamentally different from how we deal with AIL on pSeries type machines where we disable AIL for the whole machine as soon as a single KVM VM is up. The reason for that is easy - on pSeries we do not have control over per-cpu configuration of AIL. We also don't want to mess with CPU hotplug races and AIL configuration, so setting it per CPU is easier and more flexible. This patch fixes running PR KVM on POWER8 bare metal for me. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - Extend patch description diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index 3da412e..8ea7da4 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -71,6 +71,12 @@ static void kvmppc_core_vcpu_load_pr(struct kvm_vcpu *vcpu, int cpu) svcpu-in_use = 0; svcpu_put(svcpu); #endif + + /* Disable AIL if supported */ + if (cpu_has_feature(CPU_FTR_HVMODE) + cpu_has_feature(CPU_FTR_ARCH_207S)) + mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) ~LPCR_AIL); + vcpu-cpu = smp_processor_id(); #ifdef CONFIG_PPC_BOOK3S_32 current-thread.kvm_shadow_vcpu = vcpu-arch.shadow_vcpu; @@ -91,6 +97,12 @@ static void kvmppc_core_vcpu_put_pr(struct kvm_vcpu *vcpu) kvmppc_giveup_ext(vcpu, MSR_FP | MSR_VEC | MSR_VSX); kvmppc_giveup_fac(vcpu, FSCR_TAR_LG); + + /* Enable AIL if supported */ + if (cpu_has_feature(CPU_FTR_HVMODE) + cpu_has_feature(CPU_FTR_ARCH_207S)) + mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) | LPCR_AIL_3); + vcpu-cpu = -1; } -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: PPC: Book3s PR: Disable AIL mode with OPAL
On Tue, Jun 17, 2014 at 12:03:58PM +0200, Alexander Graf wrote: When we're using PR KVM we must not allow the CPU to take interrupts in virtual mode, as the SLB does not contain host kernel mappings when running inside the guest context. To make sure we get good performance for non-KVM tasks but still properly functioning PR KVM, let's just disable AIL whenever a vcpu is scheduled in. This is fundamentally different from how we deal with AIL on pSeries type machines where we disable AIL for the whole machine as soon as a single KVM VM is up. The reason for that is easy - on pSeries we do not have control over per-cpu configuration of AIL. We also don't want to mess with CPU hotplug races and AIL configuration, so setting it per CPU is easier and more flexible. This patch fixes running PR KVM on POWER8 bare metal for me. Signed-off-by: Alexander Graf ag...@suse.de Acked-by: Paul Mackerras pau...@samba.org -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6] KVM: PPC: Book3S HV: Access XICS in BE
On 17.06.14 10:37, Alexander Graf wrote: On 17.06.14 03:02, Paul Mackerras wrote: On Wed, Jun 11, 2014 at 12:33:50PM +0200, Alexander Graf wrote: On the exit path from the guest we check what type of interrupt we received if we received one. This means we're doing hardware access to the XICS interrupt controller. However, when running on a little endian system, this access is byte reversed. So let's make sure to swizzle the bytes back again and virtuall make XICS accesses big endian. ... @@ -2241,7 +2253,8 @@ kvmppc_read_intr: 42:/* It's not an IPI and it's for the host, stash it in the PACA * before exit, it will be picked up by the host ICP driver */ -stwr0, HSTATE_SAVED_XIRR(r13) +lir4, HSTATE_SAVED_XIRR +STWX_BEr0, r13, r4 This is a paca field, not something mandated by PAPR or shared with the guest, so why do we need to keep it BE? If you do make it BE, don't you also need to fix kvmppc_get_xics_latch()? Yikes. Yes. Thanks a lot for the catch! Eh, no. What we do is we read (good on BE, byte reversed) into r0. Then we swab32() from r0 to r3 on LE, mr from r0 to r3 on BE. r3 gets truncated along the way. The reason we maintain r0 as wrong-endian is that we write it back using the cache inhibited stwcix instruction: stwcix r0, r6, r7 /* EOI it */ So during the lifetime of r0 as XIRR it's always byte-reversed on LE. That's why we store it using STWX_BE into hstate, because that's the time when we actually swab32() it for further interpretation. Alternatively I could clobber a different register and maintain the byte swapped variant in there if you prefer. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc/kvm: support to handle sw breakpoint
On Tuesday 17 June 2014 02:24 PM, Alexander Graf wrote: On 14.06.14 23:08, Madhavan Srinivasan wrote: This patch adds kernel side support for software breakpoint. Design is that, by using an illegal instruction, we trap to hypervisor via Emulation Assistance interrupt, where we check for the illegal instruction and accordingly we return to Host or Guest. Patch mandates use of abs instruction (primary opcode 31 and extended opcode 360) as sw breakpoint instruction. Based on PowerISA v2.01, ABS instruction has been dropped from the architecture and treated an illegal instruction. Signed-off-by: Madhavan Srinivasan ma...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s.c| 3 ++- arch/powerpc/kvm/book3s_hv.c | 23 +++ 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index c254c27..b40fe5d 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -789,7 +789,8 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg) { -return -EINVAL; +vcpu-guest_debug = dbg-control; +return 0; } void kvmppc_decrementer_func(unsigned long data) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 7a12edb..688421d 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -67,6 +67,14 @@ /* Used as a null value for timebase values */ #define TB_NIL(~(u64)0) +/* + * SW_BRK_DBG_INT is debug Instruction for supporting Software Breakpoint. + * Instruction mnemonic is ABS, primary opcode is 31 and extended opcode is 360. + * Based on PowerISA v2.01, ABS instruction has been dropped from the architecture + * and treated an illegal instruction. + */ +#define SW_BRK_DBG_INT 0x7c0002d0 The instruction we use to trap needs to get exposed to user space via a ONE_REG property. Yes. I got to know about that from Bharat (patchset ppc debug: Add debug stub support). I will change it. Also, why don't we use twi always or something else that actually is defined as illegal instruction? I would like to see this shared with book3s_32 PR. + static void kvmppc_end_cede(struct kvm_vcpu *vcpu); static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); @@ -721,12 +729,19 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, break; /* * This occurs if the guest executes an illegal instruction. - * We just generate a program interrupt to the guest, since - * we don't emulate any guest instructions at this stage. + * To support software breakpoint, we check for the sw breakpoint + * instruction to return back to host, if not we just generate a + * program interrupt to the guest. */ case BOOK3S_INTERRUPT_H_EMUL_ASSIST: -kvmppc_core_queue_program(vcpu, SRR1_PROGILL); -r = RESUME_GUEST; +if (vcpu-arch.last_inst == SW_BRK_DBG_INT) { Don't access last_inst directly. Instead use the provided helpers. Ok. Will look and replace it. +run-exit_reason = KVM_EXIT_DEBUG; +run-debug.arch.address = vcpu-arch.pc; +r = RESUME_HOST; +} else { +kvmppc_core_queue_program(vcpu, 0x8); magic numbers ^ I did not understand this? +r = RESUME_GUEST; +} break; /* * This occurs if the guest (kernel or userspace), does something that Please enable PR KVM as well while you're at it. My bad, I did not try the PR KVM. I will try it out. Alex Thanks for review Regards Maddy -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc/kvm: support to handle sw breakpoint
On 17.06.14 13:07, Madhavan Srinivasan wrote: On Tuesday 17 June 2014 02:24 PM, Alexander Graf wrote: On 14.06.14 23:08, Madhavan Srinivasan wrote: This patch adds kernel side support for software breakpoint. Design is that, by using an illegal instruction, we trap to hypervisor via Emulation Assistance interrupt, where we check for the illegal instruction and accordingly we return to Host or Guest. Patch mandates use of abs instruction (primary opcode 31 and extended opcode 360) as sw breakpoint instruction. Based on PowerISA v2.01, ABS instruction has been dropped from the architecture and treated an illegal instruction. Signed-off-by: Madhavan Srinivasan ma...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s.c| 3 ++- arch/powerpc/kvm/book3s_hv.c | 23 +++ 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index c254c27..b40fe5d 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -789,7 +789,8 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg) { -return -EINVAL; +vcpu-guest_debug = dbg-control; +return 0; } void kvmppc_decrementer_func(unsigned long data) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 7a12edb..688421d 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -67,6 +67,14 @@ /* Used as a null value for timebase values */ #define TB_NIL(~(u64)0) +/* + * SW_BRK_DBG_INT is debug Instruction for supporting Software Breakpoint. + * Instruction mnemonic is ABS, primary opcode is 31 and extended opcode is 360. + * Based on PowerISA v2.01, ABS instruction has been dropped from the architecture + * and treated an illegal instruction. + */ +#define SW_BRK_DBG_INT 0x7c0002d0 The instruction we use to trap needs to get exposed to user space via a ONE_REG property. Yes. I got to know about that from Bharat (patchset ppc debug: Add debug stub support). I will change it. Also, why don't we use twi always or something else that actually is defined as illegal instruction? I would like to see this shared with book3s_32 PR. + static void kvmppc_end_cede(struct kvm_vcpu *vcpu); static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); @@ -721,12 +729,19 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, break; /* * This occurs if the guest executes an illegal instruction. - * We just generate a program interrupt to the guest, since - * we don't emulate any guest instructions at this stage. + * To support software breakpoint, we check for the sw breakpoint + * instruction to return back to host, if not we just generate a + * program interrupt to the guest. */ case BOOK3S_INTERRUPT_H_EMUL_ASSIST: -kvmppc_core_queue_program(vcpu, SRR1_PROGILL); -r = RESUME_GUEST; +if (vcpu-arch.last_inst == SW_BRK_DBG_INT) { Don't access last_inst directly. Instead use the provided helpers. Ok. Will look and replace it. +run-exit_reason = KVM_EXIT_DEBUG; +run-debug.arch.address = vcpu-arch.pc; +r = RESUME_HOST; +} else { +kvmppc_core_queue_program(vcpu, 0x8); magic numbers ^ I did not understand this? You're replacing the readable SRR1_PROGILL with the unreadable 0x8. That's bad. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc/kvm: support to handle sw breakpoint
On Tuesday 17 June 2014 03:02 PM, Benjamin Herrenschmidt wrote: On Tue, 2014-06-17 at 11:25 +0200, Alexander Graf wrote: On 17.06.14 11:22, Benjamin Herrenschmidt wrote: On Tue, 2014-06-17 at 10:54 +0200, Alexander Graf wrote: Also, why don't we use twi always or something else that actually is defined as illegal instruction? I would like to see this shared with book3s_32 PR. twi will be directed to the guest on HV no ? We want a real illegal because those go to the host (for potential emulation by the HV). Ah, good point. I guess we need different one for PR and HV then to ensure compatibility with older ISAs on PR. Well, we also need to be careful with what happens if a PR guest puts that instruction in, do that stop its HV guest/host ? Damn, my mail client is messed up. did not see the mail till now. I havent tried this incase of PR guest kernel. I will need to try this before commenting. What if it's done in userspace ? Do that stop the kernel ? :-) Basically flow is that, when we see this instruction, we return to host, and host checks for address in the SW array and if not it returns to kernel. Maddy, I haven't checked, does your patch ensure that we only ever stop if the instruction is at a recorded bkpt address ? It still means that a userspace process can practically DOS its kernel by issuing a lot of these causing a crapload of exits. This is valid, userspace can create a mess, need to handle this, meaning incase if we dont find a valid SW breakpoint for this address in the HOST, we need to route it to guest and kill it at app. Regards Maddy Cheers, Ben. Alex I'm trying to see if I can get the architect to set one in stone in a future proof way. Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc/kvm: support to handle sw breakpoint
On Tuesday 17 June 2014 03:13 PM, Alexander Graf wrote: On 17.06.14 11:32, Benjamin Herrenschmidt wrote: On Tue, 2014-06-17 at 11:25 +0200, Alexander Graf wrote: On 17.06.14 11:22, Benjamin Herrenschmidt wrote: On Tue, 2014-06-17 at 10:54 +0200, Alexander Graf wrote: Also, why don't we use twi always or something else that actually is defined as illegal instruction? I would like to see this shared with book3s_32 PR. twi will be directed to the guest on HV no ? We want a real illegal because those go to the host (for potential emulation by the HV). Ah, good point. I guess we need different one for PR and HV then to ensure compatibility with older ISAs on PR. Well, we also need to be careful with what happens if a PR guest puts that instruction in, do that stop its HV guest/host ? What if it's done in userspace ? Do that stop the kernel ? :-) The way SW breakpointing is handled is that when we see one, it gets deflected into user space. User space then has an array of breakpoints it configured itself. If the breakpoint is part of that list, it consumes it. If not, it injects a debug interrupt (program in this case) into the guest. That way we can overlay that one instruction with as many layers as we like :). We only get a performance hit on execution of that instruction. Maddy, I haven't checked, does your patch ensure that we only ever stop if the instruction is at a recorded bkpt address ? It still means that a userspace process can practically DOS its kernel by issuing a lot of these causing a crapload of exits. Only user space knows about its breakpoint addresses, so we have to deflect. However since time still ticks on, we only increase jitter of the guest. The process would still get scheduled away after the same ^^^ Where is this taken care. I am still trying to understand. Kindly can you explain or point to the code. Will help. amount of real time, no? Alex Thanks for review. Regards Maddy -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc/kvm: support to handle sw breakpoint
On Tuesday 17 June 2014 04:38 PM, Alexander Graf wrote: On 17.06.14 13:07, Madhavan Srinivasan wrote: On Tuesday 17 June 2014 02:24 PM, Alexander Graf wrote: On 14.06.14 23:08, Madhavan Srinivasan wrote: This patch adds kernel side support for software breakpoint. Design is that, by using an illegal instruction, we trap to hypervisor via Emulation Assistance interrupt, where we check for the illegal instruction and accordingly we return to Host or Guest. Patch mandates use of abs instruction (primary opcode 31 and extended opcode 360) as sw breakpoint instruction. Based on PowerISA v2.01, ABS instruction has been dropped from the architecture and treated an illegal instruction. Signed-off-by: Madhavan Srinivasan ma...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s.c| 3 ++- arch/powerpc/kvm/book3s_hv.c | 23 +++ 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index c254c27..b40fe5d 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -789,7 +789,8 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg) { -return -EINVAL; +vcpu-guest_debug = dbg-control; +return 0; } void kvmppc_decrementer_func(unsigned long data) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 7a12edb..688421d 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -67,6 +67,14 @@ /* Used as a null value for timebase values */ #define TB_NIL(~(u64)0) +/* + * SW_BRK_DBG_INT is debug Instruction for supporting Software Breakpoint. + * Instruction mnemonic is ABS, primary opcode is 31 and extended opcode is 360. + * Based on PowerISA v2.01, ABS instruction has been dropped from the architecture + * and treated an illegal instruction. + */ +#define SW_BRK_DBG_INT 0x7c0002d0 The instruction we use to trap needs to get exposed to user space via a ONE_REG property. Yes. I got to know about that from Bharat (patchset ppc debug: Add debug stub support). I will change it. Also, why don't we use twi always or something else that actually is defined as illegal instruction? I would like to see this shared with book3s_32 PR. + static void kvmppc_end_cede(struct kvm_vcpu *vcpu); static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); @@ -721,12 +729,19 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, break; /* * This occurs if the guest executes an illegal instruction. - * We just generate a program interrupt to the guest, since - * we don't emulate any guest instructions at this stage. + * To support software breakpoint, we check for the sw breakpoint + * instruction to return back to host, if not we just generate a + * program interrupt to the guest. */ case BOOK3S_INTERRUPT_H_EMUL_ASSIST: -kvmppc_core_queue_program(vcpu, SRR1_PROGILL); -r = RESUME_GUEST; +if (vcpu-arch.last_inst == SW_BRK_DBG_INT) { Don't access last_inst directly. Instead use the provided helpers. Ok. Will look and replace it. +run-exit_reason = KVM_EXIT_DEBUG; +run-debug.arch.address = vcpu-arch.pc; +r = RESUME_HOST; +} else { +kvmppc_core_queue_program(vcpu, 0x8); magic numbers ^ I did not understand this? You're replacing the readable SRR1_PROGILL with the unreadable 0x8. That's bad. Oops. My bad. Will undo that. I guess I messed up when was re basing it. Alex Thanks for review Regards Maddy -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc/kvm: support to handle sw breakpoint
On 17.06.14 13:20, Madhavan Srinivasan wrote: On Tuesday 17 June 2014 03:13 PM, Alexander Graf wrote: On 17.06.14 11:32, Benjamin Herrenschmidt wrote: On Tue, 2014-06-17 at 11:25 +0200, Alexander Graf wrote: On 17.06.14 11:22, Benjamin Herrenschmidt wrote: On Tue, 2014-06-17 at 10:54 +0200, Alexander Graf wrote: Also, why don't we use twi always or something else that actually is defined as illegal instruction? I would like to see this shared with book3s_32 PR. twi will be directed to the guest on HV no ? We want a real illegal because those go to the host (for potential emulation by the HV). Ah, good point. I guess we need different one for PR and HV then to ensure compatibility with older ISAs on PR. Well, we also need to be careful with what happens if a PR guest puts that instruction in, do that stop its HV guest/host ? What if it's done in userspace ? Do that stop the kernel ? :-) The way SW breakpointing is handled is that when we see one, it gets deflected into user space. User space then has an array of breakpoints it configured itself. If the breakpoint is part of that list, it consumes it. If not, it injects a debug interrupt (program in this case) into the guest. That way we can overlay that one instruction with as many layers as we like :). We only get a performance hit on execution of that instruction. Maddy, I haven't checked, does your patch ensure that we only ever stop if the instruction is at a recorded bkpt address ? It still means that a userspace process can practically DOS its kernel by issuing a lot of these causing a crapload of exits. Only user space knows about its breakpoint addresses, so we have to deflect. However since time still ticks on, we only increase jitter of the guest. The process would still get scheduled away after the same ^^^ Where is this taken care. I am still trying to understand. Kindly can you explain or point to the code. Will help. We tell the guest via VPA about its steal time which includes QEMU time. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Tuesday, June 17, 2014 12:09 PM To: Wood Scott-B07421 Cc: Caraman Mihai Claudiu-B02008; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc-...@lists.ozlabs.org Subject: Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule On 13.06.14 21:42, Scott Wood wrote: On Fri, 2014-06-13 at 16:55 +0200, Alexander Graf wrote: On 13.06.14 16:43, mihai.cara...@freescale.com wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, June 12, 2014 8:05 PM To: Caraman Mihai Claudiu-B02008 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org; Wood Scott-B07421 Subject: Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule On 06/12/2014 04:00 PM, Mihai Caraman wrote: On vcpu schedule, the condition checked for tlb pollution is too tight. The tlb entries of one vcpu are polluted when a different vcpu from the same partition runs in-between. Relax the current tlb invalidation condition taking into account the lpid. Can you quantify the performance improvement from this? We've had bugs in this area before, so let's make sure it's worth it before making this more complicated. Signed-off-by: Mihai Caraman mihai.caraman at freescale.com Your mailer is broken? :) This really should be an @. I think this should work. Scott, please ack. Alex, you were right. I screwed up the patch description by inverting relax and tight terms :) It should have been more like this: KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule On vcpu schedule, the condition checked for tlb pollution is too loose. The tlb entries of a vcpu are polluted (vs stale) only when a different vcpu within the same logical partition runs in-between. Optimize the tlb invalidation condition taking into account the lpid. Can't we give every vcpu its own lpid? Or don't we trap on global invalidates? That would significantly increase the odds of exhausting LPIDs, especially on large chips like t4240 with similarly large VMs. If we were to do that, the LPIDs would need to be dynamically assigned (like PIDs), and should probably be a separate numberspace per physical core. True, I didn't realize we only have so few of them. It would however save us from most flushing as long as we have spare LPIDs available :). Yes, we had this proposal on the table for e6500 multithreaded core. This core lacks tlb write conditional instruction, so an OS needs to use locks to protect itself against concurrent tlb writes executed from sibling threads. When we expose hw treads as single-threaded vcpus (useful when the user opt not to pin vcpus), the guest can't no longer protect itself optimally (it can protect tlb writes across all threads but this is not acceptable). So instead, we found a solution at hypervisor level by assigning different logical partition ids to guest's vcpus running simultaneous on sibling hw threads. Currently in FSL SDK we allocate two lpids to each guest. I am also a proponent for using all LPID space (63 values) per (multi-threaded) physical core, which will lead to fewer invalidates on vcpu schedule and will accommodate the solution described above. -Mike
Re: [PATCH 5/6] KVM: PPC: Book3S HV: Access XICS in BE
On Tue, Jun 17, 2014 at 12:22:32PM +0200, Alexander Graf wrote: Eh, no. What we do is we read (good on BE, byte reversed) into r0. Then we swab32() from r0 to r3 on LE, mr from r0 to r3 on BE. r3 gets truncated along the way. The reason we maintain r0 as wrong-endian is that we write it back using the cache inhibited stwcix instruction: stwcix r0, r6, r7 /* EOI it */ So during the lifetime of r0 as XIRR it's always byte-reversed on LE. That's why we store it using STWX_BE into hstate, because that's the time when we actually swab32() it for further interpretation. So the STWX_BE is more like a be32_to_cpu than a cpu_to_be32, which is what the name STWX_BE would suggest. Sounds like it at least deserves a comment, or (as you suggest) rearrange the register usage so a normal store works. Paul. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6] KVM: PPC: Book3S HV: Access XICS in BE
On 17.06.14 14:13, Paul Mackerras wrote: On Tue, Jun 17, 2014 at 12:22:32PM +0200, Alexander Graf wrote: Eh, no. What we do is we read (good on BE, byte reversed) into r0. Then we swab32() from r0 to r3 on LE, mr from r0 to r3 on BE. r3 gets truncated along the way. The reason we maintain r0 as wrong-endian is that we write it back using the cache inhibited stwcix instruction: stwcix r0, r6, r7 /* EOI it */ So during the lifetime of r0 as XIRR it's always byte-reversed on LE. That's why we store it using STWX_BE into hstate, because that's the time when we actually swab32() it for further interpretation. So the STWX_BE is more like a be32_to_cpu than a cpu_to_be32, which is what the name STWX_BE would suggest. Sounds like it at least deserves a comment, or (as you suggest) rearrange the register usage so a normal store works. Yes, I have this now: From a94a66437ec714ec5650f6d8fec050a33e4477ca Mon Sep 17 00:00:00 2001 From: Alexander Graf ag...@suse.de Date: Wed, 11 Jun 2014 10:37:52 +0200 Subject: [PATCH] KVM: PPC: Book3S HV: Access XICS in BE On the exit path from the guest we check what type of interrupt we received if we received one. This means we're doing hardware access to the XICS interrupt controller. However, when running on a little endian system, this access is byte reversed. So let's make sure to swizzle the bytes back again and virtually make XICS accesses big endian. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - Make code easier to follow diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 1a2f471..9829e18 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -36,6 +36,13 @@ #define NAPPING_CEDE1 #define NAPPING_NOVCPU2 +.macro bswap32 regd, regs +srwi\regd,\regs,24 +rlwimi\regd,\regs,24,16,23 +rlwimi\regd,\regs,8,8,15 +rlwimi\regd,\regs,24,0,7 +.endm + /* * Call kvmppc_hv_entry in real mode. * Must be called with interrupts hard-disabled. @@ -2325,7 +2332,12 @@ kvmppc_read_intr: cmpdir6, 0 beq-1f lwzcixr0, r6, r7 -rlwinm.r3, r0, 0, 0xff +#ifdef __LITTLE_ENDIAN__ +bswap32r4, r0 +#else +mrr4, r0 +#endif +rlwinm.r3, r4, 0, 0xff sync beq1f/* if nothing pending in the ICP */ @@ -2360,7 +2372,7 @@ kvmppc_read_intr: 42:/* It's not an IPI and it's for the host, stash it in the PACA * before exit, it will be picked up by the host ICP driver */ -stwr0, HSTATE_SAVED_XIRR(r13) +stwr4, HSTATE_SAVED_XIRR(r13) lir3, 1 b1b Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc/kvm: support to handle sw breakpoint
On 17.06.14 13:13, Madhavan Srinivasan wrote: On Tuesday 17 June 2014 04:38 PM, Alexander Graf wrote: On 17.06.14 13:07, Madhavan Srinivasan wrote: On Tuesday 17 June 2014 02:24 PM, Alexander Graf wrote: On 14.06.14 23:08, Madhavan Srinivasan wrote: This patch adds kernel side support for software breakpoint. Design is that, by using an illegal instruction, we trap to hypervisor via Emulation Assistance interrupt, where we check for the illegal instruction and accordingly we return to Host or Guest. Patch mandates use of abs instruction (primary opcode 31 and extended opcode 360) as sw breakpoint instruction. Based on PowerISA v2.01, ABS instruction has been dropped from the architecture and treated an illegal instruction. Signed-off-by: Madhavan Srinivasan ma...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s.c| 3 ++- arch/powerpc/kvm/book3s_hv.c | 23 +++ 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index c254c27..b40fe5d 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -789,7 +789,8 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg) { -return -EINVAL; +vcpu-guest_debug = dbg-control; +return 0; } void kvmppc_decrementer_func(unsigned long data) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 7a12edb..688421d 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -67,6 +67,14 @@ /* Used as a null value for timebase values */ #define TB_NIL(~(u64)0) +/* + * SW_BRK_DBG_INT is debug Instruction for supporting Software Breakpoint. + * Instruction mnemonic is ABS, primary opcode is 31 and extended opcode is 360. + * Based on PowerISA v2.01, ABS instruction has been dropped from the architecture + * and treated an illegal instruction. + */ +#define SW_BRK_DBG_INT 0x7c0002d0 The instruction we use to trap needs to get exposed to user space via a ONE_REG property. Yes. I got to know about that from Bharat (patchset ppc debug: Add debug stub support). I will change it. Also please make sure to pick an instruction that preferably looks identical regardless of guest endianness. Segher suggested 0x0000. Does that trap properly for you? Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book3S PR: Handle hyp doorbell exits
If we're running PR KVM in HV mode, we may get hypervisor doorbell interrupts. Handle those the same way we treat normal doorbells. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_pr.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index 8ea7da4..3b82e86 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -988,6 +988,7 @@ int kvmppc_handle_exit_pr(struct kvm_run *run, struct kvm_vcpu *vcpu, case BOOK3S_INTERRUPT_DECREMENTER: case BOOK3S_INTERRUPT_HV_DECREMENTER: case BOOK3S_INTERRUPT_DOORBELL: + case BOOK3S_INTERRUPT_H_DOORBELL: vcpu-stat.dec_exits++; r = RESUME_GUEST; break; -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Book3S HV: Fix ABIv2 indirect branch issue
On 12.06.14 10:16, Anton Blanchard wrote: To establish addressability quickly, ABIv2 requires the target address of the function being called to be in r12. Signed-off-by: Anton Blanchard an...@samba.org Thanks, applied to kvm-ppc-queue. Alex --- Index: b/arch/powerpc/kvm/book3s_hv_rmhandlers.S === --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -1920,8 +1920,8 @@ hcall_try_real_mode: lwaxr3,r3,r4 cmpwi r3,0 beq guest_exit_cont - add r3,r3,r4 - mtctr r3 + add r12,r3,r4 + mtctr r12 mr r3,r9 /* get vcpu pointer */ ld r4,VCPU_GPR(R4)(r9) bctrl -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] KVM: PPC: Assembly functions exported to modules need _GLOBAL_TOC()
On 12.06.14 10:16, Anton Blanchard wrote: Both kvmppc_hv_entry_trampoline and kvmppc_entry_trampoline are assembly functions that are exported to modules and also require a valid r2. As such we need to use _GLOBAL_TOC so we provide a global entry point that establishes the TOC (r2). Signed-off-by: Anton Blanchard an...@samba.org Thanks, applied to kvm-ppc-queue. I've not applied patches 1 and 2 for now, as they break BE module support. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book3S PR: Fix ABIv2 on LE
We switched to ABIv2 on Little Endian systems now which gets rid of the dotted function names. Branch to the actual functions when we see such a system. Signed-off-by: Alexander Graf ag...@suse.de diff --git a/arch/powerpc/kvm/book3s_interrupts.S b/arch/powerpc/kvm/book3s_interrupts.S index e2c29e3..d044b8b 100644 --- a/arch/powerpc/kvm/book3s_interrupts.S +++ b/arch/powerpc/kvm/book3s_interrupts.S @@ -25,7 +25,11 @@ #include asm/exception-64s.h #if defined(CONFIG_PPC_BOOK3S_64) +#if defined(_CALL_ELF) _CALL_ELF == 2 +#define FUNC(name) name +#else #define FUNC(name) GLUE(.,name) +#endif #define GET_SHADOW_VCPU(reg)addi reg, r13, PACA_SVCPU #elif defined(CONFIG_PPC_BOOK3S_32) diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S b/arch/powerpc/kvm/book3s_rmhandlers.S index 9eec675..cdcbb63 100644 --- a/arch/powerpc/kvm/book3s_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_rmhandlers.S @@ -36,7 +36,11 @@ #if defined(CONFIG_PPC_BOOK3S_64) +#if defined(_CALL_ELF) _CALL_ELF == 2 +#define FUNC(name) name +#else #define FUNC(name) GLUE(.,name) +#endif #elif defined(CONFIG_PPC_BOOK3S_32) -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book3S PR: Fix sparse endian checks
While sending sparse with endian checks over the code base, it triggered at some places that were missing casts or had wrong types. Fix them up. Signed-off-by: Alexander Graf ag...@suse.de diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c index 52a63bf..f7c25c6 100644 --- a/arch/powerpc/kvm/book3s_pr_papr.c +++ b/arch/powerpc/kvm/book3s_pr_papr.c @@ -40,8 +40,9 @@ static int kvmppc_h_pr_enter(struct kvm_vcpu *vcpu) { long flags = kvmppc_get_gpr(vcpu, 4); long pte_index = kvmppc_get_gpr(vcpu, 5); - unsigned long pteg[2 * 8]; - unsigned long pteg_addr, i, *hpte; + __be64 pteg[2 * 8]; + __be64 *hpte; + unsigned long pteg_addr, i; long int ret; i = pte_index 7; @@ -93,8 +94,8 @@ static int kvmppc_h_pr_remove(struct kvm_vcpu *vcpu) pteg = get_pteg_addr(vcpu, pte_index); mutex_lock(vcpu-kvm-arch.hpt_mutex); copy_from_user(pte, (void __user *)pteg, sizeof(pte)); - pte[0] = be64_to_cpu(pte[0]); - pte[1] = be64_to_cpu(pte[1]); + pte[0] = be64_to_cpu((__force __be64)pte[0]); + pte[1] = be64_to_cpu((__force __be64)pte[1]); ret = H_NOT_FOUND; if ((pte[0] HPTE_V_VALID) == 0 || @@ -171,8 +172,8 @@ static int kvmppc_h_pr_bulk_remove(struct kvm_vcpu *vcpu) pteg = get_pteg_addr(vcpu, tsh H_BULK_REMOVE_PTEX); copy_from_user(pte, (void __user *)pteg, sizeof(pte)); - pte[0] = be64_to_cpu(pte[0]); - pte[1] = be64_to_cpu(pte[1]); + pte[0] = be64_to_cpu((__force __be64)pte[0]); + pte[1] = be64_to_cpu((__force __be64)pte[1]); /* tsl = AVPN */ flags = (tsh H_BULK_REMOVE_FLAGS) 26; @@ -211,8 +212,8 @@ static int kvmppc_h_pr_protect(struct kvm_vcpu *vcpu) pteg = get_pteg_addr(vcpu, pte_index); mutex_lock(vcpu-kvm-arch.hpt_mutex); copy_from_user(pte, (void __user *)pteg, sizeof(pte)); - pte[0] = be64_to_cpu(pte[0]); - pte[1] = be64_to_cpu(pte[1]); + pte[0] = be64_to_cpu((__force __be64)pte[0]); + pte[1] = be64_to_cpu((__force __be64)pte[1]); ret = H_NOT_FOUND; if ((pte[0] HPTE_V_VALID) == 0 || @@ -231,8 +232,8 @@ static int kvmppc_h_pr_protect(struct kvm_vcpu *vcpu) rb = compute_tlbie_rb(v, r, pte_index); vcpu-arch.mmu.tlbie(vcpu, rb, rb 1 ? true : false); - pte[0] = cpu_to_be64(pte[0]); - pte[1] = cpu_to_be64(pte[1]); + pte[0] = (__force u64)cpu_to_be64(pte[0]); + pte[1] = (__force u64)cpu_to_be64(pte[1]); copy_to_user((void __user *)pteg, pte, sizeof(pte)); ret = H_SUCCESS; -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/7] KVM: PPC: Book3S HV: Enable on little endian hosts
So far we've been able to successfully run HV KVM on big endian hosts, but once you dive into little endian land things start to fall apart. This patch set enables HV KVM for little endian hosts. This should be the final piece left missing to get little endian systems fully en par with big endian ones in the KVM world - modulo bugs. For now guest threading support is still slightly flaky, but I'm sure that's only a minor breakage somewhere that we'll find soon. v1 - v2: - fix typo in STWX_BE - Add __be hints - Fix H_REMOVE - Fix dtl_idx - Make XICS code easier to follow and use memory for bswap Alex Alexander Graf (7): PPC: Add asm helpers for BE 32bit load/store KVM: PPC: Book3S HV: Make HTAB code LE host aware KVM: PPC: Book3S HV: Access guest VPA in BE KVM: PPC: Book3S HV: Access host lppaca and shadow slb in BE KVM: PPC: Book3S HV: Access XICS in BE KVM: PPC: Book3S HV: Fix ABIv2 on LE KVM: PPC: Book3S HV: Enable for little endian hosts arch/powerpc/include/asm/asm-compat.h| 4 + arch/powerpc/include/asm/kvm_book3s.h| 4 +- arch/powerpc/include/asm/kvm_book3s_64.h | 15 +++- arch/powerpc/kvm/Kconfig | 1 - arch/powerpc/kvm/book3s_64_mmu_hv.c | 128 ++- arch/powerpc/kvm/book3s_hv.c | 22 ++--- arch/powerpc/kvm/book3s_hv_ras.c | 6 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 146 ++- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 60 - 9 files changed, 220 insertions(+), 166 deletions(-) -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/7] KVM: PPC: Book3S HV: Fix ABIv2 on LE
We use ABIv2 on Little Endian systems which gets rid of the dotted function names. Branch to the actual functions when we see such a system. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 22 ++ 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 1a71f60..1ff3ebd 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -36,6 +36,12 @@ #define NAPPING_CEDE 1 #define NAPPING_NOVCPU 2 +#if defined(_CALL_ELF) _CALL_ELF == 2 +#define FUNC(name) name +#else +#define FUNC(name) GLUE(.,name) +#endif + /* * Call kvmppc_hv_entry in real mode. * Must be called with interrupts hard-disabled. @@ -668,9 +674,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_TM) mr r31, r4 addir3, r31, VCPU_FPRS_TM - bl .load_fp_state + bl FUNC(load_fp_state) addir3, r31, VCPU_VRS_TM - bl .load_vr_state + bl FUNC(load_vr_state) mr r4, r31 lwz r7, VCPU_VRSAVE_TM(r4) mtspr SPRN_VRSAVE, r7 @@ -1414,9 +1420,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_TM) /* Save FP/VSX. */ addir3, r9, VCPU_FPRS_TM - bl .store_fp_state + bl FUNC(store_fp_state) addir3, r9, VCPU_VRS_TM - bl .store_vr_state + bl FUNC(store_vr_state) mfspr r6, SPRN_VRSAVE stw r6, VCPU_VRSAVE_TM(r9) 1: @@ -2405,11 +2411,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX) mtmsrd r8 isync addir3,r3,VCPU_FPRS - bl .store_fp_state + bl FUNC(store_fp_state) #ifdef CONFIG_ALTIVEC BEGIN_FTR_SECTION addir3,r31,VCPU_VRS - bl .store_vr_state + bl FUNC(store_vr_state) END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) #endif mfspr r6,SPRN_VRSAVE @@ -2441,11 +2447,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX) mtmsrd r8 isync addir3,r4,VCPU_FPRS - bl .load_fp_state + bl FUNC(load_fp_state) #ifdef CONFIG_ALTIVEC BEGIN_FTR_SECTION addir3,r31,VCPU_VRS - bl .load_vr_state + bl FUNC(load_vr_state) END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) #endif lwz r7,VCPU_VRSAVE(r31) -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/7] KVM: PPC: Book3S HV: Access host lppaca and shadow slb in BE
Some data structures are always stored in big endian. Among those are the LPPACA fields as well as the shadow slb. These structures might be shared with a hypervisor. So whenever we access those fields, make sure we do so in big endian byte order. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 7e7fa01..75c7e22 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -32,10 +32,6 @@ #define VCPU_GPRS_TM(reg) (((reg) * ULONG_SIZE) + VCPU_GPR_TM) -#ifdef __LITTLE_ENDIAN__ -#error Need to fix lppaca and SLB shadow accesses in little endian mode -#endif - /* Values in HSTATE_NAPPING(r13) */ #define NAPPING_CEDE 1 #define NAPPING_NOVCPU 2 @@ -595,9 +591,10 @@ kvmppc_got_guest: ld r3, VCPU_VPA(r4) cmpdi r3, 0 beq 25f - lwz r5, LPPACA_YIELDCOUNT(r3) + li r6, LPPACA_YIELDCOUNT + LWZX_BE r5, r3, r6 addir5, r5, 1 - stw r5, LPPACA_YIELDCOUNT(r3) + STWX_BE r5, r3, r6 li r6, 1 stb r6, VCPU_VPA_DIRTY(r4) 25: @@ -1442,9 +1439,10 @@ END_FTR_SECTION_IFCLR(CPU_FTR_TM) ld r8, VCPU_VPA(r9)/* do they have a VPA? */ cmpdi r8, 0 beq 25f - lwz r3, LPPACA_YIELDCOUNT(r8) + li r4, LPPACA_YIELDCOUNT + LWZX_BE r3, r8, r4 addir3, r3, 1 - stw r3, LPPACA_YIELDCOUNT(r8) + STWX_BE r3, r8, r4 li r3, 1 stb r3, VCPU_VPA_DIRTY(r9) 25: @@ -1757,8 +1755,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) 33:ld r8,PACA_SLBSHADOWPTR(r13) .rept SLB_NUM_BOLTED - ld r5,SLBSHADOW_SAVEAREA(r8) - ld r6,SLBSHADOW_SAVEAREA+8(r8) + li r3, SLBSHADOW_SAVEAREA + LDX_BE r5, r8, r3 + addir3, r3, 8 + LDX_BE r6, r8, r3 andis. r7,r5,SLB_ESID_V@h beq 1f slbmte r6,r5 -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/7] PPC: Add asm helpers for BE 32bit load/store
From assembly code we might not only have to explicitly BE access 64bit values, but sometimes also 32bit ones. Add helpers that allow for easy use of lwzx/stwx in their respective byte-reverse or native form. Signed-off-by: Alexander Graf ag...@suse.de CC: Benjamin Herrenschmidt b...@kernel.crashing.org --- v1 - v2: - fix typo in STWX_BE --- arch/powerpc/include/asm/asm-compat.h | 4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/include/asm/asm-compat.h b/arch/powerpc/include/asm/asm-compat.h index 4b237aa..21be8ae 100644 --- a/arch/powerpc/include/asm/asm-compat.h +++ b/arch/powerpc/include/asm/asm-compat.h @@ -34,10 +34,14 @@ #define PPC_MIN_STKFRM 112 #ifdef __BIG_ENDIAN__ +#define LWZX_BEstringify_in_c(lwzx) #define LDX_BE stringify_in_c(ldx) +#define STWX_BEstringify_in_c(stwx) #define STDX_BEstringify_in_c(stdx) #else +#define LWZX_BEstringify_in_c(lwbrx) #define LDX_BE stringify_in_c(ldbrx) +#define STWX_BEstringify_in_c(stwbrx) #define STDX_BEstringify_in_c(stdbrx) #endif -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/7] KVM: PPC: Book3S HV: Make HTAB code LE host aware
When running on an LE host all data structures are kept in little endian byte order. However, the HTAB still needs to be maintained in big endian. So every time we access any HTAB we need to make sure we do so in the right byte order. Fix up all accesses to manually byte swap. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - Add __be32 hints - Fix H_REMOVE --- arch/powerpc/include/asm/kvm_book3s.h| 4 +- arch/powerpc/include/asm/kvm_book3s_64.h | 15 +++- arch/powerpc/kvm/book3s_64_mmu_hv.c | 128 ++- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 146 ++- 4 files changed, 164 insertions(+), 129 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index a20cc0b..028f4b1 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -161,9 +161,9 @@ extern pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, bool writing, bool *writable); extern void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev, unsigned long *rmap, long pte_index, int realmode); -extern void kvmppc_invalidate_hpte(struct kvm *kvm, unsigned long *hptep, +extern void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep, unsigned long pte_index); -void kvmppc_clear_ref_hpte(struct kvm *kvm, unsigned long *hptep, +void kvmppc_clear_ref_hpte(struct kvm *kvm, __be64 *hptep, unsigned long pte_index); extern void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long addr, unsigned long *nb_ret); diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index c7871f3..e504f88 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -59,20 +59,29 @@ extern unsigned long kvm_rma_pages; /* These bits are reserved in the guest view of the HPTE */ #define HPTE_GR_RESERVED HPTE_GR_MODIFIED -static inline long try_lock_hpte(unsigned long *hpte, unsigned long bits) +static inline long try_lock_hpte(__be64 *hpte, unsigned long bits) { unsigned long tmp, old; + __be64 be_lockbit, be_bits; + + /* +* We load/store in native endian, but the HTAB is in big endian. If +* we byte swap all data we apply on the PTE we're implicitly correct +* again. +*/ + be_lockbit = cpu_to_be64(HPTE_V_HVLOCK); + be_bits = cpu_to_be64(bits); asm volatile( ldarx %0,0,%2\n and.%1,%0,%3\n bne 2f\n - ori %0,%0,%4\n + or %0,%0,%4\n stdcx. %0,0,%2\n beq+2f\n mr %1,%3\n 2:isync : =r (tmp), =r (old) -: r (hpte), r (bits), i (HPTE_V_HVLOCK) +: r (hpte), r (be_bits), r (be_lockbit) : cc, memory); return old == 0; } diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 8056107..2d154d9 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -450,7 +450,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, unsigned long slb_v; unsigned long pp, key; unsigned long v, gr; - unsigned long *hptep; + __be64 *hptep; int index; int virtmode = vcpu-arch.shregs.msr (data ? MSR_DR : MSR_IR); @@ -473,13 +473,13 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, preempt_enable(); return -ENOENT; } - hptep = (unsigned long *)(kvm-arch.hpt_virt + (index 4)); - v = hptep[0] ~HPTE_V_HVLOCK; + hptep = (__be64 *)(kvm-arch.hpt_virt + (index 4)); + v = be64_to_cpu(hptep[0]) ~HPTE_V_HVLOCK; gr = kvm-arch.revmap[index].guest_rpte; /* Unlock the HPTE */ asm volatile(lwsync : : : memory); - hptep[0] = v; + hptep[0] = cpu_to_be64(v); preempt_enable(); gpte-eaddr = eaddr; @@ -583,7 +583,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned long ea, unsigned long dsisr) { struct kvm *kvm = vcpu-kvm; - unsigned long *hptep, hpte[3], r; + unsigned long hpte[3], r; + __be64 *hptep; unsigned long mmu_seq, psize, pte_size; unsigned long gpa_base, gfn_base; unsigned long gpa, gfn, hva, pfn; @@ -606,16 +607,16 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, if (ea != vcpu-arch.pgfault_addr) return RESUME_GUEST; index = vcpu-arch.pgfault_index; -
[PATCH 7/7] KVM: PPC: Book3S HV: Enable for little endian hosts
Now that we've fixed all the issues that HV KVM code had on little endian hosts, we can enable it in the kernel configuration for users to play with. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/Kconfig | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index d6a53b9..8aeeda1 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -75,7 +75,6 @@ config KVM_BOOK3S_64 config KVM_BOOK3S_64_HV tristate KVM support for POWER7 and PPC970 using hypervisor mode in host depends on KVM_BOOK3S_64 - depends on !CPU_LITTLE_ENDIAN select KVM_BOOK3S_HV_POSSIBLE select MMU_NOTIFIER select CMA -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/7] KVM: PPC: Book3S HV: Access XICS in BE
On the exit path from the guest we check what type of interrupt we received if we received one. This means we're doing hardware access to the XICS interrupt controller. However, when running on a little endian system, this access is byte reversed. So let's make sure to swizzle the bytes back again and virtually make XICS accesses big endian. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - Make code easier to follow and use hardware for bswap --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 75c7e22..1a71f60 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -2325,7 +2325,18 @@ kvmppc_read_intr: cmpdi r6, 0 beq-1f lwzcix r0, r6, r7 - rlwinm. r3, r0, 0, 0xff + /* +* Save XIRR for later. Since we get in in reverse endian on LE +* systems, save it byte reversed and fetch it back in host endian. +*/ + li r3, HSTATE_SAVED_XIRR + STWX_BE r0, r3, r13 +#ifdef __LITTLE_ENDIAN__ + lwz r3, HSTATE_SAVED_XIRR(r13) +#else + mr r3, r0 +#endif + rlwinm. r3, r3, 0, 0xff sync beq 1f /* if nothing pending in the ICP */ @@ -2357,10 +2368,9 @@ kvmppc_read_intr: li r3, -1 1: blr -42:/* It's not an IPI and it's for the host, stash it in the PACA -* before exit, it will be picked up by the host ICP driver +42:/* It's not an IPI and it's for the host. We saved a copy of XIRR in +* the PACA earlier, it will be picked up by the host ICP driver */ - stw r0, HSTATE_SAVED_XIRR(r13) li r3, 1 b 1b -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
-Original Message- From: Wood Scott-B07421 Sent: Tuesday, June 17, 2014 6:36 PM To: Caraman Mihai Claudiu-B02008 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org Subject: Re: [PATCH v2] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule On Tue, 2014-06-17 at 12:02 +0300, Mihai Caraman wrote: On vcpu schedule, the condition checked for tlb pollution is too loose. The tlb entries of a vcpu become polluted (vs stale) only when a different vcpu within the same logical partition runs in-between. Optimize the tlb invalidation condition taking into account the logical partition id. With the new invalidation condition, a guest shows 4% performance improvement on P5020DS while running a memory stress application with the cpu oversubscribed, the other guest running a cpu intensive workload. See https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-June/118547.html Thanks. The original code needs just a simple adjustment to benefit from this optimization, please review v3. - Mike
Re: [PATCH v2] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
On Tue, 2014-06-17 at 14:04 -0500, Caraman Mihai Claudiu-B02008 wrote: -Original Message- From: Wood Scott-B07421 Sent: Tuesday, June 17, 2014 6:36 PM To: Caraman Mihai Claudiu-B02008 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org Subject: Re: [PATCH v2] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule On Tue, 2014-06-17 at 12:02 +0300, Mihai Caraman wrote: On vcpu schedule, the condition checked for tlb pollution is too loose. The tlb entries of a vcpu become polluted (vs stale) only when a different vcpu within the same logical partition runs in-between. Optimize the tlb invalidation condition taking into account the logical partition id. With the new invalidation condition, a guest shows 4% performance improvement on P5020DS while running a memory stress application with the cpu oversubscribed, the other guest running a cpu intensive workload. See https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-June/118547.html Thanks. The original code needs just a simple adjustment to benefit from this optimization, please review v3. Where is v3? Or is it forthcoming? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
On vcpu schedule, the condition checked for tlb pollution is too loose. The tlb entries of a vcpu become polluted (vs stale) only when a different vcpu within the same logical partition runs in-between. Optimize the tlb invalidation condition keeping last_vcpu_on_cpu per logical partition id. With the new invalidation condition, a guest shows 4% performance improvement on P5020DS while running a memory stress application with the cpu oversubscribed, the other guest running a cpu intensive workload. Guest - old invalidation condition real 3.89 user 3.87 sys 0.01 Guest - enhanced invalidation condition real 3.75 user 3.73 sys 0.01 Host real 3.70 user 1.85 sys 0.00 The memory stress application accesses 4KB pages backed by 75% of available TLB0 entries: char foo[ENTRIES][4096] __attribute__ ((aligned (4096))); int main() { char bar; int i, j; for (i = 0; i ITERATIONS; i++) for (j = 0; j ENTRIES; j++) bar = foo[j][0]; return 0; } Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Cc: Scott Wood scottw...@freescale.com --- v3: - use existing logic while keeping last_vcpu_per_cpu per lpid v2: - improve patch name and description - add performance results arch/powerpc/kvm/e500mc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c index 17e4562..95e33e3 100644 --- a/arch/powerpc/kvm/e500mc.c +++ b/arch/powerpc/kvm/e500mc.c @@ -110,7 +110,7 @@ void kvmppc_mmu_msr_notify(struct kvm_vcpu *vcpu, u32 old_msr) { } -static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu_on_cpu); +static DEFINE_PER_CPU(struct kvm_vcpu * [KVMPPC_NR_LPIDS], last_vcpu_on_cpu); static void kvmppc_core_vcpu_load_e500mc(struct kvm_vcpu *vcpu, int cpu) { @@ -141,9 +141,9 @@ static void kvmppc_core_vcpu_load_e500mc(struct kvm_vcpu *vcpu, int cpu) mtspr(SPRN_GESR, vcpu-arch.shared-esr); if (vcpu-arch.oldpir != mfspr(SPRN_PIR) || - __get_cpu_var(last_vcpu_on_cpu) != vcpu) { + __get_cpu_var(last_vcpu_on_cpu)[vcpu-kvm-arch.lpid] != vcpu) { kvmppc_e500_tlbil_all(vcpu_e500); - __get_cpu_var(last_vcpu_on_cpu) = vcpu; + __get_cpu_var(last_vcpu_on_cpu)[vcpu-kvm-arch.lpid] = vcpu; } kvmppc_load_guest_fp(vcpu); -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
On Tue, 2014-06-17 at 22:09 +0300, Mihai Caraman wrote: On vcpu schedule, the condition checked for tlb pollution is too loose. The tlb entries of a vcpu become polluted (vs stale) only when a different vcpu within the same logical partition runs in-between. Optimize the tlb invalidation condition keeping last_vcpu_on_cpu per logical partition id. With the new invalidation condition, a guest shows 4% performance improvement on P5020DS while running a memory stress application with the cpu oversubscribed, the other guest running a cpu intensive workload. Guest - old invalidation condition real 3.89 user 3.87 sys 0.01 Guest - enhanced invalidation condition real 3.75 user 3.73 sys 0.01 Host real 3.70 user 1.85 sys 0.00 The memory stress application accesses 4KB pages backed by 75% of available TLB0 entries: char foo[ENTRIES][4096] __attribute__ ((aligned (4096))); int main() { char bar; int i, j; for (i = 0; i ITERATIONS; i++) for (j = 0; j ENTRIES; j++) bar = foo[j][0]; return 0; } Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Cc: Scott Wood scottw...@freescale.com --- v3: - use existing logic while keeping last_vcpu_per_cpu per lpid v2: - improve patch name and description - add performance results arch/powerpc/kvm/e500mc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c index 17e4562..95e33e3 100644 --- a/arch/powerpc/kvm/e500mc.c +++ b/arch/powerpc/kvm/e500mc.c @@ -110,7 +110,7 @@ void kvmppc_mmu_msr_notify(struct kvm_vcpu *vcpu, u32 old_msr) { } -static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu_on_cpu); +static DEFINE_PER_CPU(struct kvm_vcpu * [KVMPPC_NR_LPIDS], last_vcpu_on_cpu); Hmm, I didn't know you could express types like that. Is this special syntax that only works for typeof? No space after * Name should be adjusted to match, something like last_vcpu_of_lpid (with the _on_cpu being implied by the fact that it's PER_CPU). -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v3] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
-static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu_on_cpu); +static DEFINE_PER_CPU(struct kvm_vcpu * [KVMPPC_NR_LPIDS], last_vcpu_on_cpu); Hmm, I didn't know you could express types like that. Is this special syntax that only works for typeof? Yes, AFAIK. No space after * Checkpatch complains about the missing space ;) Name should be adjusted to match, something like last_vcpu_of_lpid (with the _on_cpu being implied by the fact that it's PER_CPU). I was thinking to the long name but it was not appealing, I will change it to last_vcpu_of_lpid. -Mike
Re: [PATCH v3] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
On Tue, 2014-06-17 at 14:42 -0500, Caraman Mihai Claudiu-B02008 wrote: -static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu_on_cpu); +static DEFINE_PER_CPU(struct kvm_vcpu * [KVMPPC_NR_LPIDS], last_vcpu_on_cpu); Hmm, I didn't know you could express types like that. Is this special syntax that only works for typeof? Yes, AFAIK. No space after * Checkpatch complains about the missing space ;) Checkpatch is wrong, which isn't surprising given that this is unusual syntax. We don't normally put a space after * when used to represent a pointer. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v3] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
-Original Message- From: Wood Scott-B07421 Sent: Tuesday, June 17, 2014 10:48 PM To: Caraman Mihai Claudiu-B02008 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org Subject: Re: [PATCH v3] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule On Tue, 2014-06-17 at 14:42 -0500, Caraman Mihai Claudiu-B02008 wrote: -static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu_on_cpu); +static DEFINE_PER_CPU(struct kvm_vcpu * [KVMPPC_NR_LPIDS], last_vcpu_on_cpu); Hmm, I didn't know you could express types like that. Is this special syntax that only works for typeof? Yes, AFAIK. No space after * Checkpatch complains about the missing space ;) Checkpatch is wrong, which isn't surprising given that this is unusual syntax. We don't normally put a space after * when used to represent a pointer. This is not something new. See [PATCH 04/10] percpu: cleanup percpu array definitions: https://lkml.org/lkml/2009/6/24/26 -Mike N�r��yb�X��ǧv�^�){.n�+jir)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥
Re: [PATCH v3] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
On Tue, 2014-06-17 at 15:02 -0500, Caraman Mihai Claudiu-B02008 wrote: -Original Message- From: Wood Scott-B07421 Sent: Tuesday, June 17, 2014 10:48 PM To: Caraman Mihai Claudiu-B02008 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org Subject: Re: [PATCH v3] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule On Tue, 2014-06-17 at 14:42 -0500, Caraman Mihai Claudiu-B02008 wrote: -static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu_on_cpu); +static DEFINE_PER_CPU(struct kvm_vcpu * [KVMPPC_NR_LPIDS], last_vcpu_on_cpu); Hmm, I didn't know you could express types like that. Is this special syntax that only works for typeof? Yes, AFAIK. No space after * Checkpatch complains about the missing space ;) Checkpatch is wrong, which isn't surprising given that this is unusual syntax. We don't normally put a space after * when used to represent a pointer. This is not something new. See [PATCH 04/10] percpu: cleanup percpu array definitions: https://lkml.org/lkml/2009/6/24/26 I didn't say it was new, just unusual, and checkpatch doesn't recognize it. Checkpatch shouldn't be blindly and silently obeyed when it says something strange. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v3] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
-Original Message- From: Wood Scott-B07421 Sent: Tuesday, June 17, 2014 11:05 PM To: Caraman Mihai Claudiu-B02008 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org Subject: Re: [PATCH v3] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule On Tue, 2014-06-17 at 15:02 -0500, Caraman Mihai Claudiu-B02008 wrote: -Original Message- From: Wood Scott-B07421 Sent: Tuesday, June 17, 2014 10:48 PM To: Caraman Mihai Claudiu-B02008 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org Subject: Re: [PATCH v3] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule On Tue, 2014-06-17 at 14:42 -0500, Caraman Mihai Claudiu-B02008 wrote: -static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu_on_cpu); +static DEFINE_PER_CPU(struct kvm_vcpu * [KVMPPC_NR_LPIDS], last_vcpu_on_cpu); Hmm, I didn't know you could express types like that. Is this special syntax that only works for typeof? Yes, AFAIK. No space after * Checkpatch complains about the missing space ;) Checkpatch is wrong, which isn't surprising given that this is unusual syntax. We don't normally put a space after * when used to represent a pointer. This is not something new. See [PATCH 04/10] percpu: cleanup percpu array definitions: https://lkml.org/lkml/2009/6/24/26 I didn't say it was new, just unusual, and checkpatch doesn't recognize it. Checkpatch shouldn't be blindly and silently obeyed when it says something strange. I agree with you about the syntax and I know other cases where checkpatch is a moron. For similar corner cases checkpatch maintainers did not wanted (or found it difficult) to make an exception. I would also like to see Alex opinion on this. -Mike
Re: [PATCH v3] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
On 17.06.14 22:36, mihai.cara...@freescale.com wrote: -Original Message- From: Wood Scott-B07421 Sent: Tuesday, June 17, 2014 11:05 PM To: Caraman Mihai Claudiu-B02008 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org Subject: Re: [PATCH v3] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule On Tue, 2014-06-17 at 15:02 -0500, Caraman Mihai Claudiu-B02008 wrote: -Original Message- From: Wood Scott-B07421 Sent: Tuesday, June 17, 2014 10:48 PM To: Caraman Mihai Claudiu-B02008 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org Subject: Re: [PATCH v3] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule On Tue, 2014-06-17 at 14:42 -0500, Caraman Mihai Claudiu-B02008 wrote: -static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu_on_cpu); +static DEFINE_PER_CPU(struct kvm_vcpu * [KVMPPC_NR_LPIDS], last_vcpu_on_cpu); Hmm, I didn't know you could express types like that. Is this special syntax that only works for typeof? Yes, AFAIK. No space after * Checkpatch complains about the missing space ;) Checkpatch is wrong, which isn't surprising given that this is unusual syntax. We don't normally put a space after * when used to represent a pointer. This is not something new. See [PATCH 04/10] percpu: cleanup percpu array definitions: https://lkml.org/lkml/2009/6/24/26 I didn't say it was new, just unusual, and checkpatch doesn't recognize it. Checkpatch shouldn't be blindly and silently obeyed when it says something strange. I agree with you about the syntax and I know other cases where checkpatch is a moron. For similar corner cases checkpatch maintainers did not wanted (or found it difficult) to make an exception. I would also like to see Alex opinion on this. I usually like to apply common sense :). Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html