Re: [PATCH] KVM: PPC: Implement extension to report number of memslots
On Mon, 2015-10-26 at 16:15 +1100, Paul Mackerras wrote: > On Fri, Oct 16, 2015 at 08:41:31AM +0200, Thomas Huth wrote: > > Yes, we'll likely need this soon! 32 slots are not enough... > > Would anyone object if I raised the limit for PPC to 512 slots? > Would that cause problems on embedded PPC, for instance? I can't think of any reason why it would cause problems. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: e500: fix couple of shift operations on 64 bits
On Thu, 2015-10-01 at 15:58 +0300, Laurentiu Tudor wrote: > Fix couple of cases where we shift left a 32-bit > value thus might get truncated results on 64-bit > targets. > > Signed-off-by: Laurentiu Tudor <laurentiu.tu...@freescale.com> > Suggested-by: Scott Wood <scotttw...@freescale.com> > --- > arch/powerpc/kvm/e500_mmu_host.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Acked-by: Scott Wood <scottw...@freescale.com> -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: PPC: e6500: Handle LRAT error exception
On Wed, 2015-09-30 at 14:27 +0300, Laurentiu Tudor wrote: > On 09/30/2015 01:32 PM, Laurentiu Tudor wrote: > > On 09/25/2015 03:10 AM, Scott Wood wrote: > > > On Thu, 2015-09-24 at 16:11 +0300, Laurentiu Tudor wrote: > > [snip] > > > > > b/arch/powerpc/kvm/e500_mmu_host.c > > > > index 12d5c67..99ad88a 100644 > > > > --- a/arch/powerpc/kvm/e500_mmu_host.c > > > > +++ b/arch/powerpc/kvm/e500_mmu_host.c > > > > @@ -96,6 +96,112 @@ static inline void __write_host_tlbe(struct > > > > kvm_book3e_206_tlb_entry *stlbe, > > > > stlbe->mas2, stlbe->mas7_3); > > > > } > > > > > > > > +#if defined(CONFIG_64BIT) && defined(CONFIG_KVM_BOOKE_HV) > > > > +static int lrat_next(void) > > > > +{ > > > > > > Will anything break by removing the CONFIG_64BIT condition, even if we > > > don't > > > have a 32-bit target that uses this? > > > > Not completly certain but i remember getting compile or link errors > > on 32-bit e500mc or e500v2. I can recheck if you want. > > > > > > +void kvmppc_lrat_map(struct kvm_vcpu *vcpu, gfn_t gfn) > > > > +{ > > > > + struct kvm_memory_slot *slot; > > > > + unsigned long pfn; > > > > + unsigned long hva; > > > > + struct vm_area_struct *vma; > > > > + unsigned long psize; > > > > + int tsize; > > > > + unsigned long tsize_pages; > > > > + > > > > + slot = gfn_to_memslot(vcpu->kvm, gfn); > > > > + if (!slot) { > > > > + pr_err_ratelimited("%s: couldn't find memslot for gfn > > > > %lx!\n", > > > > +__func__, (long)gfn); > > > > + return; > > > > + } > > > > + > > > > + hva = slot->userspace_addr; > > > > > > What if the faulting address is somewhere in the middle of the slot? > > > Shouldn't you use gfn_to_hva_memslot() like kvmppc_e500_shadow_map()? > > > In > > > fact there's probably a lot of logic that should be shared between > > > these two > > > functions. > > > > So if my understanding is correct most of the gfn -> pfn translation > > stuff done in kvmppc_e500_shadow_map() should also be present in here. > > If that's the case maybe i should first extract this code (which includes > > VM_PFNMAP handling) in a separate function and call it from both > > kvmppc_lrat_map() > > and kvmppc_e500_shadow_map(). > > > > Off-topic, but just noticed that kvmppc_e500_shadow_map() is marked as > inline. > Was that on purpose? Is inlining such a large function worth anything? I don't remember the intent. It was probably a lot smaller back then. That said, it's only used two places, with probably pretty good temporal separation between performance-intensive uses of one versus the other (so not a huge icache concern), and a pretty good portion of the function will be optimized out in the caller with tlbsel == 0. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][v2] KVM: PPC: e500: Emulate TMCFG0 TMRN register
On Fri, 2015-09-25 at 18:02 +0300, Laurentiu Tudor wrote: > Emulate TMCFG0 TMRN register exposing one HW thread per vcpu. > > Signed-off-by: Mihai Caraman <mihai.cara...@freescale.com> > [laurentiu.tu...@freescale.com: rebased on latest kernel, use > define instead of hardcoded value, moved code in own function] > Signed-off-by: Laurentiu Tudor <laurentiu.tu...@freescale.com> > --- > v2: > - moved code in its own function > Needs this patch: https://patchwork.ozlabs.org/patch/521752/ > > arch/powerpc/include/asm/disassemble.h | 5 + > arch/powerpc/kvm/e500_emulate.c| 19 +++ > 2 files changed, 24 insertions(+) Acked-by: Scott Wood <scotttw...@freescale.com> -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: e6500: support powers of 2K TLB1 sizes
On Fri, 2015-09-25 at 17:30 +0300, Laurentiu Tudor wrote: > On 09/24/2015 11:23 PM, Scott Wood wrote: > > On Thu, 2015-09-24 at 15:57 +0300, Laurentiu Tudor wrote: > > > Book-E MMUv2 present in e6500 cores supports > > > powers of 2K page sizes while older MMUv1 cores > > > support only powers of 4K page sizes, or in other > > > words the LSB of TSIZE on MMUv1 is always 0. > > > Thus, on MMUv2 we must not strip the LSB. > > > > We can get better TLB utilization by not stripping it, but why "must not" > > which makes it sound like a bugfix rather than an optimization? > > Not sure i get it. If a guests wants a 2K^^(2n+1) translation > size it will instead get a 2K^^2n size, no? Isn't this an issue? It will get two pages of 2k^^2n size (assuming both halves are accessed). It will work, it will just consume twice as many TLB1 entries. It's the same as if the host pages backing the region are smaller than the mapping that the guest tries to create. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: e6500: support powers of 2K TLB1 sizes
On Thu, 2015-09-24 at 15:57 +0300, Laurentiu Tudor wrote: > Book-E MMUv2 present in e6500 cores supports > powers of 2K page sizes while older MMUv1 cores > support only powers of 4K page sizes, or in other > words the LSB of TSIZE on MMUv1 is always 0. > Thus, on MMUv2 we must not strip the LSB. We can get better TLB utilization by not stripping it, but why "must not" which makes it sound like a bugfix rather than an optimization? > Signed-off-by: Mihai Caraman> [laurentiu.tu...@freescale.com: addressed review > feedback, split in distinct patch] > Signed-off-by: Laurentiu Tudor > --- > arch/powerpc/kvm/e500_mmu_host.c | 28 +--- > 1 file changed, 21 insertions(+), 7 deletions(-) > > diff --git a/arch/powerpc/kvm/e500_mmu_host.c > b/arch/powerpc/kvm/e500_mmu_host.c > index 4d33e19..12d5c67 100644 > --- a/arch/powerpc/kvm/e500_mmu_host.c > +++ b/arch/powerpc/kvm/e500_mmu_host.c > @@ -371,6 +371,7 @@ static inline int kvmppc_e500_shadow_map(struct > kvmppc_vcpu_e500 *vcpu_e500, > > unsigned long start, end; > unsigned long slot_start, slot_end; > + int tsize_inc; > > pfnmap = 1; > > @@ -392,10 +393,20 @@ static inline int kvmppc_e500_shadow_map(struct > kvmppc_vcpu_e500 *vcpu_e500, > MAS1_TSIZE_SHIFT; > > /* > - * e500 doesn't implement the lowest tsize bit, > - * or 1K pages. > + * MMUv1 doesn't implement the lowest tsize bit, > + * or translations smaller than 4K. >*/ > - tsize = max(BOOK3E_PAGESZ_4K, tsize & ~1); > + if (!has_feature(_e500->vcpu, VCPU_FTR_MMU_V2)) > + tsize &= ~1; > + tsize = max(BOOK3E_PAGESZ_4K, tsize); > + > + /* > + * Calculate TSIZE increment. MMUv2 supports > + * power of 2K translations while MMUv1 is limited > + * to power of 4K sizes. > + */ > + tsize_inc = has_feature(_e500->vcpu, > + VCPU_FTR_MMU_V2) ? 1 : 2; If you calculate tsize_inc first, then the previous if-statement can become "tsize &= ~(tsize_inc - 1);". > > /* >* Now find the largest tsize (up to what the guest > @@ -404,7 +415,8 @@ static inline int kvmppc_e500_shadow_map(struct > kvmppc_vcpu_e500 *vcpu_e500, >* aligned. >*/ > > - for (; tsize > BOOK3E_PAGESZ_4K; tsize -= 2) { > + for (; tsize > BOOK3E_PAGESZ_4K; > + tsize -= tsize_inc) { > unsigned long gfn_start, gfn_end; > tsize_pages = 1 << (tsize - 2); > > @@ -437,10 +449,12 @@ static inline int kvmppc_e500_shadow_map(struct > kvmppc_vcpu_e500 *vcpu_e500, > tsize = min(__ilog2(psize) - 10, tsize); > > /* > - * e500 doesn't implement the lowest tsize bit, > - * or 1K pages. > + * MMUv1 doesn't implement the lowest tsize bit, > + * or translations smaller than 4K. >*/ This comment makes it sound like MMUv2 might support translations smaller than 4K. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: e500: Emulate TMCFG0 TMRN register
On Thu, 2015-09-24 at 09:56 +0300, Laurentiu Tudor wrote: > Emulate TMCFG0 TMRN register exposing one HW thread per vcpu. > > Signed-off-by: Mihai Caraman> [laurentiu.tu...@freescale.com: rebased on latest kernel, > use define instead of hardcoded value] > Signed-off-by: Laurentiu Tudor > --- > Needs this patch: https://patchwork.ozlabs.org/patch/521752/ > > arch/powerpc/include/asm/disassemble.h | 5 + > arch/powerpc/kvm/e500_emulate.c| 11 +++ > 2 files changed, 16 insertions(+) KVM patches should be sent to k...@vger.kernel.org in addition to kvm- p...@vger.kernel.org. > @@ -165,6 +167,15 @@ int kvmppc_core_emulate_op_e500(struct kvm_run *run, > struct kvm_vcpu *vcpu, > emulated = kvmppc_e500_emul_tlbivax(vcpu, ea); > break; > > + case XOP_MFTMR: > + /* Expose one thread per vcpu */ > + if (get_tmrn(inst) == TMRN_TMCFG0) > + kvmppc_set_gpr(vcpu, rt, > +1 | (1 << > TMRN_TMCFG0_NATHRD_SHIFT)); > + else > + emulated = EMULATE_FAIL; > + break; Line length. Please move the implementation into its own function like all the others. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc/e6500: add TMCFG0 register definition
On Wed, 2015-09-23 at 18:06 +0300, Laurentiu Tudor wrote: > The register is not currently used in the base kernel > but will be in a forthcoming kvm patch. > > Signed-off-by: Laurentiu Tudor <laurentiu.tu...@freescale.com> > --- > arch/powerpc/include/asm/reg_booke.h | 6 ++ > 1 file changed, 6 insertions(+) Acked-by: Scott Wood <scottw...@freescale.com> -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 15/19] KVM: PPC: e500: fix handling local_sid_lookup result
On Thu, 2015-09-24 at 16:00 +0200, Andrzej Hajda wrote: > The function can return negative value. > > The problem has been detected using proposed semantic patch > scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1]. > > [1]: http://permalink.gmane.org/gmane.linux.kernel/2046107 > > Signed-off-by: Andrzej Hajda <a.ha...@samsung.com> > --- > Hi, > > To avoid problems with too many mail recipients I have sent whole > patchset only to LKML. Anyway patches have no dependencies. > > Regards > Andrzej > --- > arch/powerpc/kvm/e500.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) Acked-by: Scott Wood <scottw...@freescale.com> -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: PPC: e6500: Handle LRAT error exception
On Thu, 2015-09-24 at 16:11 +0300, Laurentiu Tudor wrote: > diff --git a/arch/powerpc/kvm/bookehv_interrupts.S > b/arch/powerpc/kvm/bookehv_interrupts.S > index 81bd8a07..1e9fa2a 100644 > --- a/arch/powerpc/kvm/bookehv_interrupts.S > +++ b/arch/powerpc/kvm/bookehv_interrupts.S > @@ -62,6 +62,7 @@ > #define NEED_EMU 0x0001 /* emulation -- save nv regs */ > #define NEED_DEAR0x0002 /* save faulting DEAR */ > #define NEED_ESR 0x0004 /* save faulting ESR */ > +#define NEED_LPER0x0008 /* save faulting LPER */ > > /* > * On entry: > @@ -159,6 +160,12 @@ > PPC_STL r9, VCPU_FAULT_DEAR(r4) > .endif > > + /* Only supported on 64-bit cores for now */ > + .if \flags & NEED_LPER > + mfspr r7, SPRN_LPER > + std r7, VCPU_FAULT_LPER(r4) > + .endif What's the harm in using PPC_STL anyway? > /* > * For input register values, see > arch/powerpc/include/asm/kvm_booke_hv_asm.h > diff --git a/arch/powerpc/kvm/e500_mmu_host.c > b/arch/powerpc/kvm/e500_mmu_host.c > index 12d5c67..99ad88a 100644 > --- a/arch/powerpc/kvm/e500_mmu_host.c > +++ b/arch/powerpc/kvm/e500_mmu_host.c > @@ -96,6 +96,112 @@ static inline void __write_host_tlbe(struct > kvm_book3e_206_tlb_entry *stlbe, > stlbe->mas2, stlbe->mas7_3); > } > > +#if defined(CONFIG_64BIT) && defined(CONFIG_KVM_BOOKE_HV) > +static int lrat_next(void) > +{ Will anything break by removing the CONFIG_64BIT condition, even if we don't have a 32-bit target that uses this? > +void kvmppc_lrat_map(struct kvm_vcpu *vcpu, gfn_t gfn) > +{ > + struct kvm_memory_slot *slot; > + unsigned long pfn; > + unsigned long hva; > + struct vm_area_struct *vma; > + unsigned long psize; > + int tsize; > + unsigned long tsize_pages; > + > + slot = gfn_to_memslot(vcpu->kvm, gfn); > + if (!slot) { > + pr_err_ratelimited("%s: couldn't find memslot for gfn %lx!\n", > +__func__, (long)gfn); > + return; > + } > + > + hva = slot->userspace_addr; What if the faulting address is somewhere in the middle of the slot? Shouldn't you use gfn_to_hva_memslot() like kvmppc_e500_shadow_map()? In fact there's probably a lot of logic that should be shared between these two functions. > + down_read(>mm->mmap_sem); > + vma = find_vma(current->mm, hva); > + if (vma && (hva >= vma->vm_start)) { > + psize = vma_kernel_pagesize(vma); What if it's VM_PFNMAP? > + } else { > + pr_err_ratelimited("%s: couldn't find virtual memory address > for gfn > %lx!\n", > +__func__, (long)gfn); > + up_read(>mm->mmap_sem); > + return; > + } > + up_read(>mm->mmap_sem); > + > + pfn = gfn_to_pfn_memslot(slot, gfn); > + if (is_error_noslot_pfn(pfn)) { > + pr_err_ratelimited("%s: couldn't get real page for gfn %lx!\n", > +__func__, (long)gfn); > + return; > + } > + > + tsize = __ilog2(psize) - 10; > + tsize_pages = 1 << (tsize + 10 - PAGE_SHIFT); 1UL << ... kvmppc_e500_shadow_map needs the same fix. > + gfn &= ~(tsize_pages - 1); > + pfn &= ~(tsize_pages - 1); > + > + write_host_lrate(tsize, gfn, pfn, vcpu->kvm->arch.lpid, true); > + > + kvm_release_pfn_clean(pfn); > +} > + > +void kvmppc_lrat_invalidate(struct kvm_vcpu *vcpu) > +{ > + uint32_t mas0, mas1 = 0; > + int esel; > + unsigned long flags; > + > + local_irq_save(flags); > + > + /* LRAT does not have a dedicated instruction for invalidation */ > + for (esel = 0; esel < get_paca()->tcd_ptr->lrat_max; esel++) { > + mas0 = MAS0_ATSEL | MAS0_ESEL(esel); > + mtspr(SPRN_MAS0, mas0); > + asm volatile("isync; tlbre" : : : "memory"); > + mas1 = mfspr(SPRN_MAS1) & ~MAS1_VALID; > + mtspr(SPRN_MAS1, mas1); > + asm volatile("isync; tlbwe" : : : "memory"); > + } > + /* Must clear mas8 for other host tlbwe's */ > + mtspr(SPRN_MAS8, 0); > + isync(); > + > + local_irq_restore(flags); > +} > +#endif /* CONFIG_64BIT && CONFIG_KVM_BOOKE_HV */ > + > /* > * Acquire a mas0 with victim hint, as if we just took a TLB miss. > * > diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c > index cda695d..5856f8f 100644 > --- a/arch/powerpc/kvm/e500mc.c > +++ b/arch/powerpc/kvm/e500mc.c > @@ -99,6 +99,10 @@ void kvmppc_e500_tlbil_all(struct kvmppc_vcpu_e500 > *vcpu_e500) > asm volatile("tlbilxlpid"); > mtspr(SPRN_MAS5, 0); > local_irq_restore(flags); > + > +#ifdef PPC64 > + kvmppc_lrat_invalidate(_e500->vcpu); > +#endif Don't you mean CONFIG_PPC64 (or CONFIG_64BIT to be consistent)? > } > > void kvmppc_set_pid(struct kvm_vcpu *vcpu, u32 pid) > diff --git a/arch/powerpc/mm/fsl_booke_mmu.c >
Re: [PATCH] powerpc/e500: move qemu machine spec together with the rest
On Mon, 2015-09-14 at 16:17 +0300, Laurentiu Tudor wrote: > On 09/10/2015 02:01 AM, Scott Wood wrote: > > On Fri, 2015-09-04 at 15:46 +0300, Laurentiu Tudor wrote: > > > This way we get rid of an entire file with mostly > > > duplicated code plus a Kconfig option that you always > > > had to take care to check it in order for kvm to work. > > > > > > Signed-off-by: Laurentiu Tudor <laurentiu.tu...@freescale.com> > > > --- > > > arch/powerpc/platforms/85xx/Kconfig | 15 - > > > arch/powerpc/platforms/85xx/Makefile | 1 - > > > arch/powerpc/platforms/85xx/corenet_generic.c | 1 + > > > arch/powerpc/platforms/85xx/qemu_e500.c | 85 > > > > > > > > > qemu_e500 is not only for corenet chips. > > That's too bad. :-( > I remember discussions on dropping the e500v2 support at some point in time? > > > We can add it to the defconfig (in fact I've been meaning to do so). > > Or maybe just drop de KConfig option and > wrap the file in an #ifdef CONFIG_KVM or something along these lines? > > > > -static void __init qemu_e500_setup_arch(void) > > > -{ > > > - ppc_md.progress("qemu_e500_setup_arch()", 0); > > > - > > > - fsl_pci_assign_primary(); > > > - swiotlb_detect_4g(); > > > > Where is fsl_pci_assign_primary() in corenet_generic.c? > > This commit claims it's not needed and drops it: > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7d4d595dad30328bc6153e235d90f54c918fc127 That commit has nothing to do with QEMU. > > At one point this > > was needed for QEMU's PCI implementation -- have you tested QEMU PCI > > without > > it? > > Well, somehow i've (embarrassingly) messed up my initial tests. > I've retested after seeing your comment and indeed this breaks pci under > qemu. > Adding to the confusion, the commit above made me think that the removal > was safe. > Why pci qemu doesn't work without the call to fsl_pci_assign_primary() is > an interesting subject. IIRC it has to do with QEMU not liking a BAR set to zero. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc/e500: move qemu machine spec together with the rest
On Mon, 2015-09-14 at 16:14 +0200, Alexander Graf wrote: > > Am 14.09.2015 um 15:17 schrieb Laurentiu Tudor <b10...@freescale.com>: > > > > > On 09/10/2015 02:01 AM, Scott Wood wrote: > > > > On Fri, 2015-09-04 at 15:46 +0300, Laurentiu Tudor wrote: > > > > This way we get rid of an entire file with mostly > > > > duplicated code plus a Kconfig option that you always > > > > had to take care to check it in order for kvm to work. > > > > > > > > Signed-off-by: Laurentiu Tudor <laurentiu.tu...@freescale.com> > > > > --- > > > > arch/powerpc/platforms/85xx/Kconfig | 15 - > > > > arch/powerpc/platforms/85xx/Makefile | 1 - > > > > arch/powerpc/platforms/85xx/corenet_generic.c | 1 + > > > > arch/powerpc/platforms/85xx/qemu_e500.c | 85 --- > > > > - > > > > > > > > > qemu_e500 is not only for corenet chips. > > > > That's too bad. :-( > > I remember discussions on dropping the e500v2 support at some point in > > time? > > > > > We can add it to the defconfig (in fact I've been meaning to do so). > > > > Or maybe just drop de KConfig option and > > wrap the file in an #ifdef CONFIG_KVM or something along these lines? > > CONFIG_KVM is for host support though. This is for the guest kernel. CONFIG_QEMU_E500 can also be used with TCG -- it's not KVM-specific. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc/e500: move qemu machine spec together with the rest
On Fri, 2015-09-04 at 15:46 +0300, Laurentiu Tudor wrote: > This way we get rid of an entire file with mostly > duplicated code plus a Kconfig option that you always > had to take care to check it in order for kvm to work. > > Signed-off-by: Laurentiu Tudor> --- > arch/powerpc/platforms/85xx/Kconfig | 15 - > arch/powerpc/platforms/85xx/Makefile | 1 - > arch/powerpc/platforms/85xx/corenet_generic.c | 1 + > arch/powerpc/platforms/85xx/qemu_e500.c | 85 qemu_e500 is not only for corenet chips. We can add it to the defconfig (in fact I've been meaning to do so). > -static void __init qemu_e500_setup_arch(void) > -{ > - ppc_md.progress("qemu_e500_setup_arch()", 0); > - > - fsl_pci_assign_primary(); > - swiotlb_detect_4g(); Where is fsl_pci_assign_primary() in corenet_generic.c? At one point this was needed for QEMU's PCI implementation -- have you tested QEMU PCI without it? -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 1/2] perf,kvm/ppc: Add kvm_perf.h for powerpc
[Added KVM lists and a couple relevant people] On Fri, 2015-07-31 at 14:25 +0530, Hemant Kumar wrote: On 07/30/2015 03:52 AM, Scott Wood wrote: On Wed, 2015-07-29 at 16:07 +0530, Hemant Kumar wrote: Hi Scott, On 07/17/2015 01:40 AM, Scott Wood wrote: On Thu, 2015-07-16 at 21:18 +0530, Hemant Kumar wrote: To analyze the exit events with perf, we need kvm_perf.h to be added in the arch/powerpc directory, where the kvm tracepoints needed to trace the KVM exit events are defined. This patch adds kvm_perf_book3s.h to indicate that the tracepoints are book3s specific. Generic kvm_perf.h then can just include kvm_perf_book3s.h. Signed-off-by: Hemant Kumar hem...@linux.vnet.ibm.com --- Changes: - Not exporting the exit reasons compared to previous patchset (suggested by Paul) arch/powerpc/include/uapi/asm/kvm_perf.h| 6 ++ arch/powerpc/include/uapi/asm/kvm_perf_book3s.h | 14 ++ 2 files changed, 20 insertions(+) create mode 100644 arch/powerpc/include/uapi/asm/kvm_perf.h create mode 100644 arch/powerpc/include/uapi/asm/kvm_perf_book3s.h diff --git a/arch/powerpc/include/uapi/asm/kvm_perf.h b/arch/powerpc/include/uapi/asm/kvm_perf.h new file mode 100644 index 000..5ed2ff3 --- /dev/null +++ b/arch/powerpc/include/uapi/asm/kvm_perf.h @@ -0,0 +1,6 @@ +#ifndef _ASM_POWERPC_KVM_PERF_H +#define _ASM_POWERPC_KVM_PERF_H + +#include asm/kvm_perf_book3s.h + +#endif diff --git a/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h b/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h new file mode 100644 index 000..8c8d8c2 --- /dev/null +++ b/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h @@ -0,0 +1,14 @@ +#ifndef _ASM_POWERPC_KVM_PERF_BOOK3S_H +#define _ASM_POWERPC_KVM_PERF_BOOK3S_H + +#include asm/kvm.h + +#define DECODE_STR_LEN 20 + +#define VCPU_ID vcpu_id + +#define KVM_ENTRY_TRACE kvm_hv:kvm_guest_enter +#define KVM_EXIT_TRACE kvm_hv:kvm_guest_exit +#define KVM_EXIT_REASON trap + +#endif /* _ASM_POWERPC_KVM_PERF_BOOK3S_H */ Again, why is book3s stuff being presented via uapi as generic asm/kvm_perf.h with generic symbol names? -Scott Ok. We can change the KVM_ENTRY_TRACE macro to something like KVM_BOOK3S_ENTRY_TRACE and likewise for KVM_EXIT_TRACE and KVM_EXIT_REASON What about DECODE_STR_LEN and VCPU_ID? DECODE_STR_LEN can be common, we can give a big enough size to it, if we need to. And, VCPU_ID depends on the field in the tracepoint payload data which is specific to that tracepoint. This field is used to maintain the per vcpu record and this field gives us the vcpu id. So, yeah, I guess, since, I can't find any such field as vcpu_id in the kvm_exit tracepoint for book3e, we have to make this specific to book3s. Or maybe we could add kvm_guest_enter/kvm_guest_leave, with vcpu_id, to book3e... though the kvm-hv would be a problem for book3s-pr, if anyone cares about this feature there. I'm not sure why the strings are present both in the UAPI header, as well as in kvm_events_tp[] in kvm-stat.c. Where is this API documented? and then, to resolve the issue of generic macro names in the userspace side, we can handle it using __weak modifier. Does userspace get built differently for book3s versus book3e? For now it'd be fine for userspace to check for book3s and not use the feature if it's book3e. If and when book3e gains this feature, then userspace can be changed. Well, I couldn't find any way to build user space differently for book3s and book3e. How about keeping this as it is after modifying the tracepoint macro names to book3s specific in the uapi? And as and when booke decides to implement this feature, a runtime check for event availability can be added then, IMHO. What do you think? What does userspace use, at runtime, to determine if this feature is present and whether the book3s symbols should be used? Deferring the implementation of book3e support is fine, but from a uapi perspective it should be discoverable at runtime whether the feature exposed by asm/kvm_perf_book3s.h is available. Otherwise, if it is implemented (or even if it isn't), you have the potential for user confusion if an older perf tool is used. This sort of discovery is done all the time in the KVM APIs themselves. FWIW, on x86 cpu_isa_init() uses cpuid to select an exit table. Using PVR seems like it would be a bit cumbersome though compared to the kernel directly exporting which subarch it is (and it wouldn't help with HV versus PR). What would you suggest? Another option would be to explain this interface so that we can figure out if book3e would even
Re: [RFC PATCH 6/6] powerpc/kvm: change the condition of identifying hugetlb vm
On Tue, 2015-07-07 at 16:05 +0800, wenwei tao wrote: Hi Scott I understand what you said. I will use the function 'is_vm_hugetlb_page()' to hide the bit combinations according to your comments in the next version of patch set. But for the situation like below, there isn't an obvious structure 'vma', using 'is_vm_hugetlb_page()' maybe costly or even not possible. void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned long vmflag) { ... if (end == TLB_FLUSH_ALL || tlb_flushall_shift == -1 || vmflag VM_HUGETLB) { local_flush_tlb(); goto flush_all; } ... } Add a function that operates on the flags directly, then. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 6/6] powerpc/kvm: change the condition of identifying hugetlb vm
On Fri, 2015-07-03 at 16:47 +0800, wenwei tao wrote: Hi Scott Thank you for your comments. Kernel already has that function: is_vm_hugetlb_page() , but the original code didn't use it, in order to keep the coding style of the original code, I didn't use it either. For the sentence like: vma-vm_flags VM_HUGETLB , hiding it behind 'is_vm_hugetlb_page()' is ok, but the sentence like: vma-vm_flags (VM_LOCKED|VM_HUGETLB|VM_PFNMAP) appears in the patch 2/6, is it better to hide the bit combinations behind the is_vm_hugetlb_page() ? In my patch I just replaced it with vma-vm_flags (VM_LOCKED|VM_PFNMAP) || (vma-vm_flags (VM_HUGETLB|VM_MERGEABLE)) == VM_HUGETLB. If you're going to do non-obvious things with the flags, it should be done in one place rather than throughout the code. Why would you do the above and not vma-vm_flags (VM_LOCKED | VM_PFNMAP) || is_vm_hugetlb_page(vma)? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 6/6] powerpc/kvm: change the condition of identifying hugetlb vm
On Wed, 2015-06-10 at 14:27 +0800, Wenwei Tao wrote: Hugetlb VMAs are not mergeable, that means a VMA couldn't have VM_HUGETLB and VM_MERGEABLE been set in the same time. So we use VM_HUGETLB to indicate new mergeable VMAs. Because of that a VMA which has VM_HUGETLB been set is a hugetlb VMA only if it doesn't have VM_MERGEABLE been set in the same time. Eww. If you must overload such bit combinations, please hide it behind a vm_is_hugetlb() function. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] KVM: PPC: fix suspicious use of conditional operator
On Fri, 2015-05-22 at 17:46 +0300, Laurentiu Tudor wrote: This was signaled by a static code analysis tool. Signed-off-by: Laurentiu Tudor laurentiu.tu...@freescale.com --- arch/powerpc/kvm/e500_mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c index 50860e9..29911a0 100644 --- a/arch/powerpc/kvm/e500_mmu.c +++ b/arch/powerpc/kvm/e500_mmu.c @@ -377,7 +377,7 @@ int kvmppc_e500_emul_tlbsx(struct kvm_vcpu *vcpu, gva_t ea) | MAS0_NV(vcpu_e500-gtlb_nv[tlbsel]); vcpu-arch.shared-mas1 = (vcpu-arch.shared-mas6 MAS6_SPID0) - | (vcpu-arch.shared-mas6 (MAS6_SAS ? MAS1_TS : 0)) + | ((vcpu-arch.shared-mas6 MAS6_SAS) ? MAS1_TS : 0) | (vcpu-arch.shared-mas4 MAS4_TSIZED(~0)); vcpu-arch.shared-mas2 = MAS2_EPN; vcpu-arch.shared-mas2 |= vcpu-arch.shared-mas4 Reviewed-by: Scott Wood scottw...@freescale.com Please send as non-RFC. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: check for lookup_linux_ptep() returning NULL
On Thu, 2015-05-21 at 16:26 +0300, Laurentiu Tudor wrote: If passed a larger page size lookup_linux_ptep() may fail, so add a check for that and bail out if that's the case. This was found with the help of a static code analysis tool. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Signed-off-by: Laurentiu Tudor laurentiu.tu...@freescale.com Cc: Scott Wood scottw...@freescale.com --- based on https://github.com/agraf/linux-2.6.git kvm-ppc-next arch/powerpc/kvm/e500_mmu_host.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Scott Wood scottw...@freescale.com -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] KVM: PPC: Book3S: correct width in XER handling
On Wed, 2015-05-20 at 15:26 +1000, Sam Bobroff wrote: In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64 bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is accessed as such. This patch corrects places where it is accessed as a 32 bit field by a 64 bit kernel. In some cases this is via a 32 bit load or store instruction which, depending on endianness, will cause either the lower or upper 32 bits to be missed. In another case it is cast as a u32, causing the upper 32 bits to be cleared. This patch corrects those places by extending the access methods to 64 bits. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- arch/powerpc/include/asm/kvm_book3s.h |4 ++-- arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +++--- arch/powerpc/kvm/book3s_segment.S |4 ++-- 3 files changed, 7 insertions(+), 7 deletions(-) It's nominally a 64-bit register, but the upper 32 bits are reserved in ISA 2.06. Do newer ISAs or certain implementations define things in the upper 32 bits, or is this just about the asm accesses being wrong on big-endian? Also, I see the same issue in the booke code (CCing Mihai). -Scott diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index b91e74a..05a875a 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -225,12 +225,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) return vcpu-arch.cr; } -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val) +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) { vcpu-arch.xer = val; } -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu) +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu) { return vcpu-arch.xer; } diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 4d70df2..d75be59 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -870,7 +870,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) blt hdec_soon ld r6, VCPU_CTR(r4) - lwz r7, VCPU_XER(r4) + ld r7, VCPU_XER(r4) mtctr r6 mtxer r7 @@ -1103,7 +1103,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) mfctr r3 mfxer r4 std r3, VCPU_CTR(r9) - stw r4, VCPU_XER(r9) + std r4, VCPU_XER(r9) /* If this is a page table miss then see if it's theirs or ours */ cmpwi r12, BOOK3S_INTERRUPT_H_DATA_STORAGE @@ -1675,7 +1675,7 @@ kvmppc_hdsi: bl kvmppc_msr_interrupt fast_interrupt_c_return: 6: ld r7, VCPU_CTR(r9) - lwz r8, VCPU_XER(r9) + ld r8, VCPU_XER(r9) mtctr r7 mtxer r8 mr r4, r9 diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S index acee37c..ca8f174 100644 --- a/arch/powerpc/kvm/book3s_segment.S +++ b/arch/powerpc/kvm/book3s_segment.S @@ -123,7 +123,7 @@ no_dcbz32_on: PPC_LL r8, SVCPU_CTR(r3) PPC_LL r9, SVCPU_LR(r3) lwz r10, SVCPU_CR(r3) - lwz r11, SVCPU_XER(r3) + PPC_LL r11, SVCPU_XER(r3) mtctr r8 mtlrr9 @@ -237,7 +237,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) mfctr r8 mflrr9 - stw r5, SVCPU_XER(r13) + PPC_STL r5, SVCPU_XER(r13) PPC_STL r6, SVCPU_FAULT_DAR(r13) stw r7, SVCPU_FAULT_DSISR(r13) PPC_STL r8, SVCPU_CTR(r13) -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Convert openpic lock to raw_spinlock
On Fri, 2014-09-12 at 09:12 -0500, Purcareata Bogdan-B43198 wrote: -Original Message- From: Wood Scott-B07421 Sent: Thursday, September 11, 2014 9:19 PM To: Purcareata Bogdan-B43198 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org Subject: Re: [PATCH] KVM: PPC: Convert openpic lock to raw_spinlock On Thu, 2014-09-11 at 15:25 -0400, Bogdan Purcareata wrote: This patch enables running intensive I/O workloads, e.g. netperf, in a guest deployed on a RT host. No change for !RT kernels. The openpic spinlock becomes a sleeping mutex on a RT system. This no longer guarantees that EPR is atomic with exception delivery. The guest VCPU thread fails due to a BUG_ON(preemptible()) when running netperf. In order to make the kvmppc_mpic_set_epr() call safe on RT from non-atomic context, convert the openpic lock to a raw_spinlock. A similar approach can be seen for x86 platforms in the following commit [1]. Here are some comparative cyclitest measurements run inside a high priority RT guest run on a RT host. The guest has 1 VCPU and the test has been run for 15 minutes. The guest runs ~750 hackbench processes as background stress. Does hackbench involve triggering interrupts that would go through the MPIC? You may want to try an I/O-heavy benchmark to stress the MPIC code (the more interrupt sources are active at once, the better). Before this patch, running netperf/iperf in the guest always resulted in hitting the afore-mentioned BUG_ON, when the host was RT. This is why I can't provide comparative cyclitest measurements before and after the patch, with heavy I/O stress. Since I had no problem running hackbench before, I'm assuming it doesn't involve interrupts passing through the MPIC. The measurements were posted just to show that the patch doesn't mess up anything somewhere else. I know you can't provide before/after, but it would be nice to see what the after numbers are with heavy MPIC activity. Also try a guest with many vcpus. AFAIK, without the MSI affinity patches [1], all vfio interrupts will go to core 0 in the guest. In this case, I guess there won't be contention induced latencies due to multiple VCPUs expecting to have their interrupts delivered. Am I getting it wrong? It's not about contention, but about loops in the MPIC code that iterate over the entire set of vcpus. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Convert openpic lock to raw_spinlock
On Thu, 2014-09-11 at 15:25 -0400, Bogdan Purcareata wrote: This patch enables running intensive I/O workloads, e.g. netperf, in a guest deployed on a RT host. No change for !RT kernels. The openpic spinlock becomes a sleeping mutex on a RT system. This no longer guarantees that EPR is atomic with exception delivery. The guest VCPU thread fails due to a BUG_ON(preemptible()) when running netperf. In order to make the kvmppc_mpic_set_epr() call safe on RT from non-atomic context, convert the openpic lock to a raw_spinlock. A similar approach can be seen for x86 platforms in the following commit [1]. Here are some comparative cyclitest measurements run inside a high priority RT guest run on a RT host. The guest has 1 VCPU and the test has been run for 15 minutes. The guest runs ~750 hackbench processes as background stress. Does hackbench involve triggering interrupts that would go through the MPIC? You may want to try an I/O-heavy benchmark to stress the MPIC code (the more interrupt sources are active at once, the better). Also try a guest with many vcpus. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] KVM: PPC: BOOKE: Emulate debug registers and exception
On Wed, 2014-08-27 at 13:23 +0200, Alexander Graf wrote: On 13.08.14 11:09, Bharat Bhushan wrote: This patch emulates debug registers and debug exception to support guest using debug resource. This enables running gdb/kgdb etc in guest. On BOOKE architecture we cannot share debug resources between QEMU and guest because: When QEMU is using debug resources then debug exception must be always enabled. To achieve this we set MSR_DE and also set MSRP_DEP so guest cannot change MSR_DE. When emulating debug resource for guest we want guest to control MSR_DE (enable/disable debug interrupt on need). So above mentioned two configuration cannot be supported at the same time. So the result is that we cannot share debug resources between QEMU and Guest on BOOKE architecture. In the current design QEMU gets priority over guest, this means that if QEMU is using debug resources then guest cannot use them and if guest is using debug resource then QEMU can overwrite them. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com Scott, could you please recheck whether you're ok with it now? :) I'm OK with it. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] powerpc/booke: Restrict SPE exception handlers to e200/e500 cores
On Wed, 2014-08-20 at 16:09 +0300, Mihai Caraman wrote: SPE exception handlers are now defined for 32-bit e500mc cores even though SPE unit is not present and CONFIG_SPE is undefined. Restrict SPE exception handlers to e200/e500 cores adding CONFIG_SPE_POSSIBLE and consequently guard __stup_ivors and __setup_cpu functions. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Cc: Scott Wood scottw...@freescale.com Cc: Alexander Graf ag...@suse.de --- v2: - use CONFIG_PPC_E500MC without CONFIG_E500 - use elif defined() arch/powerpc/kernel/cpu_setup_fsl_booke.S | 12 +++- arch/powerpc/kernel/cputable.c| 5 + arch/powerpc/kernel/head_fsl_booke.S | 18 +- arch/powerpc/platforms/Kconfig.cputype| 6 +- 4 files changed, 34 insertions(+), 7 deletions(-) Acked-by: Scott Wood scottw...@freescale.com -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] powerpc/booke: Revert SPE/AltiVec common defines for interrupt numbers
On Wed, 2014-08-20 at 16:09 +0300, Mihai Caraman wrote: Book3E specification defines shared interrupt numbers for SPE and AltiVec units. Still SPE is present in e200/e500v2 cores while AltiVec is present in e6500 core. So we can currently decide at compile-time which unit to support exclusively. As Alexander Graf suggested, this will improve code readability especially in KVM. Use distinct defines to identify SPE/AltiVec interrupt numbers, reverting c58ce397 and 6b310fc5 patches that added common defines. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Cc: Scott Wood scottw...@freescale.com Cc: Alexander Graf ag...@suse.de --- arch/powerpc/kernel/exceptions-64e.S | 4 ++-- arch/powerpc/kernel/head_fsl_booke.S | 8 2 files changed, 6 insertions(+), 6 deletions(-) Acked-by: Scott Wood scottw...@freescale.com -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/7 v3] KVM: PPC: BOOKE: Emulate debug registers and exception
On Tue, 2014-08-12 at 02:36 -0500, Bhushan Bharat-R65777 wrote: -Original Message- From: Wood Scott-B07421 Sent: Tuesday, August 12, 2014 5:30 AM To: Bhushan Bharat-R65777 Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Yoder Stuart- B08248 Subject: Re: [PATCH 7/7 v3] KVM: PPC: BOOKE: Emulate debug registers and exception On Wed, 2014-08-06 at 12:08 +0530, Bharat Bhushan wrote: @@ -1249,6 +1284,7 @@ int kvmppc_subarch_vcpu_init(struct kvm_vcpu *vcpu) setup_timer(vcpu-arch.wdt_timer, kvmppc_watchdog_func, (unsigned long)vcpu); + kvmppc_clear_dbsr(); return 0; This could use a comment for why we're doing this. Also, I'm a bit uneasy about clearing the whole DBSR here, where we haven't yet switched the debug registers to guest context. I think we wanted MRR to not cause debug event to guest, So should we only clear MRR ? It shouldn't actually matter except for deferred debug exceptions which are not actually useful (in fact e6500 removed support for them), Exactly, that's why I was clearing complete DBSR. Probably we can have a comment Do not let previously set debug events visible to guest. As deferred debug events are not supported, so it is ok to clear complete DBSR. This would be affecting host debugging of the host, not guest debugging of the guest. Still I don't think it's a huge deal, but clearing only MRR would be cleaner. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] powerpc/booke: Restrict SPE exception handlers to e200/e500 cores
On Wed, 2014-08-06 at 11:39 +0300, Mihai Caraman wrote: SPE exception handlers are now defined for 32-bit e500mc cores even though SPE unit is not present and CONFIG_SPE is undefined. Restrict SPE exception handlers to e200/e500 cores adding CONFIG_SPE_POSSIBLE and consequently guard __stup_ivors and __setup_cpu functions. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- arch/powerpc/kernel/cpu_setup_fsl_booke.S | 12 +++- arch/powerpc/kernel/cputable.c| 5 + arch/powerpc/kernel/head_fsl_booke.S | 18 +- arch/powerpc/platforms/Kconfig.cputype| 6 +- 4 files changed, 34 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kernel/cpu_setup_fsl_booke.S b/arch/powerpc/kernel/cpu_setup_fsl_booke.S index 4f1393d..44bb2c9 100644 --- a/arch/powerpc/kernel/cpu_setup_fsl_booke.S +++ b/arch/powerpc/kernel/cpu_setup_fsl_booke.S @@ -91,6 +91,7 @@ _GLOBAL(setup_altivec_idle) blr +#if defined(CONFIG_E500) defined(CONFIG_PPC_E500MC) When would you ever have CONFIG_PPC_E500MC without CONFIG_E500? diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S index b497188..4f8930f 100644 --- a/arch/powerpc/kernel/head_fsl_booke.S +++ b/arch/powerpc/kernel/head_fsl_booke.S @@ -613,6 +613,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) mfspr r10, SPRN_SPRG_RSCRATCH0 b InstructionStorage +/* Define SPE handlers for e200 and e500v2 */ #ifdef CONFIG_SPE /* SPE Unavailable */ START_EXCEPTION(SPEUnavailable) @@ -622,10 +623,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) b fast_exception_return 1: addir3,r1,STACK_FRAME_OVERHEAD EXC_XFER_EE_LITE(0x2010, KernelSPE) -#else +#elif CONFIG_SPE_POSSIBLE #elif defined(CONFIG_SPE_POSSIBLE) Likewise elsewhere -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: e500mc: Add support for single threaded vcpus on e6500 core
On Wed, 2014-08-06 at 19:33 +0300, Mihai Caraman wrote: @@ -390,19 +400,30 @@ static void kvmppc_core_vcpu_free_e500mc(struct kvm_vcpu *vcpu) static int kvmppc_core_init_vm_e500mc(struct kvm *kvm) { - int lpid; + int i, lpid; - lpid = kvmppc_alloc_lpid(); - if (lpid 0) - return lpid; + /* The lpid pool supports only 2 entries now */ + if (threads_per_core 2) + return -ENOMEM; + + /* Each VM allocates one LPID per HW thread index */ + for (i = 0; i threads_per_core; i++) { + lpid = kvmppc_alloc_lpid(); + if (lpid 0) + return lpid; + + kvm-arch.lpid_pool[i] = lpid; + } Wouldn't it be simpler to halve the size of the lpid pool that the allocator sees, and just OR in the high bit based on the low bit of the cpu number? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: e500mc: Add support for single threaded vcpus on e6500 core
On Tue, 2014-08-12 at 01:53 +0200, Alexander Graf wrote: Am 12.08.2014 um 01:36 schrieb Scott Wood scottw...@freescale.com: On Wed, 2014-08-06 at 19:33 +0300, Mihai Caraman wrote: @@ -390,19 +400,30 @@ static void kvmppc_core_vcpu_free_e500mc(struct kvm_vcpu *vcpu) static int kvmppc_core_init_vm_e500mc(struct kvm *kvm) { -int lpid; +int i, lpid; -lpid = kvmppc_alloc_lpid(); -if (lpid 0) -return lpid; +/* The lpid pool supports only 2 entries now */ +if (threads_per_core 2) +return -ENOMEM; + +/* Each VM allocates one LPID per HW thread index */ +for (i = 0; i threads_per_core; i++) { +lpid = kvmppc_alloc_lpid(); +if (lpid 0) +return lpid; + +kvm-arch.lpid_pool[i] = lpid; +} Wouldn't it be simpler to halve the size of the lpid pool that the allocator sees, and just OR in the high bit based on the low bit of the cpu number? Heh, I wrote the same and then removed the section from my reply again. It wouldn't really make that much of a difference if you think it through completely. But yes, it certainly would be quite a bit more natural. I'm ok either way. It's not a huge difference, but it would at least get rid of some of the ifdeffing in the headers. It'd also be nicer when debugging to have the LPIDs correlated. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/7 v3] KVM: PPC: BOOKE: Emulate debug registers and exception
On Wed, 2014-08-06 at 12:08 +0530, Bharat Bhushan wrote: @@ -1249,6 +1284,7 @@ int kvmppc_subarch_vcpu_init(struct kvm_vcpu *vcpu) setup_timer(vcpu-arch.wdt_timer, kvmppc_watchdog_func, (unsigned long)vcpu); + kvmppc_clear_dbsr(); return 0; This could use a comment for why we're doing this. Also, I'm a bit uneasy about clearing the whole DBSR here, where we haven't yet switched the debug registers to guest context. It shouldn't actually matter except for deferred debug exceptions which are not actually useful (in fact e6500 removed support for them), but still... -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5 v2] KVM: PPC: BOOKE: Emulate debug registers and exception
On Mon, 2014-08-04 at 22:41 -0500, Bhushan Bharat-R65777 wrote: -Original Message- From: Wood Scott-B07421 Sent: Tuesday, August 05, 2014 4:23 AM To: Bhushan Bharat-R65777 Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Yoder Stuart- B08248 Subject: Re: [PATCH 5/5 v2] KVM: PPC: BOOKE: Emulate debug registers and exception On Mon, 2014-08-04 at 13:32 +0530, Bharat Bhushan wrote: @@ -735,7 +745,27 @@ static int kvmppc_handle_debug(struct kvm_run *run, struct kvm_vcpu *vcpu) struct debug_reg *dbg_reg = (vcpu-arch.shadow_dbg_reg); u32 dbsr = vcpu-arch.dbsr; - /* Clear guest dbsr (vcpu-arch.dbsr). + if (vcpu-guest_debug == 0) { + /* + * Debug resources belong to Guest. + * Imprecise debug event are not injected + */ + if (dbsr DBSR_IDE) + return RESUME_GUEST; This is incorrect. DBSR_IDE shouldn't *cause* an injection, but it shouldn't inhibit it either. Will this work ? If ((dbsr DBSR_IDE) !(dbsr ~DBSR_IDE)) Return RESUME_GUEST; I suppose it could, but it would be cleaner to just change dbsr to (dbsr ~DBSR_IDE) in the next if-statement (maybe factoring out each term of that if-statement to variables to make it more readable). @@ -828,6 +858,8 @@ static void kvmppc_restart_interrupt(struct kvm_vcpu *vcpu, case BOOKE_INTERRUPT_DEBUG: /* Save DBSR before preemption is enabled */ vcpu-arch.dbsr = mfspr(SPRN_DBSR); + /* MASK out DBSR_MRR */ + vcpu-arch.dbsr = ~DBSR_MRR; kvmppc_clear_dbsr(); break; } DBSR[MRR] can only be set once per host system reset. There's no need to filter it out here; just make sure the host clears it at some point before this point. Can you please suggest where ? somewhere in KVM initialization ? Sure, KVM init works given that there's no real reason for non-KVM code to care. The MRR value doesn't currently survive past kvmppc_clear_dbsr(), so this isn't helping to preserve it for the host's benefit... @@ -1858,6 +1890,7 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, if (!(dbg-control KVM_GUESTDBG_ENABLE)) { vcpu-arch.shadow_dbg_reg.dbcr0 = 0; + vcpu-arch.dbg_reg.dbcr0 = 0; Again, it's not clear why we need shadow debug registers here. Just in case we implement something that can't be implemented isn't a good reason to keep complexity around. One reason was that setting EDM in guest visible register, For this we need shadow_reg is used to save/restore state in h/w register (which does not have DBCR0_EDM) but debug_reg have DBCR0_EDM. If that's the only reason, then I'd get rid of the shadow and just OR in DCBR0_EDM when reading the register, if vcpu-guest_debug is nonzero. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] KVM: PPC: BOOKE: Clear guest dbsr in userspace exit KVM_EXIT_DEBUG
On Mon, 2014-08-04 at 13:22 +0530, Bharat Bhushan wrote: Dbsr is not visible to userspace and we do not think any need to expose this to userspace because: Userspace cannot inject debug interrupt to guest (as this does not know guest ability to handle debug interrupt), so userspace will always clear DBSR. Now if userspace has to always clear DBSR in KVM_EXIT_DEBUG handling then clearing dbsr in kernel looks simple as this avoid doing SET_SREGS/set_one_reg() to clear DBSR Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/booke.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 322da7d..5c2e26a 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -735,6 +735,17 @@ static int kvmppc_handle_debug(struct kvm_run *run, struct kvm_vcpu *vcpu) struct debug_reg *dbg_reg = (vcpu-arch.shadow_dbg_reg); u32 dbsr = vcpu-arch.dbsr; + /* Clear guest dbsr (vcpu-arch.dbsr). + * dbsr is not visible to userspace and we do not think any + * need to expose this to userspace because: + * Userspace cannot inject debug interrupt to guest (as this does + * not know guest ability to handle debug interrupt), so userspace + * will always clear DBSR. + * Now if userspace has to always clear DBSR in KVM_EXIT_DEBUG + * handling then clearing here looks simple as this + * avoid doing SET_SREGS/set_one_reg() to clear DBSR + */ + vcpu-arch.dbsr = 0; run-debug.arch.status = 0; run-debug.arch.address = vcpu-arch.pc; I think the changelog is adequate -- I don't think we need to be so verbose in the code itself. The question was just whether this was a userspace-visible change, and it isn't. FWIW, I think dbsr should be visible to userspace in general (regardless of whether it's cleared here), because all guest registers should be visible to userspace. I may be debugging a guest through means that don't require owning debug resources, such as stopping and inspecting a guest that has hung or crashed. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5 v2] KVM: PPC: BOOKE: Emulate debug registers and exception
On Mon, 2014-08-04 at 13:32 +0530, Bharat Bhushan wrote: @@ -735,7 +745,27 @@ static int kvmppc_handle_debug(struct kvm_run *run, struct kvm_vcpu *vcpu) struct debug_reg *dbg_reg = (vcpu-arch.shadow_dbg_reg); u32 dbsr = vcpu-arch.dbsr; - /* Clear guest dbsr (vcpu-arch.dbsr). + if (vcpu-guest_debug == 0) { + /* + * Debug resources belong to Guest. + * Imprecise debug event are not injected + */ + if (dbsr DBSR_IDE) + return RESUME_GUEST; This is incorrect. DBSR_IDE shouldn't *cause* an injection, but it shouldn't inhibit it either. @@ -828,6 +858,8 @@ static void kvmppc_restart_interrupt(struct kvm_vcpu *vcpu, case BOOKE_INTERRUPT_DEBUG: /* Save DBSR before preemption is enabled */ vcpu-arch.dbsr = mfspr(SPRN_DBSR); + /* MASK out DBSR_MRR */ + vcpu-arch.dbsr = ~DBSR_MRR; kvmppc_clear_dbsr(); break; } DBSR[MRR] can only be set once per host system reset. There's no need to filter it out here; just make sure the host clears it at some point before this point. The MRR value doesn't currently survive past kvmppc_clear_dbsr(), so this isn't helping to preserve it for the host's benefit... @@ -1858,6 +1890,7 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, if (!(dbg-control KVM_GUESTDBG_ENABLE)) { vcpu-arch.shadow_dbg_reg.dbcr0 = 0; + vcpu-arch.dbg_reg.dbcr0 = 0; Again, it's not clear why we need shadow debug registers here. Just in case we implement something that can't be implemented isn't a good reason to keep complexity around. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] KVM: PPC: BOOKE: Clear guest dbsr in userspace exit KVM_EXIT_DEBUG
On Mon, 2014-08-04 at 22:33 -0500, Bhushan Bharat-R65777 wrote: -Original Message- From: Wood Scott-B07421 Sent: Tuesday, August 05, 2014 4:17 AM To: Bhushan Bharat-R65777 Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Yoder Stuart- B08248 Subject: Re: [PATCH 4/5] KVM: PPC: BOOKE: Clear guest dbsr in userspace exit KVM_EXIT_DEBUG FWIW, I think dbsr should be visible to userspace in general (regardless of whether it's cleared here), because all guest registers should be visible to userspace. I may be debugging a guest through means that don't require owning debug resources, such as stopping and inspecting a guest that has hung or crashed. Do you mean that we should also add a one-reg interface for DBSR ? Yes. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception
On Fri, 2014-08-01 at 04:34 -0500, Bhushan Bharat-R65777 wrote: on dbsr write emulation, deque the debug interrupt even if DBSR_IDE is set. case SPRN_DBSR: vcpu-arch.dbsr = ~spr_val; if (!(vcpu-arch.dbsr ~DBSR_IDE)) kvmppc_core_dequeue_debug(vcpu); break; or vcpu-arch.dbsr = ~(spr_val | DBSR_IDE); if (!vcpu-arch.dbsr) kvmppc_core_dequeue_debug(vcpu); break; The first option. I see no reason to have KVM forcibly clear DBSR[IDE]. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception
On Thu, 2014-07-31 at 01:15 -0500, Bhushan Bharat-R65777 wrote: -Original Message- From: Wood Scott-B07421 Sent: Thursday, July 31, 2014 8:18 AM To: Bhushan Bharat-R65777 Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Yoder Stuart- B08248 Subject: Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception On Wed, 2014-07-30 at 01:43 -0500, Bhushan Bharat-R65777 wrote: -Original Message- From: Wood Scott-B07421 Sent: Tuesday, July 29, 2014 3:58 AM To: Bhushan Bharat-R65777 Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Yoder Stuart- B08248 Subject: Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception Userspace might be interested in the raw value, With the current design, If userspace is interested then it will not get the DBSR. Oh, because DBSR isn't currently implemented in sregs or one reg? That is one reason. Another is that if we give dbsr visibility to userspace then userspace have to clear dbsr in handling KVM_EXIT_DEBUG. Right -- since I didn't realize DBSR wasn't already exposed, I thought userspace already had this responsibility. It looked like it was removing dbsr visibility and the requirement for userspace to clear dbsr. I guess the old way was that the value in vcpu-arch.dbsr didn't matter until the next debug exception, when it would be overwritten by the new SPRN_DBSR? But that means old dbsr will be visibility to userspace, which is even bad than not visible, no? Also this can lead to old dbsr visible to guest once userspace releases debug resources, but this can be solved by clearing dbsr in kvm_arch_vcpu_ioctl_set_guest_debug() - if (!(dbg-control KVM_GUESTDBG_ENABLE)) { }. I wasn't suggesting that you keep it that way, just clarifying my understanding of the current code. + case SPRN_DBCR2: + /* + * If userspace is debugging guest then guest + * can not access debug registers. + */ + if (vcpu-guest_debug) + break; + + debug_inst = true; + vcpu-arch.dbg_reg.dbcr2 = spr_val; + vcpu-arch.shadow_dbg_reg.dbcr2 = spr_val; break; In what circumstances can the architected and shadow registers differ? As of now they are same. But I think that if we want to implement other features like Freeze Timer (FT) then they can be different. I don't think we can possibly implement Freeze Timer. May be, but in my opinion we should keep this open. We're not talking about API here -- the implementation should be kept simple if there's no imminent need for shadow registers. I am not sure what we should in that case ? As we are currently emulating a subset of debug events (IAC, DAC, IC, BT and TIE --- DBCR0 emulation) then we should expose status of those events in guest dbsr and rest should be cleared ? I'm not saying they need to be exposed to the guest, but I don't see where you filter out bits like these. I am trying to get what all bits should be filtered out, all bits except IACx, DACx, IC, BT and TIE (same as event set filtering done when setting DBCR0) ? i.e IDE, UDE, MRR, IRPT, RET, CIRPT, CRET should be filtered out? Bits like IRPT and RET don't really matter, as you shouldn't see them happen. Likewise MRR if you're sure you've cleared it since boot. But IDE could be set any time an asynchronous exception happens. I don't think you should filter it out, but instead make sure that it doesn't cause an exception to be delivered. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest visible register
On Wed, 2014-07-30 at 00:21 -0500, Bhushan Bharat-R65777 wrote: -Original Message- From: Wood Scott-B07421 Sent: Tuesday, July 29, 2014 3:22 AM To: Bhushan Bharat-R65777 Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Yoder Stuart- B08248 Subject: Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest visible register On Fri, 2014-07-11 at 14:08 +0530, Bharat Bhushan wrote: This is not used and even I do not remember why this was added in first place. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/booke.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index ab62109..a5ee42c 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -1804,8 +1804,6 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, kvm_guest_protect_msr(vcpu, MSR_DE, true); vcpu-guest_debug = dbg-control; vcpu-arch.shadow_dbg_reg.dbcr0 = 0; - /* Set DBCR0_EDM in guest visible DBCR0 register. */ - vcpu-arch.dbg_reg.dbcr0 = DBCR0_EDM; if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) vcpu-arch.shadow_dbg_reg.dbcr0 |= DBCR0_IDM | DBCR0_IC; This was intended to let the guest know that the host owns the debug resources, by analogy to what a JTAG debugger would do. The Power ISA has this Virtualized Implementation Note: It is the responsibility of the hypervisor to ensure that DBCR0[EDM] is consistent with usage of DEP. Ok, That means that if MSRP_DEP is set then set DBCR0_EDM and if MSRP_DEP is clear then clear DBCR0_EDM, right? We need to implement above mentioned this. We should probably clear EDM only when guest debug emulation is working and enabled (i.e. not until at least patch 6/6). -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest visible register
On Wed, 2014-07-30 at 12:57 -0500, Bhushan Bharat-R65777 wrote: -Original Message- From: Wood Scott-B07421 Sent: Wednesday, July 30, 2014 11:18 PM To: Bhushan Bharat-R65777 Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Yoder Stuart- B08248 Subject: Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest visible register On Wed, 2014-07-30 at 00:21 -0500, Bhushan Bharat-R65777 wrote: -Original Message- From: Wood Scott-B07421 Sent: Tuesday, July 29, 2014 3:22 AM To: Bhushan Bharat-R65777 Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Yoder Stuart- B08248 Subject: Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest visible register On Fri, 2014-07-11 at 14:08 +0530, Bharat Bhushan wrote: This is not used and even I do not remember why this was added in first place. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/booke.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index ab62109..a5ee42c 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -1804,8 +1804,6 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, kvm_guest_protect_msr(vcpu, MSR_DE, true); vcpu-guest_debug = dbg-control; vcpu-arch.shadow_dbg_reg.dbcr0 = 0; - /* Set DBCR0_EDM in guest visible DBCR0 register. */ - vcpu-arch.dbg_reg.dbcr0 = DBCR0_EDM; if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) vcpu-arch.shadow_dbg_reg.dbcr0 |= DBCR0_IDM | DBCR0_IC; This was intended to let the guest know that the host owns the debug resources, by analogy to what a JTAG debugger would do. The Power ISA has this Virtualized Implementation Note: It is the responsibility of the hypervisor to ensure that DBCR0[EDM] is consistent with usage of DEP. Ok, That means that if MSRP_DEP is set then set DBCR0_EDM and if MSRP_DEP is clear then clear DBCR0_EDM, right? We need to implement above mentioned this. We should probably clear EDM only when guest debug emulation is working and enabled (i.e. not until at least patch 6/6). But if EDM is set then guest debug emulation will not start/allowed. I don't mean after the guest tries to write to the registers -- I mean after the code has been added to KVM to allow it to work. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception
On Wed, 2014-07-30 at 01:43 -0500, Bhushan Bharat-R65777 wrote: -Original Message- From: Wood Scott-B07421 Sent: Tuesday, July 29, 2014 3:58 AM To: Bhushan Bharat-R65777 Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Yoder Stuart- B08248 Subject: Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception Userspace might be interested in the raw value, With the current design, If userspace is interested then it will not get the DBSR. Oh, because DBSR isn't currently implemented in sregs or one reg? But why userspace will be interested? Do you expose all of the hardware's debugging features in your high-level interface? plus it's a change from the current API semantics. Can you please let us know how ? It looked like it was removing dbsr visibility and the requirement for userspace to clear dbsr. I guess the old way was that the value in vcpu-arch.dbsr didn't matter until the next debug exception, when it would be overwritten by the new SPRN_DBSR? + case SPRN_DBCR2: + /* + * If userspace is debugging guest then guest + * can not access debug registers. + */ + if (vcpu-guest_debug) + break; + + debug_inst = true; + vcpu-arch.dbg_reg.dbcr2 = spr_val; + vcpu-arch.shadow_dbg_reg.dbcr2 = spr_val; break; In what circumstances can the architected and shadow registers differ? As of now they are same. But I think that if we want to implement other features like Freeze Timer (FT) then they can be different. I don't think we can possibly implement Freeze Timer. case SPRN_DBSR: + /* + * If userspace is debugging guest then guest + * can not access debug registers. + */ + if (vcpu-guest_debug) + break; + vcpu-arch.dbsr = ~spr_val; + if (vcpu-arch.dbsr == 0) + kvmppc_core_dequeue_debug(vcpu); break; Not all DBSR bits cause an exception, e.g. IDE and MRR. I am not sure what we should in that case ? As we are currently emulating a subset of debug events (IAC, DAC, IC, BT and TIE --- DBCR0 emulation) then we should expose status of those events in guest dbsr and rest should be cleared ? I'm not saying they need to be exposed to the guest, but I don't see where you filter out bits like these. @@ -273,6 +397,10 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val) emulated = EMULATE_FAIL; } + if (debug_inst) { + switch_booke_debug_regs(vcpu-arch.shadow_dbg_reg); + current-thread.debug = vcpu-arch.shadow_dbg_reg; + } Could you explain what's going on with regard to copying the registers into current-thread.debug? Why is it done after loading the registers into the hardware (is there a race if we get preempted in the middle)? Yes, and this was something I was not clear when writing this code. Should we have preempt disable-enable around this. Can they be reordered instead? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception
On Tue, 2014-07-29 at 16:06 +0200, Alexander Graf wrote: On 29.07.14 00:33, Scott Wood wrote: On Mon, 2014-07-28 at 16:04 +0200, Alexander Graf wrote: On 11.07.14 10:39, Bharat Bhushan wrote: This patch emulates debug registers and debug exception to support guest using debug resource. This enables running gdb/kgdb etc in guest. On BOOKE architecture we cannot share debug resources between QEMU and guest because: When QEMU is using debug resources then debug exception must be always enabled. To achieve this we set MSR_DE and also set MSRP_DEP so guest cannot change MSR_DE. When emulating debug resource for guest we want guest to control MSR_DE (enable/disable debug interrupt on need). So above mentioned two configuration cannot be supported at the same time. So the result is that we cannot share debug resources between QEMU and Guest on BOOKE architecture. In the current design QEMU gets priority over guest, this means that if QEMU is using debug resources then guest cannot use them and if guest is using debug resource then QEMU can overwrite them. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- Hi Alex, I thought of having some print in register emulation if QEMU is using debug resource, Also when QEMU overwrites guest written values but that looks excessive. If I uses some variable which get set when guest starts using debug registers and check in debug set ioctl then that look ugly. Looking for suggestions Whatever you do, have QEMU do the print, not the kernel. How would that be accomplished? How would the kernel know to exit to QEMU, and how would the exit reason be conveyed? QEMU is the one forcefully enabling debug and overwriting guest debug registers, so it also knows when it did overwrite valid ones. QEMU knows when it overwrites the guest values, but it doesn't know if, after enabling host debug, the guest tries to write to the debug registers and it gets nopped. If we keep the EDM setting, then we can at least say the situation is no worse than with a JTAG. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest visible register
On Fri, 2014-07-11 at 14:08 +0530, Bharat Bhushan wrote: This is not used and even I do not remember why this was added in first place. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/booke.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index ab62109..a5ee42c 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -1804,8 +1804,6 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, kvm_guest_protect_msr(vcpu, MSR_DE, true); vcpu-guest_debug = dbg-control; vcpu-arch.shadow_dbg_reg.dbcr0 = 0; - /* Set DBCR0_EDM in guest visible DBCR0 register. */ - vcpu-arch.dbg_reg.dbcr0 = DBCR0_EDM; if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) vcpu-arch.shadow_dbg_reg.dbcr0 |= DBCR0_IDM | DBCR0_IC; This was intended to let the guest know that the host owns the debug resources, by analogy to what a JTAG debugger would do. The Power ISA has this Virtualized Implementation Note: It is the responsibility of the hypervisor to ensure that DBCR0[EDM] is consistent with usage of DEP. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/6] KVM: PPC: BOOKE: Force MSR_DE in rfci if guest is under debug
On Fri, 2014-07-11 at 14:08 +0530, Bharat Bhushan wrote: When userspace (QEMU) is using the debug resource to debug guest then we want MSR_DE to be always set. This patch adds missing MSR_DE setting in rfci instruction. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/booke_emulate.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/booke_emulate.c b/arch/powerpc/kvm/booke_emulate.c index 27a4b28..80c51a2 100644 --- a/arch/powerpc/kvm/booke_emulate.c +++ b/arch/powerpc/kvm/booke_emulate.c @@ -40,7 +40,11 @@ static void kvmppc_emul_rfi(struct kvm_vcpu *vcpu) static void kvmppc_emul_rfci(struct kvm_vcpu *vcpu) { vcpu-arch.pc = vcpu-arch.csrr0; - kvmppc_set_msr(vcpu, vcpu-arch.csrr1); + /* Force MSR_DE when guest does not own debug facilities */ + if (vcpu-guest_debug) + kvmppc_set_msr(vcpu, vcpu-arch.csrr1 | MSR_DE); + else + kvmppc_set_msr(vcpu, vcpu-arch.csrr1); } int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, It looks like this is already handled by kvmppc_vcpu_sync_debug(), which is called by kvmppc_set_msr(). Plus, it should only be done for HV mode. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6] KVM: PPC: BOOKE: Allow guest to change MSR_DE
On Fri, 2014-07-11 at 14:09 +0530, Bharat Bhushan wrote: When userspace is debugging guest then MSR_DE is always set and MSRP_DEP is set so that guest cannot change MSR_DE. Guest debug resources are not yet emulated, So there seems no reason we should stop guest controlling MSR_DE. Also a followup patch will enable debug emulation and that requires guest to control MSR_DE. Why does it matter whether we emulate debug resources? We still don't want the guest to be able to clear MSR[DE] and thus break host debug. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception
On Fri, 2014-07-11 at 14:09 +0530, Bharat Bhushan wrote: This patch emulates debug registers and debug exception to support guest using debug resource. This enables running gdb/kgdb etc in guest. On BOOKE architecture we cannot share debug resources between QEMU and guest because: When QEMU is using debug resources then debug exception must be always enabled. To achieve this we set MSR_DE and also set MSRP_DEP so guest cannot change MSR_DE. When emulating debug resource for guest we want guest to control MSR_DE (enable/disable debug interrupt on need). So above mentioned two configuration cannot be supported at the same time. So the result is that we cannot share debug resources between QEMU and Guest on BOOKE architecture. In the current design QEMU gets priority over guest, this means that if QEMU is using debug resources then guest cannot use them and if guest is using debug resource then QEMU can overwrite them. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- Hi Alex, I thought of having some print in register emulation if QEMU is using debug resource, Also when QEMU overwrites guest written values but that looks excessive. If I uses some variable which get set when guest starts using debug registers and check in debug set ioctl then that look ugly. Looking for suggestions arch/powerpc/include/asm/kvm_ppc.h | 3 + arch/powerpc/kvm/booke.c | 27 +++ arch/powerpc/kvm/booke_emulate.c | 157 + 3 files changed, 187 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index e2fd5a1..f3f7611 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -173,6 +173,9 @@ extern int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 *server, extern int kvmppc_xics_int_on(struct kvm *kvm, u32 irq); extern int kvmppc_xics_int_off(struct kvm *kvm, u32 irq); +void kvmppc_core_dequeue_debug(struct kvm_vcpu *vcpu); +void kvmppc_core_queue_debug(struct kvm_vcpu *vcpu); + union kvmppc_one_reg { u32 wval; u64 dval; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index fadfe76..c2471ed 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -264,6 +264,16 @@ static void kvmppc_core_dequeue_watchdog(struct kvm_vcpu *vcpu) clear_bit(BOOKE_IRQPRIO_WATCHDOG, vcpu-arch.pending_exceptions); } +void kvmppc_core_queue_debug(struct kvm_vcpu *vcpu) +{ + kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DEBUG); +} + +void kvmppc_core_dequeue_debug(struct kvm_vcpu *vcpu) +{ + clear_bit(BOOKE_IRQPRIO_DEBUG, vcpu-arch.pending_exceptions); +} + static void set_guest_srr(struct kvm_vcpu *vcpu, unsigned long srr0, u32 srr1) { #ifdef CONFIG_KVM_BOOKE_HV @@ -783,6 +793,23 @@ static int kvmppc_handle_debug(struct kvm_run *run, struct kvm_vcpu *vcpu) struct debug_reg *dbg_reg = (vcpu-arch.shadow_dbg_reg); u32 dbsr = vcpu-arch.dbsr; + if (vcpu-guest_debug == 0) { + /* Debug resources belong to Guest */ + if (dbsr (vcpu-arch.shared-msr MSR_DE)) + kvmppc_core_queue_debug(vcpu); Should also check for DBCR0[IDM]. + + /* Inject a program interrupt if trap debug is not allowed */ + if ((dbsr DBSR_TIE) !(vcpu-arch.shared-msr MSR_DE)) + kvmppc_core_queue_program(vcpu, ESR_PTR); + + return RESUME_GUEST; + } + + /* + * Prepare for userspace exit as debug resources + * are owned by userspace. + */ + vcpu-arch.dbsr = 0; run-debug.arch.status = 0; run-debug.arch.address = vcpu-arch.pc; Why are you clearing vcpu-arch.dbsr? Userspace might be interested in the raw value, plus it's a change from the current API semantics. Where is this run-debug.arch.foo stuff and KVM_EXIT_DEBUG documented? diff --git a/arch/powerpc/kvm/booke_emulate.c b/arch/powerpc/kvm/booke_emulate.c index 2a20194..3d143fe 100644 --- a/arch/powerpc/kvm/booke_emulate.c +++ b/arch/powerpc/kvm/booke_emulate.c @@ -139,6 +139,7 @@ int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val) { int emulated = EMULATE_DONE; + bool debug_inst = false; switch (sprn) { case SPRN_DEAR: @@ -153,14 +154,137 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val) case SPRN_CSRR1: vcpu-arch.csrr1 = spr_val; break; + case SPRN_DSRR0: + vcpu-arch.dsrr0 = spr_val; + break; + case SPRN_DSRR1: + vcpu-arch.dsrr1 = spr_val; + break; + case SPRN_IAC1: + /* + * If userspace is debugging guest then
Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception
On Mon, 2014-07-28 at 16:04 +0200, Alexander Graf wrote: On 11.07.14 10:39, Bharat Bhushan wrote: This patch emulates debug registers and debug exception to support guest using debug resource. This enables running gdb/kgdb etc in guest. On BOOKE architecture we cannot share debug resources between QEMU and guest because: When QEMU is using debug resources then debug exception must be always enabled. To achieve this we set MSR_DE and also set MSRP_DEP so guest cannot change MSR_DE. When emulating debug resource for guest we want guest to control MSR_DE (enable/disable debug interrupt on need). So above mentioned two configuration cannot be supported at the same time. So the result is that we cannot share debug resources between QEMU and Guest on BOOKE architecture. In the current design QEMU gets priority over guest, this means that if QEMU is using debug resources then guest cannot use them and if guest is using debug resource then QEMU can overwrite them. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- Hi Alex, I thought of having some print in register emulation if QEMU is using debug resource, Also when QEMU overwrites guest written values but that looks excessive. If I uses some variable which get set when guest starts using debug registers and check in debug set ioctl then that look ugly. Looking for suggestions Whatever you do, have QEMU do the print, not the kernel. How would that be accomplished? How would the kernel know to exit to QEMU, and how would the exit reason be conveyed? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers
On Mon, 2014-07-28 at 03:54 -0500, Caraman Mihai Claudiu-B02008 wrote: -Original Message- From: Wood Scott-B07421 Sent: Saturday, July 26, 2014 3:11 AM To: Caraman Mihai Claudiu-B02008 Cc: Alexander Graf; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc-...@lists.ozlabs.org Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers On Thu, 2014-07-24 at 04:16 -0500, Caraman Mihai Claudiu-B02008 wrote: Scott, Alex's request to define SPE handlers only for e500v2 implies changes in 32-bit FSL kernel to have exclusive configurations for e200/e500v2 and e500mc/e5500. We would probably need something like this, what's your take on it? That is already a compile-time decision. Yes, but is not fully implemented. We might be missing some kconfig language to prohibit e500v2 boards from being enabled in an e500mc kernel, but if you try to do so it won't work (except for obsolete hacks like running QEMU's mpc8544ds platform with -cpu e500mc). diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S index b497188..9d41015 100644 --- a/arch/powerpc/kernel/head_fsl_booke.S +++ b/arch/powerpc/kernel/head_fsl_booke.S @@ -613,6 +613,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) mfspr r10, SPRN_SPRG_RSCRATCH0 b InstructionStorage +/* Define SPE handlers only for e500v2 and e200 */ +#ifndef CONFIG_PPC_E500MC #ifdef CONFIG_SPE /* SPE Unavailable */ START_EXCEPTION(SPEUnavailable) @@ -626,7 +628,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) EXCEPTION(0x2020, SPE_ALTIVEC_UNAVAIL, SPEUnavailable, \ unknown_exception, EXC_XFER_EE) #endif /* CONFIG_SPE */ +#endif This is redundant: config SPE bool SPE Support depends on E200 || (E500 !PPC_E500MC) default y I see what you mean, but it's not redundant. OK, I didn't realize there was an #else that wasn't included in the context. It would have been nice if the comment at the end were !CONFIG_SPE rather than CONFIG_SPE. Alex was looking to remove SPE handlers for e500mc+ and the proposal handled !SPE case. In the new light I find this variant more readable: +/* Define SPE handlers for e200 and e500v2 */ #ifdef CONFIG_SPE /* SPE Unavailable */ START_EXCEPTION(SPEUnavailable) @@ -622,11 +623,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) b fast_exception_return 1: addir3,r1,STACK_FRAME_OVERHEAD EXC_XFER_EE_LITE(0x2010, KernelSPE) -#else +#elif defined(CONFIG_E200) || \ + (defined(CONFIG_E500) !defined(CONFIG_PPC_E500MC)) EXCEPTION(0x2020, SPE_ALTIVEC_UNAVAIL, SPEUnavailable, \ unknown_exception, EXC_XFER_EE) #endif /* CONFIG_SPE */ Yes, or better define a CONFIG_SPE_POSSIBLE so that the list only has to exist in one place, and the intent is clearer. diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c index c1faade..3ab65c2 100644 --- a/arch/powerpc/kernel/cputable.c +++ b/arch/powerpc/kernel/cputable.c @@ -2030,6 +2030,7 @@ static struct cpu_spec __initdata cpu_specs[] = { #endif /* CONFIG_PPC32 */ #ifdef CONFIG_E500 #ifdef CONFIG_PPC32 +#ifndef CONFIG_PPC_E500MC { /* e500 */ .pvr_mask = 0x, .pvr_value = 0x8020, @@ -2069,6 +2070,7 @@ static struct cpu_spec __initdata cpu_specs[] = { .machine_check = machine_check_e500, .platform = ppc8548, }, +#endif /* !CONFIG_PPC_E500MC */ { /* e500mc */ .pvr_mask = 0x, .pvr_value = 0x8023, This looks a bit strange -- e500v2 gets excluded if PPC_E500MC, but e500mc gets included in !PPC_E500MC? Right, my main purpose was to get rid of __setup_e500_ivors on PPC_E500MC which refers SPEUnavailable. I will add an #else to exclude e500mc. The generic E500 PPC default cpu advertises PPC_FEATURE_HAS_SPE_COMP. Do we want to excluded it if PPC_E500MC? Is this cpu useful without cpu_setup() and if so why don't we also have a default for 64-bit? I don't think that default will do anything useful. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers
On Thu, 2014-07-24 at 04:16 -0500, Caraman Mihai Claudiu-B02008 wrote: Scott, Alex's request to define SPE handlers only for e500v2 implies changes in 32-bit FSL kernel to have exclusive configurations for e200/e500v2 and e500mc/e5500. We would probably need something like this, what's your take on it? That is already a compile-time decision. diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S index b497188..9d41015 100644 --- a/arch/powerpc/kernel/head_fsl_booke.S +++ b/arch/powerpc/kernel/head_fsl_booke.S @@ -613,6 +613,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) mfspr r10, SPRN_SPRG_RSCRATCH0 b InstructionStorage +/* Define SPE handlers only for e500v2 and e200 */ +#ifndef CONFIG_PPC_E500MC #ifdef CONFIG_SPE /* SPE Unavailable */ START_EXCEPTION(SPEUnavailable) @@ -626,7 +628,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) EXCEPTION(0x2020, SPE_ALTIVEC_UNAVAIL, SPEUnavailable, \ unknown_exception, EXC_XFER_EE) #endif /* CONFIG_SPE */ +#endif This is redundant: config SPE bool SPE Support depends on E200 || (E500 !PPC_E500MC) default y diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c index c1faade..3ab65c2 100644 --- a/arch/powerpc/kernel/cputable.c +++ b/arch/powerpc/kernel/cputable.c @@ -2030,6 +2030,7 @@ static struct cpu_spec __initdata cpu_specs[] = { #endif /* CONFIG_PPC32 */ #ifdef CONFIG_E500 #ifdef CONFIG_PPC32 +#ifndef CONFIG_PPC_E500MC { /* e500 */ .pvr_mask = 0x, .pvr_value = 0x8020, @@ -2069,6 +2070,7 @@ static struct cpu_spec __initdata cpu_specs[] = { .machine_check = machine_check_e500, .platform = ppc8548, }, +#endif /* !CONFIG_PPC_E500MC */ { /* e500mc */ .pvr_mask = 0x, .pvr_value = 0x8023, This looks a bit strange -- e500v2 gets excluded if PPC_E500MC, but e500mc gets included in !PPC_E500MC? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] powerpc/e6500: Work around erratum A-008139
Erratum A-008139 can cause duplicate TLB entries if an indirect entry is overwritten using tlbwe while the other thread is using it to do a lookup. Work around this by using tlbilx to invalidate prior to overwriting. To avoid the need to save another register to hold MAS1 during the workaround code, TID clearing has been moved from tlb_miss_kernel_e6500 until after the SMT section. Signed-off-by: Scott Wood scottw...@freescale.com --- arch/powerpc/mm/tlb_low_64e.S | 68 +++ 1 file changed, 56 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S index 57c4d66..89bf95b 100644 --- a/arch/powerpc/mm/tlb_low_64e.S +++ b/arch/powerpc/mm/tlb_low_64e.S @@ -299,7 +299,9 @@ itlb_miss_fault_bolted: * r10 = crap (free to use) */ tlb_miss_common_e6500: -BEGIN_FTR_SECTION + crmove cr2*4+2,cr0*4+2 /* cr2.eq != 0 if kernel address */ + +BEGIN_FTR_SECTION /* CPU_FTR_SMT */ /* * Search if we already have an indirect entry for that virtual * address, and if we do, bail out. @@ -324,17 +326,62 @@ BEGIN_FTR_SECTION b 1b .previous + /* +* Erratum A-008139 says that we can't use tlbwe to change +* an indirect entry in any way (including replacing or +* invalidating) if the other thread could be in the process +* of a lookup. The workaround is to invalidate the entry +* with tlbilx before overwriting. +*/ + + lbz r15,TCD_ESEL_NEXT(r11) + rlwinm r10,r15,16,0xff + orisr10,r10,MAS0_TLBSEL(1)@h + mtspr SPRN_MAS0,r10 + isync + tlbre mfspr r15,SPRN_MAS1 - mfspr r10,SPRN_MAS2 + andis. r15,r15,MAS1_VALID@h + beq 5f + +BEGIN_FTR_SECTION_NESTED(532) + mfspr r10,SPRN_MAS8 + rlwinm r10,r10,0,0x8fff /* tgs,tlpid - sgs,slpid */ + mtspr SPRN_MAS5,r10 +END_FTR_SECTION_NESTED(CPU_FTR_EMB_HV,CPU_FTR_EMB_HV,532) - tlbsx 0,r16 - mtspr SPRN_MAS2,r10 mfspr r10,SPRN_MAS1 - mtspr SPRN_MAS1,r15 + rlwinm r15,r10,0,0x3fff /* tid - spid */ + rlwimi r15,r10,20,0x0003 /* ind,ts - sind,sas */ + mfspr r10,SPRN_MAS6 + mtspr SPRN_MAS6,r15 + + mfspr r15,SPRN_MAS2 + isync + tlbilxva 0,r15 + isync + + mtspr SPRN_MAS6,r10 - andis. r10,r10,MAS1_VALID@h +5: +BEGIN_FTR_SECTION_NESTED(532) + li r10,0 + mtspr SPRN_MAS8,r10 + mtspr SPRN_MAS5,r10 +END_FTR_SECTION_NESTED(CPU_FTR_EMB_HV,CPU_FTR_EMB_HV,532) + + tlbsx 0,r16 + mfspr r10,SPRN_MAS1 + andis. r15,r10,MAS1_VALID@h bne tlb_miss_done_e6500 -END_FTR_SECTION_IFSET(CPU_FTR_SMT) +FTR_SECTION_ELSE + mfspr r10,SPRN_MAS1 +ALT_FTR_SECTION_END_IFSET(CPU_FTR_SMT) + + orisr10,r10,MAS1_VALID@h + beq cr2,4f + rlwinm r10,r10,0,16,1 /* Clear TID */ +4: mtspr SPRN_MAS1,r10 /* Now, we need to walk the page tables. First check if we are in * range. @@ -410,12 +457,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_SMT) rfi tlb_miss_kernel_e6500: - mfspr r10,SPRN_MAS1 ld r14,PACA_KERNELPGD(r13) - cmpldi cr0,r15,8 /* Check for vmalloc region */ - rlwinm r10,r10,0,16,1 /* Clear TID */ - mtspr SPRN_MAS1,r10 - beq+tlb_miss_common_e6500 + cmpldi cr1,r15,8 /* Check for vmalloc region */ + beq+cr1,tlb_miss_common_e6500 tlb_miss_fault_e6500: tlb_unlock_e6500 -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: ppc: booke: Restore SPRG3 when entering guest
On Fri, 2014-07-18 at 02:33 +0200, Alexander Graf wrote: On 18.07.14 02:28, Scott Wood wrote: On Thu, 2014-07-17 at 18:29 +0200, Alexander Graf wrote: On 17.07.14 18:27, Alexander Graf wrote: On 17.07.14 18:24, bharat.bhus...@freescale.com wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, July 17, 2014 9:41 PM To: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org Cc: k...@vger.kernel.org; Wood Scott-B07421; Yoder Stuart-B08248 Subject: Re: [PATCH] kvm: ppc: booke: Restore SPRG3 when entering guest On 16.07.14 08:02, Bharat Bhushan wrote: SPRG3 is guest accessible and SPRG3 can be clobbered by host or another guest, So this need to be restored when loading guest state. SPRG3 is not guest writeable. We should be doing this so that guest reads of SPRG3 through the alternative read-only SPR work, not because SPRG3 can be clobbered by host or another guest. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/booke_interrupts.S | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kvm/booke_interrupts.S b/arch/powerpc/kvm/booke_interrupts.S index 2c6deb5ef..0d3403f 100644 --- a/arch/powerpc/kvm/booke_interrupts.S +++ b/arch/powerpc/kvm/booke_interrupts.S @@ -459,6 +459,8 @@ lightweight_exit: * written directly to the shared area, so we * need to reload them here with the guest's values. */ +PPC_LD(r3, VCPU_SHARED_SPRG3, r5) +mtsprSPRN_SPRG3, r3 We also need to restore it when resuming the host, no? I do not think host expect some meaningful value when returning from guest, same true for SPRG4-7. So there seems no reason to save host values and restore them. Linux no longer uses SPRG4-7 for itself. That is not true of SPRG3, as Alex points out. Hmm - arch/powerpc/include/asm/reg.h says: * All 32-bit: * - SPRG3 current thread_info pointer *(virtual on BookE, physical on others) but I can indeed find no trace of usage anywhere. This at least needs to go into the patch description. Bah - it obviously is used. It's SPRN_SPRG_THREAD. And it's so incredibly important that I have no idea how we could possibly run without switching the host value back in very early. And even then our interrupt handlers wouldn't work anymore. This is more complicated :). To make this work we need to avoid SPRG3 as well, or at least avoid using it for something needed prior to DO_KVM. We also need to update the documentation in reg.h to reflect the fact that we don't use SPRG4-7 anymore on e500. I would personally prefer if we claim SPRG3R as unsupported on e500v2 until we find someone who actually uses it. There's a good chance we'd start jumping through a lot of hoops and reduce overall performance for no real-world gain today. The same problem applies to e500mc. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: ppc: booke: Restore SPRG3 when entering guest
On Fri, 2014-07-18 at 02:37 +0200, Alexander Graf wrote: On 18.07.14 02:36, Scott Wood wrote: On Fri, 2014-07-18 at 02:33 +0200, Alexander Graf wrote: On 18.07.14 02:28, Scott Wood wrote: On Thu, 2014-07-17 at 18:29 +0200, Alexander Graf wrote: On 17.07.14 18:27, Alexander Graf wrote: On 17.07.14 18:24, bharat.bhus...@freescale.com wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, July 17, 2014 9:41 PM To: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org Cc: k...@vger.kernel.org; Wood Scott-B07421; Yoder Stuart-B08248 Subject: Re: [PATCH] kvm: ppc: booke: Restore SPRG3 when entering guest On 16.07.14 08:02, Bharat Bhushan wrote: SPRG3 is guest accessible and SPRG3 can be clobbered by host or another guest, So this need to be restored when loading guest state. SPRG3 is not guest writeable. We should be doing this so that guest reads of SPRG3 through the alternative read-only SPR work, not because SPRG3 can be clobbered by host or another guest. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/booke_interrupts.S | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kvm/booke_interrupts.S b/arch/powerpc/kvm/booke_interrupts.S index 2c6deb5ef..0d3403f 100644 --- a/arch/powerpc/kvm/booke_interrupts.S +++ b/arch/powerpc/kvm/booke_interrupts.S @@ -459,6 +459,8 @@ lightweight_exit: * written directly to the shared area, so we * need to reload them here with the guest's values. */ +PPC_LD(r3, VCPU_SHARED_SPRG3, r5) +mtsprSPRN_SPRG3, r3 We also need to restore it when resuming the host, no? I do not think host expect some meaningful value when returning from guest, same true for SPRG4-7. So there seems no reason to save host values and restore them. Linux no longer uses SPRG4-7 for itself. That is not true of SPRG3, as Alex points out. Hmm - arch/powerpc/include/asm/reg.h says: * All 32-bit: * - SPRG3 current thread_info pointer *(virtual on BookE, physical on others) but I can indeed find no trace of usage anywhere. This at least needs to go into the patch description. Bah - it obviously is used. It's SPRN_SPRG_THREAD. And it's so incredibly important that I have no idea how we could possibly run without switching the host value back in very early. And even then our interrupt handlers wouldn't work anymore. This is more complicated :). To make this work we need to avoid SPRG3 as well, or at least avoid using it for something needed prior to DO_KVM. We also need to update the documentation in reg.h to reflect the fact that we don't use SPRG4-7 anymore on e500. I would personally prefer if we claim SPRG3R as unsupported on e500v2 until we find someone who actually uses it. There's a good chance we'd start jumping through a lot of hoops and reduce overall performance for no real-world gain today. The same problem applies to e500mc. There we have SPRN_GSPRG3, no? Oh, right. Since it's only a problem for PR-mode, it can be fixed without needing to avoid SPRG3 entirely, since PR-mode doesn't use DO_KVM. We'd only need to avoid using SPRG_THREAD in __KVM_HANDLER (i.e. revert commit ffe129ecd79779221fdb03305049ec8b5a8beb0f). And if we decide it's not worthwhile and don't revert that commit, we should at least remove the comment that Under KVM, the host SPRG1 is used to point to the current VCPU data structure... -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6 v2] kvm: ppc: bookehv: Added wrapper macros for shadow registers
On Thu, 2014-07-17 at 17:01 +0530, Bharat Bhushan wrote: There are shadow registers like, GSPRG[0-3], GSRR0, GSRR1 etc on BOOKE-HV and these shadow registers are guest accessible. So these shadow registers needs to be updated on BOOKE-HV. This patch adds new macro for get/set helper of shadow register . Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- v1-v2 - Fix compilation for book3s (separate macro etc) arch/powerpc/include/asm/kvm_ppc.h | 44 +++--- 1 file changed, 36 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index f3f7611..7646994 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -475,8 +475,20 @@ static inline bool kvmppc_shared_big_endian(struct kvm_vcpu *vcpu) #endif } +#define SPRNG_WRAPPER_GET(reg, e500hv_spr) \ +static inline ulong kvmppc_get_##reg(struct kvm_vcpu *vcpu) \ +{\ + return mfspr(e500hv_spr); \ +}\ + +#define SPRNG_WRAPPER_SET(reg, e500hv_spr) \ +static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, ulong val) \ +{\ + mtspr(e500hv_spr, val); \ +}\ Why e500hv rather than bookehv? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/4] KVM: PPC: Book3E: Handle LRAT error exception
On Fri, 2014-07-04 at 10:15 +0200, Alexander Graf wrote: On 03.07.14 16:45, Mihai Caraman wrote: diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index a192975..ab1077f 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -1286,6 +1286,46 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, break; } +#ifdef CONFIG_KVM_BOOKE_HV + case BOOKE_INTERRUPT_LRAT_ERROR: + { + gfn_t gfn; + + /* +* Guest TLB management instructions (EPCR.DGTMI == 0) is not +* supported for now +*/ + if (!(vcpu-arch.fault_esr ESR_PT)) { + WARN(1, %s: Guest TLB management instructions not supported!\n, __func__); Wouldn't this allow a guest to flood the host's kernel log? It shouldn't be possible for this to happen, since the host will never set EPCR[DGTMI] -- but yes, it should be WARN_ONCE or ratelimited. +{ + int this, next; + + this = local_paca-tcd.lrat_next; + next = (this + 1) % local_paca-tcd.lrat_max; Can we assume that lrat_max is always a power of 2? IIRC modulo functions with variables can be quite expensive. So if we can instead do next = (this + 1) local_paca-tcd.lrat_mask; we should be faster and not rely on division helpers. Architecturally we can't assume that, though it's true on the only existing implementation. Why not do something similar to what is done for tlb1: unsigned int sesel = vcpu_e500-host_tlb1_nv++; if (unlikely(vcpu_e500-host_tlb1_nv = tlb1_max_shadow_size())) vcpu_e500-host_tlb1_nv = 0; ...and while we're at it, use local_paca-tcd for tlb1 as well (except on 32-bit). Also, please use get_paca() rather than local_paca so that the preemption-disabled check is retained. +void write_host_lrate(int tsize, gfn_t gfn, unsigned long pfn, uint32_t lpid, + int valid, int lrat_entry) +{ + struct kvm_book3e_206_tlb_entry stlbe; + int esel = lrat_entry; + unsigned long flags; + + stlbe.mas1 = (valid ? MAS1_VALID : 0) | MAS1_TSIZE(tsize); + stlbe.mas2 = ((u64)gfn PAGE_SHIFT); + stlbe.mas7_3 = ((u64)pfn PAGE_SHIFT); + stlbe.mas8 = MAS8_TGS | lpid; + + local_irq_save(flags); + /* book3e_tlb_lock(); */ Hm? Indeed. + + if (esel == -1) + esel = lrat_next(); + __write_host_tlbe(stlbe, MAS0_ATSEL | MAS0_ESEL(esel)); Where do you call this function with lrat_entry != -1? Why rename it to esel at function entry? + down_read(current-mm-mmap_sem); + vma = find_vma(current-mm, hva); + if (vma (hva = vma-vm_start)) { + psize = vma_kernel_pagesize(vma); + } else { + pr_err_ratelimited(%s: couldn't find virtual memory address for gfn %lx!\n, __func__, (long)gfn); While output strings should not be linewrapped, the arguments that come after the long string should be. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 3/4] KVM: PPC: e500: TLB emulation for IND entries
On Thu, 2014-07-03 at 17:45 +0300, Mihai Caraman wrote: Handle indirect entries (IND) in TLB emulation code. Translation size of IND entries differ from the size of referred Page Tables (Linux guests now use IND of 2MB for 4KB PTs) and this require careful tweak of the existing logic. TLB search emulation requires additional search in HW TLB0 (since these entries are directly added by HTW) and found entries shoud be presented to the guest with RPN changed from PFN to GFN. There might be more GFNs pointing to the same PFN so the only way to get the corresponding GFN is to search it in guest's PTE. If IND entry for the corresponding PT is not available just invalidate guest's ea and report a tlbsx miss. This patch only implements the invalidation and let a TODO note for searching HW TLB0. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- arch/powerpc/include/asm/mmu-book3e.h | 2 + arch/powerpc/kvm/e500.h | 81 --- arch/powerpc/kvm/e500_mmu.c | 78 +++-- arch/powerpc/kvm/e500_mmu_host.c | 31 -- arch/powerpc/kvm/e500mc.c | 53 +-- 5 files changed, 211 insertions(+), 34 deletions(-) Please look at erratum A-008139. A patch to work around it for the main Linux tablewalk code is forthcoming. You need to make sure to never overwrite an indirect entry with tlbwe -- always use tlbilx. @@ -286,6 +336,22 @@ void kvmppc_e500_tlbil_one(struct kvmppc_vcpu_e500 *vcpu_e500, void kvmppc_e500_tlbil_all(struct kvmppc_vcpu_e500 *vcpu_e500); #ifdef CONFIG_KVM_BOOKE_HV +void inval_tlb_on_host(struct kvm_vcpu *vcpu, int type, int pid); + +void inval_ea_on_host(struct kvm_vcpu *vcpu, gva_t ea, int pid, int sas, + int sind); +#else +/* TLB is fully virtualized */ +static inline void inval_tlb_on_host(struct kvm_vcpu *vcpu, + int type, int pid) +{} + +static inline void inval_ea_on_host(struct kvm_vcpu *vcpu, gva_t ea, int pid, + int sas, int sind) +{} +#endif Document exactly what these do, and explain why it's conceptually a separate API from kvmppc_e500_tlbil_all/one(), and why TLB is fully virtualized explains why we never need to do this on non-HV. @@ -290,21 +298,32 @@ static void tlbilx_all(struct kvmppc_vcpu_e500 *vcpu_e500, int tlbsel, kvmppc_e500_gtlbe_invalidate(vcpu_e500, tlbsel, esel); } } + + /* Invalidate enties added by HTW */ + if (has_feature(vcpu_e500-vcpu, VCPU_FTR_MMU_V2) (!sind)) Unnecessary parens. @@ -368,6 +388,23 @@ int kvmppc_e500_emul_tlbsx(struct kvm_vcpu *vcpu, gva_t ea) } else { int victim; + if (has_feature(vcpu, VCPU_FTR_MMU_V2) + get_cur_sind(vcpu) != 1) { + /* Never align the continuation line of an if or loop with the indentation of the if/loop body. get_cur_sind(vcpu) == 0 would be more natural, and probably slightly faster. + * TLB0 entries are not cached in KVM being written + * directly by HTW. TLB0 entry found in HW TLB0 needs + * to be presented to the guest with RPN changed from + * PFN to GFN. There might be more GFNs pointing to the + * same PFN so the only way to get the corresponding GFN + * is to search it in guest's PTE. If IND entry for the + * corresponding PT is not available just invalidate + * guest's ea and report a tlbsx miss. + * + * TODO: search ea in HW TLB0 + */ + inval_ea_on_host(vcpu, ea, pid, as, 0); What if the guest is in a loop where it does an access followed by a tlbsx, looping back if the tlbsx reports no entry (e.g. an exception handler that wants to emulate retrying the faulting instruction on a tlbsx miss)? The architecture allows hypervisors to invalidate at arbitrary times, but we should avoid doing this in ways that can block forward progress. How likely is it really that we have multiple GFNs per PFN? How bad is it really if we return an arbitrary matching GFN in such a case? @@ -516,12 +527,17 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, tsize = (gtlbe-mas1 MAS1_TSIZE_MASK) MAS1_TSIZE_SHIFT; + if (ind) + tsize -= BOOK3E_PAGESZ_4K + 7; /* * e500 doesn't implement the lowest tsize bit, * or 1K pages. */ - tsize = max(BOOK3E_PAGESZ_4K, tsize ~1); + if
Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers
On Thu, 2014-07-03 at 10:25 -0500, Caraman Mihai Claudiu-B02008 wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, July 03, 2014 3:21 PM To: Caraman Mihai Claudiu-B02008; kvm-ppc@vger.kernel.org Cc: k...@vger.kernel.org; linuxppc-...@lists.ozlabs.org Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers On 30.06.14 17:34, Mihai Caraman wrote: Use common BOOKE_IRQPRIO and BOOKE_INTERRUPT defines for SPE/FP/AltiVec which share the same interrupt numbers. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v2: - remove outdated definitions arch/powerpc/include/asm/kvm_asm.h| 8 arch/powerpc/kvm/booke.c | 17 + arch/powerpc/kvm/booke.h | 4 ++-- arch/powerpc/kvm/booke_interrupts.S | 9 + arch/powerpc/kvm/bookehv_interrupts.S | 4 ++-- arch/powerpc/kvm/e500.c | 10 ++ arch/powerpc/kvm/e500_emulate.c | 10 ++ 7 files changed, 30 insertions(+), 32 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h index 9601741..c94fd33 100644 --- a/arch/powerpc/include/asm/kvm_asm.h +++ b/arch/powerpc/include/asm/kvm_asm.h @@ -56,14 +56,6 @@ /* E500 */ #define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32 #define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33 -/* - * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same defines - */ -#define BOOKE_INTERRUPT_SPE_UNAVAIL BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL -#define BOOKE_INTERRUPT_SPE_FP_DATA BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST -#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL -#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \ - BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST I think I'd prefer to keep them separate. What is the reason from changing your mind from ver 1? Do you want to have different defines with same values (we specifically mapped them to the hardware interrupt numbers). We already upstreamed the necessary changes in the kernel. Scott, please share your opinion here. I don't like hiding the fact that they're the same number, which could lead to wrong code in the absence of ifdefs that strictly mutually exclude SPE and Altivec code -- there was an instance of this with MSR_VEC versus MSR_SPE in a previous patchset. #define BOOKE_INTERRUPT_SPE_FP_ROUND 34 #define BOOKE_INTERRUPT_PERFORMANCE_MONITOR 35 #define BOOKE_INTERRUPT_DOORBELL 36 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index ab62109..3c86d9b 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -388,8 +388,8 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, case BOOKE_IRQPRIO_ITLB_MISS: case BOOKE_IRQPRIO_SYSCALL: case BOOKE_IRQPRIO_FP_UNAVAIL: - case BOOKE_IRQPRIO_SPE_UNAVAIL: - case BOOKE_IRQPRIO_SPE_FP_DATA: + case BOOKE_IRQPRIO_SPE_ALTIVEC_UNAVAIL: + case BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST: #ifdef CONFIG_KVM_E500V2 case ...SPE: #else case ..ALTIVEC: #endif I don't think that's an improvement. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers
On Thu, 2014-07-03 at 17:15 -0500, Scott Wood wrote: On Thu, 2014-07-03 at 10:25 -0500, Caraman Mihai Claudiu-B02008 wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, July 03, 2014 3:21 PM To: Caraman Mihai Claudiu-B02008; kvm-ppc@vger.kernel.org Cc: k...@vger.kernel.org; linuxppc-...@lists.ozlabs.org Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers On 30.06.14 17:34, Mihai Caraman wrote: Use common BOOKE_IRQPRIO and BOOKE_INTERRUPT defines for SPE/FP/AltiVec which share the same interrupt numbers. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v2: - remove outdated definitions arch/powerpc/include/asm/kvm_asm.h| 8 arch/powerpc/kvm/booke.c | 17 + arch/powerpc/kvm/booke.h | 4 ++-- arch/powerpc/kvm/booke_interrupts.S | 9 + arch/powerpc/kvm/bookehv_interrupts.S | 4 ++-- arch/powerpc/kvm/e500.c | 10 ++ arch/powerpc/kvm/e500_emulate.c | 10 ++ 7 files changed, 30 insertions(+), 32 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h index 9601741..c94fd33 100644 --- a/arch/powerpc/include/asm/kvm_asm.h +++ b/arch/powerpc/include/asm/kvm_asm.h @@ -56,14 +56,6 @@ /* E500 */ #define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32 #define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33 -/* - * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same defines - */ -#define BOOKE_INTERRUPT_SPE_UNAVAIL BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL -#define BOOKE_INTERRUPT_SPE_FP_DATA BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST -#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL -#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \ - BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST I think I'd prefer to keep them separate. What is the reason from changing your mind from ver 1? Do you want to have different defines with same values (we specifically mapped them to the hardware interrupt numbers). We already upstreamed the necessary changes in the kernel. Scott, please share your opinion here. I don't like hiding the fact that they're the same number, which could lead to wrong code in the absence of ifdefs that strictly mutually exclude SPE and Altivec code -- there was an instance of this with MSR_VEC versus MSR_SPE in a previous patchset. That said, if you want to enforce that mutual exclusion in a way that is clear, I won't object too loudly -- but the code does look pretty similar between the two (as well as between the two IVORs). -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers
On Fri, 2014-07-04 at 00:35 +0200, Alexander Graf wrote: On 04.07.14 00:31, Scott Wood wrote: On Thu, 2014-07-03 at 17:15 -0500, Scott Wood wrote: On Thu, 2014-07-03 at 10:25 -0500, Caraman Mihai Claudiu-B02008 wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, July 03, 2014 3:21 PM To: Caraman Mihai Claudiu-B02008; kvm-ppc@vger.kernel.org Cc: k...@vger.kernel.org; linuxppc-...@lists.ozlabs.org Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers On 30.06.14 17:34, Mihai Caraman wrote: Use common BOOKE_IRQPRIO and BOOKE_INTERRUPT defines for SPE/FP/AltiVec which share the same interrupt numbers. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v2: - remove outdated definitions arch/powerpc/include/asm/kvm_asm.h| 8 arch/powerpc/kvm/booke.c | 17 + arch/powerpc/kvm/booke.h | 4 ++-- arch/powerpc/kvm/booke_interrupts.S | 9 + arch/powerpc/kvm/bookehv_interrupts.S | 4 ++-- arch/powerpc/kvm/e500.c | 10 ++ arch/powerpc/kvm/e500_emulate.c | 10 ++ 7 files changed, 30 insertions(+), 32 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h index 9601741..c94fd33 100644 --- a/arch/powerpc/include/asm/kvm_asm.h +++ b/arch/powerpc/include/asm/kvm_asm.h @@ -56,14 +56,6 @@ /* E500 */ #define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32 #define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33 -/* - * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same defines - */ -#define BOOKE_INTERRUPT_SPE_UNAVAIL BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL -#define BOOKE_INTERRUPT_SPE_FP_DATA BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST -#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL -#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \ - BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST I think I'd prefer to keep them separate. What is the reason from changing your mind from ver 1? Do you want to have different defines with same values (we specifically mapped them to the hardware interrupt numbers). We already upstreamed the necessary changes in the kernel. Scott, please share your opinion here. I don't like hiding the fact that they're the same number, which could lead to wrong code in the absence of ifdefs that strictly mutually exclude SPE and Altivec code -- there was an instance of this with MSR_VEC versus MSR_SPE in a previous patchset. That said, if you want to enforce that mutual exclusion in a way that is clear, I won't object too loudly -- but the code does look pretty similar between the two (as well as between the two IVORs). Yes, I want to make sure we have 2 separate code paths for SPE and Altivec. No code sharing at all unless it's very generically possible. Also, which code does look pretty similar? The fact that we deflect interrupts back into the guest? That's mostly boilerplate. There's also the injection of a program check (or exiting to userspace) when CONFIG_SPE/ALTIVEC is missing. Not a big deal, but maybe it could be factored into a helper function. I like minimizing boilerplate. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6 v2] KVM: PPC: Book3E: Add AltiVec support
On Mon, 2014-06-30 at 18:34 +0300, Mihai Caraman wrote: Add KVM Book3E AltiVec support. KVM Book3E FPU support gracefully reuse host infrastructure so follow the same approach for AltiVec. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v2: - integrate Paul's FP/VMX/VSX changes arch/powerpc/kvm/booke.c | 67 ++-- 1 file changed, 65 insertions(+), 2 deletions(-) I had to apply the whole patchset to get proper context for reviewing this, and found some nits: case BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST: if (kvmppc_supports_spe() || kvmppc_supports_altivec()) { kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST); r = RESUME_GUEST; } else { /* * These really should never happen without CONFIG_SPE, * as we should never enable the real MSR[SPE] in the * guest. */ Besides the comment not being updated for Altivec, it's not true on HV, where the guest can enable MSR[VEC] all by itself. For HV, the reason we shouldn't be able to get here is that we disable KVM on e6500 if CONFIG_ALTIVEC is not enabled, and no other HV core supports either SPE or Altivec. pr_crit(%s: unexpected SPE interrupt %u at %08lx\n, __func__, exit_nr, vcpu-arch.pc); Error string will say SPE regardless of what sort of chip you're on. Given that this is explicitly on the no support for Altivec or SPE path, SPE/Altivec phrasing seems appropriate. Of course we have bigger problems than that if we ever reach this code. :-) -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM : powerpc/booke: Allow debug interrupt injection to guest
On Tue, 2014-07-01 at 08:23 +0200, Alexander Graf wrote: Am 30.06.2014 um 22:25 schrieb Scott Wood scottw...@freescale.com: On Sun, 2014-06-29 at 23:38 -0500, Bhushan Bharat-R65777 wrote: -Original Message- From: Wood Scott-B07421 Sent: Friday, June 27, 2014 11:53 PM To: Bhushan Bharat-R65777 Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org Subject: Re: [PATCH 2/2] KVM : powerpc/booke: Allow debug interrupt injection to guest On Fri, 2014-06-27 at 11:55 +0530, Bharat Bhushan wrote: -/* Force enable debug interrupts when user space wants to debug */ -if (vcpu-guest_debug) { +/* + * Force enable debug interrupts when user space wants to debug + * and there is no debug interrupt pending for guest to handle. + */ +if (vcpu-guest_debug !kvmppc_core_pending_debug(vcpu)) { Are you trying to allow the guest to be simultaneously debugged by itself and by host userspace? How does this work? Not actually, Currently we are not partitioning debug resources between host userspace and guest. In fact we do not emulate debug registers for guest. But we want host userspace to pass the interrupt to guest if it is not able to handle. I don't understand the logic here. A debug interrupt should be injected when the programming model in the guest says that a debug interrupt should happen. How can that occur currently? If the guest didn't set up the debug registers and QEMU still can't handle the debug interrupt, that's a bug in QEMU (or KVM, or the hardware...). Injecting the interrupt into the guest just adds another bug on top of that. I don't think QEMU should be aware of these limitations. OK, but we should at least have some idea of how the whole thing is supposed to work, in order to determine if this is the correct behavior for QEMU. I thought the model was that debug resources are either owned by QEMU or by the guest, and in the latter case, QEMU would never see the debug exception to begin with. one reg? We are using SREGS but if required we can use one_reg. I thought we were preferring one reg over sregs for new functionality. I'm personally torn on this one. The problem here is that the sregs fields and values are already reserved. For anything we don't have an API for yet, yes, one_reg only. IIUC we have the API here, but were lacking the implementation. From a QEMU perspective, are we going to want to update this at the same time as everything else, or would jumping through sregs hoops be a nuisance? Is the long term goal to have one reg support for everything? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM : powerpc/booke: Allow debug interrupt injection to guest
On Tue, 2014-07-01 at 17:04 +0200, Alexander Graf wrote: On 01.07.14 16:58, Scott Wood wrote: On Tue, 2014-07-01 at 08:23 +0200, Alexander Graf wrote: I don't think QEMU should be aware of these limitations. OK, but we should at least have some idea of how the whole thing is supposed to work, in order to determine if this is the correct behavior for QEMU. I thought the model was that debug resources are either owned by QEMU or by the guest, and in the latter case, QEMU would never see the debug exception to begin with. That's bad for a number of reasons. For starters it's different from how x86 handles debug registers - and I hate to be different just for the sake of being different. How does it work on x86? So if we do want to declare that debug registers are owned by either QEMU or the guest, we should change the semantics for all architectures. If we want to say that ownership of the registers is shared, we need a plan for how that would actually work. The second problem I see is that we're disabling use cases for the sake of correctness. When I use gdbstub, I usually don't want anything in the way of debugging something - like if I want to debug the guest debugging itself for example. I don't have a problem with making it possible for userspace to forcibly claim ownership of the debug registers -- but I don't want to advertise that won't mess up the guest debugging that you're trying to debug, without some reason to believe that. So overall, I think the x86 approach is a nice middle ground between usability, performance and functionality. You mean don't document anything other than the mechanism to get/set the registers and hope everything works out? :-) one reg? We are using SREGS but if required we can use one_reg. I thought we were preferring one reg over sregs for new functionality. I'm personally torn on this one. The problem here is that the sregs fields and values are already reserved. For anything we don't have an API for yet, yes, one_reg only. IIUC we have the API here, but were lacking the implementation. From a QEMU perspective, are we going to want to update this at the same time as everything else, or would jumping through sregs hoops be a nuisance? Is the long term goal to have one reg support for everything? Because we need to maintain backwards compatibility, I don't think we'll actually have any benefit from one reg support for everything. We don't need to maintain backwards compatibility with interfaces that were never implemented, and even where we do need to maintain compatibility, that doesn't mean the caller needs to use the old, cumbersome interface (don't assume the caller is QEMU, or that all new features of QEMU need to support old kernels). I think we're in a path that is slow enough already to not worry about performance. It's not just about performance, but simplicity of use, and consistency of API. Oh, and it looks like there already exist one reg definitions and implementations for most of the debug registers. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM : powerpc/booke: Allow debug interrupt injection to guest
On Tue, 2014-07-01 at 18:22 +0200, Alexander Graf wrote: On 01.07.14 17:35, Scott Wood wrote: On Tue, 2014-07-01 at 17:04 +0200, Alexander Graf wrote: On 01.07.14 16:58, Scott Wood wrote: On Tue, 2014-07-01 at 08:23 +0200, Alexander Graf wrote: I don't think QEMU should be aware of these limitations. OK, but we should at least have some idea of how the whole thing is supposed to work, in order to determine if this is the correct behavior for QEMU. I thought the model was that debug resources are either owned by QEMU or by the guest, and in the latter case, QEMU would never see the debug exception to begin with. That's bad for a number of reasons. For starters it's different from how x86 handles debug registers - and I hate to be different just for the sake of being different. How does it work on x86? It overwrites more-or-less random breakpoints with its own ones, but leaves the others intact ;). Are you talking about software breakpoints or management of hardware debug registers? So if we do want to declare that debug registers are owned by either QEMU or the guest, we should change the semantics for all architectures. If we want to say that ownership of the registers is shared, we need a plan for how that would actually work. I think you're overengineering here :). When do people actually use gdbstub? Usually when they want to debug a broken guest. We can either * overengineer heavily and reduce the number of registers available to the guest to always have spares * overengineer a bit and turn off guest debugging completely when we use gdbstub * just leave as much alive as we can, hoping that it helps with the debugging Option 3 is what x86 does - and I think it's a reasonable approach. This is not an interface that needs to be 100% consistent and bullet proof, it's a best effort to enable you to debug as much as possible. I'm not insisting on 100% -- just hoping for some explanation/discussion about how it's intended to work for the cases where it can. How will MSR[DE] and MSRP[DEP] be handled? How would I go about telling QEMU/KVM that I don't want this shared mode, because I don't want guest to interfere with the debugging I'm trying to do from QEMU? Will guest accesses to debug registers cause a userspace exit when guest_debug is enabled? I think we're in a path that is slow enough already to not worry about performance. It's not just about performance, but simplicity of use, and consistency of API. Oh, and it looks like there already exist one reg definitions and implementations for most of the debug registers. For BookE? Where? arch/powerpc/kvm/booke.c: KVM_REG_PPC_IACn, KVM_REG_PPC_DACn -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: e500: Fix default tlb for victim hint
On Mon, 2014-06-30 at 15:54 +0300, Mihai Caraman wrote: Tlb search operation used for victim hint relies on the default tlb set by the host. When hardware tablewalk support is enabled in the host, the default tlb is TLB1 which leads KVM to evict the bolted entry. Set and restore the default tlb when searching for victim hint. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- arch/powerpc/include/asm/mmu-book3e.h | 5 - arch/powerpc/kvm/e500_mmu_host.c | 4 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/mmu-book3e.h b/arch/powerpc/include/asm/mmu-book3e.h index 901dac6..5dad378 100644 --- a/arch/powerpc/include/asm/mmu-book3e.h +++ b/arch/powerpc/include/asm/mmu-book3e.h @@ -40,7 +40,9 @@ /* MAS registers bit definitions */ -#define MAS0_TLBSEL(x) (((x) 28) 0x3000) +#define MAS0_TLBSEL_MASK0x3000 +#define MAS0_TLBSEL_SHIFT 28 +#define MAS0_TLBSEL(x) (((x) MAS0_TLBSEL_SHIFT) MAS0_TLBSEL_MASK) #define MAS0_ESEL_MASK 0x0FFF #define MAS0_ESEL_SHIFT 16 #define MAS0_ESEL(x) (((x) MAS0_ESEL_SHIFT) MAS0_ESEL_MASK) @@ -86,6 +88,7 @@ #define MAS3_SPSIZE 0x003e #define MAS3_SPSIZE_SHIFT1 +#define MAS4_TLBSEL_MASK MAS0_TLBSEL_MASK #define MAS4_TLBSELD(x) MAS0_TLBSEL(x) #define MAS4_INDD0x8000 /* Default IND */ #define MAS4_TSIZED(x) MAS1_TSIZE(x) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index dd2cc03..79677d7 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -107,11 +107,15 @@ static u32 get_host_mas0(unsigned long eaddr) { unsigned long flags; u32 mas0; + u32 mas4; local_irq_save(flags); mtspr(SPRN_MAS6, 0); + mas4 = mfspr(SPRN_MAS4); + mtspr(SPRN_MAS4, mas4 ~MAS4_TLBSEL_MASK); asm volatile(tlbsx 0, %0 : : b (eaddr ~CONFIG_PAGE_OFFSET)); mas0 = mfspr(SPRN_MAS0); + mtspr(SPRN_MAS4, mas4); local_irq_restore(flags); return mas0; Reviewed-by: Scott Wood scottw...@freescale.com -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: e500: Emulate power management control SPR
On Mon, 2014-06-30 at 15:55 +0300, Mihai Caraman wrote: For FSL e6500 core the kernel uses power management SPR register (PWRMGTCR0) to enable idle power down for cores and devices by setting up the idle count period at boot time. With the host already controlling the power management configuration the guest could simply benefit from it, so emulate guest request as nop. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- arch/powerpc/kvm/e500_emulate.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c index 002d517..98a22e5 100644 --- a/arch/powerpc/kvm/e500_emulate.c +++ b/arch/powerpc/kvm/e500_emulate.c @@ -250,6 +250,10 @@ int kvmppc_core_emulate_mtspr_e500(struct kvm_vcpu *vcpu, int sprn, ulong spr_va spr_val); break; + case SPRN_PWRMGTCR0: + /* Guest relies on host power management configurations */ + break; + /* extra exceptions */ case SPRN_IVOR32: vcpu-arch.ivor[BOOKE_IRQPRIO_SPE_UNAVAIL] = spr_val; @@ -355,6 +359,10 @@ int kvmppc_core_emulate_mfspr_e500(struct kvm_vcpu *vcpu, int sprn, ulong *spr_v *spr_val = 0; break; + case SPRN_PWRMGTCR0: + *spr_val = 0; + break; + case SPRN_MMUCFG: *spr_val = vcpu-arch.mmucfg; break; When reading, is it better to return zero, or the current host value, or the value last written by the guest (even though it wasn't written to hardware)? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM : powerpc/booke: Allow debug interrupt injection to guest
On Sun, 2014-06-29 at 23:38 -0500, Bhushan Bharat-R65777 wrote: -Original Message- From: Wood Scott-B07421 Sent: Friday, June 27, 2014 11:53 PM To: Bhushan Bharat-R65777 Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org Subject: Re: [PATCH 2/2] KVM : powerpc/booke: Allow debug interrupt injection to guest On Fri, 2014-06-27 at 11:55 +0530, Bharat Bhushan wrote: - /* Force enable debug interrupts when user space wants to debug */ - if (vcpu-guest_debug) { + /* + * Force enable debug interrupts when user space wants to debug + * and there is no debug interrupt pending for guest to handle. + */ + if (vcpu-guest_debug !kvmppc_core_pending_debug(vcpu)) { Are you trying to allow the guest to be simultaneously debugged by itself and by host userspace? How does this work? Not actually, Currently we are not partitioning debug resources between host userspace and guest. In fact we do not emulate debug registers for guest. But we want host userspace to pass the interrupt to guest if it is not able to handle. I don't understand the logic here. A debug interrupt should be injected when the programming model in the guest says that a debug interrupt should happen. How can that occur currently? If the guest didn't set up the debug registers and QEMU still can't handle the debug interrupt, that's a bug in QEMU (or KVM, or the hardware...). Injecting the interrupt into the guest just adds another bug on top of that. #ifdef CONFIG_KVM_BOOKE_HV /* * Since there is no shadow MSR, sync MSR_DE into the guest @@ -264,6 +272,16 @@ static void kvmppc_core_dequeue_watchdog(struct kvm_vcpu *vcpu) clear_bit(BOOKE_IRQPRIO_WATCHDOG, vcpu-arch.pending_exceptions); } +static void kvmppc_core_queue_debug(struct kvm_vcpu *vcpu) { + kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DEBUG); } + +static void kvmppc_core_dequeue_debug(struct kvm_vcpu *vcpu) { + clear_bit(BOOKE_IRQPRIO_DEBUG, vcpu-arch.pending_exceptions); } Is there currently no support for a guest debugging itself (i.e. guest_debug unset) on e500v2? Yes, It is not yet supported (IACx/DACx/DBCR/DBSR/DSRRx are not yet emulated). How is it useful to inject a debug exception into the guest, until these things are emulated? @@ static int set_sregs_base(struct kvm_vcpu *vcpu, if (sregs-u.e.update_special KVM_SREGS_E_UPDATE_TSR) kvmppc_set_tsr(vcpu, sregs-u.e.tsr); + if (sregs-u.e.update_special KVM_SREGS_E_UPDATE_DBSR) { + vcpu-arch.dbsr = sregs-u.e.dbsr; + if (vcpu-arch.dbsr) + kvmppc_core_queue_debug(vcpu); + else + kvmppc_core_dequeue_debug(vcpu); + } + return 0; } one reg? We are using SREGS but if required we can use one_reg. I thought we were preferring one reg over sregs for new functionality. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM : powerpc/booke: Allow debug interrupt injection to guest
On Fri, 2014-06-27 at 11:55 +0530, Bharat Bhushan wrote: This patch allow userspace to inject debug interrupt to guest. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com Could you describe how userspace plans to make use of this, and go into more detail about the changes you're making? --- arch/powerpc/kvm/booke.c | 31 +-- arch/powerpc/kvm/e500mc.c | 10 +- 2 files changed, 38 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index bb25937..63ac38c 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -135,6 +135,11 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu) #endif } +static int kvmppc_core_pending_debug(struct kvm_vcpu *vcpu) +{ + return test_bit(BOOKE_IRQPRIO_DEBUG, vcpu-arch.pending_exceptions); +} + static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu) { /* Synchronize guest's desire to get debug interrupts into shadow MSR */ @@ -143,8 +148,11 @@ static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu) vcpu-arch.shadow_msr |= vcpu-arch.shared-msr MSR_DE; #endif - /* Force enable debug interrupts when user space wants to debug */ - if (vcpu-guest_debug) { + /* + * Force enable debug interrupts when user space wants to debug + * and there is no debug interrupt pending for guest to handle. + */ + if (vcpu-guest_debug !kvmppc_core_pending_debug(vcpu)) { Are you trying to allow the guest to be simultaneously debugged by itself and by host userspace? How does this work? #ifdef CONFIG_KVM_BOOKE_HV /* * Since there is no shadow MSR, sync MSR_DE into the guest @@ -264,6 +272,16 @@ static void kvmppc_core_dequeue_watchdog(struct kvm_vcpu *vcpu) clear_bit(BOOKE_IRQPRIO_WATCHDOG, vcpu-arch.pending_exceptions); } +static void kvmppc_core_queue_debug(struct kvm_vcpu *vcpu) +{ + kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DEBUG); +} + +static void kvmppc_core_dequeue_debug(struct kvm_vcpu *vcpu) +{ + clear_bit(BOOKE_IRQPRIO_DEBUG, vcpu-arch.pending_exceptions); +} Is there currently no support for a guest debugging itself (i.e. guest_debug unset) on e500v2? static void set_guest_srr(struct kvm_vcpu *vcpu, unsigned long srr0, u32 srr1) { #ifdef CONFIG_KVM_BOOKE_HV @@ -1332,6 +1350,7 @@ static void get_sregs_base(struct kvm_vcpu *vcpu, sregs-u.e.dec = kvmppc_get_dec(vcpu, tb); sregs-u.e.tb = tb; sregs-u.e.vrsave = vcpu-arch.vrsave; + sregs-u.e.dbsr = vcpu-arch.dbsr; } static int set_sregs_base(struct kvm_vcpu *vcpu, @@ -1356,6 +1375,14 @@ static int set_sregs_base(struct kvm_vcpu *vcpu, if (sregs-u.e.update_special KVM_SREGS_E_UPDATE_TSR) kvmppc_set_tsr(vcpu, sregs-u.e.tsr); + if (sregs-u.e.update_special KVM_SREGS_E_UPDATE_DBSR) { + vcpu-arch.dbsr = sregs-u.e.dbsr; + if (vcpu-arch.dbsr) + kvmppc_core_queue_debug(vcpu); + else + kvmppc_core_dequeue_debug(vcpu); + } + return 0; } one reg? diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c index 17e4562..ea724f2 100644 --- a/arch/powerpc/kvm/e500mc.c +++ b/arch/powerpc/kvm/e500mc.c @@ -212,7 +212,7 @@ static int kvmppc_core_get_sregs_e500mc(struct kvm_vcpu *vcpu, struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu); sregs-u.e.features |= KVM_SREGS_E_ARCH206_MMU | KVM_SREGS_E_PM | -KVM_SREGS_E_PC; +KVM_SREGS_E_PC | KVM_SREGS_E_ED; sregs-u.e.impl_id = KVM_SREGS_E_IMPL_FSL; sregs-u.e.impl.fsl.features = 0; @@ -220,6 +220,9 @@ static int kvmppc_core_get_sregs_e500mc(struct kvm_vcpu *vcpu, sregs-u.e.impl.fsl.hid0 = vcpu_e500-hid0; sregs-u.e.impl.fsl.mcar = vcpu_e500-mcar; + sregs-u.e.dsrr0 = vcpu-arch.dsrr0; + sregs-u.e.dsrr1 = vcpu-arch.dsrr1; + kvmppc_get_sregs_e500_tlb(vcpu, sregs); sregs-u.e.ivor_high[3] = @@ -261,6 +264,11 @@ static int kvmppc_core_set_sregs_e500mc(struct kvm_vcpu *vcpu, sregs-u.e.ivor_high[5]; } + if (sregs-u.e.features KVM_SREGS_E_ED) { + vcpu-arch.dsrr0 = sregs-u.e.dsrr0; + vcpu-arch.dsrr1 = sregs-u.e.dsrr1; + } SPRG9? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/33] KVM: PPC: Fix IRQ race in magic page code
On Sun, 2014-06-22 at 23:23 +0200, Alexander Graf wrote: Howdy, Ben reminded me a while back that we have a nasty race in our KVM PV code. We replace a few instructions with longer streams of instructions to check whether it's necessary to trap out from it (like mtmsr, no need to trap if we only disable interrupts). During those replacement chunks we must not get any interrupts, because they might overwrite scratch space that we already used to save otherwise clobbered register state into. So we have a thing called critical sections which allows us to atomically get in and out of interrupt disabled modes without touching MSR. When we are supposed to deliver an interrupt into the guest while we are in a critical section, we just don't inject the interrupt yet, but leave it be until the next trap. However, we never really know when the next trap would be. For all we know it could be never. At this point we created a race that is a potential source for interrupt loss or at least deferral. This patch set aims at solving the race. Instead of merely deferring an interrupt when we see such a situation, we go into a special instruction interpretation mode. In this mode, we interpret all PPC assembler instructions that happen until we are out of the critical section again, at which point we can now inject the interrupt. This bug only affects KVM implementations that make use of the magic page, so e500v2, book3s_32 and book3s_64 PR KVM. Would it be possible to single step through the critical section instead? Or set a high res timer to expire very quickly? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/33] KVM: PPC: Fix IRQ race in magic page code
On Wed, 2014-06-25 at 00:41 +0200, Alexander Graf wrote: On 24.06.14 20:53, Scott Wood wrote: On Sun, 2014-06-22 at 23:23 +0200, Alexander Graf wrote: Howdy, Ben reminded me a while back that we have a nasty race in our KVM PV code. We replace a few instructions with longer streams of instructions to check whether it's necessary to trap out from it (like mtmsr, no need to trap if we only disable interrupts). During those replacement chunks we must not get any interrupts, because they might overwrite scratch space that we already used to save otherwise clobbered register state into. So we have a thing called critical sections which allows us to atomically get in and out of interrupt disabled modes without touching MSR. When we are supposed to deliver an interrupt into the guest while we are in a critical section, we just don't inject the interrupt yet, but leave it be until the next trap. However, we never really know when the next trap would be. For all we know it could be never. At this point we created a race that is a potential source for interrupt loss or at least deferral. This patch set aims at solving the race. Instead of merely deferring an interrupt when we see such a situation, we go into a special instruction interpretation mode. In this mode, we interpret all PPC assembler instructions that happen until we are out of the critical section again, at which point we can now inject the interrupt. This bug only affects KVM implementations that make use of the magic page, so e500v2, book3s_32 and book3s_64 PR KVM. Would it be possible to single step through the critical section instead? Or set a high res timer to expire very quickly? There are a few other alternatives to this implementation: 1) Unmap the magic page, emulate all memory access to it while in critical and irq pending 2) Trigger a timer that sends a request to the vcpu to wake it from potential sleep and inject the irq 3) Single step until we're beyond the critical section 4) Probably more that I can't think of right now :) Each has their good and bad sides. Unmapping the magic page adds complexity to the MMU mapping code, since we need to make sure we don't map it back in on demand and treat faults to it specially. The timer interrupt works, but I'm not fully convinced that it's a good idea for things like MC events which we also block during critical sections on e500v2. Are you concerned about the guest seeing machine checks that are (more) asynchronous with the error condition? e500v2 machine checks are always asynchronous. From the core manual: Machine check interrupts are typically caused by a hardware or memory subsystem failure or by an attempt to access an invalid address. They may be caused indirectly by execution of an instruction, but may not be recognized or reported until long after the processor has executed past the instruction that caused the machine check. As such, machine check interrupts are not thought of as synchronous or asynchronous nor as precise or imprecise. I don't think the lag would be a problem, and certainly it's better than the current situation. Single stepping is hard enough to get right on interaction between QEMU, KVM and the guest. I didn't really want to make that stuff any more complicated. I'm not sure that it would add much complexity. We'd just need to check whether any source other than the magic page turned wants DCBR0_IC on, to determine whether to exit to userspace or not. This approach is really just one out of many - and it's one that's nicely self-contained and shouldn't have any impact at all on implementations that don't care about it ;). Nicely self-contained is not a phrase I'd associate with 33 patches, including a bunch of new emulation that probably isn't getting great test coverage. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/33] KVM: PPC: Fix IRQ race in magic page code
On Wed, 2014-06-25 at 01:40 +0200, Alexander Graf wrote: On 25.06.14 01:15, Scott Wood wrote: On Wed, 2014-06-25 at 00:41 +0200, Alexander Graf wrote: On 24.06.14 20:53, Scott Wood wrote: The timer interrupt works, but I'm not fully convinced that it's a good idea for things like MC events which we also block during critical sections on e500v2. Are you concerned about the guest seeing machine checks that are (more) asynchronous with the error condition? e500v2 machine checks are always asynchronous. From the core manual: Machine check interrupts are typically caused by a hardware or memory subsystem failure or by an attempt to access an invalid address. They may be caused indirectly by execution of an instruction, but may not be recognized or reported until long after the processor has executed past the instruction that caused the machine check. As such, machine check interrupts are not thought of as synchronous or asynchronous nor as precise or imprecise. I don't think the lag would be a problem, and certainly it's better than the current situation. So what value would you set the timer to? If the value is too small, we never finish the critical section. If it's too big, we add lots of jitter. Maybe something like 100us? Single stepping would be better, though. Single stepping is hard enough to get right on interaction between QEMU, KVM and the guest. I didn't really want to make that stuff any more complicated. I'm not sure that it would add much complexity. We'd just need to check whether any source other than the magic page turned wants DCBR0_IC on, to determine whether to exit to userspace or not. What if the guest is single stepping itself? How do we determine when to unset the bit again? When we get out of the critical section? How do we know what the value was before we set it? Keep track of each requester of single stepping separately, and only ever set the real bit by ORing them. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
On Wed, 2014-06-18 at 10:15 +0300, Mihai Caraman wrote: On vcpu schedule, the condition checked for tlb pollution is too loose. The tlb entries of a vcpu become polluted (vs stale) only when a different vcpu within the same logical partition runs in-between. Optimize the tlb invalidation condition keeping last_vcpu per logical partition id. With the new invalidation condition, a guest shows 4% performance improvement on P5020DS while running a memory stress application with the cpu oversubscribed, the other guest running a cpu intensive workload. Guest - old invalidation condition real 3.89 user 3.87 sys 0.01 Guest - enhanced invalidation condition real 3.75 user 3.73 sys 0.01 Host real 3.70 user 1.85 sys 0.00 The memory stress application accesses 4KB pages backed by 75% of available TLB0 entries: char foo[ENTRIES][4096] __attribute__ ((aligned (4096))); int main() { char bar; int i, j; for (i = 0; i ITERATIONS; i++) for (j = 0; j ENTRIES; j++) bar = foo[j][0]; return 0; } Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Cc: Scott Wood scottw...@freescale.com --- v4: - rename last_vcpu_on_cpu to last_vcpu_of_lpid - use *[ syntax despite checkpatch error v3: - use existing logic while keeping last_cpu per lpid arch/powerpc/kvm/e500mc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c index 17e4562..690499d 100644 --- a/arch/powerpc/kvm/e500mc.c +++ b/arch/powerpc/kvm/e500mc.c @@ -110,7 +110,7 @@ void kvmppc_mmu_msr_notify(struct kvm_vcpu *vcpu, u32 old_msr) { } -static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu_on_cpu); +static DEFINE_PER_CPU(struct kvm_vcpu *[KVMPPC_NR_LPIDS], last_vcpu_of_lpid); static void kvmppc_core_vcpu_load_e500mc(struct kvm_vcpu *vcpu, int cpu) { @@ -141,9 +141,9 @@ static void kvmppc_core_vcpu_load_e500mc(struct kvm_vcpu *vcpu, int cpu) mtspr(SPRN_GESR, vcpu-arch.shared-esr); if (vcpu-arch.oldpir != mfspr(SPRN_PIR) || - __get_cpu_var(last_vcpu_on_cpu) != vcpu) { + __get_cpu_var(last_vcpu_of_lpid)[vcpu-kvm-arch.lpid] != vcpu) { kvmppc_e500_tlbil_all(vcpu_e500); - __get_cpu_var(last_vcpu_on_cpu) = vcpu; + __get_cpu_var(last_vcpu_of_lpid)[vcpu-kvm-arch.lpid] = vcpu; } kvmppc_load_guest_fp(vcpu); Reviewed-by: Scott Wood scottw...@freescale.com -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
On Tue, 2014-06-17 at 14:04 -0500, Caraman Mihai Claudiu-B02008 wrote: -Original Message- From: Wood Scott-B07421 Sent: Tuesday, June 17, 2014 6:36 PM To: Caraman Mihai Claudiu-B02008 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org Subject: Re: [PATCH v2] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule On Tue, 2014-06-17 at 12:02 +0300, Mihai Caraman wrote: On vcpu schedule, the condition checked for tlb pollution is too loose. The tlb entries of a vcpu become polluted (vs stale) only when a different vcpu within the same logical partition runs in-between. Optimize the tlb invalidation condition taking into account the logical partition id. With the new invalidation condition, a guest shows 4% performance improvement on P5020DS while running a memory stress application with the cpu oversubscribed, the other guest running a cpu intensive workload. See https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-June/118547.html Thanks. The original code needs just a simple adjustment to benefit from this optimization, please review v3. Where is v3? Or is it forthcoming? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
On Tue, 2014-06-17 at 22:09 +0300, Mihai Caraman wrote: On vcpu schedule, the condition checked for tlb pollution is too loose. The tlb entries of a vcpu become polluted (vs stale) only when a different vcpu within the same logical partition runs in-between. Optimize the tlb invalidation condition keeping last_vcpu_on_cpu per logical partition id. With the new invalidation condition, a guest shows 4% performance improvement on P5020DS while running a memory stress application with the cpu oversubscribed, the other guest running a cpu intensive workload. Guest - old invalidation condition real 3.89 user 3.87 sys 0.01 Guest - enhanced invalidation condition real 3.75 user 3.73 sys 0.01 Host real 3.70 user 1.85 sys 0.00 The memory stress application accesses 4KB pages backed by 75% of available TLB0 entries: char foo[ENTRIES][4096] __attribute__ ((aligned (4096))); int main() { char bar; int i, j; for (i = 0; i ITERATIONS; i++) for (j = 0; j ENTRIES; j++) bar = foo[j][0]; return 0; } Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Cc: Scott Wood scottw...@freescale.com --- v3: - use existing logic while keeping last_vcpu_per_cpu per lpid v2: - improve patch name and description - add performance results arch/powerpc/kvm/e500mc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c index 17e4562..95e33e3 100644 --- a/arch/powerpc/kvm/e500mc.c +++ b/arch/powerpc/kvm/e500mc.c @@ -110,7 +110,7 @@ void kvmppc_mmu_msr_notify(struct kvm_vcpu *vcpu, u32 old_msr) { } -static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu_on_cpu); +static DEFINE_PER_CPU(struct kvm_vcpu * [KVMPPC_NR_LPIDS], last_vcpu_on_cpu); Hmm, I didn't know you could express types like that. Is this special syntax that only works for typeof? No space after * Name should be adjusted to match, something like last_vcpu_of_lpid (with the _on_cpu being implied by the fact that it's PER_CPU). -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
On Tue, 2014-06-17 at 14:42 -0500, Caraman Mihai Claudiu-B02008 wrote: -static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu_on_cpu); +static DEFINE_PER_CPU(struct kvm_vcpu * [KVMPPC_NR_LPIDS], last_vcpu_on_cpu); Hmm, I didn't know you could express types like that. Is this special syntax that only works for typeof? Yes, AFAIK. No space after * Checkpatch complains about the missing space ;) Checkpatch is wrong, which isn't surprising given that this is unusual syntax. We don't normally put a space after * when used to represent a pointer. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
On Tue, 2014-06-17 at 15:02 -0500, Caraman Mihai Claudiu-B02008 wrote: -Original Message- From: Wood Scott-B07421 Sent: Tuesday, June 17, 2014 10:48 PM To: Caraman Mihai Claudiu-B02008 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org Subject: Re: [PATCH v3] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule On Tue, 2014-06-17 at 14:42 -0500, Caraman Mihai Claudiu-B02008 wrote: -static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu_on_cpu); +static DEFINE_PER_CPU(struct kvm_vcpu * [KVMPPC_NR_LPIDS], last_vcpu_on_cpu); Hmm, I didn't know you could express types like that. Is this special syntax that only works for typeof? Yes, AFAIK. No space after * Checkpatch complains about the missing space ;) Checkpatch is wrong, which isn't surprising given that this is unusual syntax. We don't normally put a space after * when used to represent a pointer. This is not something new. See [PATCH 04/10] percpu: cleanup percpu array definitions: https://lkml.org/lkml/2009/6/24/26 I didn't say it was new, just unusual, and checkpatch doesn't recognize it. Checkpatch shouldn't be blindly and silently obeyed when it says something strange. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule
On Fri, 2014-06-13 at 16:55 +0200, Alexander Graf wrote: On 13.06.14 16:43, mihai.cara...@freescale.com wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, June 12, 2014 8:05 PM To: Caraman Mihai Claudiu-B02008 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org; Wood Scott-B07421 Subject: Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule On 06/12/2014 04:00 PM, Mihai Caraman wrote: On vcpu schedule, the condition checked for tlb pollution is too tight. The tlb entries of one vcpu are polluted when a different vcpu from the same partition runs in-between. Relax the current tlb invalidation condition taking into account the lpid. Can you quantify the performance improvement from this? We've had bugs in this area before, so let's make sure it's worth it before making this more complicated. Signed-off-by: Mihai Caraman mihai.caraman at freescale.com Your mailer is broken? :) This really should be an @. I think this should work. Scott, please ack. Alex, you were right. I screwed up the patch description by inverting relax and tight terms :) It should have been more like this: KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule On vcpu schedule, the condition checked for tlb pollution is too loose. The tlb entries of a vcpu are polluted (vs stale) only when a different vcpu within the same logical partition runs in-between. Optimize the tlb invalidation condition taking into account the lpid. Can't we give every vcpu its own lpid? Or don't we trap on global invalidates? That would significantly increase the odds of exhausting LPIDs, especially on large chips like t4240 with similarly large VMs. If we were to do that, the LPIDs would need to be dynamically assigned (like PIDs), and should probably be a separate numberspace per physical core. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4 v3] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation
On Thu, 2014-06-12 at 18:04 +0200, Alexander Graf wrote: On 06/02/2014 05:50 PM, Mihai Caraman wrote: On book3e, KVM uses load external pid (lwepx) dedicated instruction to read guest last instruction on the exit path. lwepx exceptions (DTLB_MISS, DSI and LRAT), generated by loading a guest address, needs to be handled by KVM. These exceptions are generated in a substituted guest translation context (EPLC[EGS] = 1) from host context (MSR[GS] = 0). Currently, KVM hooks only interrupts generated from guest context (MSR[GS] = 1), doing minimal checks on the fast path to avoid host performance degradation. lwepx exceptions originate from host state (MSR[GS] = 0) which implies additional checks in DO_KVM macro (beside the current MSR[GS] = 1) by looking at the Exception Syndrome Register (ESR[EPID]) and the External PID Load Context Register (EPLC[EGS]). Doing this on each Data TLB miss exception is obvious too intrusive for the host. Read guest last instruction from kvmppc_load_last_inst() by searching for the physical address and kmap it. This address the TODO for TLB eviction and execute-but-not-read entries, and allow us to get rid of lwepx until we are able to handle failures. A simple stress benchmark shows a 1% sys performance degradation compared with previous approach (lwepx without failure handling): time for i in `seq 1 1`; do /bin/echo /dev/null; done real0m 8.85s user0m 4.34s sys 0m 4.48s vs real0m 8.84s user0m 4.36s sys 0m 4.44s An alternative solution, to handle lwepx exceptions in KVM, is to temporary highjack the interrupt vector from host. Some cores share host IVOR registers between hardware threads, which is the case of FSL e6500, which impose additional synchronization logic for this solution to work. This optimized solution can be developed later on top of this patch. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v3: - reworked patch description - use unaltered kmap addr for kunmap - get last instruction before beeing preempted v2: - reworked patch description - used pr_* functions - addressed cosmetic feedback arch/powerpc/kvm/booke.c | 32 arch/powerpc/kvm/bookehv_interrupts.S | 37 -- arch/powerpc/kvm/e500_mmu_host.c | 93 +++ 3 files changed, 134 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 34a42b9..4ef52a8 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -880,6 +880,8 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, int r = RESUME_HOST; int s; int idx; + u32 last_inst = KVM_INST_FETCH_FAILED; + enum emulation_result emulated = EMULATE_DONE; /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); @@ -887,6 +889,15 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, /* restart interrupts if they were meant for the host */ kvmppc_restart_interrupt(vcpu, exit_nr); + /* +* get last instruction before beeing preempted +* TODO: for e6500 check also BOOKE_INTERRUPT_LRAT_ERROR ESR_DATA +*/ + if (exit_nr == BOOKE_INTERRUPT_DATA_STORAGE || + exit_nr == BOOKE_INTERRUPT_DTLB_MISS || + exit_nr == BOOKE_INTERRUPT_HV_PRIV) Please make this a switch() - that's easier to read. + emulated = kvmppc_get_last_inst(vcpu, false, last_inst); + local_irq_enable(); trace_kvm_exit(exit_nr, vcpu); @@ -895,6 +906,26 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, run-exit_reason = KVM_EXIT_UNKNOWN; run-ready_for_interrupt_injection = 1; + switch (emulated) { + case EMULATE_AGAIN: + r = RESUME_GUEST; + goto out; + + case EMULATE_FAIL: + pr_debug(%s: emulation at %lx failed (%08x)\n, + __func__, vcpu-arch.pc, last_inst); + /* For debugging, encode the failing instruction and +* report it to userspace. */ + run-hw.hardware_exit_reason = ~0ULL 32; + run-hw.hardware_exit_reason |= last_inst; + kvmppc_core_queue_program(vcpu, ESR_PIL); + r = RESUME_HOST; + goto out; + + default: + break; + } I think you can just put this into a function. Scott, I think the patch overall looks quite good. Can you please check as well and if you agree give it your reviewed-by? Mike, when Scott gives you a reviewed-by, please include it for the next version. Alex + switch (exit_nr) { case BOOKE_INTERRUPT_MACHINE_CHECK: printk(MACHINE CHECK: %lx\n, mfspr(SPRN_MCSR)); @@ -1184,6 +1215,7 @@ int
Re: [PATCH v2 4/4] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation
On Fri, 2014-05-02 at 13:10 +0200, Alexander Graf wrote: On 05/02/2014 12:12 PM, David Laight wrote: You also probably want the page mapped uncached - no point polluting the data cache. We can't do that without creating an architecturally illegal alias between cacheable and non-cacheable mappings. Do e500 chips have a shared I/D cache somewhere? Yes. Only L1 is separate. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation
On Tue, 2014-04-01 at 07:47 +0200, Alexander Graf wrote: Am 01.04.2014 um 01:03 schrieb Scott Wood scottw...@freescale.com: On Mon, 2014-03-31 at 15:41 +0200, Alexander Graf wrote: On 03/26/2014 10:17 PM, Scott Wood wrote: On Thu, 2014-02-20 at 18:30 +0200, Mihai Caraman wrote: +/* + * Another thread may rewrite the TLB entry in parallel, don't + * execute from the address if the execute permission is not set + */ What happens when another thread rewrites the TLB entry in parallel? Does tlbsx succeed? Does it fail? Do we see failure indicated somehow? Are the contents of the MAS registers consistent at this point or inconsistent? If another thread rewrites the TLB entry, then we use the new TLB entry, just as if it had raced in hardware. This check ensures that we don't execute from the new TLB entry if it doesn't have execute permissions (just as hardware would). What happens if the new TLB entry is valid and executable, and the instruction pointed to is valid, but doesn't trap (and thus we don't have emulation for it)? There has to be a good way to detect such a race and deal with it, no? How would you detect it? We don't get any information from the trap about what physical address the instruction was fetched from, or what the instruction was. Ah, this is a guest race where the guest modifies exactly the TLB in question. I see. Why would this ever happen in practice (in a case where we're not executing malicious code)? I don't know. It probably wouldn't. But it seems wrong to not check the exec bit. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation
On Mon, 2014-03-31 at 15:41 +0200, Alexander Graf wrote: On 03/26/2014 10:17 PM, Scott Wood wrote: On Thu, 2014-02-20 at 18:30 +0200, Mihai Caraman wrote: + /* + * Another thread may rewrite the TLB entry in parallel, don't + * execute from the address if the execute permission is not set + */ What happens when another thread rewrites the TLB entry in parallel? Does tlbsx succeed? Does it fail? Do we see failure indicated somehow? Are the contents of the MAS registers consistent at this point or inconsistent? If another thread rewrites the TLB entry, then we use the new TLB entry, just as if it had raced in hardware. This check ensures that we don't execute from the new TLB entry if it doesn't have execute permissions (just as hardware would). What happens if the new TLB entry is valid and executable, and the instruction pointed to is valid, but doesn't trap (and thus we don't have emulation for it)? There has to be a good way to detect such a race and deal with it, no? How would you detect it? We don't get any information from the trap about what physical address the instruction was fetched from, or what the instruction was. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Alow kvmppc_get_last_inst() to fail
On Thu, 2014-02-20 at 18:30 +0200, Mihai Caraman wrote: diff --git a/arch/powerpc/kvm/book3s_paired_singles.c b/arch/powerpc/kvm/book3s_paired_singles.c index a59a25a..80c533e 100644 --- a/arch/powerpc/kvm/book3s_paired_singles.c +++ b/arch/powerpc/kvm/book3s_paired_singles.c @@ -640,19 +640,24 @@ static int kvmppc_ps_one_in(struct kvm_vcpu *vcpu, bool rc, int kvmppc_emulate_paired_single(struct kvm_run *run, struct kvm_vcpu *vcpu) { - u32 inst = kvmppc_get_last_inst(vcpu); + u32 inst; enum emulation_result emulated = EMULATE_DONE; - - int ax_rd = inst_get_field(inst, 6, 10); - int ax_ra = inst_get_field(inst, 11, 15); - int ax_rb = inst_get_field(inst, 16, 20); - int ax_rc = inst_get_field(inst, 21, 25); - short full_d = inst_get_field(inst, 16, 31); - - u64 *fpr_d = vcpu-arch.fpr[ax_rd]; - u64 *fpr_a = vcpu-arch.fpr[ax_ra]; - u64 *fpr_b = vcpu-arch.fpr[ax_rb]; - u64 *fpr_c = vcpu-arch.fpr[ax_rc]; + int ax_rd, ax_ra, ax_rb, ax_rc; + short full_d; + u64 *fpr_d, *fpr_a, *fpr_b, *fpr_c; + + kvmppc_get_last_inst(vcpu, inst); Should probably check for failure here and elsewhere -- even though it can't currently fail on book3s, the interface now allows it. diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index 5b9e906..b0d884d 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -624,9 +624,10 @@ void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr) static int kvmppc_read_inst(struct kvm_vcpu *vcpu) { ulong srr0 = kvmppc_get_pc(vcpu); - u32 last_inst = kvmppc_get_last_inst(vcpu); + u32 last_inst; int ret; + kvmppc_get_last_inst(vcpu, last_inst); ret = kvmppc_ld(vcpu, srr0, sizeof(u32), last_inst, false); This isn't new, but this function looks odd to me -- calling kvmppc_get_last_inst() but ignoring last_inst, then calling kvmppc_ld() and ignoring anything but failure. last_inst itself is never read. And no comments to explain the weirdness. :-) I get that kvmppc_get_last_inst() is probably being used for the side effect of filling in vcpu-arch.last_inst, but why store the return value without using it? Why pass the address of it to kvmppc_ld(), which seems to be used only as an indirect way of determining whether kvmppc_get_last_inst() failed? And that whole mechanism becomes stranger once it becomes possible for kvmppc_get_last_inst() to directly return failure. diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h index 09bfd9b..c7c60c2 100644 --- a/arch/powerpc/kvm/booke.h +++ b/arch/powerpc/kvm/booke.h @@ -90,6 +90,9 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu); void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu); void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu); +void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu, + ulong esr_flags); Whitespace + enum int_class { INT_CLASS_NONCRIT, INT_CLASS_CRIT, @@ -123,6 +126,8 @@ extern int kvmppc_core_emulate_mtspr_e500(struct kvm_vcpu *vcpu, int sprn, extern int kvmppc_core_emulate_mfspr_e500(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val); +extern int kvmppc_ld_inst(struct kvm_vcpu *vcpu, u32 *instr) ; Whitespace diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index ecf2247..6025cb7 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -34,6 +34,7 @@ #include e500.h #include timing.h #include e500_mmu_host.h +#include booke.h #include trace_booke.h @@ -597,6 +598,10 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, gpa_t gpaddr, } } +int kvmppc_ld_inst(struct kvm_vcpu *vcpu, u32 *instr) { + return EMULATE_FAIL; +}; Brace placement /* MMU Notifiers */ int kvm_unmap_hva(struct kvm *kvm, unsigned long hva) diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c index 2f9a087..24a8e50 100644 --- a/arch/powerpc/kvm/emulate.c +++ b/arch/powerpc/kvm/emulate.c @@ -225,19 +225,26 @@ static int kvmppc_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt) * from opcode tables in the future. */ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu) { - u32 inst = kvmppc_get_last_inst(vcpu); - int ra = get_ra(inst); - int rs = get_rs(inst); - int rt = get_rt(inst); - int sprn = get_sprn(inst); - enum emulation_result emulated = EMULATE_DONE; + u32 inst; + int ra, rs, rt, sprn; + enum emulation_result emulated; int advance = 1; /* this default type might be overwritten by subcategories */ kvmppc_set_exit_type(vcpu, EMULATED_INST_EXITS); + emulated = kvmppc_get_last_inst(vcpu, inst); + if (emulated != EMULATE_DONE) { +
Re: [PATCH 4/4] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation
On Thu, 2014-02-20 at 18:30 +0200, Mihai Caraman wrote: Load external pid (lwepx) instruction faults (when called from KVM with guest context) needs to be handled by KVM. This implies additional code in DO_KVM macro to identify the source of the exception (which oiginate from KVM host rather than the guest). The hook requires to check the Exception Syndrome Register ESR[EPID] and External PID Load Context Register EPLC[EGS] for some exceptions (DTLB_MISS, DSI and LRAT). Doing this on Data TLB miss exception is obvious intrusive for the host. Get rid of lwepx and acquire last instuction in kvmppc_get_last_inst() by searching for the physical address and kmap it. This fixes an infinite loop caused by lwepx's data TLB miss handled in the host and the TODO for TLB eviction and execute-but-not-read entries. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- arch/powerpc/kvm/bookehv_interrupts.S | 37 +++-- arch/powerpc/kvm/e500_mmu_host.c | 93 + 2 files changed, 102 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index 20c7a54..c50490c 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -119,38 +119,14 @@ 1: .if \flags NEED_EMU - /* - * This assumes you have external PID support. - * To support a bookehv CPU without external PID, you'll - * need to look up the TLB entry and create a temporary mapping. - * - * FIXME: we don't currently handle if the lwepx faults. PR-mode - * booke doesn't handle it either. Since Linux doesn't use - * broadcast tlbivax anymore, the only way this should happen is - * if the guest maps its memory execute-but-not-read, or if we - * somehow take a TLB miss in the middle of this entry code and - * evict the relevant entry. On e500mc, all kernel lowmem is - * bolted into TLB1 large page mappings, and we don't use - * broadcast invalidates, so we should not take a TLB miss here. - * - * Later we'll need to deal with faults here. Disallowing guest - * mappings that are execute-but-not-read could be an option on - * e500mc, but not on chips with an LRAT if it is used. - */ - - mfspr r3, SPRN_EPLC /* will already have correct ELPID and EGS */ PPC_STL r15, VCPU_GPR(R15)(r4) PPC_STL r16, VCPU_GPR(R16)(r4) PPC_STL r17, VCPU_GPR(R17)(r4) PPC_STL r18, VCPU_GPR(R18)(r4) PPC_STL r19, VCPU_GPR(R19)(r4) - mr r8, r3 PPC_STL r20, VCPU_GPR(R20)(r4) - rlwimi r8, r6, EPC_EAS_SHIFT - MSR_IR_LG, EPC_EAS PPC_STL r21, VCPU_GPR(R21)(r4) - rlwimi r8, r6, EPC_EPR_SHIFT - MSR_PR_LG, EPC_EPR PPC_STL r22, VCPU_GPR(R22)(r4) - rlwimi r8, r10, EPC_EPID_SHIFT, EPC_EPID PPC_STL r23, VCPU_GPR(R23)(r4) PPC_STL r24, VCPU_GPR(R24)(r4) PPC_STL r25, VCPU_GPR(R25)(r4) @@ -160,10 +136,15 @@ PPC_STL r29, VCPU_GPR(R29)(r4) PPC_STL r30, VCPU_GPR(R30)(r4) PPC_STL r31, VCPU_GPR(R31)(r4) - mtspr SPRN_EPLC, r8 - isync - lwepx r9, 0, r5 - mtspr SPRN_EPLC, r3 + + /* + * We don't use external PID support. lwepx faults would need to be + * handled by KVM and this implies aditional code in DO_KVM (for + * DTB_MISS, DSI and LRAT) to check ESR[EPID] and EPLC[EGS] which + * is too intrusive for the host. Get last instuction in + * kvmppc_get_last_inst(). + */ + li r9, KVM_INST_FETCH_FAILED stw r9, VCPU_LAST_INST(r4) .endif diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 6025cb7..1b4cb41 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -598,9 +598,102 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, gpa_t gpaddr, } } +#ifdef CONFIG_KVM_BOOKE_HV +int kvmppc_ld_inst(struct kvm_vcpu *vcpu, u32 *instr) It'd be interesting to see what the performance impact of doing this on non-HV would be -- it would eliminate divergent code, eliminate the MSR_DS hack, and make exec-only mappings work. +{ + gva_t geaddr; + hpa_t addr; + hfn_t pfn; + hva_t eaddr; + u32 mas0, mas1, mas2, mas3; + u64 mas7_mas3; + struct page *page; + unsigned int addr_space, psize_shift; + bool pr; + unsigned long flags; + + /* Search TLB for guest pc to get the real address */ + geaddr = kvmppc_get_pc(vcpu); + addr_space = (vcpu-arch.shared-msr MSR_IS) MSR_IR_LG; + + local_irq_save(flags); + mtspr(SPRN_MAS6, (vcpu-arch.pid MAS6_SPID_SHIFT) | addr_space); + mtspr(SPRN_MAS5, MAS5_SGS | vcpu-kvm-arch.lpid); + isync(); + asm volatile(tlbsx 0, %[geaddr]\n : : [geaddr] r (geaddr)); We can probably get away without
Re: [PATCH 0/8] PPC Book 3S HV-mode KVM updates for 3.15
On Tue, 2014-03-25 at 10:47 +1100, Paul Mackerras wrote: This series of patches fixes some bugs in HV-mode KVM for PowerPC Book 3S and finishes off adding the support for POWER8. Patches 2 and 3 are the two patches from the series I posted in January that Alex Graf didn't apply at that stage. I have updated them according to his review comments. The last patch is also POWER8-related, adding code to save and restore more of the host state of the PMU. (We context-switch the PMU between host and guest since the guest can access the PMU directly.) The remaining patches fix bugs that have been found over the last few months of testing. This patch series is based on the merge of the queue branch of the kvm tree with the kvm-ppc-queue branch of Alex Graf's tree, though I expect they would apply cleanly against the kvm tree queue branch also. I would like these to go into 3.15. Scott, please ack. Paul. --- [PATCH 1/8] KVM: PPC: Book3S HV: Fix KVM hang with CONFIG_KVM_XICS=n [PATCH 2/8] KVM: PPC: Book3S HV: Add transactional memory support [PATCH 3/8] KVM: PPC: Book3S HV: Add get/set_one_reg for new TM state [PATCH 4/8] KVM: PPC: Book3S: Trim top 4 bits of physical address in [PATCH 5/8] KVM: PPC: Book3S HV: Return ENODEV error rather than EIO [PATCH 6/8] KVM: PPC: Book3S HV: Don't use kvm_memslots() in real [PATCH 7/8] KVM: PPC: Book3S HV: Fix decrementer timeouts with [PATCH 8/8] KVM: PPC: Book3S HV: Save/restore host PMU registers that Acked-by: Scott Wood scottw...@freescale.com -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Fixes for HV KVM on PPC for 3.14
On Thu, 2014-03-13 at 20:01 +1100, Paul Mackerras wrote: These two patches fix two things in hypervisor-mode KVM for the IBM POWER server processors. The first patch removes a hunk of extraneous code that got in as a result of a mistake I made in cleaning up after rebasing a patch. The second fixes a bug that causes host memory corruption. Both patches fix things that cause host crashes, so I'd like them in 3.14 if possible. The two patches only touch one file, arch/powerpc/kvm/book3s_hv_rmhandlers.S, so they can't possibly cause any problems for other architectures or other PPC platforms. The patches are against the master branch of the kvm tree but should apply equally on Linus' current master branch. Scott, please ack. Paolo, I meant to get these out earlier, but a personal emergency arose this week and delayed me. Acked-by: Scott Wood scottw...@freescale.com -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/10] powerpc/booke64: Use SPRG7 for VDSO
Previously SPRG3 was marked for use by both VDSO and critical interrupts (though critical interrupts were not fully implemented). In commit 8b64a9dfb091f1eca8b7e58da82f1e7d1d5fe0ad (powerpc/booke64: Use SPRG0/3 scratch for bolted TLB miss crit int), Mihai Caraman made an attempt to resolve this conflict by restoring the VDSO value early in the critical interrupt, but this has some issues: - It's incompatible with EXCEPTION_COMMON which restores r13 from the by-then-overwritten scratch (this cost me some debugging time). - It forces critical exceptions to be a special case handled differently from even machine check and debug level exceptions. - It didn't occur to me that it was possible to make this work at all (by doing a final ld r13, PACA_EXCRIT+EX_R13(r13)) until after I made (most of) this patch. :-) It might be worth investigating using a load rather than SPRG on return from all exceptions (except TLB misses where the scratch never leaves the SPRG) -- it could save a few cycles. Until then, let's stick with SPRG for all exceptions. Since we cannot use SPRG4-7 for scratch without corrupting the state of a KVM guest, move VDSO to SPRG7 on book3e. Since neither SPRG4-7 nor critical interrupts exist on book3s, SPRG3 is still used for VDSO there. Signed-off-by: Scott Wood scottw...@freescale.com Cc: Mihai Caraman mihai.cara...@freescale.com Cc: Anton Blanchard an...@samba.org Cc: Paul Mackerras pau...@samba.org Cc: kvm-ppc@vger.kernel.org --- arch/powerpc/include/asm/exception-64e.h| 5 ++--- arch/powerpc/include/asm/kvm_booke_hv_asm.h | 9 + arch/powerpc/include/asm/paca.h | 2 +- arch/powerpc/include/asm/reg.h | 13 ++--- arch/powerpc/kernel/asm-offsets.c | 2 +- arch/powerpc/kernel/exceptions-64e.S| 19 ++- arch/powerpc/kernel/vdso.c | 8 arch/powerpc/kernel/vdso32/getcpu.S | 2 +- arch/powerpc/kernel/vdso64/getcpu.S | 2 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 4 ++-- arch/powerpc/kvm/book3s_interrupts.S| 4 ++-- arch/powerpc/kvm/bookehv_interrupts.S | 10 ++ 12 files changed, 33 insertions(+), 47 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64e.h b/arch/powerpc/include/asm/exception-64e.h index 51fa43e..e73452f 100644 --- a/arch/powerpc/include/asm/exception-64e.h +++ b/arch/powerpc/include/asm/exception-64e.h @@ -46,9 +46,8 @@ #define EX_CR (1 * 8) #define EX_R10 (2 * 8) #define EX_R11 (3 * 8) -#define EX_R13 (4 * 8) -#define EX_R14 (5 * 8) -#define EX_R15 (6 * 8) +#define EX_R14 (4 * 8) +#define EX_R15 (5 * 8) /* * The TLB miss exception uses different slots. diff --git a/arch/powerpc/include/asm/kvm_booke_hv_asm.h b/arch/powerpc/include/asm/kvm_booke_hv_asm.h index 3a79f53..c3e3fd5 100644 --- a/arch/powerpc/include/asm/kvm_booke_hv_asm.h +++ b/arch/powerpc/include/asm/kvm_booke_hv_asm.h @@ -36,20 +36,13 @@ * *(r8 + GPR11) = saved r11 * * 64-bit host - * Expected inputs (GEN/GDBELL/DBG/MC exception types): + * Expected inputs (GEN/GDBELL/DBG/CRIT/MC exception types): * r10 = saved CR * r13 = PACA_POINTER * *(r13 + PACA_EX##type + EX_R10) = saved r10 * *(r13 + PACA_EX##type + EX_R11) = saved r11 * SPRN_SPRG_##type##_SCRATCH = saved r13 * - * Expected inputs (CRIT exception type): - * r10 = saved CR - * r13 = PACA_POINTER - * *(r13 + PACA_EX##type + EX_R10) = saved r10 - * *(r13 + PACA_EX##type + EX_R11) = saved r11 - * *(r13 + PACA_EX##type + EX_R13) = saved r13 - * * Expected inputs (TLB exception type): * r10 = saved CR * r13 = PACA_POINTER diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index 9c5dbc3..948f01a 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -146,7 +146,7 @@ struct paca_struct { u8 io_sync; /* writel() needs spin_unlock sync */ u8 irq_work_pending;/* IRQ_WORK interrupt while soft-disable */ u8 nap_state_lost; /* NV GPR values lost in power7_idle */ - u64 sprg3; /* Saved user-visible sprg */ + u64 sprg_vdso; /* Saved user-visible sprg */ #ifdef CONFIG_PPC_TRANSACTIONAL_MEM u64 tm_scratch; /* TM scratch area for reclaim */ #endif diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 90c06ec..b9ac329 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -577,9 +577,13 @@ #define SPRN_SPRG3 0x113 /* Special Purpose Register General 3 */ #define SPRN_USPRG30x103 /* SPRG3 userspace read */ #define SPRN_SPRG4 0x114 /* Special Purpose Register General 4 */ +#define SPRN_USPRG40x104 /* SPRG4 userspace read */ #define SPRN_SPRG5 0x115 /* Special Purpose Register General 5
[PATCH 06/10] powerpc/booke64: Use SPRG_TLB_EXFRAME on bolted handlers
While bolted handlers (including e6500) do not need to deal with a TLB miss recursively causing another TLB miss, nested TLB misses can still happen with crit/mc/debug exceptions -- so we still need to honor SPRG_TLB_EXFRAME. We don't need to spend time modifying it in the TLB miss fastpath, though -- the special level exception will handle that. Signed-off-by: Scott Wood scottw...@freescale.com Cc: Mihai Caraman mihai.cara...@freescale.com Cc: kvm-ppc@vger.kernel.org --- arch/powerpc/include/asm/exception-64e.h| 10 --- arch/powerpc/include/asm/kvm_booke_hv_asm.h | 8 -- arch/powerpc/kvm/bookehv_interrupts.S | 14 +++-- arch/powerpc/mm/tlb_low_64e.S | 44 ++--- 4 files changed, 45 insertions(+), 31 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64e.h b/arch/powerpc/include/asm/exception-64e.h index e73452f..a563d9af 100644 --- a/arch/powerpc/include/asm/exception-64e.h +++ b/arch/powerpc/include/asm/exception-64e.h @@ -172,16 +172,6 @@ exc_##label##_book3e: ld r9,EX_TLB_R9(r12); \ ld r8,EX_TLB_R8(r12); \ mtlrr16; -#define TLB_MISS_PROLOG_STATS_BOLTED \ - mflrr10;\ - std r8,PACA_EXTLB+EX_TLB_R8(r13); \ - std r9,PACA_EXTLB+EX_TLB_R9(r13); \ - std r10,PACA_EXTLB+EX_TLB_LR(r13); -#define TLB_MISS_RESTORE_STATS_BOLTED \ - ld r16,PACA_EXTLB+EX_TLB_LR(r13); \ - ld r9,PACA_EXTLB+EX_TLB_R9(r13); \ - ld r8,PACA_EXTLB+EX_TLB_R8(r13); \ - mtlrr16; #define TLB_MISS_STATS_D(name) \ addir9,r13,MMSTAT_DSTATS+name; \ bl .tlb_stat_inc; diff --git a/arch/powerpc/include/asm/kvm_booke_hv_asm.h b/arch/powerpc/include/asm/kvm_booke_hv_asm.h index c3e3fd5..e5f048b 100644 --- a/arch/powerpc/include/asm/kvm_booke_hv_asm.h +++ b/arch/powerpc/include/asm/kvm_booke_hv_asm.h @@ -45,10 +45,12 @@ * * Expected inputs (TLB exception type): * r10 = saved CR + * r12 = extlb pointer * r13 = PACA_POINTER - * *(r13 + PACA_EX##type + EX_TLB_R10) = saved r10 - * *(r13 + PACA_EX##type + EX_TLB_R11) = saved r11 - * SPRN_SPRG_GEN_SCRATCH = saved r13 + * *(r12 + EX_TLB_R10) = saved r10 + * *(r12 + EX_TLB_R11) = saved r11 + * *(r12 + EX_TLB_R13) = saved r13 + * SPRN_SPRG_GEN_SCRATCH = saved r12 * * Only the bolted version of TLB miss exception handlers is supported now. */ diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index 99635a3..890e338 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -229,13 +229,20 @@ stw r10, VCPU_CR(r4) PPC_STL r11, VCPU_GPR(R4)(r4) PPC_STL r5, VCPU_GPR(R5)(r4) - mfspr r5, \scratch PPC_STL r6, VCPU_GPR(R6)(r4) PPC_STL r8, VCPU_GPR(R8)(r4) PPC_STL r9, VCPU_GPR(R9)(r4) - PPC_STL r5, VCPU_GPR(R13)(r4) + .if \type == EX_TLB + PPC_LL r5, EX_TLB_R13(r12) + PPC_LL r6, EX_TLB_R10(r12) + PPC_LL r8, EX_TLB_R11(r12) + mfspr r12, \scratch + .else + mfspr r5, \scratch PPC_LL r6, (\paca_ex + \ex_r10)(r13) PPC_LL r8, (\paca_ex + \ex_r11)(r13) + .endif + PPC_STL r5, VCPU_GPR(R13)(r4) PPC_STL r3, VCPU_GPR(R3)(r4) PPC_STL r7, VCPU_GPR(R7)(r4) PPC_STL r12, VCPU_GPR(R12)(r4) @@ -444,6 +451,9 @@ _GLOBAL(kvmppc_resume_host) PPC_STD(r8, VCPU_SHARED_SPRG6, r11) mfxer r3 PPC_STD(r9, VCPU_SHARED_SPRG7, r11) +#ifdef CONFIG_64BIT + mtspr SPRN_SPRG_VDSO_WRITE, r3 +#endif /* save guest MAS registers and restore host mas4 mas6 */ mfspr r5, SPRN_MAS0 diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S index 1e50249..356e8b4 100644 --- a/arch/powerpc/mm/tlb_low_64e.S +++ b/arch/powerpc/mm/tlb_low_64e.S @@ -39,37 +39,49 @@ ** **/ +/* + * Note that, unlike non-bolted handlers, TLB_EXFRAME is not + * modified by the TLB miss handlers themselves, since the TLB miss + * handler code will not itself cause a recursive TLB miss. + * + * TLB_EXFRAME will be modified when crit/mc/debug exceptions are + * entered/exited. + */ .macro tlb_prolog_bolted intnum addr - mtspr SPRN_SPRG_GEN_SCRATCH,r13 + mtspr SPRN_SPRG_GEN_SCRATCH,r12 + mfspr r12,SPRN_SPRG_TLB_EXFRAME
Re: [PATCH v2] KVM: Specify byte order for KVM_EXIT_MMIO
On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote: The KVM API documentation is not clear about the semantics of the data field on the mmio struct on the kvm_run struct. This has become problematic when supporting ARM guests on big-endian host systems with guests of both endianness types, because it is unclear how the data should be exported to user space. This should not break with existing implementations as all supported existing implementations of known user space applications (QEMU and kvmtools for virtio) only support default endianness of the architectures on the host side. Cc: Marc Zyngier marc.zyng...@arm.com Cc: Peter Maydell peter.mayd...@linaro.org Cc: Alexander Graf ag...@suse.de Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- Changes [v1 - v2]: - s/host kernel should/host user space should/ Documentation/virtual/kvm/api.txt | 5 + 1 file changed, 5 insertions(+) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 366bf4b..6dbd68c 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied by kvm. The 'data' member contains the written data if 'is_write' is true, and should be filled by application code otherwise. +The 'data' member byte order is host kernel native endianness, regardless of +the endianness of the guest, and represents the the value as it would go on the +bus in real hardware. The host user space should always be able to do: +type val = *((type *)mmio.data). Host userspace should be able to do that with what results? It would only produce a directly usable value if host endianness is the same as the emulated device's endianness. I'm not sure that host kernel native endianness is an accurate way of describing what currently happens. Regardless of host or guest endianness, the guest should be swapping the value as necessary to ensure that the value that goes on the (real or emulated) bus is the same. Plus, if it were really as it would go on the bus the value wouldn't necessarily be left justified within data[], depending on how the bus works. How about a wording like this: The 'data' member contains, in its first 'len' bytes, the value as it would appear if the guest had accessed memory rather than I/O. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: Specify byte order for KVM_EXIT_MMIO
On Sat, 2014-01-25 at 00:24 +, Peter Maydell wrote: On 24 January 2014 23:51, Scott Wood scottw...@freescale.com wrote: On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote: diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 366bf4b..6dbd68c 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied by kvm. The 'data' member contains the written data if 'is_write' is true, and should be filled by application code otherwise. +The 'data' member byte order is host kernel native endianness, regardless of +the endianness of the guest, and represents the the value as it would go on the +bus in real hardware. The host user space should always be able to do: +type val = *((type *)mmio.data). Host userspace should be able to do that with what results? It would only produce a directly usable value if host endianness is the same as the emulated device's endianness. With the result that it gets the value the CPU has sent out on the bus as the memory transaction. Doesn't that assume the host kernel endianness is the same as the bus (or rather, that the host CPU would not swap such an access before it hits the bus)? If you take the same hardware and boot a little endian host kernel one day, and a big endian host kernel the next, the bus doesn't change, and neither should the bytewise (assuming address invariance) contents of data[]. How data[] would look when read as a larger integer would of course change -- but that's due to how you're reading it. It's clear to say that a value in memory has been stored there in host endianness when the value is as you would want to see it in a CPU register, but it's less clear when you talk about it relative to values on a bus. It's harder to correlate that to something that is software visible. I don't think there's any actual technical difference between your wording and mine when each wording is properly interpreted, but I suspect my wording is less likely to be misinterpreted (I could be wrong). Obviously if what userspace is emulating is a bus which has a byteswapping bridge or if it's being helpful to device emulation by providing here's the value even though you think you're wired up backwards then it needs to byteswap. Whether the emulated bus has a byteswapping bridge doesn't sound like something that depends on the endianness that the host CPU is currently running in. How about a wording like this: The 'data' member contains, in its first 'len' bytes, the value as it would appear if the guest had accessed memory rather than I/O. I think this is confusing, because now userspace authors have to figure out how to get back to value X of size Y at address Z by interpreting this text... Can you write out the equivalent of Christoffer's text here's how you get the memory transaction value for what you want? Userspace swaps the value if and only if userspace's endianness differs from the endianness with which the device interprets the data (regardless of whether said interpretation is considered natural or swapped relative to the way the bus is documented). It's similar to how userspace would handle emulating DMA. KVM swaps the value if and only if the endianness of the guest access differs from that of the host, i.e. if it would have done swapping when emulating an ordinary memory access. (Also, value as it would appear to who?) As it would appear to anyone. It works because data[] actually is memory. Any difference in how data appears based on the reader's context would already be reflected when the reader performs the load. I think your wording implies that the order of bytes in data[] depend on the guest CPU usual byte order, ie the order which the CPU does not do a byte-lane-swap for (LE for ARM, BE for PPC), and it would mean it would come out differently from my/Alex/Christoffer's proposal if the host kernel was the opposite endianness from that usual order. It doesn't depend on usual anything. The only thing it implicitly says about guest byte order is that it's KVM's job to implement any swapping if the endianness of the guest access is different from the endianness of the host kernel access (whether it's due to the guest's mode, the way a page is mapped, the instruction used, etc). Finally, I think it's a bit confusing in that as if the guest had accessed memory is assigning implicit semantics to memory in the emulated system, when memory is actually kind of outside KVM's purview because it's not part of the CPU. That's sort of the point. It defines it in a way that is independent of the CPU, and thus independent of what endianness the CPU operates in. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More
[PATCH] kvm: Provide kvm_vcpu_eligible_for_directed_yield() stub
Commit 7940876e1330671708186ac3386aa521ffb5c182 (kvm: make local functions static) broke KVM PPC builds due to removing (rather than moving) the stub version of kvm_vcpu_eligible_for_directed_yield(). This patch reintroduces it. Signed-off-by: Scott Wood scottw...@freescale.com Cc: Stephen Hemminger step...@networkplumber.org Cc: Alexander Graf ag...@suse.de --- virt/kvm/kvm_main.c | 5 + 1 file changed, 5 insertions(+) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4733fa1..f896665 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1771,6 +1771,11 @@ static bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu) return eligible; } +#else +static bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu) +{ + return true; +} #endif void kvm_vcpu_on_spin(struct kvm_vcpu *me) -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] kvm/ppc: IRQ disabling cleanup
On Sun, 2013-12-29 at 16:43 +0100, Alexander Graf wrote: On 11.07.2013, at 00:47, Scott Wood scottw...@freescale.com wrote: diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index ddfaf56..c13caa0 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -884,14 +884,11 @@ program_interrupt: * and if we really did time things so badly, then we just exit * again due to a host external interrupt. */ - local_irq_disable(); s = kvmppc_prepare_to_enter(vcpu); - if (s = 0) { - local_irq_enable(); + if (s = 0) r = s; - } else { + else kvmppc_fix_ee_before_entry(); - } } Please add a comment here stating that interrupts are hard disabled at this point. OK. diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 4e05f8c..2f7a221 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -64,12 +64,14 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu) { int r = 1; - WARN_ON_ONCE(!irqs_disabled()); + WARN_ON(irqs_disabled()); + hard_irq_disable(); + while (true) { if (need_resched()) { local_irq_enable(); One thing I find reasonably tricky in this approach is that we run local_irq_enable, but hard_irq_disable. I did dig through the code and it should work just fine, but I think we should make sure to note that this is intended and doesn't just work by accident. Just add a comment above the first call to hard_irq_disable() that explains that local_irq_enable() will properly unroll hard_irq_disable. That way the next person reading this doesn't have to dig as deeply. There is no hard_irq_enable() -- only __hard_irq_enable() that doesn't update local_paca-irq_happened. This is normal use of the API. If there does need to be a comment, it should go in hw_irq.h. :-) #ifdef CONFIG_PPC64 - /* lazy EE magic */ - hard_irq_disable(); - if (lazy_irq_pending()) { - /* Got an interrupt in between, try again */ - local_irq_enable(); - local_irq_disable(); - kvm_guest_exit(); - continue; - } + WARN_ON(lazy_irq_pending()); #endif + /* Can't use irqs_disabled() because we want hard irq state */ + WARN_ON(mfmsr() MSR_EE); The reason for lazy EE is that mfmsr() is supposed to be slow. Not just mtmsr? And when I complained about wrtee not being all that slow on our hardware, Ben said it was also for perf coverage. :-) Couldn't we check for the internal hard disable flag instead? Just create a new helper in hw_irq.h that tells us local_paca-irq_happened PACA_IRQ_HARD_DIS on PPC64 and !(mfmsr() MSR_EE) on PPC32. We can then just use that helper here and not run expensive mfmsr() calls. I'm not sure whether it's actually measurable overhead in the context of the whole world switch, but why diverge from the rest of the system? If you think a new helper is overkill, I'd be fine with a simple #ifdef here just as well. I'd hope a mfmsr wouldn't be so slow on any hardware as to be a big deal here, though it'd also be nice if there were a Linux-wide way of specifying whether a particular WARN should be always checked, or only if the kernel is built with some debug option... The rest of the system is irqs_disabled() and there's already a comment explaining why we diverge from that. Maybe we should just remove that check; we'd still at least have the 64-bit check in kvmppc_fix_ee_before_entry. kvm_guest_enter(); - break; + return r; This must be 0 at this point, right? No, it'll be 1. Either way, I prefer to keep the code easily understandable. How about you keep the break and just add an if (r) around the local_irq_enable() below? Then add a comment stating that the return value tells us whether entry is ok to proceed and interrupts are disabled (0) or something didn't work out and interrupts are enabled (!0). How is it more understandable to overload what looks like a single code path with divergent cases that have little to nothing in common? Would it help to add a comment on the return-to-host code path indicating that it is only used for returning to the host? I'd be fine with replacing return r with return 1 (and getting rid of the initialization of r at the top of the function, unless GCC complains inappropriately). -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc/kvm/booke: Fix build break due to stack frame size warning
On Mon, 2013-11-25 at 04:26 -0600, Bharat Bhushan wrote: -Original Message- From: Wood Scott-B07421 Sent: Saturday, November 23, 2013 3:22 AM To: Alexander Graf Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Wood Scott-B07421; Bhushan Bharat-R65777 Subject: [PATCH] powerpc/kvm/booke: Fix build break due to stack frame size warning Commit ce11e48b7fdd256ec68b932a89b397a790566031 (KVM: PPC: E500: Add userspace debug stub support) added struct thread_struct to the stack of kvmppc_vcpu_run(). thread_struct is 1152 bytes on my build, compared to 48 bytes for the recently-introduced struct debug_reg. Use the latter instead. This fixes the following error: cc1: warnings being treated as errors arch/powerpc/kvm/booke.c: In function 'kvmppc_vcpu_run': arch/powerpc/kvm/booke.c:760:1: error: the frame size of 1424 bytes is larger than 1024 bytes make[2]: *** [arch/powerpc/kvm/booke.o] Error 1 make[1]: *** [arch/powerpc/kvm] Error 2 make[1]: *** Waiting for unfinished jobs Signed-off-by: Scott Wood scottw...@freescale.com Cc: Bharat Bhushan r65...@freescale.com --- Build tested only. Bharat, please test. Tested with qemu debug stub; It works fine -Bharat Alex, are you going to take this through your tree? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v5][PATCH] KVM: PPC: Book3E HV: call RECONCILE_IRQ_STATE to sync the software state
On Tue, 2013-11-19 at 10:51 +0100, Alexander Graf wrote: Am 19.11.2013 um 00:49 schrieb Scott Wood scottw...@freescale.com: On Mon, 2013-11-18 at 16:09 -0500, Alexander Graf wrote: On 18.11.2013, at 03:34, “tiejun.chen” tiejun.c...@windriver.com wrote: On 10/23/2013 09:26 AM, Tiejun Chen wrote: We enter with interrupts disabled in hardware, but we need to call RECONCILE_IRQ_STATE anyway to ensure that the software state is kept in sync instead of calling hard_irq_disable() directly. Why didn't this happen before? What is this patch fixing? It's cleanup, not a fix. It makes things more consistent with other 64-bit kernel entry code. Could we please note this in the comit message so that whoever stumbles over the patch later knows that this is effectively a no-op (and just prepones the lazy sync)? Also, I'm still wary of lazy breakage in the pr code path, but I guess since it's completely untested today already it's ok to ignore. PR doesn't support 64-bit at all (e.g. it's all stw/lwz rather than some word-size abstraction). In the unlikely event that lazy EE is ever extended to 32-bit, then fixing up booke_interrupts.S would be just one part of the task of updating all the 32-bit asm (e.g. you'll find RECONCILE_IRQ_STATE in entry_64.S but not entry_32.S). -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4 v3] kvm: powerpc: use cache attributes from linux pte
On Fri, 2013-11-15 at 11:01 +0530, Bharat Bhushan wrote: From: Bharat Bhushan bharat.bhus...@freescale.com v2-v3 - Returning pte pointer rather than pte as suggested by Scott - Also removed PAGE_SPLITTING as this need to be handled by caller v1-v2 - Removed _PAGE_BUSY loop as suggested by PaulS. - Added check for PAGE_SPLITTING kvm: powerpc: use cache attributes from linux pte - 1st Patch fixes a bug in booke (detail in patch) - 2nd patch is renaming the linux_pte_lookup_function() just for clarity. There is not functional change. - 3nd Patch adds a Linux pte lookup function. - 4th Patch uses the above defined function and setup TLB.wimg accordingly Bharat Bhushan (4): kvm: booke: clear host tlb reference flag on guest tlb invalidation kvm: book3s: rename lookup_linux_pte() to lookup_linux_pte_and_update() kvm: powerpc: define a linux pte lookup function kvm: powerpc: use caching attributes as per linux pte arch/powerpc/include/asm/kvm_host.h |2 +- arch/powerpc/include/asm/pgtable.h | 21 + arch/powerpc/kvm/book3s_hv_rm_mmu.c |8 +++-- arch/powerpc/kvm/booke.c|1 + arch/powerpc/kvm/e500.h |8 +++-- arch/powerpc/kvm/e500_mmu_host.c| 55 +++--- 6 files changed, 64 insertions(+), 31 deletions(-) Reviewed-by: Scott Wood scottw...@freescale.com with patch 4/4 replaced by v4. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v5][PATCH] KVM: PPC: Book3E HV: call RECONCILE_IRQ_STATE to sync the software state
On Mon, 2013-11-18 at 16:09 -0500, Alexander Graf wrote: On 18.11.2013, at 03:34, “tiejun.chen” tiejun.c...@windriver.com wrote: On 10/23/2013 09:26 AM, Tiejun Chen wrote: We enter with interrupts disabled in hardware, but we need to call RECONCILE_IRQ_STATE anyway to ensure that the software state is kept in sync instead of calling hard_irq_disable() directly. Why didn't this happen before? What is this patch fixing? It's cleanup, not a fix. It makes things more consistent with other 64-bit kernel entry code. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4 v3] kvm: powerpc: use caching attributes as per linux pte
On Fri, 2013-11-15 at 11:01 +0530, Bharat Bhushan wrote: @@ -440,9 +437,9 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, } if (likely(!pfnmap)) { - unsigned long tsize_pages = 1 (tsize + 10 - PAGE_SHIFT); + tsize_pages = 1 (tsize + 10 - PAGE_SHIFT); pfn = gfn_to_pfn_memslot(slot, gfn); - if (is_error_noslot_pfn(pfn)) { + if (is_error_noslot_pfn(pfn) printk_ratelimit()) { printk(KERN_ERR Couldn't get real page for gfn %lx!\n, (long)gfn); return -EINVAL; The return -EINVAL shouldn't be affected by printk_ratelimit(). @@ -453,7 +450,17 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, gvaddr = ~((tsize_pages PAGE_SHIFT) - 1); } - kvmppc_e500_ref_setup(ref, gtlbe, pfn); + + pgdir = vcpu_e500-vcpu.arch.pgdir; + ptep = lookup_linux_ptep(pgdir, hva, tsize_pages); + if (pte_present(*ptep)) + wimg = (*ptep PTE_WIMGE_SHIFT) MAS2_WIMGE_MASK; + else if (printk_ratelimit()) { + printk(KERN_ERR %s: pte not present: gfn %lx, pfn %lx\n, + __func__, (long)gfn, pfn); + return -EINVAL; + } Likewise. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PULL 34/51] powerpc: move debug registers in a structure
On Mon, 2013-11-04 at 07:56 +1100, Benjamin Herrenschmidt wrote: On Sun, 2013-11-03 at 16:30 +0200, Gleb Natapov wrote: On Thu, Oct 31, 2013 at 10:18:19PM +0100, Alexander Graf wrote: From: Bharat Bhushan r65...@freescale.com This way we can use same data type struct with KVM and also help in using other debug related function. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com Signed-off-by: Alexander Graf ag...@suse.de It would be nice to have PPC maintainers (CCed) ACKs here. This patch also has merging conflicts with cbc9565e (should be easy to resolve) Is it easier if I put it in powerpc -next and resolve the conflicts ? I was ok with the patch but yes, I should probably have put it in a topic branch in the first place and merge it both in my tree and Alex. It's already in your -next via my tree (sha1 51ae8d4a2b9e4aa9a502061b9c39168e08829b94). I didn't realize Alex was planning on applying it. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v4][PATCH] KVM: PPC: Book3E HV: call RECONCILE_IRQ_STATE to sync the software state
On Mon, 2013-10-21 at 17:57 +0800, Tiejun Chen wrote: +#ifdef CONFIG_64BIT + /* + * We enter with interrupts disabled in hardware, but + * we need to call SOFT_DISABLE_INTS anyway to ensure + * that the software state is kept in sync. + */ + RECONCILE_IRQ_STATE(r3,r5) +#endif s/SOFT_DISABLE_INTS/RECONCILE_IRQ_STATE/ in the comment. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] kvm: powerpc: define a linux pte lookup function
On Wed, 2013-10-09 at 03:48 -0500, Bhushan Bharat-R65777 wrote: -Original Message- From: Wood Scott-B07421 Sent: Wednesday, October 09, 2013 3:07 AM To: Bhushan Bharat-R65777 Cc: ag...@suse.de; Yoder Stuart-B08248; k...@vger.kernel.org; kvm- p...@vger.kernel.org; pau...@samba.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 3/4] kvm: powerpc: define a linux pte lookup function On Tue, 2013-10-08 at 11:33 +0530, Bharat Bhushan wrote: We need to search linux pte to get pte attributes for setting TLB in KVM. This patch defines a linux_pte_lookup() function for same. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/pgtable.h | 35 +++ 1 files changed, 35 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index 7d6eacf..fd26c04 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -223,6 +223,41 @@ extern int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, #endif pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, unsigned *shift); + +static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva, + unsigned long *pte_sizep) +{ + pte_t *ptep; + pte_t pte; + unsigned long ps = *pte_sizep; + unsigned int shift; + + ptep = find_linux_pte_or_hugepte(pgdir, hva, shift); + if (!ptep) + return __pte(0); + if (shift) + *pte_sizep = 1ul shift; + else + *pte_sizep = PAGE_SIZE; + + if (ps *pte_sizep) + return __pte(0); + + /* wait until _PAGE_BUSY is clear */ + while (1) { + pte = pte_val(*ptep); + if (unlikely(pte _PAGE_BUSY)) { + cpu_relax(); + continue; + } + } + + /* If pte is not present return None */ + if (unlikely(!(pte _PAGE_PRESENT))) + return __pte(0); + + return pte; +} Can lookup_linux_pte_and_update() call lookup_linux_pte()? What lookup_linux_pte_and_update() does:- - find_linux_pte_or_hugepte() - does size and some other trivial checks - Then atomically update the pte:- = while() = wait till _PAGE_BUSY is clear = atomically update the pte = if not updated then go back to while() above else break While what lookup_linux_pte() does:- - find_linux_pte_or_hugepte() - does size and some other trivial checks - wait till _PAGE_BUSY is clear - return pte I am finding it difficult to call lookup_linux_pte() from lookup_linux_pte_and_update(). You could factor out a common lookup_linux_ptep(). -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] kvm: powerpc: define a linux pte lookup function
On Tue, 2013-10-08 at 11:33 +0530, Bharat Bhushan wrote: We need to search linux pte to get pte attributes for setting TLB in KVM. This patch defines a linux_pte_lookup() function for same. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/pgtable.h | 35 +++ 1 files changed, 35 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index 7d6eacf..fd26c04 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -223,6 +223,41 @@ extern int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, #endif pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, unsigned *shift); + +static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva, + unsigned long *pte_sizep) +{ + pte_t *ptep; + pte_t pte; + unsigned long ps = *pte_sizep; + unsigned int shift; + + ptep = find_linux_pte_or_hugepte(pgdir, hva, shift); + if (!ptep) + return __pte(0); + if (shift) + *pte_sizep = 1ul shift; + else + *pte_sizep = PAGE_SIZE; + + if (ps *pte_sizep) + return __pte(0); + + /* wait until _PAGE_BUSY is clear */ + while (1) { + pte = pte_val(*ptep); + if (unlikely(pte _PAGE_BUSY)) { + cpu_relax(); + continue; + } + } + + /* If pte is not present return None */ + if (unlikely(!(pte _PAGE_PRESENT))) + return __pte(0); + + return pte; +} Can lookup_linux_pte_and_update() call lookup_linux_pte()? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2 v2] kvm/powerpc: rename kvm_hypercall() to epapr_hypercall()
On Mon, 2013-10-07 at 22:23 +0530, Bharat Bhushan wrote: kvm_hypercall() have nothing KVM specific, so renamed to epapr_hypercall(). Also this in moved to arch/powerpc/include/asm/epapr_hcalls.h Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- v1-v2 - epapr_hypercall() is always defined and returns EV_UNIMPLEMENTED when CONFIG_KVM_GUEST or CONFIG_EPAPR_PARAVIRT not defined. arch/powerpc/include/asm/epapr_hcalls.h | 46 +++ arch/powerpc/include/asm/kvm_para.h | 23 --- arch/powerpc/kernel/kvm.c | 41 +-- 3 files changed, 54 insertions(+), 56 deletions(-) diff --git a/arch/powerpc/include/asm/epapr_hcalls.h b/arch/powerpc/include/asm/epapr_hcalls.h index d3d6342..6b8e007 100644 --- a/arch/powerpc/include/asm/epapr_hcalls.h +++ b/arch/powerpc/include/asm/epapr_hcalls.h @@ -454,5 +454,51 @@ static inline unsigned int ev_idle(void) return r3; } + +#if defined(CONFIG_KVM_GUEST) || defined(CONFIG_EPAPR_PARAVIRT) CONFIG_KVM_GUEST implies CONFIG_EPAPR_PARAVIRT, so you only need to check for the latter. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] kvm/ppc: IRQ disabling cleanup
On Fri, 2013-09-06 at 01:06 +0200, Alexander Graf wrote: On 06.09.2013, at 00:09, Scott Wood wrote: On Thu, 2013-07-11 at 01:09 +0200, Alexander Graf wrote: On 11.07.2013, at 01:08, Scott Wood wrote: On 07/10/2013 06:04:53 PM, Alexander Graf wrote: On 11.07.2013, at 01:01, Benjamin Herrenschmidt wrote: On Thu, 2013-07-11 at 00:57 +0200, Alexander Graf wrote: #ifdef CONFIG_PPC64 + /* + * To avoid races, the caller must have gone directly from having + * interrupts fully-enabled to hard-disabled. + */ + WARN_ON(local_paca-irq_happened != PACA_IRQ_HARD_DIS); WARN_ON(lazy_irq_pending()); ? Different semantics. What you propose will not catch irq_happened == 0 :-) Right, but we only ever reach here after hard_irq_disable() I think. And the WARN_ON helps us ensure that it stays that way. Heh - ok :). Works for me. What's the status on this patch? IIUC it was ok. Ben, could you please verify? ping -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] kvm/powerpc: rename kvm_hypercall() to epapr_hypercall()
On Wed, 2013-10-02 at 16:19 +0200, Alexander Graf wrote: On 23.09.2013, at 07:23, Bharat Bhushan wrote: static inline long kvm_hypercall0_1(unsigned int nr, unsigned long *r2) @@ -65,7 +54,7 @@ static inline long kvm_hypercall0_1(unsigned int nr, unsigned long *r2) unsigned long out[8]; unsigned long r; - r = kvm_hypercall(in, out, KVM_HCALL_TOKEN(nr)); + r = epapr_hypercall(in, out, KVM_HCALL_TOKEN(nr)); Won't this break when CONFIG_EPAPR_PARAVIRT=n? We wouldn't have epapr_hcalls.S compiled into the code base then and the bl above would reference an unknown function. KVM_GUEST selects EPAPR_PARAVIRT. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] kvm/powerpc: rename kvm_hypercall() to epapr_hypercall()
On Wed, 2013-10-02 at 18:53 +0200, Alexander Graf wrote: On 02.10.2013, at 18:40, Scott Wood wrote: On Wed, 2013-10-02 at 16:19 +0200, Alexander Graf wrote: On 23.09.2013, at 07:23, Bharat Bhushan wrote: static inline long kvm_hypercall0_1(unsigned int nr, unsigned long *r2) @@ -65,7 +54,7 @@ static inline long kvm_hypercall0_1(unsigned int nr, unsigned long *r2) unsigned long out[8]; unsigned long r; - r = kvm_hypercall(in, out, KVM_HCALL_TOKEN(nr)); + r = epapr_hypercall(in, out, KVM_HCALL_TOKEN(nr)); Won't this break when CONFIG_EPAPR_PARAVIRT=n? We wouldn't have epapr_hcalls.S compiled into the code base then and the bl above would reference an unknown function. KVM_GUEST selects EPAPR_PARAVIRT. But you can not select KVM_GUEST and still call these inline functions, no? No. Like kvm_arch_para_features(). Where does that get called without KVM_GUEST? How would that work currently, with the call to kvm_hypercall() in arch/powerpc/kernel/kvm.c (which calls epapr_hypercall, BTW)? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html