Re: [PATCH] powerpc/mm/hugetlb: Use the correct page size when flushing hugepage tlb
On 05/29/2018 07:39 AM, Nicholas Piggin wrote: On Tue, 29 May 2018 11:33:56 +1000 Michael Ellerman wrote: "Aneesh Kumar K.V" writes: We used wrong page size (mmu_virtual_psize) when doing a tlbflush after pte update. This patch update the flush to use hugetlb page size. The page size is derived from hugetlb hstate. This sounds bad. Or is it not for some reason? It's not all that bad because the flush is mostly superfluous (one of my tlbie optimisation patches gets rid of it except for accelerators). Either way a Fixes tag would be nice. Maybe: Fixes: b3603e174fc8 ("powerpc/mm: update radix__ptep_set_access_flag to not do full mm tlb flush") I think this is only a problem on Radix, but the change log doesn't say. huge_ptep_set_access_flags->ptep_set_access_flags->flush_tlb_page-> void radix__flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr) { #ifdef CONFIG_HUGETLB_PAGE if (is_vm_hugetlb_page(vma)) return radix__flush_hugetlb_page(vma, vmaddr); #endif radix__flush_tlb_page_psize(vma->vm_mm, vmaddr, mmu_virtual_psize); } So I'm still not sure how the size is going wrong here. What am I missig? My mistake, I didn't look at the radix expansion. I was looking at huge_ptep_clear_flush() where we use flush_hugetlb_page(). -aneesh
Re: [PATCH] powerpc/mm/hugetlb: Use the correct page size when flushing hugepage tlb
On Tue, 29 May 2018 11:33:56 +1000 Michael Ellerman wrote: > "Aneesh Kumar K.V" writes: > > > We used wrong page size (mmu_virtual_psize) when doing a tlbflush after > > pte update. This patch update the flush to use hugetlb page size. > > The page size is derived from hugetlb hstate. > > This sounds bad. Or is it not for some reason? It's not all that bad because the flush is mostly superfluous (one of my tlbie optimisation patches gets rid of it except for accelerators). > > Either way a Fixes tag would be nice. Maybe: > > Fixes: b3603e174fc8 ("powerpc/mm: update radix__ptep_set_access_flag to not > do full mm tlb flush") > > I think this is only a problem on Radix, but the change log doesn't say. huge_ptep_set_access_flags->ptep_set_access_flags->flush_tlb_page-> void radix__flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr) { #ifdef CONFIG_HUGETLB_PAGE if (is_vm_hugetlb_page(vma)) return radix__flush_hugetlb_page(vma, vmaddr); #endif radix__flush_tlb_page_psize(vma->vm_mm, vmaddr, mmu_virtual_psize); } So I'm still not sure how the size is going wrong here. What am I missig? Thanks, Nick
Re: [PATCH-RESEND] cxl: Disable prefault_mode in Radix mode
Michael Ellerman writes: > Vaibhav Jain writes: > >> From: Vaibhav Jain >> >> Currently we see a kernel-oops reported on Power-9 while attaching a >> context to an AFU, with radix-mode and sysfs attr 'prefault_mode' set >> to anything other than 'none'. The backtrace of the oops is of this >> form: >> >> Unable to handle kernel paging request for data at address 0x0080 >> Faulting instruction address: 0xc0080bcf3b20 >> cpu 0x1: Vector: 300 (Data Access) at [c0037f003800] >> pc: c0080bcf3b20: cxl_load_segment+0x178/0x290 [cxl] >> lr: c0080bcf39f0: cxl_load_segment+0x48/0x290 [cxl] >> sp: c0037f003a80 >>msr: 90009033 >>dar: 80 >> dsisr: 4000 >> current = 0xc0037f28 >> paca= 0xc003e600 softe: 3irq_happened: 0x01 >> pid = 3529, comm = afp_no_int >> >> [c0037f003af0] c0080bcf4424 cxl_prefault+0xfc/0x248 [cxl] >> [c0037f003b50] c0080bcf8a40 process_element_entry_psl9+0xd8/0x1a0 >> [cxl] >> [c0037f003b90] c0080bcf944c >> cxl_attach_dedicated_process_psl9+0x44/0x130 [cxl] >> [c0037f003bd0] c0080bcf5448 native_attach_process+0xc0/0x130 [cxl] >> [c0037f003c50] c0080bcf16cc afu_ioctl+0x3f4/0x5e0 [cxl] >> [c0037f003d00] c039d98c do_vfs_ioctl+0xdc/0x890 >> [c0037f003da0] c039e1a8 ksys_ioctl+0x68/0xf0 >> [c0037f003df0] c039e270 sys_ioctl+0x40/0xa0 >> [c0037f003e30] c000b320 system_call+0x58/0x6c >> --- Exception: c01 (System Call) at 10053bb0 > ^^^ > This tells patch/git-am to drop the rest of the change log, which is not > what we want. > > I tend to indent stack traces etc with two spaces, which avoids the > problem. Or in this case we can just drop the line as it's not really > that informative. > > I've fixed it up. > > cheers > Sorry for missing that and thanks for fixing it in the patch. -- Vaibhav Jain Linux Technology Center, IBM India Pvt. Ltd.
[PATCH] printk: make printk_safe_flush safe in NMI context
From: Hoeun Ryu Make printk_safe_flush() safe in NMI context. And printk_safe_flush_on_panic() is folded into this function. The prototype of printk_safe_flush() is changed to "void printk_safe_flush(bool panic)". nmi_trigger_cpumask_backtrace() can be called in NMI context. For example the function is called in watchdog_overflow_callback() if the flag of hardlockup backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and watchdog_overflow_callback() function is called in NMI context on some architectures. Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually tries to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be already locked in preempted contexts (task or irq in this case) or by other CPUs and it may cause deadlocks. By making printk_safe_flush() safe in NMI context, the backtrace triggering CPU just skips flushing if the lock is not avaiable in NMI context. The messages in per-cpu nmi buffer of the backtrace triggering CPU can be lost if the CPU is in hard lockup (because irq is disabled here) but if panic() is not called. The flushing can be delayed by the next irq work in normal cases. Suggested-by: Sergey Senozhatsky Signed-off-by: Hoeun Ryu --- arch/powerpc/kernel/traps.c| 2 +- arch/powerpc/kernel/watchdog.c | 2 +- include/linux/printk.h | 9 ++ kernel/kexec_core.c| 2 +- kernel/panic.c | 4 +-- kernel/printk/printk_safe.c| 62 +- lib/nmi_backtrace.c| 2 +- 7 files changed, 39 insertions(+), 44 deletions(-) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 0904492..c50749c 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -160,7 +160,7 @@ extern void panic_flush_kmsg_start(void) extern void panic_flush_kmsg_end(void) { - printk_safe_flush_on_panic(); + printk_safe_flush(true); kmsg_dump(KMSG_DUMP_PANIC); bust_spinlocks(0); debug_locks_off(); diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c index 6256dc3..3c9138b 100644 --- a/arch/powerpc/kernel/watchdog.c +++ b/arch/powerpc/kernel/watchdog.c @@ -173,7 +173,7 @@ static void watchdog_smp_panic(int cpu, u64 tb) wd_smp_unlock(); - printk_safe_flush(); + printk_safe_flush(false); /* * printk_safe_flush() seems to require another print * before anything actually goes out to console. diff --git a/include/linux/printk.h b/include/linux/printk.h index 6d7e800..495fe26 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -203,8 +203,7 @@ void dump_stack_print_info(const char *log_lvl); void show_regs_print_info(const char *log_lvl); extern asmlinkage void dump_stack(void) __cold; extern void printk_safe_init(void); -extern void printk_safe_flush(void); -extern void printk_safe_flush_on_panic(void); +extern void printk_safe_flush(bool panic); #else static inline __printf(1, 0) int vprintk(const char *s, va_list args) @@ -273,11 +272,7 @@ static inline void printk_safe_init(void) { } -static inline void printk_safe_flush(void) -{ -} - -static inline void printk_safe_flush_on_panic(void) +static inline void printk_safe_flush(bool panic) { } #endif diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 20fef1a..1b0876e 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -961,7 +961,7 @@ void crash_kexec(struct pt_regs *regs) old_cpu = atomic_cmpxchg(_cpu, PANIC_CPU_INVALID, this_cpu); if (old_cpu == PANIC_CPU_INVALID) { /* This is the 1st CPU which comes here, so go ahead. */ - printk_safe_flush_on_panic(); + printk_safe_flush(true); __crash_kexec(regs); /* diff --git a/kernel/panic.c b/kernel/panic.c index 42e4874..2f2c86c 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -193,7 +193,7 @@ void panic(const char *fmt, ...) * Bypass the panic_cpu check and call __crash_kexec directly. */ if (!_crash_kexec_post_notifiers) { - printk_safe_flush_on_panic(); + printk_safe_flush(true); __crash_kexec(NULL); /* @@ -218,7 +218,7 @@ void panic(const char *fmt, ...) atomic_notifier_call_chain(_notifier_list, 0, buf); /* Call flush even twice. It tries harder with a single online CPU */ - printk_safe_flush_on_panic(); + printk_safe_flush(true); kmsg_dump(KMSG_DUMP_PANIC); /* diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 3e3c200..35ea941 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -244,49 +244,49 @@ static void __printk_safe_flush(struct irq_work *work) } /** - * printk_safe_flush - flush all per-cpu nmi buffers. + * printk_safe_flush - flush all per-cpu nmi buffers. it can be called even in NMI + *
Re: [PATCH] cpuidle/powernv : Add Description for cpuidle state
Abhishek Goel writes: > @@ -215,7 +216,7 @@ static inline void add_powernv_state(int index, const > char *name, >u64 psscr_val, u64 psscr_mask) > { > strlcpy(powernv_states[index].name, name, CPUIDLE_NAME_LEN); > - strlcpy(powernv_states[index].desc, name, CPUIDLE_NAME_LEN); > + strlcpy(powernv_states[index].desc, desc, CPUIDLE_DESC_LEN); We should still fall back to using name in the event of desc being null, as not all firmware will expose the descriptions. > @@ -311,6 +313,11 @@ static int powernv_add_idle_states(void) > pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-names in > DT\n"); > goto out; > } > + if (of_property_read_string_array(power_mgt, > + "ibm,cpu-idle-state-descs", descs, dt_idle_states) < 0) { > + pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-descs in > DT\n"); > + goto out; > + } I don't think pr_warn is appropriate here, as for all current released firmware we don't have that property. I think perhaps just silently continuing on is okay, as we have to keep compatibility with that firmware. > --- a/include/linux/cpuidle.h > +++ b/include/linux/cpuidle.h > @@ -17,7 +17,7 @@ > > #define CPUIDLE_STATE_MAX10 > #define CPUIDLE_NAME_LEN 16 > -#define CPUIDLE_DESC_LEN 32 > +#define CPUIDLE_DESC_LEN 60 Do we really get that long? -- Stewart Smith OPAL Architect, IBM.
Re: [PATCH] powerpc/mm/hugetlb: Use the correct page size when flushing hugepage tlb
"Aneesh Kumar K.V" writes: > We used wrong page size (mmu_virtual_psize) when doing a tlbflush after > pte update. This patch update the flush to use hugetlb page size. > The page size is derived from hugetlb hstate. This sounds bad. Or is it not for some reason? Either way a Fixes tag would be nice. Maybe: Fixes: b3603e174fc8 ("powerpc/mm: update radix__ptep_set_access_flag to not do full mm tlb flush") I think this is only a problem on Radix, but the change log doesn't say. cheers > Now that ptep_set_access_flags won't be called for hugetlb remove > the is_vm_hugetlb_page() check and add the assert of pte lock > unconditionally. > > Signed-off-by: Aneesh Kumar K.V > --- > arch/powerpc/include/asm/hugetlb.h | 19 +++-- > arch/powerpc/mm/pgtable.c | 33 -- > 2 files changed, 34 insertions(+), 18 deletions(-) > > diff --git a/arch/powerpc/include/asm/hugetlb.h > b/arch/powerpc/include/asm/hugetlb.h > index 78540c074d70..b4404a6da74f 100644 > --- a/arch/powerpc/include/asm/hugetlb.h > +++ b/arch/powerpc/include/asm/hugetlb.h > @@ -166,22 +166,9 @@ static inline pte_t huge_pte_wrprotect(pte_t pte) > return pte_wrprotect(pte); > } > > -static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma, > - unsigned long addr, pte_t *ptep, > - pte_t pte, int dirty) > -{ > -#ifdef HUGETLB_NEED_PRELOAD > - /* > - * The "return 1" forces a call of update_mmu_cache, which will write a > - * TLB entry. Without this, platforms that don't do a write of the TLB > - * entry in the TLB miss handler asm will fault ad infinitum. > - */ > - ptep_set_access_flags(vma, addr, ptep, pte, dirty); > - return 1; > -#else > - return ptep_set_access_flags(vma, addr, ptep, pte, dirty); > -#endif > -} > +extern int huge_ptep_set_access_flags(struct vm_area_struct *vma, > + unsigned long addr, pte_t *ptep, > + pte_t pte, int dirty); > > static inline pte_t huge_ptep_get(pte_t *ptep) > { > diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c > index 9f361ae571e9..e70af9939379 100644 > --- a/arch/powerpc/mm/pgtable.c > +++ b/arch/powerpc/mm/pgtable.c > @@ -221,14 +221,43 @@ int ptep_set_access_flags(struct vm_area_struct *vma, > unsigned long address, > entry = set_access_flags_filter(entry, vma, dirty); > changed = !pte_same(*(ptep), entry); > if (changed) { > - if (!is_vm_hugetlb_page(vma)) > - assert_pte_locked(vma->vm_mm, address); > + assert_pte_locked(vma->vm_mm, address); > __ptep_set_access_flags(vma->vm_mm, ptep, entry, address); > flush_tlb_page(vma, address); > } > return changed; > } > > +#ifdef CONFIG_HUGETLB_PAGE > +extern int huge_ptep_set_access_flags(struct vm_area_struct *vma, > + unsigned long addr, pte_t *ptep, > + pte_t pte, int dirty) > +{ > +#ifdef HUGETLB_NEED_PRELOAD > + /* > + * The "return 1" forces a call of update_mmu_cache, which will write a > + * TLB entry. Without this, platforms that don't do a write of the TLB > + * entry in the TLB miss handler asm will fault ad infinitum. > + */ > + ptep_set_access_flags(vma, addr, ptep, pte, dirty); > + return 1; > +#else > + int changed; > + > + pte = set_access_flags_filter(pte, vma, dirty); > + changed = !pte_same(*(ptep), pte); > + if (changed) { > +#ifdef CONFIG_DEBUG_VM > + assert_spin_locked(>vm_mm->page_table_lock); > +#endif > + __ptep_set_access_flags(vma->vm_mm, ptep, pte, addr); > + flush_hugetlb_page(vma, addr); > + } > + return changed; > +#endif > +} > +#endif /* CONFIG_HUGETLB_PAGE */ > + > #ifdef CONFIG_DEBUG_VM > void assert_pte_locked(struct mm_struct *mm, unsigned long addr) > { > -- > 2.17.0
Re: [PATCH-RESEND] cxl: Disable prefault_mode in Radix mode
Vaibhav Jain writes: > From: Vaibhav Jain > > Currently we see a kernel-oops reported on Power-9 while attaching a > context to an AFU, with radix-mode and sysfs attr 'prefault_mode' set > to anything other than 'none'. The backtrace of the oops is of this > form: > > Unable to handle kernel paging request for data at address 0x0080 > Faulting instruction address: 0xc0080bcf3b20 > cpu 0x1: Vector: 300 (Data Access) at [c0037f003800] > pc: c0080bcf3b20: cxl_load_segment+0x178/0x290 [cxl] > lr: c0080bcf39f0: cxl_load_segment+0x48/0x290 [cxl] > sp: c0037f003a80 >msr: 90009033 >dar: 80 > dsisr: 4000 > current = 0xc0037f28 > paca= 0xc003e600 softe: 3irq_happened: 0x01 > pid = 3529, comm = afp_no_int > > [c0037f003af0] c0080bcf4424 cxl_prefault+0xfc/0x248 [cxl] > [c0037f003b50] c0080bcf8a40 process_element_entry_psl9+0xd8/0x1a0 > [cxl] > [c0037f003b90] c0080bcf944c > cxl_attach_dedicated_process_psl9+0x44/0x130 [cxl] > [c0037f003bd0] c0080bcf5448 native_attach_process+0xc0/0x130 [cxl] > [c0037f003c50] c0080bcf16cc afu_ioctl+0x3f4/0x5e0 [cxl] > [c0037f003d00] c039d98c do_vfs_ioctl+0xdc/0x890 > [c0037f003da0] c039e1a8 ksys_ioctl+0x68/0xf0 > [c0037f003df0] c039e270 sys_ioctl+0x40/0xa0 > [c0037f003e30] c000b320 system_call+0x58/0x6c > --- Exception: c01 (System Call) at 10053bb0 ^^^ This tells patch/git-am to drop the rest of the change log, which is not what we want. I tend to indent stack traces etc with two spaces, which avoids the problem. Or in this case we can just drop the line as it's not really that informative. I've fixed it up. cheers
Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
On Tue, 2018-05-29 at 09:48 +1000, Benjamin Herrenschmidt wrote: > > Well it's not supposed to be much slower for the static case. > > > > vhost has a cache so should be fine. > > > > A while ago Paolo implemented a translation cache which should be > > perfect for this case - most of the code got merged but > > never enabled because of stability issues. > > > > If all else fails, we could teach QEMU to handle the no-iommu case > > as if VIRTIO_F_IOMMU_PLATFORM was off. > > Any serious reason why not just getting that 2 line patch allowing our > arch code to force virtio to use the DMA API ? > > It's not particularly invasive and solves our problem rather nicely > without adding overhead or additional knowledge to qemu/libvirt/mgmnt > tools etc... that it doesn't need etc > > The guest knows it's going secure so the guest arch code can do the > right thing rather trivially. > > Long term we should probably make virtio always use the DMA API anyway, > and interpose "1:1" dma_ops for the traditional virtio case, that would > reduce code clutter significantly. In that case, it would become just a > matter of having a platform hook to override the dma_ops used. To elaborate a bit What we are trying to solve here is entirely a guest problem, I don't think involving qemu in the solution is the right thing to do. The guest can only allow external parties (qemu, potentially PCI devices, etc...) access to some restricted portions of memory (insecure memory). Thus the guest need to do some bounce buffering/swiotlb type tricks. This is completely orthogonal to whether there is an actual iommu between the guest and the device (or emulated device/virtio). This is why I think the solution should reside in the guest kernel, by proper manipulation (by the arch code) of the dma ops. I don't think forcing the addition of an emulated iommu in the middle just to work around the fact that virtio "cheats" and doesn't use the dma API unless there is one, is the right "fix". The right long term fix is to always use the DMA API, reducing code path etc... and just have a single point where virtio can "chose" alternate DMA ops (via an arch hook to deal with our case). In the meantime, having the hook we propose gets us going, but if you agree with the approach, we should also work on the long term approach. Cheers, Ben.
Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
On Fri, 2018-05-25 at 20:45 +0300, Michael S. Tsirkin wrote: > On Thu, May 24, 2018 at 08:27:04AM +1000, Benjamin Herrenschmidt wrote: > > On Wed, 2018-05-23 at 21:50 +0300, Michael S. Tsirkin wrote: > > > > > I re-read that discussion and I'm still unclear on the > > > original question, since I got several apparently > > > conflicting answers. > > > > > > I asked: > > > > > > Why isn't setting VIRTIO_F_IOMMU_PLATFORM on the > > > hypervisor side sufficient? > > > > I thought I had replied to this... > > > > There are a couple of reasons: > > > > - First qemu doesn't know that the guest will switch to "secure mode" > > in advance. There is no difference between a normal and a secure > > partition until the partition does the magic UV call to "enter secure > > mode" and qemu doesn't see any of it. So who can set the flag here ? > > Not sure I understand. Just set the flag e.g. on qemu command line. > I might be wrong, but these secure mode things usually > a. require hypervisor side tricks anyway The way our secure mode architecture is designed, there doesn't need at this point to be any knowledge at qemu level whatsoever. Well at least until we do migration but that's a different kettle of fish. In any case, the guest starts normally (which means as a non-secure guest, and thus expects normal virtio, our FW today doesn't handle VIRTIO_F_IOMMU_PLATFORM, though granted, we can fix this), and later that guest issues some special Ultravisor call that turns it into a secure guest. There is some involvement of the hypervisor, but not qemu at this stage. We would very much like to avoid that, as it would be a hassle for users to have to use different libvirt options etc... bcs the guest might turn itself into a secure VM. > > - Second, when using VIRTIO_F_IOMMU_PLATFORM, we also make qemu (or > > vhost) go through the emulated MMIO for every access to the guest, > > which adds additional overhead. > > > > Cheers, > > Ben. > > Well it's not supposed to be much slower for the static case. > > vhost has a cache so should be fine. > > A while ago Paolo implemented a translation cache which should be > perfect for this case - most of the code got merged but > never enabled because of stability issues. > > If all else fails, we could teach QEMU to handle the no-iommu case > as if VIRTIO_F_IOMMU_PLATFORM was off. Any serious reason why not just getting that 2 line patch allowing our arch code to force virtio to use the DMA API ? It's not particularly invasive and solves our problem rather nicely without adding overhead or additional knowledge to qemu/libvirt/mgmnt tools etc... that it doesn't need etc The guest knows it's going secure so the guest arch code can do the right thing rather trivially. Long term we should probably make virtio always use the DMA API anyway, and interpose "1:1" dma_ops for the traditional virtio case, that would reduce code clutter significantly. In that case, it would become just a matter of having a platform hook to override the dma_ops used. Cheers, Ben. > > > > > > > > > > > > arch/powerpc/include/asm/dma-mapping.h | 6 ++ > > > > arch/powerpc/platforms/pseries/iommu.c | 11 +++ > > > > drivers/virtio/virtio_ring.c | 10 ++ > > > > 3 files changed, 27 insertions(+) > > > > > > > > diff --git a/arch/powerpc/include/asm/dma-mapping.h > > > > b/arch/powerpc/include/asm/dma-mapping.h > > > > index 8fa3945..056e578 100644 > > > > --- a/arch/powerpc/include/asm/dma-mapping.h > > > > +++ b/arch/powerpc/include/asm/dma-mapping.h > > > > @@ -115,4 +115,10 @@ extern u64 __dma_get_required_mask(struct device > > > > *dev); > > > > #define ARCH_HAS_DMA_MMAP_COHERENT > > > > > > > > #endif /* __KERNEL__ */ > > > > + > > > > +#define platform_forces_virtio_dma platform_forces_virtio_dma > > > > + > > > > +struct virtio_device; > > > > + > > > > +extern bool platform_forces_virtio_dma(struct virtio_device *vdev); > > > > #endif /* _ASM_DMA_MAPPING_H */ > > > > diff --git a/arch/powerpc/platforms/pseries/iommu.c > > > > b/arch/powerpc/platforms/pseries/iommu.c > > > > index 06f0296..a2ec15a 100644 > > > > --- a/arch/powerpc/platforms/pseries/iommu.c > > > > +++ b/arch/powerpc/platforms/pseries/iommu.c > > > > @@ -38,6 +38,7 @@ > > > > #include > > > > #include > > > > #include > > > > +#include > > > > #include > > > > #include > > > > #include > > > > @@ -1396,3 +1397,13 @@ static int __init disable_multitce(char *str) > > > > __setup("multitce=", disable_multitce); > > > > > > > > machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init); > > > > + > > > > +bool platform_forces_virtio_dma(struct virtio_device *vdev) > > > > +{ > > > > + /* > > > > +* On protected guest platforms, force virtio core to use DMA > > > > +* MAP API for all virtio devices. But there can also be some > > > > +* exceptions for individual devices like virtio balloon. > > > > +*/ > > > > +
[PATCH v2] selftests/powerpc: Add perf breakpoint test
This tests perf hardware breakpoints (ie PERF_TYPE_BREAKPOINT) on powerpc. Signed-off-by: Michael Neuling --- v2: - fix failure noticed by mpe - rewrite to explicitly test all options (rather than randomly) --- .../selftests/powerpc/ptrace/.gitignore | 1 + .../testing/selftests/powerpc/ptrace/Makefile | 2 +- .../selftests/powerpc/ptrace/perf-hwbreak.c | 195 ++ 3 files changed, 197 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c diff --git a/tools/testing/selftests/powerpc/ptrace/.gitignore b/tools/testing/selftests/powerpc/ptrace/.gitignore index 9dcc16ea81..07ec449a27 100644 --- a/tools/testing/selftests/powerpc/ptrace/.gitignore +++ b/tools/testing/selftests/powerpc/ptrace/.gitignore @@ -9,3 +9,4 @@ ptrace-tm-vsx ptrace-tm-spd-vsx ptrace-tm-spr ptrace-hwbreak +perf-hwbreak diff --git a/tools/testing/selftests/powerpc/ptrace/Makefile b/tools/testing/selftests/powerpc/ptrace/Makefile index 0e2f4601d1..aaf7044771 100644 --- a/tools/testing/selftests/powerpc/ptrace/Makefile +++ b/tools/testing/selftests/powerpc/ptrace/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 TEST_PROGS := ptrace-gpr ptrace-tm-gpr ptrace-tm-spd-gpr \ ptrace-tar ptrace-tm-tar ptrace-tm-spd-tar ptrace-vsx ptrace-tm-vsx \ - ptrace-tm-spd-vsx ptrace-tm-spr ptrace-hwbreak + ptrace-tm-spd-vsx ptrace-tm-spr ptrace-hwbreak perf-hwbreak include ../../lib.mk diff --git a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c new file mode 100644 index 00..60df0b5e62 --- /dev/null +++ b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c @@ -0,0 +1,195 @@ +/* + * perf events self profiling example test case for hw breakpoints. + * + * This tests perf PERF_TYPE_BREAKPOINT parameters + * 1) tests all variants of the break on read/write flags + * 2) tests exclude_user == 0 and 1 + * 3) test array matches (if DAWR is supported)) + * 4) test different numbers of breakpoints matches + * + * Configure this breakpoint, then read and write the data a number of + * times. Then check the output count from perf is as expected. + * + * Based on: + * http://ozlabs.org/~anton/junkcode/perf_events_example1.c + * + * Copyright (C) 2018 Michael Neuling, IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "utils.h" + +#define MAX_LOOPS 1 + +#define DAWR_LENGTH_MAX ((0x3f + 1) * 8) + +static inline int sys_perf_event_open(struct perf_event_attr *attr, pid_t pid, + int cpu, int group_fd, + unsigned long flags) +{ + attr->size = sizeof(*attr); + return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags); +} + +static inline bool breakpoint_test(int len) +{ + struct perf_event_attr attr; + int fd; + + /* setup counters */ + memset(, 0, sizeof(attr)); + attr.disabled = 1; + attr.type = PERF_TYPE_BREAKPOINT; + attr.bp_type = HW_BREAKPOINT_R; + /* bp_addr can point anywhere but needs to be aligned */ + attr.bp_addr = (__u64)() & 0xf800; + attr.bp_len = len; + fd = sys_perf_event_open(, 0, -1, -1, 0); + if (fd < 0) + return false; + close(fd); + return true; +} + +static inline bool perf_breakpoint_supported(void) +{ + return breakpoint_test(4); +} + +static inline bool dawr_supported(void) +{ + return breakpoint_test(DAWR_LENGTH_MAX); +} + +static int runtestsingle(int readwriteflag, int exclude_user, int arraytest) +{ + int i,j; + struct perf_event_attr attr; + size_t res; + unsigned long long breaks, needed; + int readint; + int readintarraybig[2*DAWR_LENGTH_MAX/sizeof(int)]; + int *readintalign; + volatile int *ptr; + int break_fd; + int loop_num = MAX_LOOPS - (rand() % 100); /* provide some variability */ + volatile int *k; + + /* align to 0x400 boundary as required by DAWR */ + readintalign = (int *)(((unsigned long)readintarraybig + 0x7ff) & + 0xf800); + + ptr = + if (arraytest) + ptr = [0]; + + /* setup counters */ + memset(, 0, sizeof(attr)); + attr.disabled = 1; + attr.type = PERF_TYPE_BREAKPOINT; + attr.bp_type = readwriteflag; + attr.bp_addr = (__u64)ptr; + attr.bp_len = sizeof(int); + if (arraytest) + attr.bp_len = DAWR_LENGTH_MAX; + attr.exclude_user
Re: [PATCH] powerpc/64: Fix build failure with GCC 8.1
On Mon, 2018-05-28 at 16:37 +, Christophe Leroy wrote: > CC arch/powerpc/kernel/nvram_64.o > arch/powerpc/kernel/nvram_64.c: In function 'nvram_create_partition': > arch/powerpc/kernel/nvram_64.c:1042:2: error: 'strncpy' specified bound 12 > equals destination size [-Werror=stringop-truncation] > strncpy(new_part->header.name, name, 12); > ^~~~ > > CC arch/powerpc/kernel/trace/ftrace.o > In function 'make_field', > inlined from 'ps3_repository_read_boot_dat_address' at > arch/powerpc/platforms/ps3/repository.c:900:9: > arch/powerpc/platforms/ps3/repository.c:106:2: error: 'strncpy' output > truncated before terminating nul copying 8 bytes from a string of the same > length [-Werror=stringop-truncation] > strncpy((char *), text, 8); > ^~~~ That's not completely correct, here it's gcc warning being over- zealous. The old specs I could find define the format to be a 12 characters strings *or* a less-than-12 characters NUL terminated string. So it's perfectly ok to drop the trailing 0 if the name happens to be exactly 12 characters. Cheers, Ben. > Signed-off-by: Christophe Leroy > --- > arch/powerpc/kernel/nvram_64.c | 2 +- > arch/powerpc/platforms/ps3/repository.c | 4 ++-- > 2 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/kernel/nvram_64.c b/arch/powerpc/kernel/nvram_64.c > index ba681dac7b46..7507448cd904 100644 > --- a/arch/powerpc/kernel/nvram_64.c > +++ b/arch/powerpc/kernel/nvram_64.c > @@ -1039,7 +1039,7 @@ loff_t __init nvram_create_partition(const char *name, > int sig, > new_part->index = free_part->index; > new_part->header.signature = sig; > new_part->header.length = size; > - strncpy(new_part->header.name, name, 12); > + strncpy(new_part->header.name, name, sizeof(new_part->header.name) - 1); > new_part->header.checksum = nvram_checksum(_part->header); > > rc = nvram_write_header(new_part); > diff --git a/arch/powerpc/platforms/ps3/repository.c > b/arch/powerpc/platforms/ps3/repository.c > index 50dbaf24b1ee..b4d6628eec5e 100644 > --- a/arch/powerpc/platforms/ps3/repository.c > +++ b/arch/powerpc/platforms/ps3/repository.c > @@ -101,9 +101,9 @@ static u64 make_first_field(const char *text, u64 index) > > static u64 make_field(const char *text, u64 index) > { > - u64 n; > + u64 n = 0; > > - strncpy((char *), text, 8); > + memcpy((char *), text, min(sizeof(n), strlen(text))); > return n + index; > } >
Re: [PATCH][RFC] [powerpc] arch_ptrace() uses of access_ok() are pointless
> Maybe this is just an RFC, but: > > CALL../arch/powerpc/kernel/systbl_chk.sh > ../arch/powerpc/kernel/ptrace.c: In function ‘arch_ptrace’: > ../arch/powerpc/kernel/ptrace.c:3086:4: error: expected ‘)’ before ‘return’ > return -EFAULT; > ^~ and the same a few lines later. What's more, those 'unlikely' are pointless there. Fixed variant follows; only build-tested, though. make it use copy_{from,to}_user(), rather than access_ok() + __copy_... Signed-off-by: Al Viro --- arch/powerpc/kernel/ptrace.c | 22 +++--- 1 file changed, 7 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c index d23cf632edf0..f557322621e0 100644 --- a/arch/powerpc/kernel/ptrace.c +++ b/arch/powerpc/kernel/ptrace.c @@ -3081,27 +3081,19 @@ long arch_ptrace(struct task_struct *child, long request, #endif /* CONFIG_HAVE_HW_BREAKPOINT */ #endif /* CONFIG_PPC_ADV_DEBUG_REGS */ - if (!access_ok(VERIFY_WRITE, datavp, - sizeof(struct ppc_debug_info))) + if (copy_to_user(datavp, , +sizeof(struct ppc_debug_info))) return -EFAULT; - ret = __copy_to_user(datavp, , -sizeof(struct ppc_debug_info)) ? - -EFAULT : 0; - break; + return 0; } case PPC_PTRACE_SETHWDEBUG: { struct ppc_hw_breakpoint bp_info; - if (!access_ok(VERIFY_READ, datavp, - sizeof(struct ppc_hw_breakpoint))) - return -EFAULT; - ret = __copy_from_user(_info, datavp, - sizeof(struct ppc_hw_breakpoint)) ? - -EFAULT : 0; - if (!ret) - ret = ppc_set_hwdebug(child, _info); - break; + if (copy_from_user(_info, datavp, + sizeof(struct ppc_hw_breakpoint))) + return -EFAULT; + return ppc_set_hwdebug(child, _info); } case PPC_PTRACE_DELHWDEBUG: { -- 2.11.0
Re: [PATCH][RFC] [powerpc] arch_ptrace() uses of access_ok() are pointless
On Mon, May 28, 2018 at 12:34 AM, Al Viro wrote: > make it use copy_{from,to}_user(), rather than access_ok() + > __copy_... > > Signed-off-by: Al Viro > --- > arch/powerpc/kernel/ptrace.c | 22 +++--- > 1 file changed, 7 insertions(+), 15 deletions(-) > > diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c > index d23cf632edf0..d8b0fd2fa3aa 100644 > --- a/arch/powerpc/kernel/ptrace.c > +++ b/arch/powerpc/kernel/ptrace.c > @@ -3081,27 +3081,19 @@ long arch_ptrace(struct task_struct *child, long > request, > #endif /* CONFIG_HAVE_HW_BREAKPOINT */ > #endif /* CONFIG_PPC_ADV_DEBUG_REGS */ > > - if (!access_ok(VERIFY_WRITE, datavp, > - sizeof(struct ppc_debug_info))) > + if (unlikely(copy_to_user(datavp, , > +sizeof(struct ppc_debug_info))) Maybe this is just an RFC, but: CALL../arch/powerpc/kernel/systbl_chk.sh ../arch/powerpc/kernel/ptrace.c: In function ‘arch_ptrace’: ../arch/powerpc/kernel/ptrace.c:3086:4: error: expected ‘)’ before ‘return’ return -EFAULT; ^~ Missing closing parenthesis. > return -EFAULT; > - ret = __copy_to_user(datavp, , > -sizeof(struct ppc_debug_info)) ? > - -EFAULT : 0; > - break; > + return 0; > } > > case PPC_PTRACE_SETHWDEBUG: { > struct ppc_hw_breakpoint bp_info; > > - if (!access_ok(VERIFY_READ, datavp, > - sizeof(struct ppc_hw_breakpoint))) > - return -EFAULT; > - ret = __copy_from_user(_info, datavp, > - sizeof(struct ppc_hw_breakpoint)) ? > - -EFAULT : 0; > - if (!ret) > - ret = ppc_set_hwdebug(child, _info); > - break; > + if (unlikely(copy_from_user(_info, datavp, > + sizeof(struct ppc_hw_breakpoint))) > + return -EFAULT; > + return ppc_set_hwdebug(child, _info); > } > > case PPC_PTRACE_DELHWDEBUG: { > -- > 2.11.0 >
[PATCH] cpuidle/powernv : Add Description for cpuidle state
Names of cpuidle states were being used for description of states in POWER as no descriptions were added in device tree. This patch reads description for idle states which have been added in device tree. The description for idle states in case of POWER can be printed using "cpupower monitor -l" or "cpupower idle-info". Signed-off-by: Abhishek Goel --- The skiboot patch which adds description for idle states in device tree can be found here: https://patchwork.ozlabs.org/patch/921637/ drivers/cpuidle/cpuidle-powernv.c | 17 + include/linux/cpuidle.h | 2 +- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c index 1a8234e706bc..d3915a965128 100644 --- a/drivers/cpuidle/cpuidle-powernv.c +++ b/drivers/cpuidle/cpuidle-powernv.c @@ -133,7 +133,7 @@ static int stop_loop(struct cpuidle_device *dev, static struct cpuidle_state powernv_states[CPUIDLE_STATE_MAX] = { { /* Snooze */ .name = "snooze", - .desc = "snooze", + .desc = "idle polling state", .exit_latency = 0, .target_residency = 0, .enter = snooze_loop }, @@ -206,6 +206,7 @@ static int powernv_cpuidle_driver_init(void) } static inline void add_powernv_state(int index, const char *name, +const char *desc, unsigned int flags, int (*idle_fn)(struct cpuidle_device *, struct cpuidle_driver *, @@ -215,7 +216,7 @@ static inline void add_powernv_state(int index, const char *name, u64 psscr_val, u64 psscr_mask) { strlcpy(powernv_states[index].name, name, CPUIDLE_NAME_LEN); - strlcpy(powernv_states[index].desc, name, CPUIDLE_NAME_LEN); + strlcpy(powernv_states[index].desc, desc, CPUIDLE_DESC_LEN); powernv_states[index].flags = flags; powernv_states[index].target_residency = target_residency; powernv_states[index].exit_latency = exit_latency; @@ -250,6 +251,7 @@ static int powernv_add_idle_states(void) u64 psscr_val[CPUIDLE_STATE_MAX]; u64 psscr_mask[CPUIDLE_STATE_MAX]; const char *names[CPUIDLE_STATE_MAX]; + const char *descs[CPUIDLE_STATE_MAX]; u32 has_stop_states = 0; int i, rc; u32 supported_flags = pnv_get_supported_cpuidle_states(); @@ -311,6 +313,11 @@ static int powernv_add_idle_states(void) pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-names in DT\n"); goto out; } + if (of_property_read_string_array(power_mgt, + "ibm,cpu-idle-state-descs", descs, dt_idle_states) < 0) { + pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-descs in DT\n"); + goto out; + } /* * If the idle states use stop instruction, probe for psscr values @@ -414,10 +421,11 @@ static int powernv_add_idle_states(void) target_residency = 100; /* Add NAP state */ add_powernv_state(nr_idle_states, "Nap", + "stop processor execution", CPUIDLE_FLAG_NONE, nap_loop, target_residency, exit_latency, 0, 0); } else if (has_stop_states && !stops_timebase) { - add_powernv_state(nr_idle_states, names[i], + add_powernv_state(nr_idle_states, names[i], descs[i], CPUIDLE_FLAG_NONE, stop_loop, target_residency, exit_latency, psscr_val[i], psscr_mask[i]); @@ -434,11 +442,12 @@ static int powernv_add_idle_states(void) target_residency = 30; /* Add FASTSLEEP state */ add_powernv_state(nr_idle_states, "FastSleep", + "Core and L2 clock gating", CPUIDLE_FLAG_TIMER_STOP, fastsleep_loop, target_residency, exit_latency, 0, 0); } else if (has_stop_states && stops_timebase) { - add_powernv_state(nr_idle_states, names[i], + add_powernv_state(nr_idle_states, names[i], descs[i], CPUIDLE_FLAG_TIMER_STOP, stop_loop, target_residency, exit_latency, psscr_val[i], psscr_mask[i]); diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h index 1eefabf1621f..5094755cb132 100644 ---
[PATCH] powerpc/64: Fix build failure with GCC 8.1
CC arch/powerpc/kernel/nvram_64.o arch/powerpc/kernel/nvram_64.c: In function 'nvram_create_partition': arch/powerpc/kernel/nvram_64.c:1042:2: error: 'strncpy' specified bound 12 equals destination size [-Werror=stringop-truncation] strncpy(new_part->header.name, name, 12); ^~~~ CC arch/powerpc/kernel/trace/ftrace.o In function 'make_field', inlined from 'ps3_repository_read_boot_dat_address' at arch/powerpc/platforms/ps3/repository.c:900:9: arch/powerpc/platforms/ps3/repository.c:106:2: error: 'strncpy' output truncated before terminating nul copying 8 bytes from a string of the same length [-Werror=stringop-truncation] strncpy((char *), text, 8); ^~~~ Signed-off-by: Christophe Leroy--- arch/powerpc/kernel/nvram_64.c | 2 +- arch/powerpc/platforms/ps3/repository.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/nvram_64.c b/arch/powerpc/kernel/nvram_64.c index ba681dac7b46..7507448cd904 100644 --- a/arch/powerpc/kernel/nvram_64.c +++ b/arch/powerpc/kernel/nvram_64.c @@ -1039,7 +1039,7 @@ loff_t __init nvram_create_partition(const char *name, int sig, new_part->index = free_part->index; new_part->header.signature = sig; new_part->header.length = size; - strncpy(new_part->header.name, name, 12); + strncpy(new_part->header.name, name, sizeof(new_part->header.name) - 1); new_part->header.checksum = nvram_checksum(_part->header); rc = nvram_write_header(new_part); diff --git a/arch/powerpc/platforms/ps3/repository.c b/arch/powerpc/platforms/ps3/repository.c index 50dbaf24b1ee..b4d6628eec5e 100644 --- a/arch/powerpc/platforms/ps3/repository.c +++ b/arch/powerpc/platforms/ps3/repository.c @@ -101,9 +101,9 @@ static u64 make_first_field(const char *text, u64 index) static u64 make_field(const char *text, u64 index) { - u64 n; + u64 n = 0; - strncpy((char *), text, 8); + memcpy((char *), text, min(sizeof(n), strlen(text))); return n + index; } -- 2.13.3
Re: [PATCH] powerpc/Makefile: fix build failure by disabling attribute-alias warning
On Mon, May 28, 2018 at 5:07 PM, Segher Boessenkoolwrote: > On Mon, May 28, 2018 at 02:39:37PM +, Christophe Leroy wrote: >> In file included from arch/powerpc/kernel/syscalls.c:24: >> ./include/linux/syscalls.h:233:18: warning: 'sys_mmap2' alias between >> functions of incompatible types 'long int(long unsigned int, size_t, >> long unsigned int, long unsigned int, long unsigned int, long >> unsigned int)' {aka 'long int(long unsigned int, unsigned int, long >> unsigned int, long unsigned int, long unsigned int, long unsigned >> int)'} and 'long int(long int, long int, long int, long int, long >> int, long int)' [-Wattribute-alias] > > So yes, these are actually different (int vs. long). > >> In file included from arch/powerpc/kernel/signal_32.c:29: >> ./include/linux/syscalls.h:233:18: warning: 'sys_swapcontext' alias >> between functions of incompatible types 'long int(struct ucontext *, >> struct ucontext *, long int)' and 'long int(long int, long int, long >> int)' [-Wattribute-alias] > > And this one is quite spectacularly different. https://patchwork.kernel.org/patch/10375607/ > > Try putting this before the wacko aliases: > > #pragma GCC diagnostic push > #pragma GCC diagnostic ignored "-Wattribute-alias" > > and this after: > > #pragma GCC diagnostic pop > > so that you will get this quite useful warning for all other code. > > > Segher
[PATCH] powerpc/64s: Enhance the information in cpu_show_spectre_v1()
We now have barrier_nospec as mitigation so print it in cpu_show_spectre_v1 when enabled. Signed-off-by: Michal Suchanek--- arch/powerpc/kernel/security.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c index 0239383c7e4d..a0c32d53980b 100644 --- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -120,7 +120,10 @@ ssize_t cpu_show_spectre_v1(struct device *dev, struct device_attribute *attr, c if (!security_ftr_enabled(SEC_FTR_BNDS_CHK_SPEC_BAR)) return sprintf(buf, "Not affected\n"); - return sprintf(buf, "Vulnerable\n"); + if (barrier_nospec_enabled) + return sprintf(buf, "Mitigation: __user pointer sanitization\n"); + else + return sprintf(buf, "Vulnerable\n"); } ssize_t cpu_show_spectre_v2(struct device *dev, struct device_attribute *attr, char *buf) -- 2.13.6
Re: [PATCH] powerpc/Makefile: fix build failure by disabling attribute-alias warning
On Mon, May 28, 2018 at 02:39:37PM +, Christophe Leroy wrote: > In file included from arch/powerpc/kernel/syscalls.c:24: > ./include/linux/syscalls.h:233:18: warning: 'sys_mmap2' alias between > functions of incompatible types 'long int(long unsigned int, size_t, > long unsigned int, long unsigned int, long unsigned int, long > unsigned int)' {aka 'long int(long unsigned int, unsigned int, long > unsigned int, long unsigned int, long unsigned int, long unsigned > int)'} and 'long int(long int, long int, long int, long int, long > int, long int)' [-Wattribute-alias] So yes, these are actually different (int vs. long). > In file included from arch/powerpc/kernel/signal_32.c:29: > ./include/linux/syscalls.h:233:18: warning: 'sys_swapcontext' alias > between functions of incompatible types 'long int(struct ucontext *, > struct ucontext *, long int)' and 'long int(long int, long int, long > int)' [-Wattribute-alias] And this one is quite spectacularly different. Try putting this before the wacko aliases: #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wattribute-alias" and this after: #pragma GCC diagnostic pop so that you will get this quite useful warning for all other code. Segher
Re: [PATCH] powerpc/Makefile: fix build failure by disabling attribute-alias warning
Le 28/05/2018 à 16:37, Segher Boessenkool a écrit : On Mon, May 28, 2018 at 02:17:49PM +, Christophe Leroy wrote: Latest GCC version emit many warnings similar to the following one. You didn't actually show an example? Yes I forgot: In file included from arch/powerpc/kernel/syscalls.c:24: ./include/linux/syscalls.h:233:18: warning: 'sys_mmap2' alias between functions of incompatible types 'long int(long unsigned int, size_t, long unsigned int, long unsigned int, long unsigned int, long unsigned int)' {aka 'long int(long unsigned int, unsigned int, long unsigned int, long unsigned int, long unsigned int, long unsigned int)'} and 'long int(long int, long int, long int, long int, long int, long int)' [-Wattribute-alias] asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \ ^~~ ./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx' __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) ^ ./include/linux/syscalls.h:216:36: note: in expansion of macro 'SYSCALL_DEFINEx' #define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__) ^~~ arch/powerpc/kernel/syscalls.c:65:1: note: in expansion of macro 'SYSCALL_DEFINE6' SYSCALL_DEFINE6(mmap2, unsigned long, addr, size_t, len, ^~~ ./include/linux/syscalls.h:238:18: note: aliased declaration here asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \ ^~~~ ./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx' __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) ^ ./include/linux/syscalls.h:216:36: note: in expansion of macro 'SYSCALL_DEFINEx' #define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__) ^~~ arch/powerpc/kernel/syscalls.c:65:1: note: in expansion of macro 'SYSCALL_DEFINE6' SYSCALL_DEFINE6(mmap2, unsigned long, addr, size_t, len, ^~~ This warning should detect serious problems, so don't disable it without first investigating please. It's been a discussion on this topic, ref https://lkml.org/lkml/2017/12/5/581 It says "The new warning seems reasonable in principle, but it doesn't help us here, since we rely on the type mismatch to sanitize the system call arguments. After I reported this as GCC PR82435, a new -Wno-attribute-alias option was added that could be used to turn the warning off globally on the command line, but I'd prefer to do it a little more fine-grained" What do you call "latest version", btw? Trunk, aka 9.0? Or the most advanced release, 8.1? Or the latest release (which also is 8.1 :-) ) I encounter it with 8.1 According the refered discusion, it linked to GCC 8 Christophe Segher
Re: [PATCH] powerpc/Makefile: fix build failure by disabling attribute-alias warning
On 05/28/2018 02:17 PM, Christophe Leroy wrote: Latest GCC version emit many warnings similar to the following one. Oops, forgot to include the failure text: In file included from arch/powerpc/kernel/syscalls.c:24: ./include/linux/syscalls.h:233:18: warning: 'sys_mmap2' alias between functions of incompatible types 'long int(long unsigned int, size_t, long unsigned int, long unsigned int, long unsigned int, long unsigned int)' {aka 'long int(long unsigned int, unsigned int, long unsigned int, long unsigned int, long unsigned int, long unsigned int)'} and 'long int(long int, long int, long int, long int, long int, long int)' [-Wattribute-alias] asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \ ^~~ ./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx' __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) ^ ./include/linux/syscalls.h:216:36: note: in expansion of macro 'SYSCALL_DEFINEx' #define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__) ^~~ arch/powerpc/kernel/syscalls.c:65:1: note: in expansion of macro 'SYSCALL_DEFINE6' SYSCALL_DEFINE6(mmap2, unsigned long, addr, size_t, len, ^~~ ./include/linux/syscalls.h:238:18: note: aliased declaration here asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \ ^~~~ ./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx' __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) ^ ./include/linux/syscalls.h:216:36: note: in expansion of macro 'SYSCALL_DEFINEx' #define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__) ^~~ arch/powerpc/kernel/syscalls.c:65:1: note: in expansion of macro 'SYSCALL_DEFINE6' SYSCALL_DEFINE6(mmap2, unsigned long, addr, size_t, len, ^~~ ./include/linux/syscalls.h:233:18: warning: 'sys_mmap' alias between functions of incompatible types 'long int(long unsigned int, size_t, long unsigned int, long unsigned int, long unsigned int, off_t)' {aka 'long int(long unsigned int, unsigned int, long unsigned int, long unsigned int, long unsigned int, long int)'} and 'long int(long int, long int, long int, long int, long int, long int)' [-Wattribute-alias] asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \ ^~~ ./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx' __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) ^ ./include/linux/syscalls.h:216:36: note: in expansion of macro 'SYSCALL_DEFINEx' #define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__) ^~~ arch/powerpc/kernel/syscalls.c:72:1: note: in expansion of macro 'SYSCALL_DEFINE6' SYSCALL_DEFINE6(mmap, unsigned long, addr, size_t, len, ^~~ ./include/linux/syscalls.h:238:18: note: aliased declaration here asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \ ^~~~ ./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx' __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) ^ ./include/linux/syscalls.h:216:36: note: in expansion of macro 'SYSCALL_DEFINEx' #define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__) ^~~ arch/powerpc/kernel/syscalls.c:72:1: note: in expansion of macro 'SYSCALL_DEFINE6' SYSCALL_DEFINE6(mmap, unsigned long, addr, size_t, len, ^~~ CC arch/powerpc/kernel/irq.o CC arch/powerpc/kernel/align.o CC arch/powerpc/kernel/signal_32.o In file included from arch/powerpc/kernel/signal_32.c:29: ./include/linux/syscalls.h:233:18: warning: 'sys_swapcontext' alias between functions of incompatible types 'long int(struct ucontext *, struct ucontext *, long int)' and 'long int(long int, long int, long int)' [-Wattribute-alias] asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \ ^~~ ./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx' __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) ^ ./include/linux/syscalls.h:213:36: note: in expansion of macro 'SYSCALL_DEFINEx' #define SYSCALL_DEFINE3(name, ...) SYSCALL_DEFINEx(3, _##name, __VA_ARGS__) ^~~ arch/powerpc/kernel/signal_32.c:1044:1: note: in expansion of macro 'SYSCALL_DEFINE3' SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx, ^~~ ./include/linux/syscalls.h:238:18: note: aliased declaration here asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \ ^~~~ ./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx' __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
Re: [PATCH] powerpc/Makefile: fix build failure by disabling attribute-alias warning
On Mon, May 28, 2018 at 02:17:49PM +, Christophe Leroy wrote: > Latest GCC version emit many warnings similar to the following one. You didn't actually show an example? This warning should detect serious problems, so don't disable it without first investigating please. What do you call "latest version", btw? Trunk, aka 9.0? Or the most advanced release, 8.1? Or the latest release (which also is 8.1 :-) ) Segher
[PATCH] powerpc/Makefile: fix build failure by disabling attribute-alias warning
Latest GCC version emit many warnings similar to the following one. As arch/powerpc code is built with -Werror, this breaks build with GCC 8.1 This patch inhibits that warning Signed-off-by: Christophe Leroy--- arch/powerpc/Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 26ccd0b91512..d6c557530ce8 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -145,6 +145,8 @@ CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mno-pointers-to-nested-functions) CFLAGS-$(CONFIG_PPC32) := -ffixed-r2 $(MULTIPLEWORD) CFLAGS-$(CONFIG_PPC32) += $(call cc-option,-mno-readonly-in-sdata) +CFLAGS-y += $(call cc-option,-Wno-attribute-alias) + ifeq ($(CONFIG_PPC_BOOK3S_64),y) ifeq ($(CONFIG_CPU_LITTLE_ENDIAN),y) CFLAGS-$(CONFIG_GENERIC_CPU) += -mcpu=power8 -- 2.13.3
Re: [PATCH] powerpc/64s: Clear PCR on boot
On Sat, 2018-05-26 at 20:45 -0700, Guenter Roeck wrote: > > I already have a patch, or at least one that does the trick for me. > Getting qemu patched was not the problem. I just want to be sure that > the problem is indeed a qemu problem. Hey Guenter ! It's not quite the right patch though. The PCR is a hypervisor priviledged register, your patch makes it supervisor accessible. I don't have all my stuff at hand to provide a "proper" or tested patch but it should look like spr_register_hv(env, SPR_PCR, "PCR", SPR_NOACCESS, SPR_NOACCESS, SPR_NOACCESS, SPR_NOACCESS, _read_generic, _write_generic, 0); Additionally the TCG ppc instruction decoder should be made to check the PCR for varous instructions (that or use a specific write callback that affects the CPU flags) but that's less urgent. Cheers, Ben. > Thanks, > Guenter > > --- > > From 1617bac264b4c49d817b6947611affa9b73318f6 Mon Sep 17 00:00:00 2001 > > From: Guenter Roeck> Date: Fri, 25 May 2018 06:38:40 -0700 > Subject: [PATCH] PowerPC: Permit privileged access to SPR_PCR for POWER7+ > > Without this access, Linux mainline bails out. > > Signed-off-by: Guenter Roeck > --- > target/ppc/translate_init.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c > index 391b94b..3b09c49 100644 > --- a/target/ppc/translate_init.c > +++ b/target/ppc/translate_init.c > @@ -7953,11 +7953,12 @@ static void gen_spr_power6_common(CPUPPCState *env) > #endif > /* > * Register PCR to report POWERPC_EXCP_PRIV_REG instead of > - * POWERPC_EXCP_INVAL_SPR. > + * POWERPC_EXCP_INVAL_SPR in userspace. Permit privileged > + * access. > */ > spr_register(env, SPR_PCR, "PCR", > SPR_NOACCESS, SPR_NOACCESS, > - SPR_NOACCESS, SPR_NOACCESS, > + _read_generic, _write_generic, > 0x); > } >
[PATCH v2] powerpc/32: implement strlen() in assembly
The generic implementation of strlen() reads strings byte per byte. This patch implements strlen() in assembly for PPC32 based on a read of entire words, in the same spirit as what some other arches and glibc do. For long strings, the time spent in strlen is reduced by 50-60% Signed-off-by: Christophe Leroy--- Applies on top of serie "[v5 0/3] powerpc/lib: Optimisation of memcmp() and __clear_user() for PPC32" Changes in v2: - Moved handling of unaligned strings outside of the main path as it is very unlikely. - Removed the verification of the fourth byte in case none of the three first ones are NUL. arch/powerpc/include/asm/string.h | 3 +++ arch/powerpc/lib/string_32.S | 41 +++ 2 files changed, 44 insertions(+) diff --git a/arch/powerpc/include/asm/string.h b/arch/powerpc/include/asm/string.h index 9b8cedf618f4..5ecfdb776f87 100644 --- a/arch/powerpc/include/asm/string.h +++ b/arch/powerpc/include/asm/string.h @@ -13,6 +13,9 @@ #define __HAVE_ARCH_MEMCHR #define __HAVE_ARCH_MEMSET16 #define __HAVE_ARCH_MEMCPY_FLUSHCACHE +#ifdef CONFIG_PPC32 +#define __HAVE_ARCH_STRLEN +#endif extern char * strcpy(char *,const char *); extern char * strncpy(char *,const char *, __kernel_size_t); diff --git a/arch/powerpc/lib/string_32.S b/arch/powerpc/lib/string_32.S index 4fbaa046aa84..69593c917b35 100644 --- a/arch/powerpc/lib/string_32.S +++ b/arch/powerpc/lib/string_32.S @@ -47,6 +47,47 @@ _GLOBAL(memcmp) blr EXPORT_SYMBOL(memcmp) +_GLOBAL(strlen) + andi. r9, r3, 3 + addir10, r3, -4 + bne-1f +2: lis r6, 0x8080 + ori r6, r6, 0x8080 /* r6 = 0x80808080 (himagic) */ + rlwinm r7, r6, 1, 0x /* r7 = 0x01010101 (lomagic) */ +3: lwzur9, 4(r10) + /* ((x - lomagic) & ~x & himagic) == 0 means no byte in x is NUL */ + subfr8, r7, r9 + andcr11, r6, r9 + and.r8, r8, r11 + beq+3b + rlwinm. r8, r9, 0, 0xff00 + beq 20f + rlwinm. r8, r9, 0, 0x00ff + beq 21f + rlwinm. r8, r9, 0, 0xff00 + beq 22f + subfr3, r3, r10 + addir3, r3, 3 + blr +22:subfr3, r3, r10 + addir3, r3, 2 + blr +21:subfr3, r3, r10 + addir3, r3, 1 + blr +19:addir10, r10, 3 +20:subfr3, r3, r10 + blr + +1: lbz r9, 4(r10) + addir10, r10, 1 + cmpwi cr1, r9, 0 + andi. r9, r10, 3 + beq cr1, 19b + bne 1b + b 2b +EXPORT_SYMBOL(strlen) + CACHELINE_BYTES = L1_CACHE_BYTES LG_CACHELINE_BYTES = L1_CACHE_SHIFT CACHELINE_MASK = (L1_CACHE_BYTES-1) -- 2.13.3
Re: [PATCH v6 2/4] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision
Hi Simon, wei.guo.si...@gmail.com writes: > diff --git a/arch/powerpc/lib/memcmp_64.S b/arch/powerpc/lib/memcmp_64.S > index f20e883..4ba7bb6 100644 > --- a/arch/powerpc/lib/memcmp_64.S > +++ b/arch/powerpc/lib/memcmp_64.S > @@ -174,6 +235,13 @@ _GLOBAL(memcmp) > blr > > .Llong: > +#ifdef CONFIG_ALTIVEC > + /* Try to use vmx loop if length is equal or greater than 4K */ > + cmpldi cr6,r5,VMX_THRESH > + bge cr6,.Lsameoffset_vmx_cmp > + Here we decide to use vmx, but we don't do any CPU feature checks. > @@ -332,7 +400,94 @@ _GLOBAL(memcmp) > 8: > blr > > +#ifdef CONFIG_ALTIVEC > +.Lsameoffset_vmx_cmp: > + /* Enter with src/dst addrs has the same offset with 8 bytes > + * align boundary > + */ > + ENTER_VMX_OPS > + beq cr1,.Llong_novmx_cmp > + > +3: > + /* need to check whether r4 has the same offset with r3 > + * for 16 bytes boundary. > + */ > + xor r0,r3,r4 > + andi. r0,r0,0xf > + bne .Ldiffoffset_vmx_cmp_start > + > + /* len is no less than 4KB. Need to align with 16 bytes further. > + */ > + andi. rA,r3,8 > + LD rA,0,r3 > + beq 4f > + LD rB,0,r4 > + cmpld cr0,rA,rB > + addir3,r3,8 > + addir4,r4,8 > + addir5,r5,-8 > + > + beq cr0,4f > + /* save and restore cr0 */ > + mfocrf r5,64 > + EXIT_VMX_OPS > + mtocrf 64,r5 > + b .LcmpAB_lightweight > + > +4: > + /* compare 32 bytes for each loop */ > + srdir0,r5,5 > + mtctr r0 > + clrldi r5,r5,59 > + li off16,16 > + > +.balign 16 > +5: > + lvx v0,0,r3 > + lvx v1,0,r4 > + vcmpequd. v0,v0,v1 vcmpequd is only available on Power8 and later CPUs. Which means this will crash on Power7 or earlier. Something like this should fix it I think. diff --git a/arch/powerpc/lib/memcmp_64.S b/arch/powerpc/lib/memcmp_64.S index 96eb08b2be2e..0a11ff14dcd9 100644 --- a/arch/powerpc/lib/memcmp_64.S +++ b/arch/powerpc/lib/memcmp_64.S @@ -236,9 +236,11 @@ _GLOBAL(memcmp) .Llong: #ifdef CONFIG_ALTIVEC +BEGIN_FTR_SECTION /* Try to use vmx loop if length is equal or greater than 4K */ cmpldi cr6,r5,VMX_THRESH bge cr6,.Lsameoffset_vmx_cmp +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) .Llong_novmx_cmp: #endif There's another problem which is that old toolchains don't know about vcmpequd. To fix that we'll need to add a macro that uses .long to construct the instruction. cheers
Re: [PATCH] sound: Use octal not symbolic permissions
On Wed, 23 May 2018 21:20:59 +0200, Joe Perches wrote: > > Convert the S_ symbolic permissions to their octal equivalents as > using octal and not symbolic permissions is preferred by many as more > readable. > > see: https://lkml.org/lkml/2016/8/2/1945 > > Done with automated conversion via: > $ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace > > Miscellanea: > > o Wrapped one multi-line call to a single line > > Signed-off-by: Joe PerchesApplied, thanks. Takashi
Re: [PATCH v6 2/4] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision
On Fri, May 25, 2018 at 12:07:34PM +0800, wei.guo.si...@gmail.com wrote: > + /* save and restore cr0 */ > + mfocrf r5,64 > + EXIT_VMX_OPS > + mtocrf 64,r5 > + b .LcmpAB_lightweight That's cr1, not cr0. You can use mcrf instead, it is cheaper (esp. if you have it in a non-volatile CR field before so you need only one, if any). > + vcmpequb. v7,v9,v10 > + bnl cr6,.Ldiffoffset_vmx_diff_found In other places you say bf 24,... Dunno which is more readable, but please pick one? Segher
RE: [PATCH v11 00/26] Speculative page faults
Full run would take one or two weeks depended on our resource available. Could you pick some ones up, e.g. those have performance regression? -Original Message- From: owner-linux...@kvack.org [mailto:owner-linux...@kvack.org] On Behalf Of Laurent Dufour Sent: Monday, May 28, 2018 4:55 PM To: Song, HaiyanXCc: a...@linux-foundation.org; mho...@kernel.org; pet...@infradead.org; kir...@shutemov.name; a...@linux.intel.com; d...@stgolabs.net; j...@suse.cz; Matthew Wilcox ; khand...@linux.vnet.ibm.com; aneesh.ku...@linux.vnet.ibm.com; b...@kernel.crashing.org; m...@ellerman.id.au; pau...@samba.org; Thomas Gleixner ; Ingo Molnar ; h...@zytor.com; Will Deacon ; Sergey Senozhatsky ; sergey.senozhatsky.w...@gmail.com; Andrea Arcangeli ; Alexei Starovoitov ; Wang, Kemi ; Daniel Jordan ; David Rientjes ; Jerome Glisse ; Ganesh Mahendran ; Minchan Kim ; Punit Agrawal ; vinayak menon ; Yang Shi ; linux-ker...@vger.kernel.org; linux...@kvack.org; ha...@linux.vnet.ibm.com; npig...@gmail.com; bsinghar...@gmail.com; paul...@linux.vnet.ibm.com; Tim Chen ; linuxppc-dev@lists.ozlabs.org; x...@kernel.org Subject: Re: [PATCH v11 00/26] Speculative page faults On 28/05/2018 10:22, Haiyan Song wrote: > Hi Laurent, > > Yes, these tests are done on V9 patch. Do you plan to give this V11 a run ? > > > Best regards, > Haiyan Song > > On Mon, May 28, 2018 at 09:51:34AM +0200, Laurent Dufour wrote: >> On 28/05/2018 07:23, Song, HaiyanX wrote: >>> >>> Some regression and improvements is found by LKP-tools(linux kernel >>> performance) on V9 patch series tested on Intel 4s Skylake platform. >> >> Hi, >> >> Thanks for reporting this benchmark results, but you mentioned the >> "V9 patch series" while responding to the v11 header series... >> Were these tests done on v9 or v11 ? >> >> Cheers, >> Laurent. >> >>> >>> The regression result is sorted by the metric will-it-scale.per_thread_ops. >>> Branch: Laurent-Dufour/Speculative-page-faults/20180316-151833 (V9 >>> patch series) Commit id: >>> base commit: d55f34411b1b126429a823d06c3124c16283231f >>> head commit: 0355322b3577eeab7669066df42c550a56801110 >>> Benchmark suite: will-it-scale >>> Download link: >>> https://github.com/antonblanchard/will-it-scale/tree/master/tests >>> Metrics: >>> will-it-scale.per_process_ops=processes/nr_cpu >>> will-it-scale.per_thread_ops=threads/nr_cpu >>> test box: lkp-skl-4sp1(nr_cpu=192,memory=768G) >>> THP: enable / disable >>> nr_task: 100% >>> >>> 1. Regressions: >>> a) THP enabled: >>> testcasebasechange head >>> metric >>> page_fault3/ enable THP 10092 -17.5% 8323 >>> will-it-scale.per_thread_ops >>> page_fault2/ enable THP 8300 -17.2% 6869 >>> will-it-scale.per_thread_ops >>> brk1/ enable THP 957.67 -7.6% 885 >>> will-it-scale.per_thread_ops >>> page_fault3/ enable THP172821-5.3%163692 >>> will-it-scale.per_process_ops >>> signal1/ enable THP 9125-3.2% 8834 >>> will-it-scale.per_process_ops >>> >>> b) THP disabled: >>> testcasebasechange head >>> metric >>> page_fault3/ disable THP10107 -19.1% 8180 >>> will-it-scale.per_thread_ops >>> page_fault2/ disable THP 8432 -17.8% 6931 >>> will-it-scale.per_thread_ops >>> context_switch1/ disable THP 215389-6.8%200776 >>> will-it-scale.per_thread_ops >>> brk1/ disable THP 939.67 -6.6% 877.33 >>> will-it-scale.per_thread_ops >>> page_fault3/ disable THP 173145-4.7%165064 >>> will-it-scale.per_process_ops >>> signal1/ disable THP 9162-3.9% 8802 >>> will-it-scale.per_process_ops >>> >>> 2. Improvements: >>> a) THP enabled: >>> testcasebasechange head >>> metric >>> malloc1/ enable THP 66.33+469.8% 383.67 >>> will-it-scale.per_thread_ops >>> writeseek3/ enable THP 2531 +4.5% 2646 >>> will-it-scale.per_thread_ops >>> signal1/ enable THP 989.33 +2.8% 1016 >>> will-it-scale.per_thread_ops >>> >>> b) THP disabled: >>> testcasebasechange head >>>
Re: [PATCH v6 1/4] powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp()
On Fri, May 25, 2018 at 12:07:33PM +0800, wei.guo.si...@gmail.com wrote: > _GLOBAL(memcmp) > cmpdi cr1,r5,0 > > - /* Use the short loop if both strings are not 8B aligned */ > - or r6,r3,r4 > + /* Use the short loop if the src/dst addresses are not > + * with the same offset of 8 bytes align boundary. > + */ > + xor r6,r3,r4 > andi. r6,r6,7 > > - /* Use the short loop if length is less than 32B */ > - cmpdi cr6,r5,31 > + /* Fall back to short loop if compare at aligned addrs > + * with less than 8 bytes. > + */ > + cmpdi cr6,r5,7 > > beq cr1,.Lzero > - bne .Lshort > - bgt cr6,.Llong > + bgt cr6,.Lno_short If this doesn't use cr0 anymore, you can do rlwinm r6,r6,0,7 instead of andi r6,r6,7 . > +.Lsameoffset_8bytes_make_align_start: > + /* attempt to compare bytes not aligned with 8 bytes so that > + * rest comparison can run based on 8 bytes alignment. > + */ > + andi. r6,r3,7 > + > + /* Try to compare the first double word which is not 8 bytes aligned: > + * load the first double word at (src & ~7UL) and shift left appropriate > + * bits before comparision. > + */ > + clrlwi r6,r3,29 > + rlwinm r6,r6,3,0,28 Those last two lines are together just rlwinm r6,r3,3,0x1c > + subfc. r5,r6,r5 Why subfc? You don't use the carry. > + rlwinm r6,r6,3,0,28 That's slwi r6,r6,3 > + bgt cr0,8f > + li r3,-1 > +8: > + blr blelr li r3,-1 blr (and more of the same things elsewhere). Segher
[PATCH v5 3/3] powerpc/lib: optimise PPC32 memcmp
At the time being, memcmp() compares two chunks of memory byte per byte. This patch optimises the comparison by comparing word by word. A small benchmark performed on an 8xx comparing two chuncks of 512 bytes performed 10 times gives: Before : 5852274 TB ticks After: 1488638 TB ticks This is almost 4 times faster Signed-off-by: Christophe Leroy--- arch/powerpc/lib/string_32.S | 37 +++-- 1 file changed, 27 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/lib/string_32.S b/arch/powerpc/lib/string_32.S index 40a576d56ac7..4fbaa046aa84 100644 --- a/arch/powerpc/lib/string_32.S +++ b/arch/powerpc/lib/string_32.S @@ -16,17 +16,34 @@ .text _GLOBAL(memcmp) - cmpwi cr0, r5, 0 - beq-2f - mtctr r5 - addir6,r3,-1 - addir4,r4,-1 -1: lbzur3,1(r6) - lbzur0,1(r4) - subf. r3,r0,r3 - bdnzt 2,1b + srawi. r7, r5, 2 /* Divide len by 4 */ + mr r6, r3 + beq-3f + mtctr r7 + li r7, 0 +1: lwzxr3, r6, r7 + lwzxr0, r4, r7 + addir7, r7, 4 + cmplw cr0, r3, r0 + bdnzt eq, 1b + bne 5f +3: andi. r3, r5, 3 + beqlr + cmplwi cr1, r3, 2 + blt-cr1, 4f + lhzxr3, r6, r7 + lhzxr0, r4, r7 + addir7, r7, 2 + subf. r3, r0, r3 + beqlr cr1 + bnelr +4: lbzxr3, r6, r7 + lbzxr0, r4, r7 + subf. r3, r0, r3 blr -2: li r3,0 +5: li r3, 1 + bgtlr + li r3, -1 blr EXPORT_SYMBOL(memcmp) -- 2.13.3
[PATCH v5 2/3] powerpc/lib: optimise 32 bits __clear_user()
Rewrite clear_user() on the same principle as memset(0), making use of dcbz to clear complete cache lines. This code is a copy/paste of memset(), with some modifications in order to retrieve remaining number of bytes to be cleared, as it needs to be returned in case of error. On a MPC885, throughput is almost doubled: Before: ~# dd if=/dev/zero of=/dev/null bs=1M count=1000 1048576000 bytes (1000.0MB) copied, 18.990779 seconds, 52.7MB/s After: ~# dd if=/dev/zero of=/dev/null bs=1M count=1000 1048576000 bytes (1000.0MB) copied, 9.611468 seconds, 104.0MB/s On a MPC8321, throughput is multiplied by 2.12: Before: root@vgoippro:~# dd if=/dev/zero of=/dev/null bs=1M count=1000 1048576000 bytes (1000.0MB) copied, 6.844352 seconds, 146.1MB/s After: root@vgoippro:~# dd if=/dev/zero of=/dev/null bs=1M count=1000 1048576000 bytes (1000.0MB) copied, 3.218854 seconds, 310.7MB/s Signed-off-by: Christophe Leroy--- arch/powerpc/lib/string_32.S | 85 +++- 1 file changed, 60 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/lib/string_32.S b/arch/powerpc/lib/string_32.S index 204db8a834fd..40a576d56ac7 100644 --- a/arch/powerpc/lib/string_32.S +++ b/arch/powerpc/lib/string_32.S @@ -11,6 +11,7 @@ #include #include #include +#include .text @@ -29,44 +30,78 @@ _GLOBAL(memcmp) blr EXPORT_SYMBOL(memcmp) +CACHELINE_BYTES = L1_CACHE_BYTES +LG_CACHELINE_BYTES = L1_CACHE_SHIFT +CACHELINE_MASK = (L1_CACHE_BYTES-1) + _GLOBAL(__clear_user) - addir6,r3,-4 - li r3,0 - li r5,0 - cmplwi 0,r4,4 +/* + * Use dcbz on the complete cache lines in the destination + * to set them to zero. This requires that the destination + * area is cacheable. + */ + cmplwi cr0, r4, 4 + mr r10, r3 + li r3, 0 blt 7f - /* clear a single word */ -11:stwur5,4(r6) + +11:stw r3, 0(r10) beqlr - /* clear word sized chunks */ - andi. r0,r6,3 - add r4,r0,r4 - subfr6,r0,r6 - srwir0,r4,2 - andi. r4,r4,3 + andi. r0, r10, 3 + add r11, r0, r4 + subfr6, r0, r10 + + clrlwi r7, r6, 32 - LG_CACHELINE_BYTES + add r8, r7, r11 + srwir9, r8, LG_CACHELINE_BYTES + addic. r9, r9, -1 /* total number of complete cachelines */ + ble 2f + xorir0, r7, CACHELINE_MASK & ~3 + srwi. r0, r0, 2 + beq 3f + mtctr r0 +4: stwur3, 4(r6) + bdnz4b +3: mtctr r9 + li r7, 4 +10:dcbzr7, r6 + addir6, r6, CACHELINE_BYTES + bdnz10b + clrlwi r11, r8, 32 - LG_CACHELINE_BYTES + addir11, r11, 4 + +2: srwir0 ,r11 ,2 mtctr r0 - bdz 7f -1: stwur5,4(r6) + bdz 6f +1: stwur3, 4(r6) bdnz1b - /* clear byte sized chunks */ -7: cmpwi 0,r4,0 +6: andi. r11, r11, 3 beqlr - mtctr r4 - addir6,r6,3 -8: stbur5,1(r6) + mtctr r11 + addir6, r6, 3 +8: stbur3, 1(r6) bdnz8b blr -90:mr r3,r4 + +7: cmpwi cr0, r4, 0 + beqlr + mtctr r4 + addir6, r10, -1 +9: stbur3, 1(r6) + bdnz9b blr -91:mfctr r3 - slwir3,r3,2 - add r3,r3,r4 + +90:mr r3, r4 blr -92:mfctr r3 +91:add r3, r10, r4 + subfr3, r6, r3 blr EX_TABLE(11b, 90b) + EX_TABLE(4b, 91b) + EX_TABLE(10b, 91b) EX_TABLE(1b, 91b) - EX_TABLE(8b, 92b) + EX_TABLE(8b, 91b) + EX_TABLE(9b, 91b) EXPORT_SYMBOL(__clear_user) -- 2.13.3
[PATCH v5 1/3] powerpc/lib: move PPC32 specific functions out of string.S
In preparation of optimisation patches, move PPC32 specific memcmp() and __clear_user() into string_32.S Signed-off-by: Christophe Leroy--- arch/powerpc/lib/Makefile| 5 +-- arch/powerpc/lib/string.S| 61 - arch/powerpc/lib/string_32.S | 72 3 files changed, 75 insertions(+), 63 deletions(-) create mode 100644 arch/powerpc/lib/string_32.S diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile index 653901042ad7..2c9b8c0adf22 100644 --- a/arch/powerpc/lib/Makefile +++ b/arch/powerpc/lib/Makefile @@ -26,13 +26,14 @@ obj-$(CONFIG_PPC_BOOK3S_64) += copyuser_power7.o copypage_power7.o \ memcpy_power7.o obj64-y+= copypage_64.o copyuser_64.o mem_64.o hweight_64.o \ - string_64.o memcpy_64.o memcmp_64.o pmem.o + memcpy_64.o memcmp_64.o pmem.o obj64-$(CONFIG_SMP)+= locks.o obj64-$(CONFIG_ALTIVEC)+= vmx-helper.o obj64-$(CONFIG_KPROBES_SANITY_TEST) += test_emulate_step.o -obj-y += checksum_$(BITS).o checksum_wrappers.o +obj-y += checksum_$(BITS).o checksum_wrappers.o \ + string_$(BITS).o obj-y += sstep.o ldstfp.o quad.o obj64-y+= quad.o diff --git a/arch/powerpc/lib/string.S b/arch/powerpc/lib/string.S index 0378def28d41..b03c1a6861f4 100644 --- a/arch/powerpc/lib/string.S +++ b/arch/powerpc/lib/string.S @@ -56,23 +56,6 @@ _GLOBAL(strncmp) blr EXPORT_SYMBOL(strncmp) -#ifdef CONFIG_PPC32 -_GLOBAL(memcmp) - PPC_LCMPI 0,r5,0 - beq-2f - mtctr r5 - addir6,r3,-1 - addir4,r4,-1 -1: lbzur3,1(r6) - lbzur0,1(r4) - subf. r3,r0,r3 - bdnzt 2,1b - blr -2: li r3,0 - blr -EXPORT_SYMBOL(memcmp) -#endif - _GLOBAL(memchr) PPC_LCMPI 0,r5,0 beq-2f @@ -86,47 +69,3 @@ _GLOBAL(memchr) 2: li r3,0 blr EXPORT_SYMBOL(memchr) - -#ifdef CONFIG_PPC32 -_GLOBAL(__clear_user) - addir6,r3,-4 - li r3,0 - li r5,0 - cmplwi 0,r4,4 - blt 7f - /* clear a single word */ -11:stwur5,4(r6) - beqlr - /* clear word sized chunks */ - andi. r0,r6,3 - add r4,r0,r4 - subfr6,r0,r6 - srwir0,r4,2 - andi. r4,r4,3 - mtctr r0 - bdz 7f -1: stwur5,4(r6) - bdnz1b - /* clear byte sized chunks */ -7: cmpwi 0,r4,0 - beqlr - mtctr r4 - addir6,r6,3 -8: stbur5,1(r6) - bdnz8b - blr -90:mr r3,r4 - blr -91:mfctr r3 - slwir3,r3,2 - add r3,r3,r4 - blr -92:mfctr r3 - blr - - EX_TABLE(11b, 90b) - EX_TABLE(1b, 91b) - EX_TABLE(8b, 92b) - -EXPORT_SYMBOL(__clear_user) -#endif diff --git a/arch/powerpc/lib/string_32.S b/arch/powerpc/lib/string_32.S new file mode 100644 index ..204db8a834fd --- /dev/null +++ b/arch/powerpc/lib/string_32.S @@ -0,0 +1,72 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * String handling functions for PowerPC32 + * + * Copyright (C) 1996 Paul Mackerras. + * + */ + +#include +#include +#include +#include + + .text + +_GLOBAL(memcmp) + cmpwi cr0, r5, 0 + beq-2f + mtctr r5 + addir6,r3,-1 + addir4,r4,-1 +1: lbzur3,1(r6) + lbzur0,1(r4) + subf. r3,r0,r3 + bdnzt 2,1b + blr +2: li r3,0 + blr +EXPORT_SYMBOL(memcmp) + +_GLOBAL(__clear_user) + addir6,r3,-4 + li r3,0 + li r5,0 + cmplwi 0,r4,4 + blt 7f + /* clear a single word */ +11:stwur5,4(r6) + beqlr + /* clear word sized chunks */ + andi. r0,r6,3 + add r4,r0,r4 + subfr6,r0,r6 + srwir0,r4,2 + andi. r4,r4,3 + mtctr r0 + bdz 7f +1: stwur5,4(r6) + bdnz1b + /* clear byte sized chunks */ +7: cmpwi 0,r4,0 + beqlr + mtctr r4 + addir6,r6,3 +8: stbur5,1(r6) + bdnz8b + blr +90:mr r3,r4 + blr +91:mfctr r3 + slwir3,r3,2 + add r3,r3,r4 + blr +92:mfctr r3 + blr + + EX_TABLE(11b, 90b) + EX_TABLE(1b, 91b) + EX_TABLE(8b, 92b) + +EXPORT_SYMBOL(__clear_user) -- 2.13.3
[PATCH v5 0/3] powerpc/lib: Optimisation of memcmp() and __clear_user() for PPC32
This serie intends to optimise memcmp() and __clear_user() for PPC32 in the same spirit as already done on PPC64. The first patch moves PPC32 specific functions from string.S into a dedicated file named string_32.S The second patch rewrites __clear_user() by using dcbz intruction The third patch rewrites memcmp() to compare 32 bits words instead of comparing byte per byte. As shown in each individual commit log, second and third patches provide significant improvment. Changes in v5: - Removed the handling of Little Endian, as PPC32 kernel only support Big Endian at the time being, and because unaligned accesses should be handled differently in LE. Changes in v4: - In memcmp(), dropped the special handling for when length is 0. Handling it through the small length path. Changes in v3: - Fixed the sign of the result returned by memcmp() by using a logical comparison of u32 words and returning -1, 0 or 1 instead of doing a substract. - In first patch, replaced PPC_LCMPI by cmpwi - Fixed licence in string_32.S - Removed the two last patches from the serie. They will be handled later as they require further tests and analysis to properly identify their real benefit in all possible cases. Changes in v2: - Moved out the patch removing the hot loop alignment on PPC32 - Squashed the changes related to NUL size verification in a single patch - Reordered the patches in a more logical order - Modified the inlining patch to avoid warning about impossibility to version symbols. Christophe Leroy (3): powerpc/lib: move PPC32 specific functions out of string.S powerpc/lib: optimise 32 bits __clear_user() powerpc/lib: optimise PPC32 memcmp arch/powerpc/lib/Makefile| 5 +- arch/powerpc/lib/string.S| 61 - arch/powerpc/lib/string_32.S | 124 +++ 3 files changed, 127 insertions(+), 63 deletions(-) create mode 100644 arch/powerpc/lib/string_32.S -- 2.13.3
Re: [PATCH] powerpc/mm/hugetlb: Use the correct page size when flushing hugepage tlb
On Tue, 22 May 2018 14:42:09 +0530 "Aneesh Kumar K.V"wrote: > We used wrong page size (mmu_virtual_psize) when doing a tlbflush after > pte update. This patch update the flush to use hugetlb page size. > The page size is derived from hugetlb hstate. > > Now that ptep_set_access_flags won't be called for hugetlb remove > the is_vm_hugetlb_page() check and add the assert of pte lock > unconditionally. radix__flush_tlb_page seems to check for a hugepage vma and flush the right page size in that case. So is there a case where the flush size actually goes wrong? I do think if we could make hugetlb flushing always use flush_hugetlb_page version and take the test out of flush_tlb_page I think that would be nicer code (I haven't checked if that's feasible after this patch), but that wouldn't be a bug-fix. Thanks, Nick
Re: [PATCH 2/2] powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang
On Fri, 25 May 2018 21:19:17 +0530 "Aneesh Kumar K.V"wrote: > When relaxing access (read -> read_write update), pte need to be marked > invalid > to handle a nest MMU bug. We also need to do a tlb flush after the pte is > marked invalid before updating the pte with new access bits. > > We also move tlb flush to platform specific __ptep_set_access_flags. This will > help us to gerid of unnecessary tlb flush on BOOK3S 64 later. We don't do that > in this patch. This also helps in avoiding multiple tlbies with coprocessor > attached. > > Signed-off-by: Aneesh Kumar K.V I think these look good to me, but same comment: can you split the API change from the NMMU fix? The fix is just a couple of lines, and the rest of the API change should leave generated code almost unchanged, so it would be easier to review if these are split. Thanks, Nick
Re: [PATCH v11 00/26] Speculative page faults
On 28/05/2018 10:22, Haiyan Song wrote: > Hi Laurent, > > Yes, these tests are done on V9 patch. Do you plan to give this V11 a run ? > > > Best regards, > Haiyan Song > > On Mon, May 28, 2018 at 09:51:34AM +0200, Laurent Dufour wrote: >> On 28/05/2018 07:23, Song, HaiyanX wrote: >>> >>> Some regression and improvements is found by LKP-tools(linux kernel >>> performance) on V9 patch series >>> tested on Intel 4s Skylake platform. >> >> Hi, >> >> Thanks for reporting this benchmark results, but you mentioned the "V9 patch >> series" while responding to the v11 header series... >> Were these tests done on v9 or v11 ? >> >> Cheers, >> Laurent. >> >>> >>> The regression result is sorted by the metric will-it-scale.per_thread_ops. >>> Branch: Laurent-Dufour/Speculative-page-faults/20180316-151833 (V9 patch >>> series) >>> Commit id: >>> base commit: d55f34411b1b126429a823d06c3124c16283231f >>> head commit: 0355322b3577eeab7669066df42c550a56801110 >>> Benchmark suite: will-it-scale >>> Download link: >>> https://github.com/antonblanchard/will-it-scale/tree/master/tests >>> Metrics: >>> will-it-scale.per_process_ops=processes/nr_cpu >>> will-it-scale.per_thread_ops=threads/nr_cpu >>> test box: lkp-skl-4sp1(nr_cpu=192,memory=768G) >>> THP: enable / disable >>> nr_task: 100% >>> >>> 1. Regressions: >>> a) THP enabled: >>> testcasebasechange head >>> metric >>> page_fault3/ enable THP 10092 -17.5% 8323 >>> will-it-scale.per_thread_ops >>> page_fault2/ enable THP 8300 -17.2% 6869 >>> will-it-scale.per_thread_ops >>> brk1/ enable THP 957.67 -7.6% 885 >>> will-it-scale.per_thread_ops >>> page_fault3/ enable THP172821-5.3%163692 >>> will-it-scale.per_process_ops >>> signal1/ enable THP 9125-3.2% 8834 >>> will-it-scale.per_process_ops >>> >>> b) THP disabled: >>> testcasebasechange head >>> metric >>> page_fault3/ disable THP10107 -19.1% 8180 >>> will-it-scale.per_thread_ops >>> page_fault2/ disable THP 8432 -17.8% 6931 >>> will-it-scale.per_thread_ops >>> context_switch1/ disable THP 215389-6.8%200776 >>> will-it-scale.per_thread_ops >>> brk1/ disable THP 939.67 -6.6% 877.33 >>> will-it-scale.per_thread_ops >>> page_fault3/ disable THP 173145-4.7%165064 >>> will-it-scale.per_process_ops >>> signal1/ disable THP 9162-3.9% 8802 >>> will-it-scale.per_process_ops >>> >>> 2. Improvements: >>> a) THP enabled: >>> testcasebasechange head >>> metric >>> malloc1/ enable THP 66.33+469.8% 383.67 >>> will-it-scale.per_thread_ops >>> writeseek3/ enable THP 2531 +4.5% 2646 >>> will-it-scale.per_thread_ops >>> signal1/ enable THP 989.33 +2.8% 1016 >>> will-it-scale.per_thread_ops >>> >>> b) THP disabled: >>> testcasebasechange head >>> metric >>> malloc1/ disable THP 90.33+417.3% 467.33 >>> will-it-scale.per_thread_ops >>> read2/ disable THP 58934+39.2% 82060 >>> will-it-scale.per_thread_ops >>> page_fault1/ disable THP8607+36.4% 11736 >>> will-it-scale.per_thread_ops >>> read1/ disable THP314063+12.7%353934 >>> will-it-scale.per_thread_ops >>> writeseek3/ disable THP 2452+12.5% 2759 >>> will-it-scale.per_thread_ops >>> signal1/ disable THP 971.33 +5.5% 1024 >>> will-it-scale.per_thread_ops >>> >>> Notes: for above values in column "change", the higher value means that the >>> related testcase result >>> on head commit is better than that on base commit for this benchmark. >>> >>> >>> Best regards >>> Haiyan Song >>> >>> >>> From: owner-linux...@kvack.org [owner-linux...@kvack.org] on behalf of >>> Laurent Dufour [lduf...@linux.vnet.ibm.com] >>> Sent: Thursday, May 17, 2018 7:06 PM >>> To: a...@linux-foundation.org; mho...@kernel.org; pet...@infradead.org; >>> kir...@shutemov.name; a...@linux.intel.com; d...@stgolabs.net; >>> j...@suse.cz; Matthew Wilcox; khand...@linux.vnet.ibm.com; >>> aneesh.ku...@linux.vnet.ibm.com; b...@kernel.crashing.org; >>> m...@ellerman.id.au; pau...@samba.org; Thomas Gleixner; Ingo Molnar; >>> h...@zytor.com; Will Deacon; Sergey Senozhatsky; >>> sergey.senozhatsky.w...@gmail.com; Andrea Arcangeli; Alexei Starovoitov;
Re: [PATCH v11 00/26] Speculative page faults
Hi Laurent, Yes, these tests are done on V9 patch. Best regards, Haiyan Song On Mon, May 28, 2018 at 09:51:34AM +0200, Laurent Dufour wrote: > On 28/05/2018 07:23, Song, HaiyanX wrote: > > > > Some regression and improvements is found by LKP-tools(linux kernel > > performance) on V9 patch series > > tested on Intel 4s Skylake platform. > > Hi, > > Thanks for reporting this benchmark results, but you mentioned the "V9 patch > series" while responding to the v11 header series... > Were these tests done on v9 or v11 ? > > Cheers, > Laurent. > > > > > The regression result is sorted by the metric will-it-scale.per_thread_ops. > > Branch: Laurent-Dufour/Speculative-page-faults/20180316-151833 (V9 patch > > series) > > Commit id: > > base commit: d55f34411b1b126429a823d06c3124c16283231f > > head commit: 0355322b3577eeab7669066df42c550a56801110 > > Benchmark suite: will-it-scale > > Download link: > > https://github.com/antonblanchard/will-it-scale/tree/master/tests > > Metrics: > > will-it-scale.per_process_ops=processes/nr_cpu > > will-it-scale.per_thread_ops=threads/nr_cpu > > test box: lkp-skl-4sp1(nr_cpu=192,memory=768G) > > THP: enable / disable > > nr_task: 100% > > > > 1. Regressions: > > a) THP enabled: > > testcasebasechange head > > metric > > page_fault3/ enable THP 10092 -17.5% 8323 > > will-it-scale.per_thread_ops > > page_fault2/ enable THP 8300 -17.2% 6869 > > will-it-scale.per_thread_ops > > brk1/ enable THP 957.67 -7.6% 885 > > will-it-scale.per_thread_ops > > page_fault3/ enable THP172821-5.3%163692 > > will-it-scale.per_process_ops > > signal1/ enable THP 9125-3.2% 8834 > > will-it-scale.per_process_ops > > > > b) THP disabled: > > testcasebasechange head > > metric > > page_fault3/ disable THP10107 -19.1% 8180 > > will-it-scale.per_thread_ops > > page_fault2/ disable THP 8432 -17.8% 6931 > > will-it-scale.per_thread_ops > > context_switch1/ disable THP 215389-6.8%200776 > > will-it-scale.per_thread_ops > > brk1/ disable THP 939.67 -6.6% 877.33 > > will-it-scale.per_thread_ops > > page_fault3/ disable THP 173145-4.7%165064 > > will-it-scale.per_process_ops > > signal1/ disable THP 9162-3.9% 8802 > > will-it-scale.per_process_ops > > > > 2. Improvements: > > a) THP enabled: > > testcasebasechange head > > metric > > malloc1/ enable THP 66.33+469.8% 383.67 > > will-it-scale.per_thread_ops > > writeseek3/ enable THP 2531 +4.5% 2646 > > will-it-scale.per_thread_ops > > signal1/ enable THP 989.33 +2.8% 1016 > > will-it-scale.per_thread_ops > > > > b) THP disabled: > > testcasebasechange head > > metric > > malloc1/ disable THP 90.33+417.3% 467.33 > > will-it-scale.per_thread_ops > > read2/ disable THP 58934+39.2% 82060 > > will-it-scale.per_thread_ops > > page_fault1/ disable THP8607+36.4% 11736 > > will-it-scale.per_thread_ops > > read1/ disable THP314063+12.7%353934 > > will-it-scale.per_thread_ops > > writeseek3/ disable THP 2452+12.5% 2759 > > will-it-scale.per_thread_ops > > signal1/ disable THP 971.33 +5.5% 1024 > > will-it-scale.per_thread_ops > > > > Notes: for above values in column "change", the higher value means that the > > related testcase result > > on head commit is better than that on base commit for this benchmark. > > > > > > Best regards > > Haiyan Song > > > > > > From: owner-linux...@kvack.org [owner-linux...@kvack.org] on behalf of > > Laurent Dufour [lduf...@linux.vnet.ibm.com] > > Sent: Thursday, May 17, 2018 7:06 PM > > To: a...@linux-foundation.org; mho...@kernel.org; pet...@infradead.org; > > kir...@shutemov.name; a...@linux.intel.com; d...@stgolabs.net; > > j...@suse.cz; Matthew Wilcox; khand...@linux.vnet.ibm.com; > > aneesh.ku...@linux.vnet.ibm.com; b...@kernel.crashing.org; > > m...@ellerman.id.au; pau...@samba.org; Thomas Gleixner; Ingo Molnar; > > h...@zytor.com; Will Deacon; Sergey Senozhatsky; > > sergey.senozhatsky.w...@gmail.com; Andrea Arcangeli; Alexei Starovoitov; > > Wang, Kemi; Daniel Jordan; David Rientjes; Jerome Glisse; Ganesh Mahendran; > > Minchan
Re: [PATCH v11 00/26] Speculative page faults
On 28/05/2018 07:23, Song, HaiyanX wrote: > > Some regression and improvements is found by LKP-tools(linux kernel > performance) on V9 patch series > tested on Intel 4s Skylake platform. Hi, Thanks for reporting this benchmark results, but you mentioned the "V9 patch series" while responding to the v11 header series... Were these tests done on v9 or v11 ? Cheers, Laurent. > > The regression result is sorted by the metric will-it-scale.per_thread_ops. > Branch: Laurent-Dufour/Speculative-page-faults/20180316-151833 (V9 patch > series) > Commit id: > base commit: d55f34411b1b126429a823d06c3124c16283231f > head commit: 0355322b3577eeab7669066df42c550a56801110 > Benchmark suite: will-it-scale > Download link: > https://github.com/antonblanchard/will-it-scale/tree/master/tests > Metrics: > will-it-scale.per_process_ops=processes/nr_cpu > will-it-scale.per_thread_ops=threads/nr_cpu > test box: lkp-skl-4sp1(nr_cpu=192,memory=768G) > THP: enable / disable > nr_task: 100% > > 1. Regressions: > a) THP enabled: > testcasebasechange head > metric > page_fault3/ enable THP 10092 -17.5% 8323 > will-it-scale.per_thread_ops > page_fault2/ enable THP 8300 -17.2% 6869 > will-it-scale.per_thread_ops > brk1/ enable THP 957.67 -7.6% 885 > will-it-scale.per_thread_ops > page_fault3/ enable THP172821-5.3%163692 > will-it-scale.per_process_ops > signal1/ enable THP 9125-3.2% 8834 > will-it-scale.per_process_ops > > b) THP disabled: > testcasebasechange head > metric > page_fault3/ disable THP10107 -19.1% 8180 > will-it-scale.per_thread_ops > page_fault2/ disable THP 8432 -17.8% 6931 > will-it-scale.per_thread_ops > context_switch1/ disable THP 215389-6.8%200776 > will-it-scale.per_thread_ops > brk1/ disable THP 939.67 -6.6% 877.33 > will-it-scale.per_thread_ops > page_fault3/ disable THP 173145-4.7%165064 > will-it-scale.per_process_ops > signal1/ disable THP 9162-3.9% 8802 > will-it-scale.per_process_ops > > 2. Improvements: > a) THP enabled: > testcasebasechange head > metric > malloc1/ enable THP 66.33+469.8% 383.67 > will-it-scale.per_thread_ops > writeseek3/ enable THP 2531 +4.5% 2646 > will-it-scale.per_thread_ops > signal1/ enable THP 989.33 +2.8% 1016 > will-it-scale.per_thread_ops > > b) THP disabled: > testcasebasechange head > metric > malloc1/ disable THP 90.33+417.3% 467.33 > will-it-scale.per_thread_ops > read2/ disable THP 58934+39.2% 82060 > will-it-scale.per_thread_ops > page_fault1/ disable THP8607+36.4% 11736 > will-it-scale.per_thread_ops > read1/ disable THP314063+12.7%353934 > will-it-scale.per_thread_ops > writeseek3/ disable THP 2452+12.5% 2759 > will-it-scale.per_thread_ops > signal1/ disable THP 971.33 +5.5% 1024 > will-it-scale.per_thread_ops > > Notes: for above values in column "change", the higher value means that the > related testcase result > on head commit is better than that on base commit for this benchmark. > > > Best regards > Haiyan Song > > > From: owner-linux...@kvack.org [owner-linux...@kvack.org] on behalf of > Laurent Dufour [lduf...@linux.vnet.ibm.com] > Sent: Thursday, May 17, 2018 7:06 PM > To: a...@linux-foundation.org; mho...@kernel.org; pet...@infradead.org; > kir...@shutemov.name; a...@linux.intel.com; d...@stgolabs.net; j...@suse.cz; > Matthew Wilcox; khand...@linux.vnet.ibm.com; aneesh.ku...@linux.vnet.ibm.com; > b...@kernel.crashing.org; m...@ellerman.id.au; pau...@samba.org; Thomas > Gleixner; Ingo Molnar; h...@zytor.com; Will Deacon; Sergey Senozhatsky; > sergey.senozhatsky.w...@gmail.com; Andrea Arcangeli; Alexei Starovoitov; > Wang, Kemi; Daniel Jordan; David Rientjes; Jerome Glisse; Ganesh Mahendran; > Minchan Kim; Punit Agrawal; vinayak menon; Yang Shi > Cc: linux-ker...@vger.kernel.org; linux...@kvack.org; > ha...@linux.vnet.ibm.com; npig...@gmail.com; bsinghar...@gmail.com; > paul...@linux.vnet.ibm.com; Tim Chen; linuxppc-dev@lists.ozlabs.org; > x...@kernel.org > Subject: [PATCH v11 00/26] Speculative page faults > > This is a port on kernel 4.17 of the work done by
Re: [PATCH 07/14] powerpc: Add support for restartable sequences
- On May 24, 2018, at 3:03 AM, Michael Ellerman m...@ellerman.id.au wrote: > Mathieu Desnoyerswrites: >> - On May 23, 2018, at 4:14 PM, Mathieu Desnoyers >> mathieu.desnoy...@efficios.com wrote: > ... >>> >>> Hi Boqun, >>> >>> I tried your patch in a ppc64 le environment, and it does not survive boot >>> with CONFIG_DEBUG_RSEQ=y. init gets killed right away. > > > Sorry this code is super gross and hard to deal with. > >> The following fixup gets ppc64 to work: >> >> --- a/arch/powerpc/kernel/entry_64.S >> +++ b/arch/powerpc/kernel/entry_64.S >> @@ -208,6 +208,7 @@ system_call_exit: >> /* Check whether the syscall is issued inside a restartable sequence >> */ >> addir3,r1,STACK_FRAME_OVERHEAD >> bl rseq_syscall >> + ld r3,RESULT(r1) >> #endif >> /* >> * Disable interrupts so current_thread_info()->flags can't change, > > I don't think that's safe. > > If you look above that, we have r3, r8 and r12 all live: > > .Lsyscall_exit: > std r3,RESULT(r1) > CURRENT_THREAD_INFO(r12, r1) > > ld r8,_MSR(r1) > #ifdef CONFIG_PPC_BOOK3S > /* No MSR:RI on BookE */ > andi. r10,r8,MSR_RI > beq-.Lunrecov_restore > #endif > > > They're all volatile across function calls: > > > http://openpowerfoundation.org/wp-content/uploads/resources/leabi/content/dbdoclet.50655240_68174.html > > > The system_call_exit symbol is actually there for kprobes and cosmetic > purposes. The actual syscall return flow starts at .Lsyscall_exit. > > So I think this would work: > > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > index db4df061c33a..e19f377a25e0 100644 > --- a/arch/powerpc/kernel/entry_64.S > +++ b/arch/powerpc/kernel/entry_64.S > @@ -184,6 +184,14 @@ system_call: /* label this so stack > traces look sane */ > > .Lsyscall_exit: > std r3,RESULT(r1) > + > +#ifdef CONFIG_DEBUG_RSEQ > + /* Check whether the syscall is issued inside a restartable sequence */ > + addir3,r1,STACK_FRAME_OVERHEAD > + bl rseq_syscall > + ld r3,RESULT(r1) > +#endif > + > CURRENT_THREAD_INFO(r12, r1) > > ld r8,_MSR(r1) > > > I'll try and get this series into my test setup at some point, been a > bit busy lately :) Yes, this was needed. I had this in my tree already, but there is still a kernel OOPS when running the rseq selftests on ppc64 with CONFIG_DEBUG_RSEQ=y. My current dev tree is at: https://github.com/compudj/linux-percpu-dev/tree/rseq/dev-local So considering we are at rc7 now, should I plan to removing the powerpc bits for merge window submission, or is there someone planning to spend time on fixing and testing ppc integration before the merge window opens ? Thanks, Mathieu > > cheers -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
[PATCH] powerpc/Makefile: set -mcpu=860 flag for the 8xx
When compiled with GCC 8.1, vmlinux is significantly bigger than with GCC 4.8. When looking at the generated code with objdump, we notice that all functions and loops when a 16 bytes alignment. This significantly increases the size of the kernel. It is pointless and even counterproductive as on the 8xx 'nop' also consumes one clock cycle. Size of vmlinux with GCC 4.8: textdata bss dec hex filename 5801948 1626076 457796 7885820 7853fc vmlinux Size of vmlinux with GCC 8.1: textdata bss dec hex filename 6764592 1630652 456476 8851720 871108 vmlinux Size of vmlinux with GCC 8.1 and this patch: textdata bss dec hex filename 6331544 1631756 456476 8419776 8079c0 vmlinux Signed-off-by: Christophe Leroy--- arch/powerpc/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 6a050b42a9f2..26ccd0b91512 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -178,6 +178,7 @@ CFLAGS-$(CONFIG_POWER6_CPU) += $(call cc-option,-mcpu=power6) CFLAGS-$(CONFIG_POWER7_CPU) += $(call cc-option,-mcpu=power7) CFLAGS-$(CONFIG_POWER8_CPU) += $(call cc-option,-mcpu=power8) CFLAGS-$(CONFIG_POWER9_CPU) += $(call cc-option,-mcpu=power9) +CFLAGS-$(CONFIG_PPC_8xx) += $(call cc-option,-mcpu=860) # Altivec option not allowed with e500mc64 in GCC. ifeq ($(CONFIG_ALTIVEC),y) -- 2.13.3