date:20180528

Re: [PATCH] powerpc/mm/hugetlb: Use the correct page size when flushing hugepage tlb

2018-05-28 Thread Aneesh Kumar K.V


On 05/29/2018 07:39 AM, Nicholas Piggin wrote:

On Tue, 29 May 2018 11:33:56 +1000
Michael Ellerman  wrote:


"Aneesh Kumar K.V"  writes:


We used wrong page size (mmu_virtual_psize) when doing a tlbflush after
pte update. This patch update the flush to use hugetlb page size.
The page size is derived from hugetlb hstate.


This sounds bad. Or is it not for some reason?


It's not all that bad because the flush is mostly superfluous (one of
my tlbie optimisation patches gets rid of it except for accelerators).



Either way a Fixes tag would be nice. Maybe:

   Fixes: b3603e174fc8 ("powerpc/mm: update radix__ptep_set_access_flag to not do 
full mm tlb flush")

I think this is only a problem on Radix, but the change log doesn't say.


huge_ptep_set_access_flags->ptep_set_access_flags->flush_tlb_page->

void radix__flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
{
#ifdef CONFIG_HUGETLB_PAGE
 if (is_vm_hugetlb_page(vma))
 return radix__flush_hugetlb_page(vma, vmaddr);
#endif
 radix__flush_tlb_page_psize(vma->vm_mm, vmaddr, mmu_virtual_psize);
}

So I'm still not sure how the size is going wrong here. What am I
missig?



My mistake, I didn't look at the radix expansion. I was looking at 
huge_ptep_clear_flush() where we use flush_hugetlb_page().


-aneesh

Re: [PATCH] powerpc/mm/hugetlb: Use the correct page size when flushing hugepage tlb

2018-05-28 Thread Nicholas Piggin

On Tue, 29 May 2018 11:33:56 +1000
Michael Ellerman  wrote:

> "Aneesh Kumar K.V"  writes:
> 
> > We used wrong page size (mmu_virtual_psize) when doing a tlbflush after
> > pte update. This patch update the flush to use hugetlb page size.
> > The page size is derived from hugetlb hstate.  
> 
> This sounds bad. Or is it not for some reason?

It's not all that bad because the flush is mostly superfluous (one of
my tlbie optimisation patches gets rid of it except for accelerators).

> 
> Either way a Fixes tag would be nice. Maybe:
> 
>   Fixes: b3603e174fc8 ("powerpc/mm: update radix__ptep_set_access_flag to not 
> do full mm tlb flush")
> 
> I think this is only a problem on Radix, but the change log doesn't say.

huge_ptep_set_access_flags->ptep_set_access_flags->flush_tlb_page->

void radix__flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
{
#ifdef CONFIG_HUGETLB_PAGE
if (is_vm_hugetlb_page(vma))
return radix__flush_hugetlb_page(vma, vmaddr);
#endif
radix__flush_tlb_page_psize(vma->vm_mm, vmaddr, mmu_virtual_psize);
}

So I'm still not sure how the size is going wrong here. What am I
missig?

Thanks,
Nick

Re: [PATCH-RESEND] cxl: Disable prefault_mode in Radix mode

2018-05-28 Thread Vaibhav Jain

Michael Ellerman  writes:

> Vaibhav Jain  writes:
>
>> From: Vaibhav Jain 
>>
>> Currently we see a kernel-oops reported on Power-9 while attaching a
>> context to an AFU, with radix-mode and sysfs attr 'prefault_mode' set
>> to anything other than 'none'. The backtrace of the oops is of this
>> form:
>>
>> Unable to handle kernel paging request for data at address 0x0080
>> Faulting instruction address: 0xc0080bcf3b20
>> cpu 0x1: Vector: 300 (Data Access) at [c0037f003800]
>> pc: c0080bcf3b20: cxl_load_segment+0x178/0x290 [cxl]
>> lr: c0080bcf39f0: cxl_load_segment+0x48/0x290 [cxl]
>> sp: c0037f003a80
>>msr: 90009033
>>dar: 80
>>  dsisr: 4000
>>   current = 0xc0037f28
>>   paca= 0xc003e600   softe: 3irq_happened: 0x01
>> pid   = 3529, comm = afp_no_int
>> 
>> [c0037f003af0] c0080bcf4424 cxl_prefault+0xfc/0x248 [cxl]
>> [c0037f003b50] c0080bcf8a40 process_element_entry_psl9+0xd8/0x1a0 
>> [cxl]
>> [c0037f003b90] c0080bcf944c 
>> cxl_attach_dedicated_process_psl9+0x44/0x130 [cxl]
>> [c0037f003bd0] c0080bcf5448 native_attach_process+0xc0/0x130 [cxl]
>> [c0037f003c50] c0080bcf16cc afu_ioctl+0x3f4/0x5e0 [cxl]
>> [c0037f003d00] c039d98c do_vfs_ioctl+0xdc/0x890
>> [c0037f003da0] c039e1a8 ksys_ioctl+0x68/0xf0
>> [c0037f003df0] c039e270 sys_ioctl+0x40/0xa0
>> [c0037f003e30] c000b320 system_call+0x58/0x6c
>> --- Exception: c01 (System Call) at 10053bb0
>   ^^^
> This tells patch/git-am to drop the rest of the change log, which is not
> what we want.
>
> I tend to indent stack traces etc with two spaces, which avoids the
> problem. Or in this case we can just drop the line as it's not really
> that informative.
>
> I've fixed it up.
>
> cheers
>

Sorry for missing that and thanks for fixing it in the patch.

-- 
Vaibhav Jain 
Linux Technology Center, IBM India Pvt. Ltd.

[PATCH] printk: make printk_safe_flush safe in NMI context

2018-05-28 Thread Hoeun Ryu

From: Hoeun Ryu 

 Make printk_safe_flush() safe in NMI context. And printk_safe_flush_on_panic() 
is
folded into this function. The prototype of printk_safe_flush() is changed to
"void printk_safe_flush(bool panic)".

 nmi_trigger_cpumask_backtrace() can be called in NMI context. For example the
function is called in watchdog_overflow_callback() if the flag of hardlockup
backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
watchdog_overflow_callback() function is called in NMI context on some
architectures.
 Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually 
tries to
lock logbuf_lock in vprintk_emit() but the logbuf_lock can be already locked in
preempted contexts (task or irq in this case) or by other CPUs and it may cause
deadlocks.
 By making printk_safe_flush() safe in NMI context, the backtrace triggering CPU
just skips flushing if the lock is not avaiable in NMI context. The messages in
per-cpu nmi buffer of the backtrace triggering CPU can be lost if the CPU is in
hard lockup (because irq is disabled here) but if panic() is not called. The
flushing can be delayed by the next irq work in normal cases.

Suggested-by: Sergey Senozhatsky 
Signed-off-by: Hoeun Ryu 
---
 arch/powerpc/kernel/traps.c|  2 +-
 arch/powerpc/kernel/watchdog.c |  2 +-
 include/linux/printk.h |  9 ++
 kernel/kexec_core.c|  2 +-
 kernel/panic.c |  4 +--
 kernel/printk/printk_safe.c| 62 +-
 lib/nmi_backtrace.c|  2 +-
 7 files changed, 39 insertions(+), 44 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 0904492..c50749c 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -160,7 +160,7 @@ extern void panic_flush_kmsg_start(void)
 
 extern void panic_flush_kmsg_end(void)
 {
-   printk_safe_flush_on_panic();
+   printk_safe_flush(true);
kmsg_dump(KMSG_DUMP_PANIC);
bust_spinlocks(0);
debug_locks_off();
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 6256dc3..3c9138b 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -173,7 +173,7 @@ static void watchdog_smp_panic(int cpu, u64 tb)
 
wd_smp_unlock();
 
-   printk_safe_flush();
+   printk_safe_flush(false);
/*
 * printk_safe_flush() seems to require another print
 * before anything actually goes out to console.
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 6d7e800..495fe26 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -203,8 +203,7 @@ void dump_stack_print_info(const char *log_lvl);
 void show_regs_print_info(const char *log_lvl);
 extern asmlinkage void dump_stack(void) __cold;
 extern void printk_safe_init(void);
-extern void printk_safe_flush(void);
-extern void printk_safe_flush_on_panic(void);
+extern void printk_safe_flush(bool panic);
 #else
 static inline __printf(1, 0)
 int vprintk(const char *s, va_list args)
@@ -273,11 +272,7 @@ static inline void printk_safe_init(void)
 {
 }
 
-static inline void printk_safe_flush(void)
-{
-}
-
-static inline void printk_safe_flush_on_panic(void)
+static inline void printk_safe_flush(bool panic)
 {
 }
 #endif
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 20fef1a..1b0876e 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -961,7 +961,7 @@ void crash_kexec(struct pt_regs *regs)
old_cpu = atomic_cmpxchg(_cpu, PANIC_CPU_INVALID, this_cpu);
if (old_cpu == PANIC_CPU_INVALID) {
/* This is the 1st CPU which comes here, so go ahead. */
-   printk_safe_flush_on_panic();
+   printk_safe_flush(true);
__crash_kexec(regs);
 
/*
diff --git a/kernel/panic.c b/kernel/panic.c
index 42e4874..2f2c86c 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -193,7 +193,7 @@ void panic(const char *fmt, ...)
 * Bypass the panic_cpu check and call __crash_kexec directly.
 */
if (!_crash_kexec_post_notifiers) {
-   printk_safe_flush_on_panic();
+   printk_safe_flush(true);
__crash_kexec(NULL);
 
/*
@@ -218,7 +218,7 @@ void panic(const char *fmt, ...)
atomic_notifier_call_chain(_notifier_list, 0, buf);
 
/* Call flush even twice. It tries harder with a single online CPU */
-   printk_safe_flush_on_panic();
+   printk_safe_flush(true);
kmsg_dump(KMSG_DUMP_PANIC);
 
/*
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 3e3c200..35ea941 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -244,49 +244,49 @@ static void __printk_safe_flush(struct irq_work *work)
 }
 
 /**
- * printk_safe_flush - flush all per-cpu nmi buffers.
+ * printk_safe_flush - flush all per-cpu nmi buffers. it can be called even in 
NMI
+ *

Re: [PATCH] cpuidle/powernv : Add Description for cpuidle state

2018-05-28 Thread Stewart Smith

Abhishek Goel  writes:
> @@ -215,7 +216,7 @@ static inline void add_powernv_state(int index, const 
> char *name,
>u64 psscr_val, u64 psscr_mask)
>  {
>   strlcpy(powernv_states[index].name, name, CPUIDLE_NAME_LEN);
> - strlcpy(powernv_states[index].desc, name, CPUIDLE_NAME_LEN);
> + strlcpy(powernv_states[index].desc, desc, CPUIDLE_DESC_LEN);

We should still fall back to using name in the event of desc being null,
as not all firmware will expose the descriptions.

> @@ -311,6 +313,11 @@ static int powernv_add_idle_states(void)
>   pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-names in 
> DT\n");
>   goto out;
>   }
> + if (of_property_read_string_array(power_mgt,
> + "ibm,cpu-idle-state-descs", descs, dt_idle_states) < 0) {
> + pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-descs in 
> DT\n");
> + goto out;
> + }

I don't think pr_warn is appropriate here, as for all current released
firmware we don't have that property. I think perhaps just silently
continuing on is okay, as we have to keep compatibility with that firmware.

> --- a/include/linux/cpuidle.h
> +++ b/include/linux/cpuidle.h
> @@ -17,7 +17,7 @@
>
>  #define CPUIDLE_STATE_MAX10
>  #define CPUIDLE_NAME_LEN 16
> -#define CPUIDLE_DESC_LEN 32
> +#define CPUIDLE_DESC_LEN 60

Do we really get that long?

-- 
Stewart Smith
OPAL Architect, IBM.

Re: [PATCH] powerpc/mm/hugetlb: Use the correct page size when flushing hugepage tlb

2018-05-28 Thread Michael Ellerman

"Aneesh Kumar K.V"  writes:

> We used wrong page size (mmu_virtual_psize) when doing a tlbflush after
> pte update. This patch update the flush to use hugetlb page size.
> The page size is derived from hugetlb hstate.

This sounds bad. Or is it not for some reason?

Either way a Fixes tag would be nice. Maybe:

  Fixes: b3603e174fc8 ("powerpc/mm: update radix__ptep_set_access_flag to not 
do full mm tlb flush")

I think this is only a problem on Radix, but the change log doesn't say.

cheers

> Now that ptep_set_access_flags won't be called for hugetlb remove
> the is_vm_hugetlb_page() check and add the assert of pte lock
> unconditionally.
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/include/asm/hugetlb.h | 19 +++--
>  arch/powerpc/mm/pgtable.c  | 33 --
>  2 files changed, 34 insertions(+), 18 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/hugetlb.h 
> b/arch/powerpc/include/asm/hugetlb.h
> index 78540c074d70..b4404a6da74f 100644
> --- a/arch/powerpc/include/asm/hugetlb.h
> +++ b/arch/powerpc/include/asm/hugetlb.h
> @@ -166,22 +166,9 @@ static inline pte_t huge_pte_wrprotect(pte_t pte)
>   return pte_wrprotect(pte);
>  }
>  
> -static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> -  unsigned long addr, pte_t *ptep,
> -  pte_t pte, int dirty)
> -{
> -#ifdef HUGETLB_NEED_PRELOAD
> - /*
> -  * The "return 1" forces a call of update_mmu_cache, which will write a
> -  * TLB entry.  Without this, platforms that don't do a write of the TLB
> -  * entry in the TLB miss handler asm will fault ad infinitum.
> -  */
> - ptep_set_access_flags(vma, addr, ptep, pte, dirty);
> - return 1;
> -#else
> - return ptep_set_access_flags(vma, addr, ptep, pte, dirty);
> -#endif
> -}
> +extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> +   unsigned long addr, pte_t *ptep,
> +   pte_t pte, int dirty);
>  
>  static inline pte_t huge_ptep_get(pte_t *ptep)
>  {
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index 9f361ae571e9..e70af9939379 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -221,14 +221,43 @@ int ptep_set_access_flags(struct vm_area_struct *vma, 
> unsigned long address,
>   entry = set_access_flags_filter(entry, vma, dirty);
>   changed = !pte_same(*(ptep), entry);
>   if (changed) {
> - if (!is_vm_hugetlb_page(vma))
> - assert_pte_locked(vma->vm_mm, address);
> + assert_pte_locked(vma->vm_mm, address);
>   __ptep_set_access_flags(vma->vm_mm, ptep, entry, address);
>   flush_tlb_page(vma, address);
>   }
>   return changed;
>  }
>  
> +#ifdef CONFIG_HUGETLB_PAGE
> +extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> +   unsigned long addr, pte_t *ptep,
> +   pte_t pte, int dirty)
> +{
> +#ifdef HUGETLB_NEED_PRELOAD
> + /*
> +  * The "return 1" forces a call of update_mmu_cache, which will write a
> +  * TLB entry.  Without this, platforms that don't do a write of the TLB
> +  * entry in the TLB miss handler asm will fault ad infinitum.
> +  */
> + ptep_set_access_flags(vma, addr, ptep, pte, dirty);
> + return 1;
> +#else
> + int changed;
> +
> + pte = set_access_flags_filter(pte, vma, dirty);
> + changed = !pte_same(*(ptep), pte);
> + if (changed) {
> +#ifdef CONFIG_DEBUG_VM
> + assert_spin_locked(>vm_mm->page_table_lock);
> +#endif
> + __ptep_set_access_flags(vma->vm_mm, ptep, pte, addr);
> + flush_hugetlb_page(vma, addr);
> + }
> + return changed;
> +#endif
> +}
> +#endif /* CONFIG_HUGETLB_PAGE */
> +
>  #ifdef CONFIG_DEBUG_VM
>  void assert_pte_locked(struct mm_struct *mm, unsigned long addr)
>  {
> -- 
> 2.17.0

Re: [PATCH-RESEND] cxl: Disable prefault_mode in Radix mode

2018-05-28 Thread Michael Ellerman

Vaibhav Jain  writes:

> From: Vaibhav Jain 
>
> Currently we see a kernel-oops reported on Power-9 while attaching a
> context to an AFU, with radix-mode and sysfs attr 'prefault_mode' set
> to anything other than 'none'. The backtrace of the oops is of this
> form:
>
> Unable to handle kernel paging request for data at address 0x0080
> Faulting instruction address: 0xc0080bcf3b20
> cpu 0x1: Vector: 300 (Data Access) at [c0037f003800]
> pc: c0080bcf3b20: cxl_load_segment+0x178/0x290 [cxl]
> lr: c0080bcf39f0: cxl_load_segment+0x48/0x290 [cxl]
> sp: c0037f003a80
>msr: 90009033
>dar: 80
>  dsisr: 4000
>   current = 0xc0037f28
>   paca= 0xc003e600   softe: 3irq_happened: 0x01
> pid   = 3529, comm = afp_no_int
> 
> [c0037f003af0] c0080bcf4424 cxl_prefault+0xfc/0x248 [cxl]
> [c0037f003b50] c0080bcf8a40 process_element_entry_psl9+0xd8/0x1a0 
> [cxl]
> [c0037f003b90] c0080bcf944c 
> cxl_attach_dedicated_process_psl9+0x44/0x130 [cxl]
> [c0037f003bd0] c0080bcf5448 native_attach_process+0xc0/0x130 [cxl]
> [c0037f003c50] c0080bcf16cc afu_ioctl+0x3f4/0x5e0 [cxl]
> [c0037f003d00] c039d98c do_vfs_ioctl+0xdc/0x890
> [c0037f003da0] c039e1a8 ksys_ioctl+0x68/0xf0
> [c0037f003df0] c039e270 sys_ioctl+0x40/0xa0
> [c0037f003e30] c000b320 system_call+0x58/0x6c
> --- Exception: c01 (System Call) at 10053bb0
  ^^^
This tells patch/git-am to drop the rest of the change log, which is not
what we want.

I tend to indent stack traces etc with two spaces, which avoids the
problem. Or in this case we can just drop the line as it's not really
that informative.

I've fixed it up.

cheers

Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices

2018-05-28 Thread Benjamin Herrenschmidt

On Tue, 2018-05-29 at 09:48 +1000, Benjamin Herrenschmidt wrote:
> > Well it's not supposed to be much slower for the static case.
> > 
> > vhost has a cache so should be fine.
> > 
> > A while ago Paolo implemented a translation cache which should be
> > perfect for this case - most of the code got merged but
> > never enabled because of stability issues.
> > 
> > If all else fails, we could teach QEMU to handle the no-iommu case
> > as if VIRTIO_F_IOMMU_PLATFORM was off.
> 
> Any serious reason why not just getting that 2 line patch allowing our
> arch code to force virtio to use the DMA API ?
> 
> It's not particularly invasive and solves our problem rather nicely
> without adding overhead or additional knowledge to qemu/libvirt/mgmnt
> tools etc... that it doesn't need etc
> 
> The guest knows it's going secure so the guest arch code can do the
> right thing rather trivially.
> 
> Long term we should probably make virtio always use the DMA API anyway,
> and interpose "1:1" dma_ops for the traditional virtio case, that would
> reduce code clutter significantly. In that case, it would become just a
> matter of having a platform hook to override the dma_ops used.

To elaborate a bit 

What we are trying to solve here is entirely a guest problem, I don't
think involving qemu in the solution is the right thing to do.

The guest can only allow external parties (qemu, potentially PCI
devices, etc...) access to some restricted portions of memory (insecure
memory). Thus the guest need to do some bounce buffering/swiotlb type
tricks.

This is completely orthogonal to whether there is an actual iommu
between the guest and the device (or emulated device/virtio).

This is why I think the solution should reside in the guest kernel, by
proper manipulation (by the arch code) of the dma ops.

I don't think forcing the addition of an emulated iommu in the middle
just to work around the fact that virtio "cheats" and doesn't use the
dma API unless there is one, is the right "fix".

The right long term fix is to always use the DMA API, reducing code
path etc... and just have a single point where virtio can "chose"
alternate DMA ops (via an arch hook to deal with our case).

In the meantime, having the hook we propose gets us going, but if you
agree with the approach, we should also work on the long term approach.

Cheers,
Ben.

Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices

2018-05-28 Thread Benjamin Herrenschmidt

On Fri, 2018-05-25 at 20:45 +0300, Michael S. Tsirkin wrote:
> On Thu, May 24, 2018 at 08:27:04AM +1000, Benjamin Herrenschmidt wrote:
> > On Wed, 2018-05-23 at 21:50 +0300, Michael S. Tsirkin wrote:
> > 
> > > I re-read that discussion and I'm still unclear on the
> > > original question, since I got several apparently
> > > conflicting answers.
> > > 
> > > I asked:
> > > 
> > >   Why isn't setting VIRTIO_F_IOMMU_PLATFORM on the
> > >   hypervisor side sufficient?
> > 
> > I thought I had replied to this...
> > 
> > There are a couple of reasons:
> > 
> > - First qemu doesn't know that the guest will switch to "secure mode"
> > in advance. There is no difference between a normal and a secure
> > partition until the partition does the magic UV call to "enter secure
> > mode" and qemu doesn't see any of it. So who can set the flag here ?
> 
> Not sure I understand. Just set the flag e.g. on qemu command line.
> I might be wrong, but these secure mode things usually
> a. require hypervisor side tricks anyway

The way our secure mode architecture is designed, there doesn't need at
this point to be any knowledge at qemu level whatsoever. Well at least
until we do migration but that's a different kettle of fish. In any
case, the guest starts normally (which means as a non-secure guest, and
thus expects normal virtio, our FW today doesn't handle
VIRTIO_F_IOMMU_PLATFORM, though granted, we can fix this), and later
that guest issues some special Ultravisor call that turns it into a
secure guest.

There is some involvement of the hypervisor, but not qemu at this
stage. We would very much like to avoid that, as it would be a hassle
for users to have to use different libvirt options etc... bcs the guest
might turn itself into a secure VM.

> > - Second, when using VIRTIO_F_IOMMU_PLATFORM, we also make qemu (or
> > vhost) go through the emulated MMIO for every access to the guest,
> > which adds additional overhead.
> > 
> > Cheers,
> > Ben.
> 
> Well it's not supposed to be much slower for the static case.
> 
> vhost has a cache so should be fine.
> 
> A while ago Paolo implemented a translation cache which should be
> perfect for this case - most of the code got merged but
> never enabled because of stability issues.
> 
> If all else fails, we could teach QEMU to handle the no-iommu case
> as if VIRTIO_F_IOMMU_PLATFORM was off.

Any serious reason why not just getting that 2 line patch allowing our
arch code to force virtio to use the DMA API ?

It's not particularly invasive and solves our problem rather nicely
without adding overhead or additional knowledge to qemu/libvirt/mgmnt
tools etc... that it doesn't need etc

The guest knows it's going secure so the guest arch code can do the
right thing rather trivially.

Long term we should probably make virtio always use the DMA API anyway,
and interpose "1:1" dma_ops for the traditional virtio case, that would
reduce code clutter significantly. In that case, it would become just a
matter of having a platform hook to override the dma_ops used.

Cheers,
Ben.

> 
> 
> > > 
> > > 
> > > >  arch/powerpc/include/asm/dma-mapping.h |  6 ++
> > > >  arch/powerpc/platforms/pseries/iommu.c | 11 +++
> > > >  drivers/virtio/virtio_ring.c   | 10 ++
> > > >  3 files changed, 27 insertions(+)
> > > > 
> > > > diff --git a/arch/powerpc/include/asm/dma-mapping.h 
> > > > b/arch/powerpc/include/asm/dma-mapping.h
> > > > index 8fa3945..056e578 100644
> > > > --- a/arch/powerpc/include/asm/dma-mapping.h
> > > > +++ b/arch/powerpc/include/asm/dma-mapping.h
> > > > @@ -115,4 +115,10 @@ extern u64 __dma_get_required_mask(struct device 
> > > > *dev);
> > > >  #define ARCH_HAS_DMA_MMAP_COHERENT
> > > >  
> > > >  #endif /* __KERNEL__ */
> > > > +
> > > > +#define platform_forces_virtio_dma platform_forces_virtio_dma
> > > > +
> > > > +struct virtio_device;
> > > > +
> > > > +extern bool platform_forces_virtio_dma(struct virtio_device *vdev);
> > > >  #endif /* _ASM_DMA_MAPPING_H */
> > > > diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> > > > b/arch/powerpc/platforms/pseries/iommu.c
> > > > index 06f0296..a2ec15a 100644
> > > > --- a/arch/powerpc/platforms/pseries/iommu.c
> > > > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > > > @@ -38,6 +38,7 @@
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > +#include 
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > @@ -1396,3 +1397,13 @@ static int __init disable_multitce(char *str)
> > > >  __setup("multitce=", disable_multitce);
> > > >  
> > > >  machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
> > > > +
> > > > +bool platform_forces_virtio_dma(struct virtio_device *vdev)
> > > > +{
> > > > +   /*
> > > > +* On protected guest platforms, force virtio core to use DMA
> > > > +* MAP API for all virtio devices. But there can also be some
> > > > +* exceptions for individual devices like virtio balloon.
> > > > +*/
> > > > +

[PATCH v2] selftests/powerpc: Add perf breakpoint test

2018-05-28 Thread Michael Neuling

This tests perf hardware breakpoints (ie PERF_TYPE_BREAKPOINT) on
powerpc.

Signed-off-by: Michael Neuling 
---
v2:
  - fix failure noticed by mpe
  - rewrite to explicitly test all options (rather than randomly)
---
 .../selftests/powerpc/ptrace/.gitignore   |   1 +
 .../testing/selftests/powerpc/ptrace/Makefile |   2 +-
 .../selftests/powerpc/ptrace/perf-hwbreak.c   | 195 ++
 3 files changed, 197 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c

diff --git a/tools/testing/selftests/powerpc/ptrace/.gitignore 
b/tools/testing/selftests/powerpc/ptrace/.gitignore
index 9dcc16ea81..07ec449a27 100644
--- a/tools/testing/selftests/powerpc/ptrace/.gitignore
+++ b/tools/testing/selftests/powerpc/ptrace/.gitignore
@@ -9,3 +9,4 @@ ptrace-tm-vsx
 ptrace-tm-spd-vsx
 ptrace-tm-spr
 ptrace-hwbreak
+perf-hwbreak
diff --git a/tools/testing/selftests/powerpc/ptrace/Makefile 
b/tools/testing/selftests/powerpc/ptrace/Makefile
index 0e2f4601d1..aaf7044771 100644
--- a/tools/testing/selftests/powerpc/ptrace/Makefile
+++ b/tools/testing/selftests/powerpc/ptrace/Makefile
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 TEST_PROGS := ptrace-gpr ptrace-tm-gpr ptrace-tm-spd-gpr \
   ptrace-tar ptrace-tm-tar ptrace-tm-spd-tar ptrace-vsx 
ptrace-tm-vsx \
-  ptrace-tm-spd-vsx ptrace-tm-spr ptrace-hwbreak
+  ptrace-tm-spd-vsx ptrace-tm-spr ptrace-hwbreak perf-hwbreak
 
 include ../../lib.mk
 
diff --git a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c 
b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
new file mode 100644
index 00..60df0b5e62
--- /dev/null
+++ b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
@@ -0,0 +1,195 @@
+/*
+ * perf events self profiling example test case for hw breakpoints.
+ *
+ * This tests perf PERF_TYPE_BREAKPOINT parameters
+ * 1) tests all variants of the break on read/write flags
+ * 2) tests exclude_user == 0 and 1
+ * 3) test array matches (if DAWR is supported))
+ * 4) test different numbers of breakpoints matches
+ *
+ * Configure this breakpoint, then read and write the data a number of
+ * times. Then check the output count from perf is as expected.
+ *
+ * Based on:
+ *   http://ozlabs.org/~anton/junkcode/perf_events_example1.c
+ *
+ * Copyright (C) 2018 Michael Neuling, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "utils.h"
+
+#define MAX_LOOPS 1
+
+#define DAWR_LENGTH_MAX ((0x3f + 1) * 8)
+
+static inline int sys_perf_event_open(struct perf_event_attr *attr, pid_t pid,
+ int cpu, int group_fd,
+ unsigned long flags)
+{
+   attr->size = sizeof(*attr);
+   return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
+}
+
+static inline bool breakpoint_test(int len)
+{
+   struct perf_event_attr attr;
+   int fd;
+
+   /* setup counters */
+   memset(, 0, sizeof(attr));
+   attr.disabled = 1;
+   attr.type = PERF_TYPE_BREAKPOINT;
+   attr.bp_type = HW_BREAKPOINT_R;
+   /* bp_addr can point anywhere but needs to be aligned */
+   attr.bp_addr = (__u64)() & 0xf800;
+   attr.bp_len = len;
+   fd = sys_perf_event_open(, 0, -1, -1, 0);
+   if (fd < 0)
+   return false;
+   close(fd);
+   return true;
+}
+
+static inline bool perf_breakpoint_supported(void)
+{
+   return breakpoint_test(4);
+}
+
+static inline bool dawr_supported(void)
+{
+   return breakpoint_test(DAWR_LENGTH_MAX);
+}
+
+static int runtestsingle(int readwriteflag, int exclude_user, int arraytest)
+{
+   int i,j;
+   struct perf_event_attr attr;
+   size_t res;
+   unsigned long long breaks, needed;
+   int readint;
+   int readintarraybig[2*DAWR_LENGTH_MAX/sizeof(int)];
+   int *readintalign;
+   volatile int *ptr;
+   int break_fd;
+   int loop_num = MAX_LOOPS - (rand() % 100); /* provide some variability 
*/
+   volatile int *k;
+
+   /* align to 0x400 boundary as required by DAWR */
+   readintalign = (int *)(((unsigned long)readintarraybig + 0x7ff) &
+  0xf800);
+
+   ptr = 
+   if (arraytest)
+   ptr = [0];
+
+   /* setup counters */
+   memset(, 0, sizeof(attr));
+   attr.disabled = 1;
+   attr.type = PERF_TYPE_BREAKPOINT;
+   attr.bp_type = readwriteflag;
+   attr.bp_addr = (__u64)ptr;
+   attr.bp_len = sizeof(int);
+   if (arraytest)
+   attr.bp_len = DAWR_LENGTH_MAX;
+   attr.exclude_user

Re: [PATCH] powerpc/64: Fix build failure with GCC 8.1

2018-05-28 Thread Benjamin Herrenschmidt

On Mon, 2018-05-28 at 16:37 +, Christophe Leroy wrote:
>   CC  arch/powerpc/kernel/nvram_64.o
> arch/powerpc/kernel/nvram_64.c: In function 'nvram_create_partition':
> arch/powerpc/kernel/nvram_64.c:1042:2: error: 'strncpy' specified bound 12 
> equals destination size [-Werror=stringop-truncation]
>   strncpy(new_part->header.name, name, 12);
>   ^~~~
> 
>   CC  arch/powerpc/kernel/trace/ftrace.o
> In function 'make_field',
> inlined from 'ps3_repository_read_boot_dat_address' at 
> arch/powerpc/platforms/ps3/repository.c:900:9:
> arch/powerpc/platforms/ps3/repository.c:106:2: error: 'strncpy' output 
> truncated before terminating nul copying 8 bytes from a string of the same 
> length [-Werror=stringop-truncation]
>   strncpy((char *), text, 8);
>   ^~~~

That's not completely correct, here it's gcc warning being over-
zealous. The old specs I could find define the format to be a 12
characters strings *or* a less-than-12 characters NUL terminated
string. So it's perfectly ok to drop the trailing 0 if the name
happens to be exactly 12 characters.

Cheers,
Ben.

> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/kernel/nvram_64.c  | 2 +-
>  arch/powerpc/platforms/ps3/repository.c | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/nvram_64.c b/arch/powerpc/kernel/nvram_64.c
> index ba681dac7b46..7507448cd904 100644
> --- a/arch/powerpc/kernel/nvram_64.c
> +++ b/arch/powerpc/kernel/nvram_64.c
> @@ -1039,7 +1039,7 @@ loff_t __init nvram_create_partition(const char *name, 
> int sig,
>   new_part->index = free_part->index;
>   new_part->header.signature = sig;
>   new_part->header.length = size;
> - strncpy(new_part->header.name, name, 12);
> + strncpy(new_part->header.name, name, sizeof(new_part->header.name) - 1);
>   new_part->header.checksum = nvram_checksum(_part->header);
>  
>   rc = nvram_write_header(new_part);
> diff --git a/arch/powerpc/platforms/ps3/repository.c 
> b/arch/powerpc/platforms/ps3/repository.c
> index 50dbaf24b1ee..b4d6628eec5e 100644
> --- a/arch/powerpc/platforms/ps3/repository.c
> +++ b/arch/powerpc/platforms/ps3/repository.c
> @@ -101,9 +101,9 @@ static u64 make_first_field(const char *text, u64 index)
>  
>  static u64 make_field(const char *text, u64 index)
>  {
> - u64 n;
> + u64 n = 0;
>  
> - strncpy((char *), text, 8);
> + memcpy((char *), text, min(sizeof(n), strlen(text)));
>   return n + index;
>  }
>

Re: [PATCH][RFC] [powerpc] arch_ptrace() uses of access_ok() are pointless

2018-05-28 Thread Al Viro



> Maybe this is just an RFC, but:
> 
>   CALL../arch/powerpc/kernel/systbl_chk.sh
> ../arch/powerpc/kernel/ptrace.c: In function ‘arch_ptrace’:
> ../arch/powerpc/kernel/ptrace.c:3086:4: error: expected ‘)’ before ‘return’
> return -EFAULT;
> ^~

and the same a few lines later.  What's more, those 'unlikely' are pointless
there.  Fixed variant follows; only build-tested, though.

make it use copy_{from,to}_user(), rather than access_ok() +
__copy_...

Signed-off-by: Al Viro 
---
 arch/powerpc/kernel/ptrace.c | 22 +++---
 1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index d23cf632edf0..f557322621e0 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -3081,27 +3081,19 @@ long arch_ptrace(struct task_struct *child, long 
request,
 #endif /* CONFIG_HAVE_HW_BREAKPOINT */
 #endif /* CONFIG_PPC_ADV_DEBUG_REGS */
 
-   if (!access_ok(VERIFY_WRITE, datavp,
-  sizeof(struct ppc_debug_info)))
+   if (copy_to_user(datavp, ,
+sizeof(struct ppc_debug_info)))
return -EFAULT;
-   ret = __copy_to_user(datavp, ,
-sizeof(struct ppc_debug_info)) ?
- -EFAULT : 0;
-   break;
+   return 0;
}
 
case PPC_PTRACE_SETHWDEBUG: {
struct ppc_hw_breakpoint bp_info;
 
-   if (!access_ok(VERIFY_READ, datavp,
-  sizeof(struct ppc_hw_breakpoint)))
-   return -EFAULT;
-   ret = __copy_from_user(_info, datavp,
-  sizeof(struct ppc_hw_breakpoint)) ?
- -EFAULT : 0;
-   if (!ret)
-   ret = ppc_set_hwdebug(child, _info);
-   break;
+   if (copy_from_user(_info, datavp,
+  sizeof(struct ppc_hw_breakpoint)))
+ return -EFAULT;
+   return ppc_set_hwdebug(child, _info);
}
 
case PPC_PTRACE_DELHWDEBUG: {
-- 
2.11.0

Re: [PATCH][RFC] [powerpc] arch_ptrace() uses of access_ok() are pointless

2018-05-28 Thread Mathieu Malaterre

On Mon, May 28, 2018 at 12:34 AM, Al Viro  wrote:
> make it use copy_{from,to}_user(), rather than access_ok() +
> __copy_...
>
> Signed-off-by: Al Viro 
> ---
>  arch/powerpc/kernel/ptrace.c | 22 +++---
>  1 file changed, 7 insertions(+), 15 deletions(-)
>
> diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
> index d23cf632edf0..d8b0fd2fa3aa 100644
> --- a/arch/powerpc/kernel/ptrace.c
> +++ b/arch/powerpc/kernel/ptrace.c
> @@ -3081,27 +3081,19 @@ long arch_ptrace(struct task_struct *child, long 
> request,
>  #endif /* CONFIG_HAVE_HW_BREAKPOINT */
>  #endif /* CONFIG_PPC_ADV_DEBUG_REGS */
>
> -   if (!access_ok(VERIFY_WRITE, datavp,
> -  sizeof(struct ppc_debug_info)))
> +   if (unlikely(copy_to_user(datavp, ,
> +sizeof(struct ppc_debug_info)))


Maybe this is just an RFC, but:

  CALL../arch/powerpc/kernel/systbl_chk.sh
../arch/powerpc/kernel/ptrace.c: In function ‘arch_ptrace’:
../arch/powerpc/kernel/ptrace.c:3086:4: error: expected ‘)’ before ‘return’
return -EFAULT;
^~

Missing closing parenthesis.

> return -EFAULT;
> -   ret = __copy_to_user(datavp, ,
> -sizeof(struct ppc_debug_info)) ?
> - -EFAULT : 0;
> -   break;
> +   return 0;
> }
>
> case PPC_PTRACE_SETHWDEBUG: {
> struct ppc_hw_breakpoint bp_info;
>
> -   if (!access_ok(VERIFY_READ, datavp,
> -  sizeof(struct ppc_hw_breakpoint)))
> -   return -EFAULT;
> -   ret = __copy_from_user(_info, datavp,
> -  sizeof(struct ppc_hw_breakpoint)) ?
> - -EFAULT : 0;
> -   if (!ret)
> -   ret = ppc_set_hwdebug(child, _info);
> -   break;
> +   if (unlikely(copy_from_user(_info, datavp,
> +  sizeof(struct ppc_hw_breakpoint)))
> + return -EFAULT;
> +   return ppc_set_hwdebug(child, _info);
> }
>
> case PPC_PTRACE_DELHWDEBUG: {
> --
> 2.11.0
>

[PATCH] cpuidle/powernv : Add Description for cpuidle state

2018-05-28 Thread Abhishek Goel

Names of cpuidle states were being used for description of states
in POWER as no descriptions were added in device tree. This patch
reads description for idle states which have been added in device
tree.
The description for idle states in case of POWER can be printed
using "cpupower monitor -l" or "cpupower idle-info".

Signed-off-by: Abhishek Goel 
---

The skiboot patch which adds description for idle states in device tree
can be found here: https://patchwork.ozlabs.org/patch/921637/

 drivers/cpuidle/cpuidle-powernv.c | 17 +
 include/linux/cpuidle.h   |  2 +-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index 1a8234e706bc..d3915a965128 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -133,7 +133,7 @@ static int stop_loop(struct cpuidle_device *dev,
 static struct cpuidle_state powernv_states[CPUIDLE_STATE_MAX] = {
{ /* Snooze */
.name = "snooze",
-   .desc = "snooze",
+   .desc = "idle polling state",
.exit_latency = 0,
.target_residency = 0,
.enter = snooze_loop },
@@ -206,6 +206,7 @@ static int powernv_cpuidle_driver_init(void)
 }
 
 static inline void add_powernv_state(int index, const char *name,
+const char *desc,
 unsigned int flags,
 int (*idle_fn)(struct cpuidle_device *,
struct cpuidle_driver *,
@@ -215,7 +216,7 @@ static inline void add_powernv_state(int index, const char 
*name,
 u64 psscr_val, u64 psscr_mask)
 {
strlcpy(powernv_states[index].name, name, CPUIDLE_NAME_LEN);
-   strlcpy(powernv_states[index].desc, name, CPUIDLE_NAME_LEN);
+   strlcpy(powernv_states[index].desc, desc, CPUIDLE_DESC_LEN);
powernv_states[index].flags = flags;
powernv_states[index].target_residency = target_residency;
powernv_states[index].exit_latency = exit_latency;
@@ -250,6 +251,7 @@ static int powernv_add_idle_states(void)
u64 psscr_val[CPUIDLE_STATE_MAX];
u64 psscr_mask[CPUIDLE_STATE_MAX];
const char *names[CPUIDLE_STATE_MAX];
+   const char *descs[CPUIDLE_STATE_MAX];
u32 has_stop_states = 0;
int i, rc;
u32 supported_flags = pnv_get_supported_cpuidle_states();
@@ -311,6 +313,11 @@ static int powernv_add_idle_states(void)
pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-names in 
DT\n");
goto out;
}
+   if (of_property_read_string_array(power_mgt,
+   "ibm,cpu-idle-state-descs", descs, dt_idle_states) < 0) {
+   pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-descs in 
DT\n");
+   goto out;
+   }
 
/*
 * If the idle states use stop instruction, probe for psscr values
@@ -414,10 +421,11 @@ static int powernv_add_idle_states(void)
target_residency = 100;
/* Add NAP state */
add_powernv_state(nr_idle_states, "Nap",
+ "stop processor execution",
  CPUIDLE_FLAG_NONE, nap_loop,
  target_residency, exit_latency, 0, 0);
} else if (has_stop_states && !stops_timebase) {
-   add_powernv_state(nr_idle_states, names[i],
+   add_powernv_state(nr_idle_states, names[i], descs[i],
  CPUIDLE_FLAG_NONE, stop_loop,
  target_residency, exit_latency,
  psscr_val[i], psscr_mask[i]);
@@ -434,11 +442,12 @@ static int powernv_add_idle_states(void)
target_residency = 30;
/* Add FASTSLEEP state */
add_powernv_state(nr_idle_states, "FastSleep",
+ "Core and L2 clock gating",
  CPUIDLE_FLAG_TIMER_STOP,
  fastsleep_loop,
  target_residency, exit_latency, 0, 0);
} else if (has_stop_states && stops_timebase) {
-   add_powernv_state(nr_idle_states, names[i],
+   add_powernv_state(nr_idle_states, names[i], descs[i],
  CPUIDLE_FLAG_TIMER_STOP, stop_loop,
  target_residency, exit_latency,
  psscr_val[i], psscr_mask[i]);
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 1eefabf1621f..5094755cb132 100644
---

[PATCH] powerpc/64: Fix build failure with GCC 8.1

2018-05-28 Thread Christophe Leroy

  CC  arch/powerpc/kernel/nvram_64.o
arch/powerpc/kernel/nvram_64.c: In function 'nvram_create_partition':
arch/powerpc/kernel/nvram_64.c:1042:2: error: 'strncpy' specified bound 12 
equals destination size [-Werror=stringop-truncation]
  strncpy(new_part->header.name, name, 12);
  ^~~~

  CC  arch/powerpc/kernel/trace/ftrace.o
In function 'make_field',
inlined from 'ps3_repository_read_boot_dat_address' at 
arch/powerpc/platforms/ps3/repository.c:900:9:
arch/powerpc/platforms/ps3/repository.c:106:2: error: 'strncpy' output 
truncated before terminating nul copying 8 bytes from a string of the same 
length [-Werror=stringop-truncation]
  strncpy((char *), text, 8);
  ^~~~

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/nvram_64.c  | 2 +-
 arch/powerpc/platforms/ps3/repository.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/nvram_64.c b/arch/powerpc/kernel/nvram_64.c
index ba681dac7b46..7507448cd904 100644
--- a/arch/powerpc/kernel/nvram_64.c
+++ b/arch/powerpc/kernel/nvram_64.c
@@ -1039,7 +1039,7 @@ loff_t __init nvram_create_partition(const char *name, 
int sig,
new_part->index = free_part->index;
new_part->header.signature = sig;
new_part->header.length = size;
-   strncpy(new_part->header.name, name, 12);
+   strncpy(new_part->header.name, name, sizeof(new_part->header.name) - 1);
new_part->header.checksum = nvram_checksum(_part->header);
 
rc = nvram_write_header(new_part);
diff --git a/arch/powerpc/platforms/ps3/repository.c 
b/arch/powerpc/platforms/ps3/repository.c
index 50dbaf24b1ee..b4d6628eec5e 100644
--- a/arch/powerpc/platforms/ps3/repository.c
+++ b/arch/powerpc/platforms/ps3/repository.c
@@ -101,9 +101,9 @@ static u64 make_first_field(const char *text, u64 index)
 
 static u64 make_field(const char *text, u64 index)
 {
-   u64 n;
+   u64 n = 0;
 
-   strncpy((char *), text, 8);
+   memcpy((char *), text, min(sizeof(n), strlen(text)));
return n + index;
 }
 
-- 
2.13.3

Re: [PATCH] powerpc/Makefile: fix build failure by disabling attribute-alias warning

2018-05-28 Thread Mathieu Malaterre

On Mon, May 28, 2018 at 5:07 PM, Segher Boessenkool
 wrote:
> On Mon, May 28, 2018 at 02:39:37PM +, Christophe Leroy wrote:
>> In file included from arch/powerpc/kernel/syscalls.c:24:
>> ./include/linux/syscalls.h:233:18: warning: 'sys_mmap2' alias between
>> functions of incompatible types 'long int(long unsigned int,  size_t,
>> long unsigned int,  long unsigned int,  long unsigned int,  long
>> unsigned int)' {aka 'long int(long unsigned int,  unsigned int,  long
>> unsigned int,  long unsigned int,  long unsigned int,  long unsigned
>> int)'} and 'long int(long int,  long int,  long int,  long int,  long
>> int,  long int)' [-Wattribute-alias]
>
> So yes, these are actually different (int vs. long).
>
>> In file included from arch/powerpc/kernel/signal_32.c:29:
>> ./include/linux/syscalls.h:233:18: warning: 'sys_swapcontext' alias
>> between functions of incompatible types 'long int(struct ucontext *,
>> struct ucontext *, long int)' and 'long int(long int,  long int,  long
>> int)' [-Wattribute-alias]
>
> And this one is quite spectacularly different.

https://patchwork.kernel.org/patch/10375607/

>
> Try putting this before the wacko aliases:
>
> #pragma GCC diagnostic push
> #pragma GCC diagnostic ignored "-Wattribute-alias"
>
> and this after:
>
> #pragma GCC diagnostic pop
>
> so that you will get this quite useful warning for all other code.
>
>
> Segher

[PATCH] powerpc/64s: Enhance the information in cpu_show_spectre_v1()

2018-05-28 Thread Michal Suchanek


We now have barrier_nospec as mitigation so print it in
cpu_show_spectre_v1 when enabled.

Signed-off-by: Michal Suchanek 
---
 arch/powerpc/kernel/security.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index 0239383c7e4d..a0c32d53980b 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -120,7 +120,10 @@ ssize_t cpu_show_spectre_v1(struct device *dev, struct 
device_attribute *attr, c
if (!security_ftr_enabled(SEC_FTR_BNDS_CHK_SPEC_BAR))
return sprintf(buf, "Not affected\n");
 
-   return sprintf(buf, "Vulnerable\n");
+   if (barrier_nospec_enabled)
+   return sprintf(buf, "Mitigation: __user pointer 
sanitization\n");
+   else
+   return sprintf(buf, "Vulnerable\n");
 }
 
 ssize_t cpu_show_spectre_v2(struct device *dev, struct device_attribute *attr, 
char *buf)
-- 
2.13.6

Re: [PATCH] powerpc/Makefile: fix build failure by disabling attribute-alias warning

2018-05-28 Thread Segher Boessenkool

On Mon, May 28, 2018 at 02:39:37PM +, Christophe Leroy wrote:
> In file included from arch/powerpc/kernel/syscalls.c:24:
> ./include/linux/syscalls.h:233:18: warning: 'sys_mmap2' alias between 
> functions of incompatible types 'long int(long unsigned int,  size_t, 
> long unsigned int,  long unsigned int,  long unsigned int,  long 
> unsigned int)' {aka 'long int(long unsigned int,  unsigned int,  long 
> unsigned int,  long unsigned int,  long unsigned int,  long unsigned 
> int)'} and 'long int(long int,  long int,  long int,  long int,  long 
> int,  long int)' [-Wattribute-alias]

So yes, these are actually different (int vs. long).

> In file included from arch/powerpc/kernel/signal_32.c:29:
> ./include/linux/syscalls.h:233:18: warning: 'sys_swapcontext' alias 
> between functions of incompatible types 'long int(struct ucontext *, 
> struct ucontext *, long int)' and 'long int(long int,  long int,  long 
> int)' [-Wattribute-alias]

And this one is quite spectacularly different.


Try putting this before the wacko aliases:

#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wattribute-alias"

and this after:

#pragma GCC diagnostic pop

so that you will get this quite useful warning for all other code.


Segher

Re: [PATCH] powerpc/Makefile: fix build failure by disabling attribute-alias warning

2018-05-28 Thread Christophe LEROY




Le 28/05/2018 à 16:37, Segher Boessenkool a écrit :

On Mon, May 28, 2018 at 02:17:49PM +, Christophe Leroy wrote:

Latest GCC version emit many warnings similar to the following one.


You didn't actually show an example?


Yes I forgot:

In file included from arch/powerpc/kernel/syscalls.c:24:
./include/linux/syscalls.h:233:18: warning: 'sys_mmap2' alias between 
functions of incompatible types 'long int(long unsigned int,  size_t, 
long unsigned int,  long unsigned int,  long unsigned int,  long 
unsigned int)' {aka 'long int(long unsigned int,  unsigned int,  long 
unsigned int,  long unsigned int,  long unsigned int,  long unsigned 
int)'} and 'long int(long int,  long int,  long int,  long int,  long 
int,  long int)' [-Wattribute-alias]

  asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
  ^~~
./include/linux/syscalls.h:222:2: note: in expansion of macro 
'__SYSCALL_DEFINEx'

  __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
  ^
./include/linux/syscalls.h:216:36: note: in expansion of macro 
'SYSCALL_DEFINEx'
 #define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, 
__VA_ARGS__)

^~~
arch/powerpc/kernel/syscalls.c:65:1: note: in expansion of macro 
'SYSCALL_DEFINE6'

 SYSCALL_DEFINE6(mmap2, unsigned long, addr, size_t, len,
 ^~~
./include/linux/syscalls.h:238:18: note: aliased declaration here
  asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
  ^~~~
./include/linux/syscalls.h:222:2: note: in expansion of macro 
'__SYSCALL_DEFINEx'

  __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
  ^
./include/linux/syscalls.h:216:36: note: in expansion of macro 
'SYSCALL_DEFINEx'
 #define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, 
__VA_ARGS__)

^~~
arch/powerpc/kernel/syscalls.c:65:1: note: in expansion of macro 
'SYSCALL_DEFINE6'

 SYSCALL_DEFINE6(mmap2, unsigned long, addr, size_t, len,
 ^~~



This warning should detect serious problems, so don't disable it without
first investigating please.


It's been a discussion on this topic, ref 
https://lkml.org/lkml/2017/12/5/581


It says "The new warning seems reasonable in principle, but it doesn't
help us here, since we rely on the type mismatch to sanitize the
system call arguments. After I reported this as GCC PR82435, a new
-Wno-attribute-alias option was added that could be used to turn the
warning off globally on the command line, but I'd prefer to do it a
little more fine-grained"




What do you call "latest version", btw?  Trunk, aka 9.0?  Or the most
advanced release, 8.1?  Or the latest release (which also is 8.1 :-) )


I encounter it with 8.1
According the refered discusion, it linked to GCC 8

Christophe




Segher

Re: [PATCH] powerpc/Makefile: fix build failure by disabling attribute-alias warning

2018-05-28 Thread Christophe Leroy




On 05/28/2018 02:17 PM, Christophe Leroy wrote:

Latest GCC version emit many warnings similar to the following one.


Oops, forgot to include the failure text:

In file included from arch/powerpc/kernel/syscalls.c:24:
./include/linux/syscalls.h:233:18: warning: 'sys_mmap2' alias between 
functions of incompatible types 'long int(long unsigned int,  size_t, 
long unsigned int,  long unsigned int,  long unsigned int,  long 
unsigned int)' {aka 'long int(long unsigned int,  unsigned int,  long 
unsigned int,  long unsigned int,  long unsigned int,  long unsigned 
int)'} and 'long int(long int,  long int,  long int,  long int,  long 
int,  long int)' [-Wattribute-alias]

  asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
  ^~~
./include/linux/syscalls.h:222:2: note: in expansion of macro 
'__SYSCALL_DEFINEx'

  __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
  ^
./include/linux/syscalls.h:216:36: note: in expansion of macro 
'SYSCALL_DEFINEx'
 #define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, 
__VA_ARGS__)

^~~
arch/powerpc/kernel/syscalls.c:65:1: note: in expansion of macro 
'SYSCALL_DEFINE6'

 SYSCALL_DEFINE6(mmap2, unsigned long, addr, size_t, len,
 ^~~
./include/linux/syscalls.h:238:18: note: aliased declaration here
  asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
  ^~~~
./include/linux/syscalls.h:222:2: note: in expansion of macro 
'__SYSCALL_DEFINEx'

  __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
  ^
./include/linux/syscalls.h:216:36: note: in expansion of macro 
'SYSCALL_DEFINEx'
 #define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, 
__VA_ARGS__)

^~~
arch/powerpc/kernel/syscalls.c:65:1: note: in expansion of macro 
'SYSCALL_DEFINE6'

 SYSCALL_DEFINE6(mmap2, unsigned long, addr, size_t, len,
 ^~~
./include/linux/syscalls.h:233:18: warning: 'sys_mmap' alias between 
functions of incompatible types 'long int(long unsigned int,  size_t, 
long unsigned int,  long unsigned int,  long unsigned int,  off_t)' {aka 
'long int(long unsigned int,  unsigned int,  long unsigned int,  long 
unsigned int,  long unsigned int,  long int)'} and 'long int(long int, 
long int,  long int,  long int,  long int,  long int)' [-Wattribute-alias]

  asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
  ^~~
./include/linux/syscalls.h:222:2: note: in expansion of macro 
'__SYSCALL_DEFINEx'

  __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
  ^
./include/linux/syscalls.h:216:36: note: in expansion of macro 
'SYSCALL_DEFINEx'
 #define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, 
__VA_ARGS__)

^~~
arch/powerpc/kernel/syscalls.c:72:1: note: in expansion of macro 
'SYSCALL_DEFINE6'

 SYSCALL_DEFINE6(mmap, unsigned long, addr, size_t, len,
 ^~~
./include/linux/syscalls.h:238:18: note: aliased declaration here
  asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
  ^~~~
./include/linux/syscalls.h:222:2: note: in expansion of macro 
'__SYSCALL_DEFINEx'

  __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
  ^
./include/linux/syscalls.h:216:36: note: in expansion of macro 
'SYSCALL_DEFINEx'
 #define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, 
__VA_ARGS__)

^~~
arch/powerpc/kernel/syscalls.c:72:1: note: in expansion of macro 
'SYSCALL_DEFINE6'

 SYSCALL_DEFINE6(mmap, unsigned long, addr, size_t, len,
 ^~~
  CC  arch/powerpc/kernel/irq.o
  CC  arch/powerpc/kernel/align.o
  CC  arch/powerpc/kernel/signal_32.o
In file included from arch/powerpc/kernel/signal_32.c:29:
./include/linux/syscalls.h:233:18: warning: 'sys_swapcontext' alias 
between functions of incompatible types 'long int(struct ucontext *, 
struct ucontext *, long int)' and 'long int(long int,  long int,  long 
int)' [-Wattribute-alias]

  asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
  ^~~
./include/linux/syscalls.h:222:2: note: in expansion of macro 
'__SYSCALL_DEFINEx'

  __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
  ^
./include/linux/syscalls.h:213:36: note: in expansion of macro 
'SYSCALL_DEFINEx'
 #define SYSCALL_DEFINE3(name, ...) SYSCALL_DEFINEx(3, _##name, 
__VA_ARGS__)

^~~
arch/powerpc/kernel/signal_32.c:1044:1: note: in expansion of macro 
'SYSCALL_DEFINE3'

 SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
 ^~~
./include/linux/syscalls.h:238:18: note: aliased declaration here
  asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
  ^~~~
./include/linux/syscalls.h:222:2: note: in expansion of macro 
'__SYSCALL_DEFINEx'

  __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)

Re: [PATCH] powerpc/Makefile: fix build failure by disabling attribute-alias warning

2018-05-28 Thread Segher Boessenkool

On Mon, May 28, 2018 at 02:17:49PM +, Christophe Leroy wrote:
> Latest GCC version emit many warnings similar to the following one.

You didn't actually show an example?

This warning should detect serious problems, so don't disable it without
first investigating please.

What do you call "latest version", btw?  Trunk, aka 9.0?  Or the most
advanced release, 8.1?  Or the latest release (which also is 8.1 :-) )


Segher

[PATCH] powerpc/Makefile: fix build failure by disabling attribute-alias warning

2018-05-28 Thread Christophe Leroy

Latest GCC version emit many warnings similar to the following one.

As arch/powerpc code is built with -Werror, this breaks build with
GCC 8.1

This patch inhibits that warning

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 26ccd0b91512..d6c557530ce8 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -145,6 +145,8 @@ CFLAGS-$(CONFIG_PPC64)  += $(call 
cc-option,-mno-pointers-to-nested-functions)
 CFLAGS-$(CONFIG_PPC32) := -ffixed-r2 $(MULTIPLEWORD)
 CFLAGS-$(CONFIG_PPC32) += $(call cc-option,-mno-readonly-in-sdata)
 
+CFLAGS-y += $(call cc-option,-Wno-attribute-alias)
+
 ifeq ($(CONFIG_PPC_BOOK3S_64),y)
 ifeq ($(CONFIG_CPU_LITTLE_ENDIAN),y)
 CFLAGS-$(CONFIG_GENERIC_CPU) += -mcpu=power8
-- 
2.13.3

Re: [PATCH] powerpc/64s: Clear PCR on boot

2018-05-28 Thread Benjamin Herrenschmidt

On Sat, 2018-05-26 at 20:45 -0700, Guenter Roeck wrote:
> 
> I already have a patch, or at least one that does the trick for me.
> Getting qemu patched was not the problem. I just want to be sure that
> the problem is indeed a qemu problem.

Hey Guenter !

It's not quite the right patch though. The PCR is a hypervisor
priviledged register, your patch makes it supervisor accessible.

I don't have all my stuff at hand to provide a "proper" or tested patch
but it should look like

spr_register_hv(env, SPR_PCR, "PCR",
SPR_NOACCESS, SPR_NOACCESS,
SPR_NOACCESS, SPR_NOACCESS,
_read_generic, _write_generic, 0);

Additionally the TCG ppc instruction decoder should be made to check
the PCR for varous instructions (that or use a specific write callback
that affects the CPU flags) but that's less urgent.

Cheers,
Ben.

> Thanks,
> Guenter
> 
> ---
> > From 1617bac264b4c49d817b6947611affa9b73318f6 Mon Sep 17 00:00:00 2001
> 
> From: Guenter Roeck 
> Date: Fri, 25 May 2018 06:38:40 -0700
> Subject: [PATCH] PowerPC: Permit privileged access to SPR_PCR for POWER7+
> 
> Without this access, Linux mainline bails out.
> 
> Signed-off-by: Guenter Roeck 
> ---
>  target/ppc/translate_init.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
> index 391b94b..3b09c49 100644
> --- a/target/ppc/translate_init.c
> +++ b/target/ppc/translate_init.c
> @@ -7953,11 +7953,12 @@ static void gen_spr_power6_common(CPUPPCState *env)
>  #endif
>  /*
>   * Register PCR to report POWERPC_EXCP_PRIV_REG instead of
> - * POWERPC_EXCP_INVAL_SPR.
> + * POWERPC_EXCP_INVAL_SPR in userspace. Permit privileged
> + * access.
>   */
>  spr_register(env, SPR_PCR, "PCR",
>   SPR_NOACCESS, SPR_NOACCESS,
> - SPR_NOACCESS, SPR_NOACCESS,
> + _read_generic, _write_generic,
>   0x);
>  }
>

[PATCH v2] powerpc/32: implement strlen() in assembly

2018-05-28 Thread Christophe Leroy

The generic implementation of strlen() reads strings byte per byte.

This patch implements strlen() in assembly for PPC32 based on
a read of entire words, in the same spirit as what some other
arches and glibc do.

For long strings, the time spent in strlen is reduced by 50-60%

Signed-off-by: Christophe Leroy 
---
Applies on top of serie "[v5 0/3] powerpc/lib: Optimisation of memcmp() and 
__clear_user() for PPC32"

Changes in v2:
 - Moved handling of unaligned strings outside of the main path as it is very 
unlikely.
 - Removed the verification of the fourth byte in case none of the three first 
ones are NUL.

 arch/powerpc/include/asm/string.h |  3 +++
 arch/powerpc/lib/string_32.S  | 41 +++
 2 files changed, 44 insertions(+)

diff --git a/arch/powerpc/include/asm/string.h 
b/arch/powerpc/include/asm/string.h
index 9b8cedf618f4..5ecfdb776f87 100644
--- a/arch/powerpc/include/asm/string.h
+++ b/arch/powerpc/include/asm/string.h
@@ -13,6 +13,9 @@
 #define __HAVE_ARCH_MEMCHR
 #define __HAVE_ARCH_MEMSET16
 #define __HAVE_ARCH_MEMCPY_FLUSHCACHE
+#ifdef CONFIG_PPC32
+#define __HAVE_ARCH_STRLEN
+#endif
 
 extern char * strcpy(char *,const char *);
 extern char * strncpy(char *,const char *, __kernel_size_t);
diff --git a/arch/powerpc/lib/string_32.S b/arch/powerpc/lib/string_32.S
index 4fbaa046aa84..69593c917b35 100644
--- a/arch/powerpc/lib/string_32.S
+++ b/arch/powerpc/lib/string_32.S
@@ -47,6 +47,47 @@ _GLOBAL(memcmp)
blr
 EXPORT_SYMBOL(memcmp)
 
+_GLOBAL(strlen)
+   andi.   r9, r3, 3
+   addir10, r3, -4
+   bne-1f
+2: lis r6, 0x8080
+   ori r6, r6, 0x8080  /* r6 = 0x80808080 (himagic) */
+   rlwinm  r7, r6, 1, 0x   /* r7 = 0x01010101 (lomagic) */
+3: lwzur9, 4(r10)
+   /* ((x - lomagic) & ~x & himagic) == 0 means no byte in x is NUL */
+   subfr8, r7, r9
+   andcr11, r6, r9
+   and.r8, r8, r11
+   beq+3b
+   rlwinm. r8, r9, 0, 0xff00
+   beq 20f
+   rlwinm. r8, r9, 0, 0x00ff
+   beq 21f
+   rlwinm. r8, r9, 0, 0xff00
+   beq 22f
+   subfr3, r3, r10
+   addir3, r3, 3
+   blr
+22:subfr3, r3, r10
+   addir3, r3, 2
+   blr
+21:subfr3, r3, r10
+   addir3, r3, 1
+   blr
+19:addir10, r10, 3
+20:subfr3, r3, r10
+   blr
+
+1: lbz r9, 4(r10)
+   addir10, r10, 1
+   cmpwi   cr1, r9, 0
+   andi.   r9, r10, 3
+   beq cr1, 19b
+   bne 1b
+   b   2b
+EXPORT_SYMBOL(strlen)
+
 CACHELINE_BYTES = L1_CACHE_BYTES
 LG_CACHELINE_BYTES = L1_CACHE_SHIFT
 CACHELINE_MASK = (L1_CACHE_BYTES-1)
-- 
2.13.3

Re: [PATCH v6 2/4] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision

2018-05-28 Thread Michael Ellerman

Hi Simon,

wei.guo.si...@gmail.com writes:
> diff --git a/arch/powerpc/lib/memcmp_64.S b/arch/powerpc/lib/memcmp_64.S
> index f20e883..4ba7bb6 100644
> --- a/arch/powerpc/lib/memcmp_64.S
> +++ b/arch/powerpc/lib/memcmp_64.S
> @@ -174,6 +235,13 @@ _GLOBAL(memcmp)
>   blr
>  
>  .Llong:
> +#ifdef CONFIG_ALTIVEC
> + /* Try to use vmx loop if length is equal or greater than 4K */
> + cmpldi  cr6,r5,VMX_THRESH
> + bge cr6,.Lsameoffset_vmx_cmp
> +

Here we decide to use vmx, but we don't do any CPU feature checks.


> @@ -332,7 +400,94 @@ _GLOBAL(memcmp)
>  8:
>   blr
>  
> +#ifdef CONFIG_ALTIVEC
> +.Lsameoffset_vmx_cmp:
> + /* Enter with src/dst addrs has the same offset with 8 bytes
> +  * align boundary
> +  */
> + ENTER_VMX_OPS
> + beq cr1,.Llong_novmx_cmp
> +
> +3:
> + /* need to check whether r4 has the same offset with r3
> +  * for 16 bytes boundary.
> +  */
> + xor r0,r3,r4
> + andi.   r0,r0,0xf
> + bne .Ldiffoffset_vmx_cmp_start
> +
> + /* len is no less than 4KB. Need to align with 16 bytes further.
> +  */
> + andi.   rA,r3,8
> + LD  rA,0,r3
> + beq 4f
> + LD  rB,0,r4
> + cmpld   cr0,rA,rB
> + addir3,r3,8
> + addir4,r4,8
> + addir5,r5,-8
> +
> + beq cr0,4f
> + /* save and restore cr0 */
> + mfocrf  r5,64
> + EXIT_VMX_OPS
> + mtocrf  64,r5
> + b   .LcmpAB_lightweight
> +
> +4:
> + /* compare 32 bytes for each loop */
> + srdir0,r5,5
> + mtctr   r0
> + clrldi  r5,r5,59
> + li  off16,16
> +
> +.balign 16
> +5:
> + lvx v0,0,r3
> + lvx v1,0,r4
> + vcmpequd. v0,v0,v1

vcmpequd is only available on Power8 and later CPUs.

Which means this will crash on Power7 or earlier.

Something like this should fix it I think.

diff --git a/arch/powerpc/lib/memcmp_64.S b/arch/powerpc/lib/memcmp_64.S
index 96eb08b2be2e..0a11ff14dcd9 100644
--- a/arch/powerpc/lib/memcmp_64.S
+++ b/arch/powerpc/lib/memcmp_64.S
@@ -236,9 +236,11 @@ _GLOBAL(memcmp)
 
 .Llong:
 #ifdef CONFIG_ALTIVEC
+BEGIN_FTR_SECTION
/* Try to use vmx loop if length is equal or greater than 4K */
cmpldi  cr6,r5,VMX_THRESH
bge cr6,.Lsameoffset_vmx_cmp
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
 .Llong_novmx_cmp:
 #endif


There's another problem which is that old toolchains don't know about
vcmpequd. To fix that we'll need to add a macro that uses .long to
construct the instruction.

cheers

Re: [PATCH] sound: Use octal not symbolic permissions

2018-05-28 Thread Takashi Iwai

On Wed, 23 May 2018 21:20:59 +0200,
Joe Perches wrote:
> 
> Convert the S_ symbolic permissions to their octal equivalents as
> using octal and not symbolic permissions is preferred by many as more
> readable.
> 
> see: https://lkml.org/lkml/2016/8/2/1945
> 
> Done with automated conversion via:
> $ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace 
> 
> Miscellanea:
> 
> o Wrapped one multi-line call to a single line
> 
> Signed-off-by: Joe Perches 

Applied, thanks.


Takashi

Re: [PATCH v6 2/4] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision

2018-05-28 Thread Segher Boessenkool

On Fri, May 25, 2018 at 12:07:34PM +0800, wei.guo.si...@gmail.com wrote:
> + /* save and restore cr0 */
> + mfocrf  r5,64
> + EXIT_VMX_OPS
> + mtocrf  64,r5
> + b   .LcmpAB_lightweight

That's cr1, not cr0.  You can use mcrf instead, it is cheaper (esp. if
you have it in a non-volatile CR field before so you need only one, if any).

> + vcmpequb.  v7,v9,v10
> + bnl cr6,.Ldiffoffset_vmx_diff_found

In other places you say  bf 24,...  Dunno which is more readable, but
please pick one?

Segher

RE: [PATCH v11 00/26] Speculative page faults

2018-05-28 Thread Wang, Kemi

Full run would take one or two weeks depended on our resource available. Could 
you pick some ones up, e.g. those have performance regression?

-Original Message-
From: owner-linux...@kvack.org [mailto:owner-linux...@kvack.org] On Behalf Of 
Laurent Dufour
Sent: Monday, May 28, 2018 4:55 PM
To: Song, HaiyanX 
Cc: a...@linux-foundation.org; mho...@kernel.org; pet...@infradead.org; 
kir...@shutemov.name; a...@linux.intel.com; d...@stgolabs.net; j...@suse.cz; 
Matthew Wilcox ; khand...@linux.vnet.ibm.com; 
aneesh.ku...@linux.vnet.ibm.com; b...@kernel.crashing.org; m...@ellerman.id.au; 
pau...@samba.org; Thomas Gleixner ; Ingo Molnar 
; h...@zytor.com; Will Deacon ; Sergey 
Senozhatsky ; sergey.senozhatsky.w...@gmail.com; 
Andrea Arcangeli ; Alexei Starovoitov 
; Wang, Kemi ; Daniel Jordan 
; David Rientjes ; Jerome 
Glisse ; Ganesh Mahendran ; 
Minchan Kim ; Punit Agrawal ; 
vinayak menon ; Yang Shi ; 
linux-ker...@vger.kernel.org; linux...@kvack.org; ha...@linux.vnet.ibm.com; 
npig...@gmail.com; bsinghar...@gmail.com; paul...@linux.vnet.ibm.com; Tim Chen 
; linuxppc-dev@lists.ozlabs.org; x...@kernel.org
Subject: Re: [PATCH v11 00/26] Speculative page faults

On 28/05/2018 10:22, Haiyan Song wrote:
> Hi Laurent,
> 
> Yes, these tests are done on V9 patch.

Do you plan to give this V11 a run ?

> 
> 
> Best regards,
> Haiyan Song
> 
> On Mon, May 28, 2018 at 09:51:34AM +0200, Laurent Dufour wrote:
>> On 28/05/2018 07:23, Song, HaiyanX wrote:
>>>
>>> Some regression and improvements is found by LKP-tools(linux kernel 
>>> performance) on V9 patch series tested on Intel 4s Skylake platform.
>>
>> Hi,
>>
>> Thanks for reporting this benchmark results, but you mentioned the 
>> "V9 patch series" while responding to the v11 header series...
>> Were these tests done on v9 or v11 ?
>>
>> Cheers,
>> Laurent.
>>
>>>
>>> The regression result is sorted by the metric will-it-scale.per_thread_ops.
>>> Branch: Laurent-Dufour/Speculative-page-faults/20180316-151833 (V9 
>>> patch series) Commit id:
>>> base commit: d55f34411b1b126429a823d06c3124c16283231f
>>> head commit: 0355322b3577eeab7669066df42c550a56801110
>>> Benchmark suite: will-it-scale
>>> Download link:
>>> https://github.com/antonblanchard/will-it-scale/tree/master/tests
>>> Metrics:
>>> will-it-scale.per_process_ops=processes/nr_cpu
>>> will-it-scale.per_thread_ops=threads/nr_cpu
>>> test box: lkp-skl-4sp1(nr_cpu=192,memory=768G)
>>> THP: enable / disable
>>> nr_task: 100%
>>>
>>> 1. Regressions:
>>> a) THP enabled:
>>> testcasebasechange  head   
>>> metric
>>> page_fault3/ enable THP 10092   -17.5%  8323   
>>> will-it-scale.per_thread_ops
>>> page_fault2/ enable THP  8300   -17.2%  6869   
>>> will-it-scale.per_thread_ops
>>> brk1/ enable THP  957.67 -7.6%   885   
>>> will-it-scale.per_thread_ops
>>> page_fault3/ enable THP172821-5.3%163692   
>>> will-it-scale.per_process_ops
>>> signal1/ enable THP  9125-3.2%  8834   
>>> will-it-scale.per_process_ops
>>>
>>> b) THP disabled:
>>> testcasebasechange  head   
>>> metric
>>> page_fault3/ disable THP10107   -19.1%  8180   
>>> will-it-scale.per_thread_ops
>>> page_fault2/ disable THP 8432   -17.8%  6931   
>>> will-it-scale.per_thread_ops
>>> context_switch1/ disable THP   215389-6.8%200776   
>>> will-it-scale.per_thread_ops
>>> brk1/ disable THP 939.67 -6.6%   877.33
>>> will-it-scale.per_thread_ops
>>> page_fault3/ disable THP   173145-4.7%165064   
>>> will-it-scale.per_process_ops
>>> signal1/ disable THP 9162-3.9%  8802   
>>> will-it-scale.per_process_ops
>>>
>>> 2. Improvements:
>>> a) THP enabled:
>>> testcasebasechange  head   
>>> metric
>>> malloc1/ enable THP   66.33+469.8%   383.67
>>> will-it-scale.per_thread_ops
>>> writeseek3/ enable THP  2531 +4.5%  2646   
>>> will-it-scale.per_thread_ops
>>> signal1/ enable THP  989.33  +2.8%  1016   
>>> will-it-scale.per_thread_ops
>>>
>>> b) THP disabled:
>>> testcasebasechange  head   
>>>

Re: [PATCH v6 1/4] powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp()

2018-05-28 Thread Segher Boessenkool

On Fri, May 25, 2018 at 12:07:33PM +0800, wei.guo.si...@gmail.com wrote:
>  _GLOBAL(memcmp)
>   cmpdi   cr1,r5,0
>  
> - /* Use the short loop if both strings are not 8B aligned */
> - or  r6,r3,r4
> + /* Use the short loop if the src/dst addresses are not
> +  * with the same offset of 8 bytes align boundary.
> +  */
> + xor r6,r3,r4
>   andi.   r6,r6,7
>  
> - /* Use the short loop if length is less than 32B */
> - cmpdi   cr6,r5,31
> + /* Fall back to short loop if compare at aligned addrs
> +  * with less than 8 bytes.
> +  */
> + cmpdi   cr6,r5,7
>  
>   beq cr1,.Lzero
> - bne .Lshort
> - bgt cr6,.Llong
> + bgt cr6,.Lno_short

If this doesn't use cr0 anymore, you can do  rlwinm r6,r6,0,7  instead of
andi r6,r6,7 .

> +.Lsameoffset_8bytes_make_align_start:
> + /* attempt to compare bytes not aligned with 8 bytes so that
> +  * rest comparison can run based on 8 bytes alignment.
> +  */
> + andi.   r6,r3,7
> +
> + /* Try to compare the first double word which is not 8 bytes aligned:
> +  * load the first double word at (src & ~7UL) and shift left appropriate
> +  * bits before comparision.
> +  */
> + clrlwi  r6,r3,29
> + rlwinm  r6,r6,3,0,28

Those last two lines are together just
  rlwinm r6,r3,3,0x1c

> + subfc.  r5,r6,r5

Why subfc?  You don't use the carry.

> + rlwinm  r6,r6,3,0,28

That's
  slwi r6,r6,3

> + bgt cr0,8f
> + li  r3,-1
> +8:
> + blr

  blelr
  li r3,-1
  blr

(and more of the same things elsewhere).


Segher

[PATCH v5 3/3] powerpc/lib: optimise PPC32 memcmp

2018-05-28 Thread Christophe Leroy

At the time being, memcmp() compares two chunks of memory
byte per byte.

This patch optimises the comparison by comparing word by word.

A small benchmark performed on an 8xx comparing two chuncks
of 512 bytes performed 10 times gives:

Before : 5852274 TB ticks
After:   1488638 TB ticks

This is almost 4 times faster

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/lib/string_32.S | 37 +++--
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/lib/string_32.S b/arch/powerpc/lib/string_32.S
index 40a576d56ac7..4fbaa046aa84 100644
--- a/arch/powerpc/lib/string_32.S
+++ b/arch/powerpc/lib/string_32.S
@@ -16,17 +16,34 @@
.text
 
 _GLOBAL(memcmp)
-   cmpwi   cr0, r5, 0
-   beq-2f
-   mtctr   r5
-   addir6,r3,-1
-   addir4,r4,-1
-1: lbzur3,1(r6)
-   lbzur0,1(r4)
-   subf.   r3,r0,r3
-   bdnzt   2,1b
+   srawi.  r7, r5, 2   /* Divide len by 4 */
+   mr  r6, r3
+   beq-3f
+   mtctr   r7
+   li  r7, 0
+1: lwzxr3, r6, r7
+   lwzxr0, r4, r7
+   addir7, r7, 4
+   cmplw   cr0, r3, r0
+   bdnzt   eq, 1b
+   bne 5f
+3: andi.   r3, r5, 3
+   beqlr
+   cmplwi  cr1, r3, 2
+   blt-cr1, 4f
+   lhzxr3, r6, r7
+   lhzxr0, r4, r7
+   addir7, r7, 2
+   subf.   r3, r0, r3
+   beqlr   cr1
+   bnelr
+4: lbzxr3, r6, r7
+   lbzxr0, r4, r7
+   subf.   r3, r0, r3
blr
-2: li  r3,0
+5: li  r3, 1
+   bgtlr
+   li  r3, -1
blr
 EXPORT_SYMBOL(memcmp)
 
-- 
2.13.3

[PATCH v5 2/3] powerpc/lib: optimise 32 bits __clear_user()

2018-05-28 Thread Christophe Leroy

Rewrite clear_user() on the same principle as memset(0), making use
of dcbz to clear complete cache lines.

This code is a copy/paste of memset(), with some modifications
in order to retrieve remaining number of bytes to be cleared,
as it needs to be returned in case of error.

On a MPC885, throughput is almost doubled:

Before:
~# dd if=/dev/zero of=/dev/null bs=1M count=1000
1048576000 bytes (1000.0MB) copied, 18.990779 seconds, 52.7MB/s

After:
~# dd if=/dev/zero of=/dev/null bs=1M count=1000
1048576000 bytes (1000.0MB) copied, 9.611468 seconds, 104.0MB/s

On a MPC8321, throughput is multiplied by 2.12:

Before:
root@vgoippro:~# dd if=/dev/zero of=/dev/null bs=1M count=1000
1048576000 bytes (1000.0MB) copied, 6.844352 seconds, 146.1MB/s

After:
root@vgoippro:~# dd if=/dev/zero of=/dev/null bs=1M count=1000
1048576000 bytes (1000.0MB) copied, 3.218854 seconds, 310.7MB/s

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/lib/string_32.S | 85 +++-
 1 file changed, 60 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/lib/string_32.S b/arch/powerpc/lib/string_32.S
index 204db8a834fd..40a576d56ac7 100644
--- a/arch/powerpc/lib/string_32.S
+++ b/arch/powerpc/lib/string_32.S
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
.text
 
@@ -29,44 +30,78 @@ _GLOBAL(memcmp)
blr
 EXPORT_SYMBOL(memcmp)
 
+CACHELINE_BYTES = L1_CACHE_BYTES
+LG_CACHELINE_BYTES = L1_CACHE_SHIFT
+CACHELINE_MASK = (L1_CACHE_BYTES-1)
+
 _GLOBAL(__clear_user)
-   addir6,r3,-4
-   li  r3,0
-   li  r5,0
-   cmplwi  0,r4,4
+/*
+ * Use dcbz on the complete cache lines in the destination
+ * to set them to zero.  This requires that the destination
+ * area is cacheable.
+ */
+   cmplwi  cr0, r4, 4
+   mr  r10, r3
+   li  r3, 0
blt 7f
-   /* clear a single word */
-11:stwur5,4(r6)
+
+11:stw r3, 0(r10)
beqlr
-   /* clear word sized chunks */
-   andi.   r0,r6,3
-   add r4,r0,r4
-   subfr6,r0,r6
-   srwir0,r4,2
-   andi.   r4,r4,3
+   andi.   r0, r10, 3
+   add r11, r0, r4
+   subfr6, r0, r10
+
+   clrlwi  r7, r6, 32 - LG_CACHELINE_BYTES
+   add r8, r7, r11
+   srwir9, r8, LG_CACHELINE_BYTES
+   addic.  r9, r9, -1  /* total number of complete cachelines */
+   ble 2f
+   xorir0, r7, CACHELINE_MASK & ~3
+   srwi.   r0, r0, 2
+   beq 3f
+   mtctr   r0
+4: stwur3, 4(r6)
+   bdnz4b
+3: mtctr   r9
+   li  r7, 4
+10:dcbzr7, r6
+   addir6, r6, CACHELINE_BYTES
+   bdnz10b
+   clrlwi  r11, r8, 32 - LG_CACHELINE_BYTES
+   addir11, r11, 4
+
+2: srwir0 ,r11 ,2
mtctr   r0
-   bdz 7f
-1: stwur5,4(r6)
+   bdz 6f
+1: stwur3, 4(r6)
bdnz1b
-   /* clear byte sized chunks */
-7: cmpwi   0,r4,0
+6: andi.   r11, r11, 3
beqlr
-   mtctr   r4
-   addir6,r6,3
-8: stbur5,1(r6)
+   mtctr   r11
+   addir6, r6, 3
+8: stbur3, 1(r6)
bdnz8b
blr
-90:mr  r3,r4
+
+7: cmpwi   cr0, r4, 0
+   beqlr
+   mtctr   r4
+   addir6, r10, -1
+9: stbur3, 1(r6)
+   bdnz9b
blr
-91:mfctr   r3
-   slwir3,r3,2
-   add r3,r3,r4
+
+90:mr  r3, r4
blr
-92:mfctr   r3
+91:add r3, r10, r4
+   subfr3, r6, r3
blr
 
EX_TABLE(11b, 90b)
+   EX_TABLE(4b, 91b)
+   EX_TABLE(10b, 91b)
EX_TABLE(1b, 91b)
-   EX_TABLE(8b, 92b)
+   EX_TABLE(8b, 91b)
+   EX_TABLE(9b, 91b)
 
 EXPORT_SYMBOL(__clear_user)
-- 
2.13.3

[PATCH v5 1/3] powerpc/lib: move PPC32 specific functions out of string.S

2018-05-28 Thread Christophe Leroy

In preparation of optimisation patches, move PPC32 specific
memcmp() and __clear_user() into string_32.S

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/lib/Makefile|  5 +--
 arch/powerpc/lib/string.S| 61 -
 arch/powerpc/lib/string_32.S | 72 
 3 files changed, 75 insertions(+), 63 deletions(-)
 create mode 100644 arch/powerpc/lib/string_32.S

diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 653901042ad7..2c9b8c0adf22 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -26,13 +26,14 @@ obj-$(CONFIG_PPC_BOOK3S_64) += copyuser_power7.o 
copypage_power7.o \
   memcpy_power7.o
 
 obj64-y+= copypage_64.o copyuser_64.o mem_64.o hweight_64.o \
-  string_64.o memcpy_64.o memcmp_64.o pmem.o
+  memcpy_64.o memcmp_64.o pmem.o
 
 obj64-$(CONFIG_SMP)+= locks.o
 obj64-$(CONFIG_ALTIVEC)+= vmx-helper.o
 obj64-$(CONFIG_KPROBES_SANITY_TEST) += test_emulate_step.o
 
-obj-y  += checksum_$(BITS).o checksum_wrappers.o
+obj-y  += checksum_$(BITS).o checksum_wrappers.o \
+  string_$(BITS).o
 
 obj-y  += sstep.o ldstfp.o quad.o
 obj64-y+= quad.o
diff --git a/arch/powerpc/lib/string.S b/arch/powerpc/lib/string.S
index 0378def28d41..b03c1a6861f4 100644
--- a/arch/powerpc/lib/string.S
+++ b/arch/powerpc/lib/string.S
@@ -56,23 +56,6 @@ _GLOBAL(strncmp)
blr
 EXPORT_SYMBOL(strncmp)
 
-#ifdef CONFIG_PPC32
-_GLOBAL(memcmp)
-   PPC_LCMPI 0,r5,0
-   beq-2f
-   mtctr   r5
-   addir6,r3,-1
-   addir4,r4,-1
-1: lbzur3,1(r6)
-   lbzur0,1(r4)
-   subf.   r3,r0,r3
-   bdnzt   2,1b
-   blr
-2: li  r3,0
-   blr
-EXPORT_SYMBOL(memcmp)
-#endif
-
 _GLOBAL(memchr)
PPC_LCMPI 0,r5,0
beq-2f
@@ -86,47 +69,3 @@ _GLOBAL(memchr)
 2: li  r3,0
blr
 EXPORT_SYMBOL(memchr)
-
-#ifdef CONFIG_PPC32
-_GLOBAL(__clear_user)
-   addir6,r3,-4
-   li  r3,0
-   li  r5,0
-   cmplwi  0,r4,4
-   blt 7f
-   /* clear a single word */
-11:stwur5,4(r6)
-   beqlr
-   /* clear word sized chunks */
-   andi.   r0,r6,3
-   add r4,r0,r4
-   subfr6,r0,r6
-   srwir0,r4,2
-   andi.   r4,r4,3
-   mtctr   r0
-   bdz 7f
-1: stwur5,4(r6)
-   bdnz1b
-   /* clear byte sized chunks */
-7: cmpwi   0,r4,0
-   beqlr
-   mtctr   r4
-   addir6,r6,3
-8: stbur5,1(r6)
-   bdnz8b
-   blr
-90:mr  r3,r4
-   blr
-91:mfctr   r3
-   slwir3,r3,2
-   add r3,r3,r4
-   blr
-92:mfctr   r3
-   blr
-
-   EX_TABLE(11b, 90b)
-   EX_TABLE(1b, 91b)
-   EX_TABLE(8b, 92b)
-
-EXPORT_SYMBOL(__clear_user)
-#endif
diff --git a/arch/powerpc/lib/string_32.S b/arch/powerpc/lib/string_32.S
new file mode 100644
index ..204db8a834fd
--- /dev/null
+++ b/arch/powerpc/lib/string_32.S
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * String handling functions for PowerPC32
+ *
+ * Copyright (C) 1996 Paul Mackerras.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+   .text
+
+_GLOBAL(memcmp)
+   cmpwi   cr0, r5, 0
+   beq-2f
+   mtctr   r5
+   addir6,r3,-1
+   addir4,r4,-1
+1: lbzur3,1(r6)
+   lbzur0,1(r4)
+   subf.   r3,r0,r3
+   bdnzt   2,1b
+   blr
+2: li  r3,0
+   blr
+EXPORT_SYMBOL(memcmp)
+
+_GLOBAL(__clear_user)
+   addir6,r3,-4
+   li  r3,0
+   li  r5,0
+   cmplwi  0,r4,4
+   blt 7f
+   /* clear a single word */
+11:stwur5,4(r6)
+   beqlr
+   /* clear word sized chunks */
+   andi.   r0,r6,3
+   add r4,r0,r4
+   subfr6,r0,r6
+   srwir0,r4,2
+   andi.   r4,r4,3
+   mtctr   r0
+   bdz 7f
+1: stwur5,4(r6)
+   bdnz1b
+   /* clear byte sized chunks */
+7: cmpwi   0,r4,0
+   beqlr
+   mtctr   r4
+   addir6,r6,3
+8: stbur5,1(r6)
+   bdnz8b
+   blr
+90:mr  r3,r4
+   blr
+91:mfctr   r3
+   slwir3,r3,2
+   add r3,r3,r4
+   blr
+92:mfctr   r3
+   blr
+
+   EX_TABLE(11b, 90b)
+   EX_TABLE(1b, 91b)
+   EX_TABLE(8b, 92b)
+
+EXPORT_SYMBOL(__clear_user)
-- 
2.13.3

[PATCH v5 0/3] powerpc/lib: Optimisation of memcmp() and __clear_user() for PPC32

2018-05-28 Thread Christophe Leroy

This serie intends to optimise memcmp() and __clear_user() for PPC32
in the same spirit as already done on PPC64.

The first patch moves PPC32 specific functions from string.S into
a dedicated file named string_32.S
The second patch rewrites __clear_user() by using dcbz intruction
The third patch rewrites memcmp() to compare 32 bits words instead
of comparing byte per byte.

As shown in each individual commit log, second and third patches
provide significant improvment.

Changes in v5:
- Removed the handling of Little Endian, as PPC32 kernel only support Big Endian
at the time being, and because unaligned accesses should be handled differently
in LE.

Changes in v4:
- In memcmp(), dropped the special handling for when length is 0. Handling it 
through the small length path.

Changes in v3:
- Fixed the sign of the result returned by memcmp() by using a logical
comparison of u32 words and returning -1, 0 or 1 instead of doing a substract.
- In first patch, replaced PPC_LCMPI by cmpwi
- Fixed licence in string_32.S
- Removed the two last patches from the serie. They will be handled later as
they require further tests and analysis to properly identify their real benefit
in all possible cases.

Changes in v2:
- Moved out the patch removing the hot loop alignment on PPC32
- Squashed the changes related to NUL size verification in a single patch
- Reordered the patches in a more logical order
- Modified the inlining patch to avoid warning about impossibility to version 
symbols.


Christophe Leroy (3):
  powerpc/lib: move PPC32 specific functions out of string.S
  powerpc/lib: optimise 32 bits __clear_user()
  powerpc/lib: optimise PPC32 memcmp

 arch/powerpc/lib/Makefile|   5 +-
 arch/powerpc/lib/string.S|  61 -
 arch/powerpc/lib/string_32.S | 124 +++
 3 files changed, 127 insertions(+), 63 deletions(-)
 create mode 100644 arch/powerpc/lib/string_32.S

-- 
2.13.3

Re: [PATCH] powerpc/mm/hugetlb: Use the correct page size when flushing hugepage tlb

2018-05-28 Thread Nicholas Piggin

On Tue, 22 May 2018 14:42:09 +0530
"Aneesh Kumar K.V"  wrote:

> We used wrong page size (mmu_virtual_psize) when doing a tlbflush after
> pte update. This patch update the flush to use hugetlb page size.
> The page size is derived from hugetlb hstate.
> 
> Now that ptep_set_access_flags won't be called for hugetlb remove
> the is_vm_hugetlb_page() check and add the assert of pte lock
> unconditionally.

radix__flush_tlb_page seems to check for a hugepage vma and flush the
right page size in that case. So is there a case where the flush size
actually goes wrong?

I do think if we could make hugetlb flushing always use
flush_hugetlb_page version and take the test out of flush_tlb_page I
think that would be nicer code (I haven't checked if that's feasible
after this patch), but that wouldn't be a bug-fix.

Thanks,
Nick

Re: [PATCH 2/2] powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang

2018-05-28 Thread Nicholas Piggin

On Fri, 25 May 2018 21:19:17 +0530
"Aneesh Kumar K.V"  wrote:

> When relaxing access (read -> read_write update), pte need to be marked 
> invalid
> to handle a nest MMU bug. We also need to do a tlb flush after the pte is
> marked invalid before updating the pte with new access bits.
> 
> We also move tlb flush to platform specific __ptep_set_access_flags. This will
> help us to gerid of unnecessary tlb flush on BOOK3S 64 later. We don't do that
> in this patch. This also helps in avoiding multiple tlbies with coprocessor
> attached.
> 
> Signed-off-by: Aneesh Kumar K.V 

I think these look good to me, but same comment: can you split the API
change from the NMMU fix? The fix is just a couple of lines, and the
rest of the API change should leave generated code almost unchanged, so
it would be easier to review if these are split.

Thanks,
Nick

Re: [PATCH v11 00/26] Speculative page faults

2018-05-28 Thread Laurent Dufour

On 28/05/2018 10:22, Haiyan Song wrote:
> Hi Laurent,
> 
> Yes, these tests are done on V9 patch.

Do you plan to give this V11 a run ?

> 
> 
> Best regards,
> Haiyan Song
> 
> On Mon, May 28, 2018 at 09:51:34AM +0200, Laurent Dufour wrote:
>> On 28/05/2018 07:23, Song, HaiyanX wrote:
>>>
>>> Some regression and improvements is found by LKP-tools(linux kernel 
>>> performance) on V9 patch series
>>> tested on Intel 4s Skylake platform.
>>
>> Hi,
>>
>> Thanks for reporting this benchmark results, but you mentioned the "V9 patch
>> series" while responding to the v11 header series...
>> Were these tests done on v9 or v11 ?
>>
>> Cheers,
>> Laurent.
>>
>>>
>>> The regression result is sorted by the metric will-it-scale.per_thread_ops.
>>> Branch: Laurent-Dufour/Speculative-page-faults/20180316-151833 (V9 patch 
>>> series)
>>> Commit id:
>>> base commit: d55f34411b1b126429a823d06c3124c16283231f
>>> head commit: 0355322b3577eeab7669066df42c550a56801110
>>> Benchmark suite: will-it-scale
>>> Download link:
>>> https://github.com/antonblanchard/will-it-scale/tree/master/tests
>>> Metrics:
>>> will-it-scale.per_process_ops=processes/nr_cpu
>>> will-it-scale.per_thread_ops=threads/nr_cpu
>>> test box: lkp-skl-4sp1(nr_cpu=192,memory=768G)
>>> THP: enable / disable
>>> nr_task: 100%
>>>
>>> 1. Regressions:
>>> a) THP enabled:
>>> testcasebasechange  head   
>>> metric
>>> page_fault3/ enable THP 10092   -17.5%  8323   
>>> will-it-scale.per_thread_ops
>>> page_fault2/ enable THP  8300   -17.2%  6869   
>>> will-it-scale.per_thread_ops
>>> brk1/ enable THP  957.67 -7.6%   885   
>>> will-it-scale.per_thread_ops
>>> page_fault3/ enable THP172821-5.3%163692   
>>> will-it-scale.per_process_ops
>>> signal1/ enable THP  9125-3.2%  8834   
>>> will-it-scale.per_process_ops
>>>
>>> b) THP disabled:
>>> testcasebasechange  head   
>>> metric
>>> page_fault3/ disable THP10107   -19.1%  8180   
>>> will-it-scale.per_thread_ops
>>> page_fault2/ disable THP 8432   -17.8%  6931   
>>> will-it-scale.per_thread_ops
>>> context_switch1/ disable THP   215389-6.8%200776   
>>> will-it-scale.per_thread_ops
>>> brk1/ disable THP 939.67 -6.6%   877.33
>>> will-it-scale.per_thread_ops
>>> page_fault3/ disable THP   173145-4.7%165064   
>>> will-it-scale.per_process_ops
>>> signal1/ disable THP 9162-3.9%  8802   
>>> will-it-scale.per_process_ops
>>>
>>> 2. Improvements:
>>> a) THP enabled:
>>> testcasebasechange  head   
>>> metric
>>> malloc1/ enable THP   66.33+469.8%   383.67
>>> will-it-scale.per_thread_ops
>>> writeseek3/ enable THP  2531 +4.5%  2646   
>>> will-it-scale.per_thread_ops
>>> signal1/ enable THP  989.33  +2.8%  1016   
>>> will-it-scale.per_thread_ops
>>>
>>> b) THP disabled:
>>> testcasebasechange  head   
>>> metric
>>> malloc1/ disable THP  90.33+417.3%   467.33
>>> will-it-scale.per_thread_ops
>>> read2/ disable THP 58934+39.2% 82060   
>>> will-it-scale.per_thread_ops
>>> page_fault1/ disable THP8607+36.4% 11736   
>>> will-it-scale.per_thread_ops
>>> read1/ disable THP314063+12.7%353934   
>>> will-it-scale.per_thread_ops
>>> writeseek3/ disable THP 2452+12.5%  2759   
>>> will-it-scale.per_thread_ops
>>> signal1/ disable THP 971.33  +5.5%  1024   
>>> will-it-scale.per_thread_ops
>>>
>>> Notes: for above values in column "change", the higher value means that the 
>>> related testcase result
>>> on head commit is better than that on base commit for this benchmark.
>>>
>>>
>>> Best regards
>>> Haiyan Song
>>>
>>> 
>>> From: owner-linux...@kvack.org [owner-linux...@kvack.org] on behalf of 
>>> Laurent Dufour [lduf...@linux.vnet.ibm.com]
>>> Sent: Thursday, May 17, 2018 7:06 PM
>>> To: a...@linux-foundation.org; mho...@kernel.org; pet...@infradead.org; 
>>> kir...@shutemov.name; a...@linux.intel.com; d...@stgolabs.net; 
>>> j...@suse.cz; Matthew Wilcox; khand...@linux.vnet.ibm.com; 
>>> aneesh.ku...@linux.vnet.ibm.com; b...@kernel.crashing.org; 
>>> m...@ellerman.id.au; pau...@samba.org; Thomas Gleixner; Ingo Molnar; 
>>> h...@zytor.com; Will Deacon; Sergey Senozhatsky; 
>>> sergey.senozhatsky.w...@gmail.com; Andrea Arcangeli; Alexei Starovoitov;

Re: [PATCH v11 00/26] Speculative page faults

2018-05-28 Thread Haiyan Song

Hi Laurent,

Yes, these tests are done on V9 patch.


Best regards,
Haiyan Song

On Mon, May 28, 2018 at 09:51:34AM +0200, Laurent Dufour wrote:
> On 28/05/2018 07:23, Song, HaiyanX wrote:
> > 
> > Some regression and improvements is found by LKP-tools(linux kernel 
> > performance) on V9 patch series
> > tested on Intel 4s Skylake platform.
> 
> Hi,
> 
> Thanks for reporting this benchmark results, but you mentioned the "V9 patch
> series" while responding to the v11 header series...
> Were these tests done on v9 or v11 ?
> 
> Cheers,
> Laurent.
> 
> > 
> > The regression result is sorted by the metric will-it-scale.per_thread_ops.
> > Branch: Laurent-Dufour/Speculative-page-faults/20180316-151833 (V9 patch 
> > series)
> > Commit id:
> > base commit: d55f34411b1b126429a823d06c3124c16283231f
> > head commit: 0355322b3577eeab7669066df42c550a56801110
> > Benchmark suite: will-it-scale
> > Download link:
> > https://github.com/antonblanchard/will-it-scale/tree/master/tests
> > Metrics:
> > will-it-scale.per_process_ops=processes/nr_cpu
> > will-it-scale.per_thread_ops=threads/nr_cpu
> > test box: lkp-skl-4sp1(nr_cpu=192,memory=768G)
> > THP: enable / disable
> > nr_task: 100%
> > 
> > 1. Regressions:
> > a) THP enabled:
> > testcasebasechange  head   
> > metric
> > page_fault3/ enable THP 10092   -17.5%  8323   
> > will-it-scale.per_thread_ops
> > page_fault2/ enable THP  8300   -17.2%  6869   
> > will-it-scale.per_thread_ops
> > brk1/ enable THP  957.67 -7.6%   885   
> > will-it-scale.per_thread_ops
> > page_fault3/ enable THP172821-5.3%163692   
> > will-it-scale.per_process_ops
> > signal1/ enable THP  9125-3.2%  8834   
> > will-it-scale.per_process_ops
> > 
> > b) THP disabled:
> > testcasebasechange  head   
> > metric
> > page_fault3/ disable THP10107   -19.1%  8180   
> > will-it-scale.per_thread_ops
> > page_fault2/ disable THP 8432   -17.8%  6931   
> > will-it-scale.per_thread_ops
> > context_switch1/ disable THP   215389-6.8%200776   
> > will-it-scale.per_thread_ops
> > brk1/ disable THP 939.67 -6.6%   877.33
> > will-it-scale.per_thread_ops
> > page_fault3/ disable THP   173145-4.7%165064   
> > will-it-scale.per_process_ops
> > signal1/ disable THP 9162-3.9%  8802   
> > will-it-scale.per_process_ops
> > 
> > 2. Improvements:
> > a) THP enabled:
> > testcasebasechange  head   
> > metric
> > malloc1/ enable THP   66.33+469.8%   383.67
> > will-it-scale.per_thread_ops
> > writeseek3/ enable THP  2531 +4.5%  2646   
> > will-it-scale.per_thread_ops
> > signal1/ enable THP  989.33  +2.8%  1016   
> > will-it-scale.per_thread_ops
> > 
> > b) THP disabled:
> > testcasebasechange  head   
> > metric
> > malloc1/ disable THP  90.33+417.3%   467.33
> > will-it-scale.per_thread_ops
> > read2/ disable THP 58934+39.2% 82060   
> > will-it-scale.per_thread_ops
> > page_fault1/ disable THP8607+36.4% 11736   
> > will-it-scale.per_thread_ops
> > read1/ disable THP314063+12.7%353934   
> > will-it-scale.per_thread_ops
> > writeseek3/ disable THP 2452+12.5%  2759   
> > will-it-scale.per_thread_ops
> > signal1/ disable THP 971.33  +5.5%  1024   
> > will-it-scale.per_thread_ops
> > 
> > Notes: for above values in column "change", the higher value means that the 
> > related testcase result
> > on head commit is better than that on base commit for this benchmark.
> > 
> > 
> > Best regards
> > Haiyan Song
> > 
> > 
> > From: owner-linux...@kvack.org [owner-linux...@kvack.org] on behalf of 
> > Laurent Dufour [lduf...@linux.vnet.ibm.com]
> > Sent: Thursday, May 17, 2018 7:06 PM
> > To: a...@linux-foundation.org; mho...@kernel.org; pet...@infradead.org; 
> > kir...@shutemov.name; a...@linux.intel.com; d...@stgolabs.net; 
> > j...@suse.cz; Matthew Wilcox; khand...@linux.vnet.ibm.com; 
> > aneesh.ku...@linux.vnet.ibm.com; b...@kernel.crashing.org; 
> > m...@ellerman.id.au; pau...@samba.org; Thomas Gleixner; Ingo Molnar; 
> > h...@zytor.com; Will Deacon; Sergey Senozhatsky; 
> > sergey.senozhatsky.w...@gmail.com; Andrea Arcangeli; Alexei Starovoitov; 
> > Wang, Kemi; Daniel Jordan; David Rientjes; Jerome Glisse; Ganesh Mahendran; 
> > Minchan

Re: [PATCH v11 00/26] Speculative page faults

2018-05-28 Thread Laurent Dufour

On 28/05/2018 07:23, Song, HaiyanX wrote:
> 
> Some regression and improvements is found by LKP-tools(linux kernel 
> performance) on V9 patch series
> tested on Intel 4s Skylake platform.

Hi,

Thanks for reporting this benchmark results, but you mentioned the "V9 patch
series" while responding to the v11 header series...
Were these tests done on v9 or v11 ?

Cheers,
Laurent.

> 
> The regression result is sorted by the metric will-it-scale.per_thread_ops.
> Branch: Laurent-Dufour/Speculative-page-faults/20180316-151833 (V9 patch 
> series)
> Commit id:
> base commit: d55f34411b1b126429a823d06c3124c16283231f
> head commit: 0355322b3577eeab7669066df42c550a56801110
> Benchmark suite: will-it-scale
> Download link:
> https://github.com/antonblanchard/will-it-scale/tree/master/tests
> Metrics:
> will-it-scale.per_process_ops=processes/nr_cpu
> will-it-scale.per_thread_ops=threads/nr_cpu
> test box: lkp-skl-4sp1(nr_cpu=192,memory=768G)
> THP: enable / disable
> nr_task: 100%
> 
> 1. Regressions:
> a) THP enabled:
> testcasebasechange  head   
> metric
> page_fault3/ enable THP 10092   -17.5%  8323   
> will-it-scale.per_thread_ops
> page_fault2/ enable THP  8300   -17.2%  6869   
> will-it-scale.per_thread_ops
> brk1/ enable THP  957.67 -7.6%   885   
> will-it-scale.per_thread_ops
> page_fault3/ enable THP172821-5.3%163692   
> will-it-scale.per_process_ops
> signal1/ enable THP  9125-3.2%  8834   
> will-it-scale.per_process_ops
> 
> b) THP disabled:
> testcasebasechange  head   
> metric
> page_fault3/ disable THP10107   -19.1%  8180   
> will-it-scale.per_thread_ops
> page_fault2/ disable THP 8432   -17.8%  6931   
> will-it-scale.per_thread_ops
> context_switch1/ disable THP   215389-6.8%200776   
> will-it-scale.per_thread_ops
> brk1/ disable THP 939.67 -6.6%   877.33
> will-it-scale.per_thread_ops
> page_fault3/ disable THP   173145-4.7%165064   
> will-it-scale.per_process_ops
> signal1/ disable THP 9162-3.9%  8802   
> will-it-scale.per_process_ops
> 
> 2. Improvements:
> a) THP enabled:
> testcasebasechange  head   
> metric
> malloc1/ enable THP   66.33+469.8%   383.67
> will-it-scale.per_thread_ops
> writeseek3/ enable THP  2531 +4.5%  2646   
> will-it-scale.per_thread_ops
> signal1/ enable THP  989.33  +2.8%  1016   
> will-it-scale.per_thread_ops
> 
> b) THP disabled:
> testcasebasechange  head   
> metric
> malloc1/ disable THP  90.33+417.3%   467.33
> will-it-scale.per_thread_ops
> read2/ disable THP 58934+39.2% 82060   
> will-it-scale.per_thread_ops
> page_fault1/ disable THP8607+36.4% 11736   
> will-it-scale.per_thread_ops
> read1/ disable THP314063+12.7%353934   
> will-it-scale.per_thread_ops
> writeseek3/ disable THP 2452+12.5%  2759   
> will-it-scale.per_thread_ops
> signal1/ disable THP 971.33  +5.5%  1024   
> will-it-scale.per_thread_ops
> 
> Notes: for above values in column "change", the higher value means that the 
> related testcase result
> on head commit is better than that on base commit for this benchmark.
> 
> 
> Best regards
> Haiyan Song
> 
> 
> From: owner-linux...@kvack.org [owner-linux...@kvack.org] on behalf of 
> Laurent Dufour [lduf...@linux.vnet.ibm.com]
> Sent: Thursday, May 17, 2018 7:06 PM
> To: a...@linux-foundation.org; mho...@kernel.org; pet...@infradead.org; 
> kir...@shutemov.name; a...@linux.intel.com; d...@stgolabs.net; j...@suse.cz; 
> Matthew Wilcox; khand...@linux.vnet.ibm.com; aneesh.ku...@linux.vnet.ibm.com; 
> b...@kernel.crashing.org; m...@ellerman.id.au; pau...@samba.org; Thomas 
> Gleixner; Ingo Molnar; h...@zytor.com; Will Deacon; Sergey Senozhatsky; 
> sergey.senozhatsky.w...@gmail.com; Andrea Arcangeli; Alexei Starovoitov; 
> Wang, Kemi; Daniel Jordan; David Rientjes; Jerome Glisse; Ganesh Mahendran; 
> Minchan Kim; Punit Agrawal; vinayak menon; Yang Shi
> Cc: linux-ker...@vger.kernel.org; linux...@kvack.org; 
> ha...@linux.vnet.ibm.com; npig...@gmail.com; bsinghar...@gmail.com; 
> paul...@linux.vnet.ibm.com; Tim Chen; linuxppc-dev@lists.ozlabs.org; 
> x...@kernel.org
> Subject: [PATCH v11 00/26] Speculative page faults
> 
> This is a port on kernel 4.17 of the work done by

Re: [PATCH 07/14] powerpc: Add support for restartable sequences

2018-05-28 Thread Mathieu Desnoyers

- On May 24, 2018, at 3:03 AM, Michael Ellerman m...@ellerman.id.au wrote:

> Mathieu Desnoyers  writes:
>> - On May 23, 2018, at 4:14 PM, Mathieu Desnoyers
>> mathieu.desnoy...@efficios.com wrote:
> ...
>>> 
>>> Hi Boqun,
>>> 
>>> I tried your patch in a ppc64 le environment, and it does not survive boot
>>> with CONFIG_DEBUG_RSEQ=y. init gets killed right away.
> 
> 
> Sorry this code is super gross and hard to deal with.
> 
>> The following fixup gets ppc64 to work:
>>
>> --- a/arch/powerpc/kernel/entry_64.S
>> +++ b/arch/powerpc/kernel/entry_64.S
>> @@ -208,6 +208,7 @@ system_call_exit:
>> /* Check whether the syscall is issued inside a restartable sequence 
>> */
>> addir3,r1,STACK_FRAME_OVERHEAD
>> bl  rseq_syscall
>> +   ld  r3,RESULT(r1)
>>  #endif
>> /*
>>  * Disable interrupts so current_thread_info()->flags can't change,
> 
> I don't think that's safe.
> 
> If you look above that, we have r3, r8 and r12 all live:
> 
> .Lsyscall_exit:
>   std r3,RESULT(r1)
>   CURRENT_THREAD_INFO(r12, r1)
> 
>   ld  r8,_MSR(r1)
> #ifdef CONFIG_PPC_BOOK3S
>   /* No MSR:RI on BookE */
>   andi.   r10,r8,MSR_RI
>   beq-.Lunrecov_restore
> #endif
> 
> 
> They're all volatile across function calls:
> 
>  
> http://openpowerfoundation.org/wp-content/uploads/resources/leabi/content/dbdoclet.50655240_68174.html
> 
> 
> The system_call_exit symbol is actually there for kprobes and cosmetic
> purposes. The actual syscall return flow starts at .Lsyscall_exit.
> 
> So I think this would work:
> 
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index db4df061c33a..e19f377a25e0 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -184,6 +184,14 @@ system_call: /* label this so stack 
> traces look sane */
> 
> .Lsyscall_exit:
>   std r3,RESULT(r1)
> +
> +#ifdef CONFIG_DEBUG_RSEQ
> + /* Check whether the syscall is issued inside a restartable sequence */
> + addir3,r1,STACK_FRAME_OVERHEAD
> + bl  rseq_syscall
> + ld  r3,RESULT(r1)
> +#endif
> +
>   CURRENT_THREAD_INFO(r12, r1)
> 
>   ld  r8,_MSR(r1)
> 
> 
> I'll try and get this series into my test setup at some point, been a
> bit busy lately :)

Yes, this was needed. I had this in my tree already, but there is still
a kernel OOPS when running the rseq selftests on ppc64 with CONFIG_DEBUG_RSEQ=y.

My current dev tree is at: 
https://github.com/compudj/linux-percpu-dev/tree/rseq/dev-local

So considering we are at rc7 now, should I plan to removing the powerpc bits
for merge window submission, or is there someone planning to spend time on
fixing and testing ppc integration before the merge window opens ?

Thanks,

Mathieu


> 
> cheers

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

[PATCH] powerpc/Makefile: set -mcpu=860 flag for the 8xx

2018-05-28 Thread Christophe Leroy

When compiled with GCC 8.1, vmlinux is significantly bigger than
with GCC 4.8.

When looking at the generated code with objdump, we notice that
all functions and loops when a 16 bytes alignment. This significantly
increases the size of the kernel. It is pointless and even
counterproductive as on the 8xx 'nop' also consumes one clock cycle.

Size of vmlinux with GCC 4.8:
   textdata bss dec hex filename
5801948 1626076  457796 7885820  7853fc vmlinux

Size of vmlinux with GCC 8.1:
   textdata bss dec hex filename
6764592 1630652  456476 8851720  871108 vmlinux

Size of vmlinux with GCC 8.1 and this patch:
   textdata bss dec hex filename
6331544 1631756  456476 8419776  8079c0 vmlinux

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 6a050b42a9f2..26ccd0b91512 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -178,6 +178,7 @@ CFLAGS-$(CONFIG_POWER6_CPU) += $(call 
cc-option,-mcpu=power6)
 CFLAGS-$(CONFIG_POWER7_CPU) += $(call cc-option,-mcpu=power7)
 CFLAGS-$(CONFIG_POWER8_CPU) += $(call cc-option,-mcpu=power8)
 CFLAGS-$(CONFIG_POWER9_CPU) += $(call cc-option,-mcpu=power9)
+CFLAGS-$(CONFIG_PPC_8xx) += $(call cc-option,-mcpu=860)
 
 # Altivec option not allowed with e500mc64 in GCC.
 ifeq ($(CONFIG_ALTIVEC),y)
-- 
2.13.3

40 matches

Mail list logo