Re: [PATCH 3/5] powerpc/kprobes: Fix warnings from __this_cpu_read() on preempt kernels

2017-09-13 Thread Naveen N. Rao
On 2017/09/13 05:36PM, Masami Hiramatsu wrote:
> On Thu, 14 Sep 2017 02:50:34 +0530
> "Naveen N. Rao"  wrote:
> 
> > Kamalesh pointed out that we are getting the below call traces with
> > livepatched functions when we enable CONFIG_PREEMPT:
> > 
> > [  495.470721] BUG: using __this_cpu_read() in preemptible [] code: 
> > cat/8394
> > [  495.471167] caller is is_current_kprobe_addr+0x30/0x90
> > [  495.471171] CPU: 4 PID: 8394 Comm: cat Tainted: G  K 
> > 4.13.0-rc7-nnr+ #95
> > [  495.471173] Call Trace:
> > [  495.471178] [c0008fd9b960] [c09f039c] dump_stack+0xec/0x160 
> > (unreliable)
> > [  495.471184] [c0008fd9b9a0] [c059169c] 
> > check_preemption_disabled+0x15c/0x170
> > [  495.471187] [c0008fd9ba30] [c0046460] 
> > is_current_kprobe_addr+0x30/0x90
> > [  495.471191] [c0008fd9ba60] [c004e9a0] ftrace_call+0x1c/0xb8
> > [  495.471195] [c0008fd9bc30] [c0376fd8] seq_read+0x238/0x5c0
> > [  495.471199] [c0008fd9bcd0] [c03cfd78] proc_reg_read+0x88/0xd0
> > [  495.471203] [c0008fd9bd00] [c033e5d4] __vfs_read+0x44/0x1b0
> > [  495.471206] [c0008fd9bd90] [c03402ec] vfs_read+0xbc/0x1b0
> > [  495.471210] [c0008fd9bde0] [c0342138] SyS_read+0x68/0x110
> > [  495.471214] [c0008fd9be30] [c000bc6c] system_call+0x58/0x6c
> > 
> > Commit c05b8c4474c030 ("powerpc/kprobes: Skip livepatch_handler() for
> > jprobes") introduced a helper is_current_kprobe_addr() to help determine
> > if the current function has been livepatched or if it has a jprobe
> > installed, both of which modify the NIP.
> > 
> > In the case of a jprobe, kprobe_ftrace_handler() disables pre-emption
> > before calling into setjmp_pre_handler() which returns without disabling
> > pre-emption. This is done to ensure that the jprobe han dler won't
> > disappear beneath us if the jprobe is unregistered between the
> > setjmp_pre_handler() and the subsequent longjmp_break_handler() called
> > from the jprobe handler. Due to this, we can use __this_cpu_read() in
> > is_current_kprobe_addr() with the pre-emption check as we know that
> > pre-emption will be disabled.
> > 
> > However, if this function has been livepatched, we are still doing this
> > check and when we do so, pre-emption won't necessarily be disabled. This
> > results in the call trace shown above.
> > 
> > Fix this by only invoking is_current_kprobe_addr() when pre-emption is
> > disabled. And since we now guard this within a pre-emption check, we can
> > instead use raw_cpu_read() to get the current_kprobe value skipping the
> > check done by __this_cpu_read().
> 
> Hmm, can you disable preempt temporary at this function and read it?

Yes, but I felt this approach is more optimal specifically for live 
patching. We don't normally expect preemption to be disabled while 
handling a livepatched function, so the simple 'if (!preemptible())'
check helps us bypass looking at current_kprobe.

The check also ensures we can use raw_cpu_read() safely in other 
scenarios. Do you see any other concerns with this approach?

Thanks,
Naveen

> 
> Thank you,
> 
> > 
> > Fixes: c05b8c4474c030 ("powerpc/kprobes: Skip livepatch_handler() for 
> > jprobes")
> > Reported-by: Kamalesh Babulal 
> > Signed-off-by: Naveen N. Rao 
> > ---
> >  arch/powerpc/kernel/kprobes.c | 8 ++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> > index e848fe2c93fb..db40b13fd3d1 100644
> > --- a/arch/powerpc/kernel/kprobes.c
> > +++ b/arch/powerpc/kernel/kprobes.c
> > @@ -45,8 +45,12 @@ struct kretprobe_blackpoint kretprobe_blacklist[] = 
> > {{NULL, NULL}};
> >  
> >  int is_current_kprobe_addr(unsigned long addr)
> >  {
> > -   struct kprobe *p = kprobe_running();
> > -   return (p && (unsigned long)p->addr == addr) ? 1 : 0;
> > +   if (!preemptible()) {
> > +   struct kprobe *p = raw_cpu_read(current_kprobe);
> > +   return (p && (unsigned long)p->addr == addr) ? 1 : 0;
> > +   }
> > +
> > +   return 0;
> >  }
> >  
> >  bool arch_within_kprobe_blacklist(unsigned long addr)
> > -- 
> > 2.14.1
> > 
> 
> 
> -- 
> Masami Hiramatsu 
> 



Re: [RFC PATCH 2/2] powerpc/powernv: implement NMI IPIs with OPAL_SIGNAL_SYSTEM_RESET

2017-09-13 Thread Alistair Popple
On Thu, 14 Sep 2017 04:32:28 PM Nicholas Piggin wrote:
> On Thu, 14 Sep 2017 12:24:49 +1000
> Benjamin Herrenschmidt  wrote:
> 
> > On Wed, 2017-09-13 at 23:13 +1000, Nicholas Piggin wrote:
> > > On Wed, 13 Sep 2017 02:05:53 +1000
> > > Nicholas Piggin  wrote:
> > >   
> > > > There are two complications. The first is that sreset from stop states
> > > > come in with SRR1 set to do a powersave wakeup, with an sreset reason
> > > > encoded.
> > > > 
> > > > The second is that threads on the same core can't be signalled directly
> > > > so we must designate a bounce CPU to reflect the IPI back.  
> > > 
> > > Here is an updated Linux patch for the latest OPAL patch. This has
> > > a few assorted fixes as well to make it work nicely, I roll them into
> > > one patch here to make it easy to apply for testing the OPAL patch.  
> > 
> > Why can't you sreset threads of the same core on P9 ?
> 
> It looks like we can, I think I had some other bugs still not ironed
> out when I previously tested it.
> 
> That simplifies things a lot on the Linux side. It may be that the
> bounce is still required if we implement it on POWER8 using ramming,
> but I'll get the POWER9 code in first.

Right, the bouncing is still required on P8 because we need to ram instructions
and you can only ram instructions if all threads on a core are quiesced.

- Alistair

>
> Thanks,
> Nick



Re: [PATCH 2/5] powerpc/kprobes: Do not suppress instruction emulation if a single run failed

2017-09-13 Thread Naveen N. Rao
On 2017/09/13 04:53PM, Masami Hiramatsu wrote:
> On Thu, 14 Sep 2017 02:50:33 +0530
> "Naveen N. Rao"  wrote:
> 
> > Currently, we disable instruction emulation if emulate_step() fails for
> > any reason. However, such failures could be transient and specific to a
> > particular run. Instead, only disable instruction emulation if we have
> > never been able to emulate this. If we had emulated this instruction
> > successfully at least once, then we single step only this probe hit and
> > continue to try emulating the instruction in subsequent probe hits.
> 
> Hmm, would this mean that the instruction is emulatable or not depends
> on context? What kind of situation is considerable?

Yes, as an example, a load/store instruction can cause exceptions 
depending on the address. In some of those cases, we will have to single 
step the instruction, but we will be able to emulate in most scenarios.

Thanks for the review!
- Naveen



Re: [RFC PATCH 2/2] powerpc/powernv: implement NMI IPIs with OPAL_SIGNAL_SYSTEM_RESET

2017-09-13 Thread Nicholas Piggin
On Thu, 14 Sep 2017 12:24:49 +1000
Benjamin Herrenschmidt  wrote:

> On Wed, 2017-09-13 at 23:13 +1000, Nicholas Piggin wrote:
> > On Wed, 13 Sep 2017 02:05:53 +1000
> > Nicholas Piggin  wrote:
> >   
> > > There are two complications. The first is that sreset from stop states
> > > come in with SRR1 set to do a powersave wakeup, with an sreset reason
> > > encoded.
> > > 
> > > The second is that threads on the same core can't be signalled directly
> > > so we must designate a bounce CPU to reflect the IPI back.  
> > 
> > Here is an updated Linux patch for the latest OPAL patch. This has
> > a few assorted fixes as well to make it work nicely, I roll them into
> > one patch here to make it easy to apply for testing the OPAL patch.  
> 
> Why can't you sreset threads of the same core on P9 ?

It looks like we can, I think I had some other bugs still not ironed
out when I previously tested it.

That simplifies things a lot on the Linux side. It may be that the
bounce is still required if we implement it on POWER8 using ramming,
but I'll get the POWER9 code in first.

Thanks,
Nick


Re: [PATCH 1/5] powerpc/kprobes: Some cosmetic updates to try_to_emulate()

2017-09-13 Thread Kamalesh Babulal

On Thursday 14 September 2017 02:50 AM, Naveen N. Rao wrote:

1. This is only used in kprobes.c, so make it static.
2. Remove the un-necessary (ret == 0) comparison in the else clause.

Signed-off-by: Naveen N. Rao 


Reviewed-by: Kamalesh Babulal 


---
 arch/powerpc/kernel/kprobes.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 367494dc67d9..c2a6ab38a67f 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -239,7 +239,7 @@ void arch_prepare_kretprobe(struct kretprobe_instance *ri, 
struct pt_regs *regs)
 }
 NOKPROBE_SYMBOL(arch_prepare_kretprobe);

-int try_to_emulate(struct kprobe *p, struct pt_regs *regs)
+static int try_to_emulate(struct kprobe *p, struct pt_regs *regs)
 {
int ret;
unsigned int insn = *p->ainsn.insn;
@@ -261,7 +261,7 @@ int try_to_emulate(struct kprobe *p, struct pt_regs *regs)
 */
printk("Can't step on instruction %x\n", insn);
BUG();
-   } else if (ret == 0)
+   } else
/* This instruction can't be boosted */
p->ainsn.boostable = -1;






Possible bug in commit f3b3f28493d9

2017-09-13 Thread Paul Mackerras
Commit f3b3f28493d9 ("powerpc/powernv/idle: Don't override
default/deepest directly in kernel", 2017-03-22) made the following
change in pnv_cpu_offline() in arch/powerpc/platforms/powernv/idle.c:

-   if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+   if (cpu_has_feature(CPU_FTR_ARCH_300) && deepest_stop_found) {
srr1 = power9_idle_stop(pnv_deepest_stop_psscr_val,
pnv_deepest_stop_psscr_mask);
} else if (idle_states & OPAL_PM_WINKLE_ENABLED) {
srr1 = power7_winkle();

Which seems to be saying that previously, on POWER9 we would always
call power9_idle_stop(), but now we might possibly call
power7_idle_insn() to do a nap or sleep or rvwinkle instruction.

Is this really what was meant?  Nap, sleep and rvwinkle are illegal
instructions on POWER9.  It looks to me as if that statement needs to
be restructured to do what the commit description says, which is to
call power9_idle_stop if we have found a valid stop state, and
otherwise to poll.

At present we are relying on idle_states not not having any of the
nap/sleep/winkle bits set in it.  Is that guaranteed on POWER9?  If so
it is at least deserving of a comment.

Paul.


Re: [PATCH 02/25] powerpc: define an additional vma bit for protection keys.

2017-09-13 Thread Balbir Singh
On Fri,  8 Sep 2017 15:44:50 -0700
Ram Pai  wrote:

> powerpc needs an additional vma bit to support 32 keys.
> Till the additional vma bit lands in include/linux/mm.h
> we have to define  it  in powerpc specific header file.
> This is  needed to get pkeys working on power.
> 
> Signed-off-by: Ram Pai 
> ---

"This" being an arch specific hack for the additional bit?

Balbir


Re: [PATCH kernel] powerpc/powernv: Update comment about shifting IOV BAR

2017-09-13 Thread Benjamin Herrenschmidt
On Thu, 2017-09-14 at 13:18 +1000, Alexey Kardashevskiy wrote:
> On 14/09/17 13:07, Benjamin Herrenschmidt wrote:
> > On Thu, 2017-09-14 at 12:45 +1000, Alexey Kardashevskiy wrote:
> > > On 31/08/17 13:34, Alexey Kardashevskiy wrote:
> > > > From: Benjamin Herrenschmidt 
> > > 
> > > Oops, this was not right :)
> > > 
> > > Anyway, Ben, please comment. Thanks.
> > 
> > This is incorrect, we can do hotplug behind switches afaik.
> 
> Do we have an actual system which allows this? 

Tuleta no ?

> Anyway, what we do now is
> wrong and it needs what? Reserve that hole? I'd like to update the comment
> for now, at least, and state what bad thing can happen and what we expect.

The hole should be reserved unless another SR-IOV device can use it ...

> 
> > > 
> > > > 
> > > > From: Alexey Kardashevskiy 
> > > > 
> > > > This updates the comment about creating a hole in /proc/iomem which
> > > > should not be normally happening but it does in the powernv platform
> > > > due the way MMIO M64 BARs are organised in the IODA2-capable hardware.
> > > > 
> > > > Signed-off-by: Alexey Kardashevskiy 
> > > > ---
> > > > 
> > > > It has been mentioned multiple times (last one -
> > > > https://www.spinics.net/lists/linux-pci/msg64084.html ) that the comment
> > > > is not informative enough for people not particularly familiar with
> > > > the POWER8 IO hardware.
> > > > 
> > > > This attempt aims to:
> > > > 1. explain why we shift the resource
> > > > 2. explain why nothing can use that hole as a resource while it is 
> > > > "free"
> > > > (I am not sure that this is the case actually)
> > > > 
> > > > Please comment, everyone, let's have this very well documented while
> > > > I remember these bits :) Thanks.
> > > > ---
> > > >  arch/powerpc/platforms/powernv/pci-ioda.c | 10 +++---
> > > >  1 file changed, 7 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> > > > b/arch/powerpc/platforms/powernv/pci-ioda.c
> > > > index 48de308224d6..c4a36ae78c95 100644
> > > > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > > > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > > > @@ -1002,9 +1002,13 @@ static int pnv_pci_vf_resource_shift(struct 
> > > > pci_dev *dev, int offset)
> > > > }
> > > >  
> > > > /*
> > > > -* After doing so, there would be a "hole" in the /proc/iomem 
> > > > when
> > > > -* offset is a positive value. It looks like the device return 
> > > > some
> > > > -* mmio back to the system, which actually no one could use it.
> > > > +* Since M64 BAR shares segments among all possible 256 PEs,
> > > > +* we have to shift the beginning of PF IOV BAR to make it 
> > > > start from
> > > > +* the segment which belongs to the PE number assigned to the 
> > > > first VF.
> > > > +* This creates a "hole" in the /proc/iomem which could be used 
> > > > for
> > > > +* allocating other resources, however this is not expected to 
> > > > happen
> > > > +* on IODA as the only possibility would be a PCI hotplug and 
> > > > IODA
> > > > +* hardware only allows it on a slot with dedicated PHB.
> > > >  */
> > > > for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> > > > res = &dev->resource[i + PCI_IOV_RESOURCES];
> > > > 
> > > 
> > > 
> 
> 


Re: [PATCH 01/25] powerpc: initial pkey plumbing

2017-09-13 Thread Balbir Singh
On Fri,  8 Sep 2017 15:44:49 -0700
Ram Pai  wrote:

> Basic  plumbing  to   initialize  the   pkey  system.
> Nothing is enabled yet. A later patch will enable it
> ones all the infrastructure is in place.
> 
> Signed-off-by: Ram Pai 
> ---
>  arch/powerpc/Kconfig   |   16 +++
>  arch/powerpc/include/asm/mmu_context.h |5 +++
>  arch/powerpc/include/asm/pkeys.h   |   45 
> 
>  arch/powerpc/kernel/setup_64.c |4 +++
>  arch/powerpc/mm/Makefile   |1 +
>  arch/powerpc/mm/hash_utils_64.c|1 +
>  arch/powerpc/mm/pkeys.c|   33 +++
>  7 files changed, 105 insertions(+), 0 deletions(-)
>  create mode 100644 arch/powerpc/include/asm/pkeys.h
>  create mode 100644 arch/powerpc/mm/pkeys.c
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 9fc3c0b..a4cd210 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -864,6 +864,22 @@ config SECCOMP
>  
> If unsure, say Y. Only embedded should say N here.
>  
> +config PPC64_MEMORY_PROTECTION_KEYS
> + prompt "PowerPC Memory Protection Keys"
> + def_bool y
> + # Note: only available in 64-bit mode
> + depends on PPC64

This is not sufficient right, you need PPC_BOOK3S_64
for compile time at-least?

> + select ARCH_USES_HIGH_VMA_FLAGS
> + select ARCH_HAS_PKEYS
> + ---help---
> +   Memory Protection Keys provides a mechanism for enforcing
> +   page-based protections, but without requiring modification of the
> +   page tables when an application changes protection domains.
> +
> +   For details, see Documentation/vm/protection-keys.txt
> +
> +   If unsure, say y.
> +
>  endmenu
>  
>  config ISA_DMA_API
> diff --git a/arch/powerpc/include/asm/mmu_context.h 
> b/arch/powerpc/include/asm/mmu_context.h
> index 3095925..7badf29 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -141,5 +141,10 @@ static inline bool arch_vma_access_permitted(struct 
> vm_area_struct *vma,
>   /* by default, allow everything */
>   return true;
>  }
> +
> +#ifndef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
> +#define pkey_initialize()
> +#endif /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
> +
>  #endif /* __KERNEL__ */
>  #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
> diff --git a/arch/powerpc/include/asm/pkeys.h 
> b/arch/powerpc/include/asm/pkeys.h
> new file mode 100644
> index 000..c02305a
> --- /dev/null
> +++ b/arch/powerpc/include/asm/pkeys.h
> @@ -0,0 +1,45 @@
> +#ifndef _ASM_PPC64_PKEYS_H
> +#define _ASM_PPC64_PKEYS_H
> +
> +extern bool pkey_inited;
> +extern bool pkey_execute_disable_support;
> +#define ARCH_VM_PKEY_FLAGS 0
> +
> +static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
> +{
> + return (pkey == 0);
> +}
> +
> +static inline int mm_pkey_alloc(struct mm_struct *mm)
> +{
> + return -1;
> +}
> +
> +static inline int mm_pkey_free(struct mm_struct *mm, int pkey)
> +{
> + return -EINVAL;
> +}
> +
> +/*
> + * Try to dedicate one of the protection keys to be used as an
> + * execute-only protection key.
> + */
> +static inline int execute_only_pkey(struct mm_struct *mm)
> +{
> + return 0;
> +}
> +
> +static inline int arch_override_mprotect_pkey(struct vm_area_struct *vma,
> + int prot, int pkey)
> +{
> + return 0;
> +}
> +
> +static inline int arch_set_user_pkey_access(struct task_struct *tsk, int 
> pkey,
> + unsigned long init_val)
> +{
> + return 0;
> +}
> +
> +extern void pkey_initialize(void);
> +#endif /*_ASM_PPC64_PKEYS_H */
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index b89c6aa..3b67014 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -37,6 +37,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -316,6 +317,9 @@ void __init early_setup(unsigned long dt_ptr)
>   /* Initialize the hash table or TLB handling */
>   early_init_mmu();
>  
> + /* initialize the key subsystem */
> + pkey_initialize();
> +
>   /*
>* At this point, we can let interrupts switch to virtual mode
>* (the MMU has been setup), so adjust the MSR in the PACA to
> diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
> index fb844d2..927620a 100644
> --- a/arch/powerpc/mm/Makefile
> +++ b/arch/powerpc/mm/Makefile
> @@ -43,3 +43,4 @@ obj-$(CONFIG_PPC_COPRO_BASE)+= copro_fault.o
>  obj-$(CONFIG_SPAPR_TCE_IOMMU)+= mmu_context_iommu.o
>  obj-$(CONFIG_PPC_PTDUMP) += dump_linuxpagetables.o
>  obj-$(CONFIG_PPC_HTDUMP) += dump_hashpagetable.o
> +obj-$(CONFIG_PPC64_MEMORY_PROTECTION_KEYS)   += pkeys.o
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index 0dff57b..67f62b5 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> 

Re: [PATCH 7/7] powerpc: capture the PTE format changes in the dump pte report

2017-09-13 Thread Balbir Singh
On Fri,  8 Sep 2017 15:44:47 -0700
Ram Pai  wrote:

> The H_PAGE_F_SECOND,H_PAGE_F_GIX are not in the 64K main-PTE.
> capture these changes in the dump pte report.
> 
> Reviewed-by: Aneesh Kumar K.V 
> Signed-off-by: Ram Pai 
> ---

So we lose slot and secondary information for 64K PTE's with
this change?

Balbir


Re: [PATCH kernel] powerpc/powernv: Update comment about shifting IOV BAR

2017-09-13 Thread Alexey Kardashevskiy
On 14/09/17 13:07, Benjamin Herrenschmidt wrote:
> On Thu, 2017-09-14 at 12:45 +1000, Alexey Kardashevskiy wrote:
>> On 31/08/17 13:34, Alexey Kardashevskiy wrote:
>>> From: Benjamin Herrenschmidt 
>>
>> Oops, this was not right :)
>>
>> Anyway, Ben, please comment. Thanks.
> 
> This is incorrect, we can do hotplug behind switches afaik.

Do we have an actual system which allows this? Anyway, what we do now is
wrong and it needs what? Reserve that hole? I'd like to update the comment
for now, at least, and state what bad thing can happen and what we expect.


>>
>>>
>>> From: Alexey Kardashevskiy 
>>>
>>> This updates the comment about creating a hole in /proc/iomem which
>>> should not be normally happening but it does in the powernv platform
>>> due the way MMIO M64 BARs are organised in the IODA2-capable hardware.
>>>
>>> Signed-off-by: Alexey Kardashevskiy 
>>> ---
>>>
>>> It has been mentioned multiple times (last one -
>>> https://www.spinics.net/lists/linux-pci/msg64084.html ) that the comment
>>> is not informative enough for people not particularly familiar with
>>> the POWER8 IO hardware.
>>>
>>> This attempt aims to:
>>> 1. explain why we shift the resource
>>> 2. explain why nothing can use that hole as a resource while it is "free"
>>> (I am not sure that this is the case actually)
>>>
>>> Please comment, everyone, let's have this very well documented while
>>> I remember these bits :) Thanks.
>>> ---
>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 10 +++---
>>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>> b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index 48de308224d6..c4a36ae78c95 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -1002,9 +1002,13 @@ static int pnv_pci_vf_resource_shift(struct pci_dev 
>>> *dev, int offset)
>>> }
>>>  
>>> /*
>>> -* After doing so, there would be a "hole" in the /proc/iomem when
>>> -* offset is a positive value. It looks like the device return some
>>> -* mmio back to the system, which actually no one could use it.
>>> +* Since M64 BAR shares segments among all possible 256 PEs,
>>> +* we have to shift the beginning of PF IOV BAR to make it start from
>>> +* the segment which belongs to the PE number assigned to the first VF.
>>> +* This creates a "hole" in the /proc/iomem which could be used for
>>> +* allocating other resources, however this is not expected to happen
>>> +* on IODA as the only possibility would be a PCI hotplug and IODA
>>> +* hardware only allows it on a slot with dedicated PHB.
>>>  */
>>> for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>>> res = &dev->resource[i + PCI_IOV_RESOURCES];
>>>
>>
>>


-- 
Alexey


Re: [PATCH kernel] powerpc/powernv: Update comment about shifting IOV BAR

2017-09-13 Thread Benjamin Herrenschmidt
On Thu, 2017-09-14 at 12:45 +1000, Alexey Kardashevskiy wrote:
> On 31/08/17 13:34, Alexey Kardashevskiy wrote:
> > From: Benjamin Herrenschmidt 
> 
> Oops, this was not right :)
> 
> Anyway, Ben, please comment. Thanks.

This is incorrect, we can do hotplug behind switches afaik.
> 
> > 
> > From: Alexey Kardashevskiy 
> > 
> > This updates the comment about creating a hole in /proc/iomem which
> > should not be normally happening but it does in the powernv platform
> > due the way MMIO M64 BARs are organised in the IODA2-capable hardware.
> > 
> > Signed-off-by: Alexey Kardashevskiy 
> > ---
> > 
> > It has been mentioned multiple times (last one -
> > https://www.spinics.net/lists/linux-pci/msg64084.html ) that the comment
> > is not informative enough for people not particularly familiar with
> > the POWER8 IO hardware.
> > 
> > This attempt aims to:
> > 1. explain why we shift the resource
> > 2. explain why nothing can use that hole as a resource while it is "free"
> > (I am not sure that this is the case actually)
> > 
> > Please comment, everyone, let's have this very well documented while
> > I remember these bits :) Thanks.
> > ---
> >  arch/powerpc/platforms/powernv/pci-ioda.c | 10 +++---
> >  1 file changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> > b/arch/powerpc/platforms/powernv/pci-ioda.c
> > index 48de308224d6..c4a36ae78c95 100644
> > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > @@ -1002,9 +1002,13 @@ static int pnv_pci_vf_resource_shift(struct pci_dev 
> > *dev, int offset)
> > }
> >  
> > /*
> > -* After doing so, there would be a "hole" in the /proc/iomem when
> > -* offset is a positive value. It looks like the device return some
> > -* mmio back to the system, which actually no one could use it.
> > +* Since M64 BAR shares segments among all possible 256 PEs,
> > +* we have to shift the beginning of PF IOV BAR to make it start from
> > +* the segment which belongs to the PE number assigned to the first VF.
> > +* This creates a "hole" in the /proc/iomem which could be used for
> > +* allocating other resources, however this is not expected to happen
> > +* on IODA as the only possibility would be a PCI hotplug and IODA
> > +* hardware only allows it on a slot with dedicated PHB.
> >  */
> > for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> > res = &dev->resource[i + PCI_IOV_RESOURCES];
> > 
> 
> 


[PATCH v2] ASoC: fsl_ssi: Caculate bit clock rate using slot number and width

2017-09-13 Thread Nicolin Chen
The set_sysclk() now is used to override the output bit clock rate.
But this is not a common way to implement a set_dai_sysclk(). And
this creates a problem when a general machine driver (simple-card
for example) tries to do set_dai_sysclk() by passing an input clock
rate for the baud clock instead of setting the bit clock rate as
fsl_ssi driver expected.

So this patch solves this problem by firstly removing set_sysclk()
since the hw_params() can calculate the bit clock rate. Secondly,
in order not to break those TDM use cases which previously might
have been using set_sysclk() to override the bit clock rate, this
patch changes the driver to calculate the bit clock rate using the
slot number and the slot width from the via set_tdm_slot().

The patch also removes an obsolete comment of the dir parameter.

Signed-off-by: Nicolin Chen 
---
 sound/soc/fsl/fsl_ssi.c | 46 ++
 1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index 64598d1..f2f51e06 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -197,12 +197,13 @@ struct fsl_ssi_soc_data {
  * @use_dma: DMA is used or FIQ with stream filter
  * @use_dual_fifo: DMA with support for both FIFOs used
  * @fifo_deph: Depth of the SSI FIFOs
+ * @slot_width: width of each DAI slot
+ * @slots: number of slots
  * @rxtx_reg_val: Specific register settings for receive/transmit configuration
  *
  * @clk: SSI clock
  * @baudclk: SSI baud clock for master mode
  * @baudclk_streams: Active streams that are using baudclk
- * @bitclk_freq: bitclock frequency set by .set_dai_sysclk
  *
  * @dma_params_tx: DMA transmit parameters
  * @dma_params_rx: DMA receive parameters
@@ -233,12 +234,13 @@ struct fsl_ssi_private {
bool use_dual_fifo;
bool has_ipg_clk_name;
unsigned int fifo_depth;
+   unsigned int slot_width;
+   unsigned int slots;
struct fsl_ssi_rxtx_reg_val rxtx_reg_val;
 
struct clk *clk;
struct clk *baudclk;
unsigned int baudclk_streams;
-   unsigned int bitclk_freq;
 
/* regcache for volatile regs */
u32 regcache_sfcsr;
@@ -700,8 +702,8 @@ static void fsl_ssi_shutdown(struct snd_pcm_substream 
*substream,
  * Note: This function can be only called when using SSI as DAI master
  *
  * Quick instruction for parameters:
- * freq: Output BCLK frequency = samplerate * 32 (fixed) * channels
- * dir: SND_SOC_CLOCK_OUT -> TxBCLK, SND_SOC_CLOCK_IN -> RxBCLK.
+ * freq: Output BCLK frequency = samplerate * slots * slot_width
+ *   (In 2-channel I2S Master mode, slot_width is fixed 32)
  */
 static int fsl_ssi_set_bclk(struct snd_pcm_substream *substream,
struct snd_soc_dai *cpu_dai,
@@ -712,15 +714,21 @@ static int fsl_ssi_set_bclk(struct snd_pcm_substream 
*substream,
int synchronous = ssi_private->cpu_dai_drv.symmetric_rates, ret;
u32 pm = 999, div2, psr, stccr, mask, afreq, factor, i;
unsigned long clkrate, baudrate, tmprate;
+   unsigned int slots = params_channels(hw_params);
+   unsigned int slot_width = 32;
u64 sub, savesub = 10;
unsigned int freq;
bool baudclk_is_used;
 
-   /* Prefer the explicitly set bitclock frequency */
-   if (ssi_private->bitclk_freq)
-   freq = ssi_private->bitclk_freq;
-   else
-   freq = params_channels(hw_params) * 32 * params_rate(hw_params);
+   /* Override slots and slot_width if being specifically set... */
+   if (ssi_private->slots)
+   slots = ssi_private->slots;
+   /* ...but keep 32 bits if slots is 2 -- I2S Master mode */
+   if (ssi_private->slot_width && slots != 2)
+   slot_width = ssi_private->slot_width;
+
+   /* Generate bit clock based on the slot number and slot width */
+   freq = slots * slot_width * params_rate(hw_params);
 
/* Don't apply it to any non-baudclk circumstance */
if (IS_ERR(ssi_private->baudclk))
@@ -805,16 +813,6 @@ static int fsl_ssi_set_bclk(struct snd_pcm_substream 
*substream,
return 0;
 }
 
-static int fsl_ssi_set_dai_sysclk(struct snd_soc_dai *cpu_dai,
-   int clk_id, unsigned int freq, int dir)
-{
-   struct fsl_ssi_private *ssi_private = snd_soc_dai_get_drvdata(cpu_dai);
-
-   ssi_private->bitclk_freq = freq;
-
-   return 0;
-}
-
 /**
  * fsl_ssi_hw_params - program the sample size
  *
@@ -1095,6 +1093,12 @@ static int fsl_ssi_set_dai_tdm_slot(struct snd_soc_dai 
*cpu_dai, u32 tx_mask,
struct regmap *regs = ssi_private->regs;
u32 val;
 
+   /* The word length should be 8, 10, 12, 16, 18, 20, 22 or 24 */
+   if (slot_width & 1 || slot_width < 8 || slot_width > 24) {
+   dev_err(cpu_dai->dev, "invalid slot width: %d\n", slot_width);
+   return -EINVAL;
+   }
+
/* The slot number should be >= 2 if using Network mode or I2S mod

Re: [PATCH kernel] powerpc/powernv: Update comment about shifting IOV BAR

2017-09-13 Thread Alexey Kardashevskiy
On 31/08/17 13:34, Alexey Kardashevskiy wrote:
> From: Benjamin Herrenschmidt 

Oops, this was not right :)

Anyway, Ben, please comment. Thanks.


> 
> From: Alexey Kardashevskiy 
> 
> This updates the comment about creating a hole in /proc/iomem which
> should not be normally happening but it does in the powernv platform
> due the way MMIO M64 BARs are organised in the IODA2-capable hardware.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
> 
> It has been mentioned multiple times (last one -
> https://www.spinics.net/lists/linux-pci/msg64084.html ) that the comment
> is not informative enough for people not particularly familiar with
> the POWER8 IO hardware.
> 
> This attempt aims to:
> 1. explain why we shift the resource
> 2. explain why nothing can use that hole as a resource while it is "free"
> (I am not sure that this is the case actually)
> 
> Please comment, everyone, let's have this very well documented while
> I remember these bits :) Thanks.
> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 48de308224d6..c4a36ae78c95 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1002,9 +1002,13 @@ static int pnv_pci_vf_resource_shift(struct pci_dev 
> *dev, int offset)
>   }
>  
>   /*
> -  * After doing so, there would be a "hole" in the /proc/iomem when
> -  * offset is a positive value. It looks like the device return some
> -  * mmio back to the system, which actually no one could use it.
> +  * Since M64 BAR shares segments among all possible 256 PEs,
> +  * we have to shift the beginning of PF IOV BAR to make it start from
> +  * the segment which belongs to the PE number assigned to the first VF.
> +  * This creates a "hole" in the /proc/iomem which could be used for
> +  * allocating other resources, however this is not expected to happen
> +  * on IODA as the only possibility would be a PCI hotplug and IODA
> +  * hardware only allows it on a slot with dedicated PHB.
>*/
>   for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>   res = &dev->resource[i + PCI_IOV_RESOURCES];
> 


-- 
Alexey


Re: [RFC PATCH 1/2] core: implement OPAL_SIGNAL_SYSTEM_RESET with POWER9 scoms

2017-09-13 Thread Benjamin Herrenschmidt
On Wed, 2017-09-13 at 23:27 +1000, Nicholas Piggin wrote:
> On Wed, 13 Sep 2017 09:18:34 +1000
> Benjamin Herrenschmidt  wrote:
> 
> > On Wed, 2017-09-13 at 02:05 +1000, Nicholas Piggin wrote:
> > > This implements a way to raise system reset interrupts on other
> > > cores. This has not yet been tested on DD2 or with deeper sleep
> > > states.  
> > 
> > Reminds me, we need to workaround a bug with XSCOMs on P9
> > 
> > PSCOMs to core in the range 20010A80-20010Ab8 (list below) can fail
> > occasionally with an error of 4 (PCB_ADDRESS_ERROR). We need to
> > (silently) retry up to 32 times.
> 
> [snip]
> 
> So, just put a loop into xscom_read and xscom_write for those
> addresses for P9 chips?

Right. Well, the top bit of the address needs filtering since it's the
target core, ie, 0x20 is core 0, 0x21 is core 1 etc... to 0x37.

Cheers,
Ben.




Re: [RFC PATCH 2/2] powerpc/powernv: implement NMI IPIs with OPAL_SIGNAL_SYSTEM_RESET

2017-09-13 Thread Benjamin Herrenschmidt
On Wed, 2017-09-13 at 23:13 +1000, Nicholas Piggin wrote:
> On Wed, 13 Sep 2017 02:05:53 +1000
> Nicholas Piggin  wrote:
> 
> > There are two complications. The first is that sreset from stop states
> > come in with SRR1 set to do a powersave wakeup, with an sreset reason
> > encoded.
> > 
> > The second is that threads on the same core can't be signalled directly
> > so we must designate a bounce CPU to reflect the IPI back.
> 
> Here is an updated Linux patch for the latest OPAL patch. This has
> a few assorted fixes as well to make it work nicely, I roll them into
> one patch here to make it easy to apply for testing the OPAL patch.

Why can't you sreset threads of the same core on P9 ?

Cheers,
Ben.



[PATCH v2] powerpc/tm: Flush TM only if CPU has TM feature

2017-09-13 Thread Gustavo Romero
Commit cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump")
added code to access TM SPRs in flush_tmregs_to_thread(). However
flush_tmregs_to_thread() does not check if TM feature is available on
CPU before trying to access TM SPRs in order to copy live state to
thread structures. flush_tmregs_to_thread() is indeed guarded by
CONFIG_PPC_TRANSACTIONAL_MEM but it might be the case that kernel
was compiled with CONFIG_PPC_TRANSACTIONAL_MEM enabled and ran on
a CPU without TM feature available, thus rendering the execution
of TM instructions that are treated by the CPU as illegal instructions.

The fix is just to add proper checking in flush_tmregs_to_thread()
if CPU has the TM feature before accessing any TM-specific resource,
returning immediately if TM is no available on the CPU. Adding
that checking in flush_tmregs_to_thread() instead of in places
where it is called, like in vsr_get() and vsr_set(), is better because
avoids the same problem cropping up elsewhere.

Cc: sta...@vger.kernel.org # v4.13+
Fixes: cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump")
Signed-off-by: Gustavo Romero 
---
 arch/powerpc/kernel/ptrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index 07cd22e..f52ad5b 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -131,7 +131,7 @@ static void flush_tmregs_to_thread(struct task_struct *tsk)
 * in the appropriate thread structures from live.
 */
 
-   if (tsk != current)
+   if ((!cpu_has_feature(CPU_FTR_TM)) || (tsk != current))
return;
 
if (MSR_TM_SUSPENDED(mfmsr())) {
-- 
2.7.4



Re: [PATCH 5/7] powerpc: Swizzle around 4K PTE bits to free up bit 5 and bit 6

2017-09-13 Thread Balbir Singh
On Fri,  8 Sep 2017 15:44:45 -0700
Ram Pai  wrote:

> We  need  PTE bits  3 ,4, 5, 6 and 57 to support protection-keys,
> because these are  the bits we want to consolidate on across all
> configuration to support protection keys.
> 
> Bit 3,4,5 and 6 are currently used on 4K-pte kernels.  But bit 9
> and 10 are available.  Hence  we  use the two available bits and
> free up bit 5 and 6.  We will still not be able to free up bit 3
> and 4. In the absence  of  any  other free bits, we will have to
> stay satisfied  with  what we have :-(.   This means we will not
> be  able  to support  32  protection  keys, but only 8.  The bit
> numbers are  big-endian as defined in the  ISA3.0
>

Any chance for 4k PTE's we can do slot searching for the PTE?
I guess thats add additional complexity

Balbir Singh.


Re: [PATCH 4/7] powerpc: Free up four 64K PTE bits in 64K backed HPTE pages

2017-09-13 Thread Balbir Singh
On Fri,  8 Sep 2017 15:44:44 -0700
Ram Pai  wrote:

> Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6
> in the 64K backed HPTE pages. This along with the earlier
> patch will  entirely free  up the four bits from 64K PTE.
> The bit numbers are  big-endian as defined in the  ISA3.0
> 
> This patch  does  the  following change to 64K PTE backed
> by 64K HPTE.
> 
> H_PAGE_F_SECOND (S) which  occupied  bit  4  moves to the
>   second part of the pte to bit 60.
> H_PAGE_F_GIX (G,I,X) which  occupied  bit 5, 6 and 7 also
>   moves  to  the   second part of the pte to bit 61,
>   62, 63, 64 respectively
> 
> since bit 7 is now freed up, we move H_PAGE_BUSY (B) from
> bit  9  to  bit  7.
> 
> The second part of the PTE will hold
> (H_PAGE_F_SECOND|H_PAGE_F_GIX) at bit 60,61,62,63.
> NOTE: None of the bits in the secondary PTE were not used
> by 64k-HPTE backed PTE.
> 
> Before the patch, the 64K HPTE backed 64k PTE format was
> as follows
> 
>  0 1 2 3 4  5  6  7  8 9 10...63
>  : : : : :  :  :  :  : : ::
>  v v v v v  v  v  v  v v vv
> 
> ,-,-,-,-,--,--,--,--,-,-,-,-,-,--,-,-,-,
> |x|x|x| |S |G |I |X |x|B| |x|x||x|x|x|x| <- primary pte
> '_'_'_'_'__'__'__'__'_'_'_'_'_''_'_'_'_'
> | | | | |  |  |  |  | | | | |..| | | | | <- secondary pte
> '_'_'_'_'__'__'__'__'_'_'_'_'__'_'_'_'_'
> 
> After the patch, the 64k HPTE backed 64k PTE format is
> as follows
> 
>  0 1 2 3 4  5  6  7  8 9 10...63
>  : : : : :  :  :  :  : : ::
>  v v v v v  v  v  v  v v vv
> 
> ,-,-,-,-,--,--,--,--,-,-,-,-,-,--,-,-,-,
> |x|x|x| |  |  |  |B |x| | |x|x||.|.|.|.| <- primary pte
> '_'_'_'_'__'__'__'__'_'_'_'_'_''_'_'_'_'
> | | | | |  |  |  |  | | | | |..|S|G|I|X| <- secondary pte
> '_'_'_'_'__'__'__'__'_'_'_'_'__'_'_'_'_'
> 
> The above PTE changes is applicable to hugetlbpages aswell.
> 
> The patch does the following code changes:
> 
> a) moves  the  H_PAGE_F_SECOND and  H_PAGE_F_GIX to 4k PTE
>   header   since it is no more needed b the 64k PTEs.
> b) abstracts  out __real_pte() and __rpte_to_hidx() so the
>   caller  need not know the bit location of the slot.
> c) moves the slot bits to the secondary pte.
> 
> Reviewed-by: Aneesh Kumar K.V 
> Signed-off-by: Ram Pai 
> ---
>  arch/powerpc/include/asm/book3s/64/hash-4k.h  |3 ++
>  arch/powerpc/include/asm/book3s/64/hash-64k.h |   29 +++-
>  arch/powerpc/include/asm/book3s/64/hash.h |3 --
>  arch/powerpc/mm/hash64_64k.c  |   23 ---
>  arch/powerpc/mm/hugetlbpage-hash64.c  |   18 ++-
>  5 files changed, 33 insertions(+), 43 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
> b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> index e66bfeb..dc153c6 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> @@ -16,6 +16,9 @@
>  #define H_PUD_TABLE_SIZE (sizeof(pud_t) << H_PUD_INDEX_SIZE)
>  #define H_PGD_TABLE_SIZE (sizeof(pgd_t) << H_PGD_INDEX_SIZE)
>  
> +#define H_PAGE_F_GIX_SHIFT   56
> +#define H_PAGE_F_SECOND  _RPAGE_RSV2 /* HPTE is in 2ndary HPTEG */
> +#define H_PAGE_F_GIX (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
>  #define H_PAGE_BUSY  _RPAGE_RSV1 /* software: PTE & hash are busy */
>  
>  /* PTE flags to conserve for HPTE identification */
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
> b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> index e038f1c..89ef5a9 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> @@ -12,7 +12,7 @@
>   */
>  #define H_PAGE_COMBO _RPAGE_RPN0 /* this is a combo 4k page */
>  #define H_PAGE_4K_PFN_RPAGE_RPN1 /* PFN is for a single 4k page */
> -#define H_PAGE_BUSY  _RPAGE_RPN42 /* software: PTE & hash are busy */
> +#define H_PAGE_BUSY  _RPAGE_RPN44 /* software: PTE & hash are busy */
>  
>  /*
>   * We need to differentiate between explicit huge page and THP huge
> @@ -21,8 +21,7 @@
>  #define H_PAGE_THP_HUGE  H_PAGE_4K_PFN
>  
>  /* PTE flags to conserve for HPTE identification */
> -#define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_F_SECOND | \
> -  H_PAGE_F_GIX | H_PAGE_HASHPTE | H_PAGE_COMBO)
> +#define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_HASHPTE | H_PAGE_COMBO)
>  /*
>   * we support 16 fragments per PTE page of 64K size.
>   */
> @@ -50,24 +49,22 @@ static inline real_pte_t __real_pte(pte_t pte, pte_t 
> *ptep)
>   unsigned long *hidxp;
>  
>   rpte.pte = pte;
> - rpte.hidx = 0;
> - if (pte_val(pte) & H_PAGE_COMBO) {
> - /*
> -  * Make sure we order the hidx load a

Re: [PATCH 3/7] powerpc: Free up four 64K PTE bits in 4K backed HPTE pages

2017-09-13 Thread Balbir Singh
On Fri,  8 Sep 2017 15:44:43 -0700
Ram Pai  wrote:

> Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6,
> in the 4K backed HPTE pages.These bits continue to be used
> for 64K backed HPTE pages in this patch, but will be freed
> up in the next patch. The  bit  numbers are big-endian  as
> defined in the ISA3.0
> 
> The patch does the following change to the 4k htpe backed
> 64K PTE's format.
> 
> H_PAGE_BUSY moves from bit 3 to bit 9 (B bit in the figure
>   below)
> V0 which occupied bit 4 is not used anymore.
> V1 which occupied bit 5 is not used anymore.
> V2 which occupied bit 6 is not used anymore.
> V3 which occupied bit 7 is not used anymore.
> 
> Before the patch, the 4k backed 64k PTE format was as follows
> 
>  0 1 2 3 4  5  6  7  8 9 10...63
>  : : : : :  :  :  :  : : ::
>  v v v v v  v  v  v  v v vv
> 
> ,-,-,-,-,--,--,--,--,-,-,-,-,-,--,-,-,-,
> |x|x|x|B|V0|V1|V2|V3|x| | |x|x||x|x|x|x| <- primary pte
> '_'_'_'_'__'__'__'__'_'_'_'_'_''_'_'_'_'
> |S|G|I|X|S |G |I |X |S|G|I|X|..|S|G|I|X| <- secondary pte
> '_'_'_'_'__'__'__'__'_'_'_'_'__'_'_'_'_'
> 
> After the patch, the 4k backed 64k PTE format is as follows
> 
>  0 1 2 3 4  5  6  7  8 9 10...63
>  : : : : :  :  :  :  : : ::
>  v v v v v  v  v  v  v v vv
> 
> ,-,-,-,-,--,--,--,--,-,-,-,-,-,--,-,-,-,
> |x|x|x| |  |  |  |  |x|B| |x|x||.|.|.|.| <- primary pte
> '_'_'_'_'__'__'__'__'_'_'_'_'_''_'_'_'_'
> |S|G|I|X|S |G |I |X |S|G|I|X|..|S|G|I|X| <- secondary pte
> '_'_'_'_'__'__'__'__'_'_'_'_'__'_'_'_'_'
> 
> the four  bits S,G,I,X (one quadruplet per 4k HPTE) that
> cache  the  hash-bucket  slot  value, is initialized  to
> 1,1,1,1 indicating -- an invalid slot.   If  a HPTE gets
> cached in a   slot(i.e 7th  slot  of  secondary hash
> bucket), it is  released  immediately. In  other  words,
> even  though    is   a valid slot  value in the hash
> bucket, we consider it invalid and  release the slot and
> the HPTE.  This  gives  us  the opportunity to determine
> the validity of S,G,I,X  bits  based on its contents and
> not on any of the bits V0,V1,V2 or V3 in the primary PTE
> 
> When   we  release  aHPTEcached in the  slot
> we alsorelease  a  legitimate   slot  in the primary
> hash bucket  and  unmap  its  corresponding  HPTE.  This
> is  to  ensure   that  we do get a HPTE cached in a slot
> of the primary hash bucket, the next time we retry.
> 
> Though  treating    slot  as  invalid,  reduces  the
> number of  available  slots  in the hash bucket and  may
> have  an  effect   on the performance, the probabilty of
> hitting a  slot is extermely low.
> 
> Compared  to  the   currentscheme,  the above scheme
> reduces  the   number  of   false   hash  table  updates
> significantly and  has the  added advantage of releasing
> four  valuable  PTE bits for other purpose.
> 
> NOTE:even though bits 3, 4, 5, 6, 7 are  not  used  when
> the  64K  PTE is backed by 4k HPTE,  they continue to be
> used  if  the  PTE  gets  backed  by 64k HPTE.  The next
> patch will decouple that aswell, and truely  release the
> bits.
> 
> This idea was jointly developed by Paul Mackerras,
> Aneesh, Michael Ellermen and myself.
> 

Acked-by: Balbir Singh 


Re: [PATCH 1/2] powerpc/eeh: Create PHB PEs after EEH is initialized

2017-09-13 Thread Russell Currey
On Thu, 2017-09-07 at 16:35 +1000, Benjamin Herrenschmidt wrote:
> Otherwise we end up not yet having computed the right
> diag data size on powernv where EEH initialization
> is delayed, thus causing memory corruption later on
> when calling OPAL.
> 
> Signed-off-by: Benjamin Herrenschmidt 

Acked-by: Russell Currey 


Re: [PATCH 2/2] powerpc/powernv: Rework EEH initialization on powernv

2017-09-13 Thread Russell Currey
On Thu, 2017-09-07 at 16:35 +1000, Benjamin Herrenschmidt wrote:
> Remove the post_init callback which is only used
> by powernv, we can just call it explicitly from
> the powernv code.
> 
> This partially kills the ability to "disable" eeh at
> runtime via debugfs as this was calling that same
> callback again, but this is both unused and broken
> in several ways. If we want to revive it, we need
> to create a dedicated enable/disable callback on the
> backend that does the right thing.
> 
> Let the bulk of eeh initialize normally at
> core_initcall() like it does on pseries by removing
> the hack in eeh_init() that delays it.
> 
> Instead we make sure our eeh->probe cleanly bails
> out of the PEs haven't been created yet and we force
> a re-probe where we used to call eeh_init() again.
> 
> Signed-off-by: Benjamin Herrenschmidt 

Acked-by: Russell Currey 


Re: Regression for Power PC in 4.14-rc0 - bisected to commit 31bfdb036f12

2017-09-13 Thread Larry Finger

On 09/13/2017 07:37 PM, Andrew Donnellan wrote:

On 14/09/17 10:07, Larry Finger wrote:
When booting my PowerBook Aluminum G4, I get a pop-up screen that says "The 
system is running in low-graphics  mode. Your screen, graphics card, and input 
device settings could not be detected correctly. You will need to configure 
these yourself." This is a big-endian 74xx CPU. The lscpu command says it is a 
PowerBook5,6 model.


This problem has been bisected to commit 31bfdb036f12 ("powerpc: Use 
instruction emulation infrastructure to handle alignment faults"). I am 
confident of the accuracy of the bisection.


Attached is the configuration.


Try with https://patchwork.ozlabs.org/patch/813153/

(A selftest for the instruction emulation will make its way upstream 
eventually...)


That patch does fix my problem. Thanks for the quick response.

Larry


Re: [PATCH 5/5] powerpc/jprobes: Validate break handler invocation as being due to a jprobe_return()

2017-09-13 Thread Masami Hiramatsu
On Thu, 14 Sep 2017 02:50:36 +0530
"Naveen N. Rao"  wrote:

> Fix a circa 2005 FIXME by implementing a check to ensure that we
> actually got into the jprobe break handler() due to the trap in
> jprobe_return().
> 

Looks good to me.

Acked-by: Masami Hiramatsu 

Thanks!

> Signed-off-by: Naveen N. Rao 
> ---
> This is v2 of the patch posted previously (*) to change the WARN() to a 
> pr_debug() as suggested by Masami.
> (*) https://patchwork.ozlabs.org/patch/762670/
> 
> - Naveen
> 
>  arch/powerpc/kernel/kprobes.c | 20 +---
>  1 file changed, 9 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> index db40b13fd3d1..5b26d5202273 100644
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -648,24 +648,22 @@ NOKPROBE_SYMBOL(setjmp_pre_handler);
>  
>  void __used jprobe_return(void)
>  {
> - asm volatile("trap" ::: "memory");
> + asm volatile("jprobe_return_trap:\n"
> +  "trap\n"
> +  ::: "memory");
>  }
>  NOKPROBE_SYMBOL(jprobe_return);
>  
> -static void __used jprobe_return_end(void)
> -{
> -}
> -NOKPROBE_SYMBOL(jprobe_return_end);
> -
>  int longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
>  {
>   struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
>  
> - /*
> -  * FIXME - we should ideally be validating that we got here 'cos
> -  * of the "trap" in jprobe_return() above, before restoring the
> -  * saved regs...
> -  */
> + if (regs->nip != ppc_kallsyms_lookup_name("jprobe_return_trap")) {
> + pr_debug("longjmp_break_handler NIP (0x%lx) does not match 
> jprobe_return_trap (0x%lx)\n",
> + regs->nip, 
> ppc_kallsyms_lookup_name("jprobe_return_trap"));
> + return 0;
> + }
> +
>   memcpy(regs, &kcb->jprobe_saved_regs, sizeof(struct pt_regs));
>   /* It's OK to start function graph tracing again */
>   unpause_graph_tracing();
> -- 
> 2.14.1
> 


-- 
Masami Hiramatsu 


Re: Regression for Power PC in 4.14-rc0 - bisected to commit 31bfdb036f12

2017-09-13 Thread Andrew Donnellan

On 14/09/17 10:07, Larry Finger wrote:
When booting my PowerBook Aluminum G4, I get a pop-up screen that says 
"The system is running in low-graphics  mode. Your screen, graphics 
card, and input device settings could not be detected correctly. You 
will need to configure these yourself." This is a big-endian 74xx CPU. 
The lscpu command says it is a PowerBook5,6 model.


This problem has been bisected to commit 31bfdb036f12 ("powerpc: Use 
instruction emulation infrastructure to handle alignment faults"). I am 
confident of the accuracy of the bisection.


Attached is the configuration.


Try with https://patchwork.ozlabs.org/patch/813153/

(A selftest for the instruction emulation will make its way upstream 
eventually...)


--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH 3/5] powerpc/kprobes: Fix warnings from __this_cpu_read() on preempt kernels

2017-09-13 Thread Masami Hiramatsu
On Thu, 14 Sep 2017 02:50:34 +0530
"Naveen N. Rao"  wrote:

> Kamalesh pointed out that we are getting the below call traces with
> livepatched functions when we enable CONFIG_PREEMPT:
> 
> [  495.470721] BUG: using __this_cpu_read() in preemptible [] code: 
> cat/8394
> [  495.471167] caller is is_current_kprobe_addr+0x30/0x90
> [  495.471171] CPU: 4 PID: 8394 Comm: cat Tainted: G  K 
> 4.13.0-rc7-nnr+ #95
> [  495.471173] Call Trace:
> [  495.471178] [c0008fd9b960] [c09f039c] dump_stack+0xec/0x160 
> (unreliable)
> [  495.471184] [c0008fd9b9a0] [c059169c] 
> check_preemption_disabled+0x15c/0x170
> [  495.471187] [c0008fd9ba30] [c0046460] 
> is_current_kprobe_addr+0x30/0x90
> [  495.471191] [c0008fd9ba60] [c004e9a0] ftrace_call+0x1c/0xb8
> [  495.471195] [c0008fd9bc30] [c0376fd8] seq_read+0x238/0x5c0
> [  495.471199] [c0008fd9bcd0] [c03cfd78] proc_reg_read+0x88/0xd0
> [  495.471203] [c0008fd9bd00] [c033e5d4] __vfs_read+0x44/0x1b0
> [  495.471206] [c0008fd9bd90] [c03402ec] vfs_read+0xbc/0x1b0
> [  495.471210] [c0008fd9bde0] [c0342138] SyS_read+0x68/0x110
> [  495.471214] [c0008fd9be30] [c000bc6c] system_call+0x58/0x6c
> 
> Commit c05b8c4474c030 ("powerpc/kprobes: Skip livepatch_handler() for
> jprobes") introduced a helper is_current_kprobe_addr() to help determine
> if the current function has been livepatched or if it has a jprobe
> installed, both of which modify the NIP.
> 
> In the case of a jprobe, kprobe_ftrace_handler() disables pre-emption
> before calling into setjmp_pre_handler() which returns without disabling
> pre-emption. This is done to ensure that the jprobe han dler won't
> disappear beneath us if the jprobe is unregistered between the
> setjmp_pre_handler() and the subsequent longjmp_break_handler() called
> from the jprobe handler. Due to this, we can use __this_cpu_read() in
> is_current_kprobe_addr() with the pre-emption check as we know that
> pre-emption will be disabled.
> 
> However, if this function has been livepatched, we are still doing this
> check and when we do so, pre-emption won't necessarily be disabled. This
> results in the call trace shown above.
> 
> Fix this by only invoking is_current_kprobe_addr() when pre-emption is
> disabled. And since we now guard this within a pre-emption check, we can
> instead use raw_cpu_read() to get the current_kprobe value skipping the
> check done by __this_cpu_read().

Hmm, can you disable preempt temporary at this function and read it?

Thank you,

> 
> Fixes: c05b8c4474c030 ("powerpc/kprobes: Skip livepatch_handler() for 
> jprobes")
> Reported-by: Kamalesh Babulal 
> Signed-off-by: Naveen N. Rao 
> ---
>  arch/powerpc/kernel/kprobes.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> index e848fe2c93fb..db40b13fd3d1 100644
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -45,8 +45,12 @@ struct kretprobe_blackpoint kretprobe_blacklist[] = 
> {{NULL, NULL}};
>  
>  int is_current_kprobe_addr(unsigned long addr)
>  {
> - struct kprobe *p = kprobe_running();
> - return (p && (unsigned long)p->addr == addr) ? 1 : 0;
> + if (!preemptible()) {
> + struct kprobe *p = raw_cpu_read(current_kprobe);
> + return (p && (unsigned long)p->addr == addr) ? 1 : 0;
> + }
> +
> + return 0;
>  }
>  
>  bool arch_within_kprobe_blacklist(unsigned long addr)
> -- 
> 2.14.1
> 


-- 
Masami Hiramatsu 


Re: [PATCH v3 04/20] mm: VMA sequence count

2017-09-13 Thread Sergey Senozhatsky
Hi,

On (09/13/17 18:56), Laurent Dufour wrote:
> Hi Sergey,
> 
> On 13/09/2017 13:53, Sergey Senozhatsky wrote:
> > Hi,
> > 
> > On (09/08/17 20:06), Laurent Dufour wrote:
[..]
> > ok, so what I got on my box is:
> > 
> > vm_munmap()  -> down_write_killable(&mm->mmap_sem)
> >  do_munmap()
> >   __split_vma()
> >__vma_adjust()  -> write_seqcount_begin(&vma->vm_sequence)
> >-> write_seqcount_begin_nested(&next->vm_sequence, 
> > SINGLE_DEPTH_NESTING)
> > 
> > so this gives 3 dependencies  ->mmap_sem   ->   ->vm_seq
> >   ->vm_seq ->   ->vm_seq/1
> >   ->mmap_sem   ->   ->vm_seq/1
> > 
> > 
> > SyS_mremap() -> down_write_killable(¤t->mm->mmap_sem)
> >  move_vma()   -> write_seqcount_begin(&vma->vm_sequence)
> >   -> write_seqcount_begin_nested(&new_vma->vm_sequence, 
> > SINGLE_DEPTH_NESTING);
> >   move_page_tables()
> >__pte_alloc()
> > pte_alloc_one()
> >  __alloc_pages_nodemask()
> >   fs_reclaim_acquire()
> > 
> > 
> > I think here we have prepare_alloc_pages() call, that does
> > 
> > -> fs_reclaim_acquire(gfp_mask)
> > -> fs_reclaim_release(gfp_mask)
> > 
> > so that adds one more dependency  ->mmap_sem   ->   ->vm_seq->   
> > fs_reclaim
> >   ->mmap_sem   ->   ->vm_seq/1  ->   
> > fs_reclaim
> > 
> > 
> > now, under memory pressure we hit the slow path and perform direct
> > reclaim. direct reclaim is done under fs_reclaim lock, so we end up
> > with the following call chain
> > 
> > __alloc_pages_nodemask()
> >  __alloc_pages_slowpath()
> >   __perform_reclaim()   ->   fs_reclaim_acquire(gfp_mask);
> >try_to_free_pages()
> > shrink_node()
> >  shrink_active_list()
> >   rmap_walk_file()  ->   i_mmap_lock_read(mapping);
> > 
> > 
> > and this break the existing dependency. since we now take the leaf lock
> > (fs_reclaim) first and the the root lock (->mmap_sem).
> 
> Thanks for looking at this.
> I'm sorry, I should have miss something.

no prob :)


> My understanding is that there are 2 chains of locks:
>  1. from __vma_adjust() mmap_sem -> i_mmap_rwsem -> vm_seq
>  2. from move_vmap() mmap_sem -> vm_seq -> fs_reclaim
>  2. from __alloc_pages_nodemask() fs_reclaim -> i_mmap_rwsem

yes, as far as lockdep warning suggests.

> So the solution would be to have in __vma_adjust()
>  mmap_sem -> vm_seq -> i_mmap_rwsem
> 
> But this will raised the following dependency from  unmap_mapping_range()
> unmap_mapping_range() -> i_mmap_rwsem
>  unmap_mapping_range_tree()
>   unmap_mapping_range_vma()
>zap_page_range_single()
> unmap_single_vma()
>  unmap_page_range()   -> vm_seq
> 
> And there is no way to get rid of it easily as in unmap_mapping_range()
> there is no VMA identified yet.
> 
> That's being said I can't see any clear way to get lock dependency cleaned
> here.
> Furthermore, this is not clear to me how a deadlock could happen as vm_seq
> is a sequence lock, and there is no way to get blocked here.

as far as I understand,
   seq locks can deadlock, technically. not on the write() side, but on
the read() side:

read_seqcount_begin()
 raw_read_seqcount_begin()
   __read_seqcount_begin()

and __read_seqcount_begin() spins for ever

   __read_seqcount_begin()
   {
repeat:
 ret = READ_ONCE(s->sequence);
 if (unlikely(ret & 1)) {
 cpu_relax();
 goto repeat;
 }
 return ret;
   }


so if there are two CPUs, one doing write_seqcount() and the other one
doing read_seqcount() then what can happen is something like this

CPU0CPU1

fs_reclaim_acquire()
write_seqcount_begin()
fs_reclaim_acquire()read_seqcount_begin()
write_seqcount_end()

CPU0 can't write_seqcount_end() because of fs_reclaim_acquire() from
CPU1, CPU1 can't read_seqcount_begin() because CPU0 did write_seqcount_begin()
and now waits for fs_reclaim_acquire(). makes sense?

-ss


Regression for Power PC in 4.14-rc0 - bisected to commit 31bfdb036f12

2017-09-13 Thread Larry Finger
When booting my PowerBook Aluminum G4, I get a pop-up screen that says "The 
system is running in low-graphics  mode. Your screen, graphics card, and input 
device settings could not be detected correctly. You will need to configure 
these yourself." This is a big-endian 74xx CPU. The lscpu command says it is a 
PowerBook5,6 model.


This problem has been bisected to commit 31bfdb036f12 ("powerpc: Use instruction 
emulation infrastructure to handle alignment faults"). I am confident of the 
accuracy of the bisection.


Attached is the configuration.

Thanks,

Larry



#
# Automatically generated file; DO NOT EDIT.
# Linux/powerpc 4.13.0-rc2 Kernel Configuration
#
# CONFIG_PPC64 is not set

#
# Processor support
#
CONFIG_PPC_BOOK3S_32=y
# CONFIG_PPC_85xx is not set
# CONFIG_PPC_8xx is not set
# CONFIG_40x is not set
# CONFIG_44x is not set
# CONFIG_E200 is not set
CONFIG_PPC_BOOK3S=y
CONFIG_6xx=y
CONFIG_PPC_FPU=y
CONFIG_ALTIVEC=y
CONFIG_PPC_STD_MMU=y
CONFIG_PPC_STD_MMU_32=y
# CONFIG_PPC_MM_SLICES is not set
CONFIG_PPC_HAVE_PMU_SUPPORT=y
CONFIG_PPC_PERF_CTRS=y
# CONFIG_FORCE_SMP is not set
# CONFIG_SMP is not set
# CONFIG_PPC_DOORBELL is not set
CONFIG_VDSO32=y
CONFIG_CPU_BIG_ENDIAN=y
CONFIG_PPC32=y
CONFIG_32BIT=y
# CONFIG_ARCH_PHYS_ADDR_T_64BIT is not set
# CONFIG_ARCH_DMA_ADDR_T_64BIT is not set
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MAX=17
CONFIG_ARCH_MMAP_RND_BITS_MIN=11
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=17
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=11
# CONFIG_HAVE_SETUP_PER_CPU_AREA is not set
# CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK is not set
CONFIG_NR_IRQS=512
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_HAS_DMA_SET_COHERENT_MASK=y
CONFIG_PPC=y
# CONFIG_GENERIC_CSUM is not set
CONFIG_EARLY_PRINTK=y
CONFIG_PANIC_TIMEOUT=180
CONFIG_GENERIC_NVRAM=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_PPC_UDBG_16550=y
# CONFIG_GENERIC_TBSYNC is not set
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
# CONFIG_EPAPR_BOOT is not set
# CONFIG_DEFAULT_UIMAGE is not set
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
# CONFIG_PPC_DCR_NATIVE is not set
# CONFIG_PPC_DCR_MMIO is not set
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_PGTABLE_LEVELS=2
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_KERNEL_GZIP=y
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
CONFIG_USELIB=y
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_SHOW_LEVEL=y
CONFIG_IRQ_DOMAIN=y
CONFIG_GENERIC_MSI_IRQ=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y

#
# RCU Subsystem
#
CONFIG_TINY_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TINY_SRCU=y
# CONFIG_TASKS_RCU is not set
# CONFIG_RCU_STALL_COMMON is not set
# CONFIG_RCU_NEED_SEGCBLIST is not set
CONFIG_BUILD_BIN2C=y
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=20
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
CONFIG_BLK_CGROUP=y
# CONFIG_DEBUG_BLK_CGROUP is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_CFS_BANDWIDTH is not set
# CONFIG_RT_GROUP_SCHED is not set
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_RDMA is not set
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_SOCK_CGROUP_DATA is not set
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
CONFIG_RD_LZ4=y
# CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE is not set
CONFIG_CC_O

Re: [PATCH 4/5] powerpc/jprobes: Disable preemption when triggered through ftrace

2017-09-13 Thread Masami Hiramatsu
On Thu, 14 Sep 2017 02:50:35 +0530
"Naveen N. Rao"  wrote:

> KPROBES_SANITY_TEST throws the below splat when CONFIG_PREEMPT is
> enabled:
> 
> [3.140410] Kprobe smoke test: started
> [3.149680] DEBUG_LOCKS_WARN_ON(val > preempt_count())
> [3.149684] [ cut here ]
> [3.149695] WARNING: CPU: 19 PID: 1 at kernel/sched/core.c:3094 
> preempt_count_sub+0xcc/0x140
> [3.149699] Modules linked in:
> [3.149705] CPU: 19 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc7-nnr+ #97
> [3.149709] task: c000fea8 task.stack: c000feb0
> [3.149713] NIP:  c011d3dc LR: c011d3d8 CTR: 
> c0a090d0
> [3.149718] REGS: c000feb03400 TRAP: 0700   Not tainted  
> (4.13.0-rc7-nnr+)
> [3.149722] MSR:  80021033   CR: 28000282  XER: 
> 
> [3.149732] CFAR: c015aa18 SOFTE: 0
> 
> [3.149786] NIP [c011d3dc] preempt_count_sub+0xcc/0x140
> [3.149790] LR [c011d3d8] preempt_count_sub+0xc8/0x140
> [3.149794] Call Trace:
> [3.149798] [c000feb03680] [c011d3d8] 
> preempt_count_sub+0xc8/0x140 (unreliable)
> [3.149804] [c000feb036e0] [c0046198] 
> kprobe_handler+0x228/0x4b0
> [3.149810] [c000feb03750] [c00269c8] 
> program_check_exception+0x58/0x3b0
> [3.149816] [c000feb037c0] [c000903c] 
> program_check_common+0x16c/0x170
> [3.149822] --- interrupt: 0 at kprobe_target+0x8/0x20
>LR = init_test_probes+0x248/0x7d0
> [3.149829] [c000feb03ab0] [c0e4f048] kp+0x0/0x80 (unreliable)
> [3.149835] [c000feb03b10] [c004ea60] 
> livepatch_handler+0x38/0x74
> [3.149841] [c000feb03ba0] [c0d0de54] init_kprobes+0x1d8/0x208
> [3.149846] [c000feb03c40] [c000daa8] 
> do_one_initcall+0x68/0x1d0
> [3.149852] [c000feb03d00] [c0ce44f0] 
> kernel_init_freeable+0x298/0x374
> [3.149857] [c000feb03dc0] [c000dd84] kernel_init+0x24/0x160
> [3.149863] [c000feb03e30] [c000bfec] 
> ret_from_kernel_thread+0x5c/0x70
> [3.149867] Instruction dump:
> [3.149871] 419effdc 3d22001b 39299240 8129 2f89 409effc8 3c82ffcb 
> 3c62ffcb
> [3.149879] 3884bc68 3863bc18 4803d5fd 6000 <0fe0> 4ba8 
> 6000 6000
> [3.149890] ---[ end trace 432dd46b4ce3d29f ]---
> [3.166003] Kprobe smoke test: passed successfully
> 
> The issue is that we aren't disabling preemption in
> kprobe_ftrace_handler(). Disable it.

Oops, right! Similar patch may need for x86 too.

Acked-by: Masami Hiramatsu 

Thanks!

> Fixes: ead514d5fb30a0 ("powerpc/kprobes: Add support for KPROBES_ON_FTRACE")
> Signed-off-by: Naveen N. Rao 
> ---
>  arch/powerpc/kernel/kprobes-ftrace.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/kprobes-ftrace.c 
> b/arch/powerpc/kernel/kprobes-ftrace.c
> index 6c089d9757c9..2d81404f818c 100644
> --- a/arch/powerpc/kernel/kprobes-ftrace.c
> +++ b/arch/powerpc/kernel/kprobes-ftrace.c
> @@ -65,6 +65,7 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long 
> parent_nip,
>   /* Disable irq for emulating a breakpoint and avoiding preempt */
>   local_irq_save(flags);
>   hard_irq_disable();
> + preempt_disable();
>  
>   p = get_kprobe((kprobe_opcode_t *)nip);
>   if (unlikely(!p) || kprobe_disabled(p))
> @@ -86,12 +87,18 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned 
> long parent_nip,
>   kcb->kprobe_status = KPROBE_HIT_ACTIVE;
>   if (!p->pre_handler || !p->pre_handler(p, regs))
>   __skip_singlestep(p, regs, kcb, orig_nip);
> - /*
> -  * If pre_handler returns !0, it sets regs->nip and
> -  * resets current kprobe.
> -  */
> + else {
> + /*
> +  * If pre_handler returns !0, it sets regs->nip and
> +  * resets current kprobe. In this case, we still need
> +  * to restore irq, but not preemption.
> +  */
> + local_irq_restore(flags);
> + return;
> + }
>   }
>  end:
> + preempt_enable_no_resched();
>   local_irq_restore(flags);
>  }
>  NOKPROBE_SYMBOL(kprobe_ftrace_handler);
> -- 
> 2.14.1
> 


-- 
Masami Hiramatsu 


Re: [PATCH 2/5] powerpc/kprobes: Do not suppress instruction emulation if a single run failed

2017-09-13 Thread Masami Hiramatsu
On Thu, 14 Sep 2017 02:50:33 +0530
"Naveen N. Rao"  wrote:

> Currently, we disable instruction emulation if emulate_step() fails for
> any reason. However, such failures could be transient and specific to a
> particular run. Instead, only disable instruction emulation if we have
> never been able to emulate this. If we had emulated this instruction
> successfully at least once, then we single step only this probe hit and
> continue to try emulating the instruction in subsequent probe hits.

Hmm, would this mean that the instruction is emulatable or not depends
on context? What kind of situation is considerable?

Thank you,

> 
> Signed-off-by: Naveen N. Rao 
> ---
>  arch/powerpc/kernel/kprobes.c | 16 +---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> index c2a6ab38a67f..e848fe2c93fb 100644
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -261,9 +261,19 @@ static int try_to_emulate(struct kprobe *p, struct 
> pt_regs *regs)
>*/
>   printk("Can't step on instruction %x\n", insn);
>   BUG();
> - } else
> - /* This instruction can't be boosted */
> - p->ainsn.boostable = -1;
> + } else {
> + /*
> +  * If we haven't previously emulated this instruction, then it
> +  * can't be boosted. Note it down so we don't try to do so 
> again.
> +  *
> +  * If, however, we had emulated this instruction in the past,
> +  * then this is just an error with the current run. We return
> +  * 0 so that this is now single-stepped, but continue to try
> +  * emulating it in subsequent probe hits.
> +  */
> + if (unlikely(p->ainsn.boostable != 1))
> + p->ainsn.boostable = -1;
> + }
>  
>   return ret;
>  }
> -- 
> 2.14.1
> 


-- 
Masami Hiramatsu 


[PATCH v3] Make initramfs honor CONFIG_DEVTMPFS_MOUNT

2017-09-13 Thread Rob Landley
From: Rob Landley 

Make initramfs honor CONFIG_DEVTMPFS_MOUNT, and move
/dev/console open after devtmpfs mount.

Add workaround for Debian bug that was copied by Ubuntu.

Signed-off-by: Rob Landley 
---

v2 discussion: http://lkml.iu.edu/hypermail/linux/kernel/1705.2/05611.html

 drivers/base/Kconfig |   14 --
 fs/namespace.c   |   14 ++
 init/main.c  |   15 +--
 3 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index f046d21..97352d4 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -48,16 +48,10 @@ config DEVTMPFS_MOUNT
bool "Automount devtmpfs at /dev, after the kernel mounted the rootfs"
depends on DEVTMPFS
help
- This will instruct the kernel to automatically mount the
- devtmpfs filesystem at /dev, directly after the kernel has
- mounted the root filesystem. The behavior can be overridden
- with the commandline parameter: devtmpfs.mount=0|1.
- This option does not affect initramfs based booting, here
- the devtmpfs filesystem always needs to be mounted manually
- after the rootfs is mounted.
- With this option enabled, it allows to bring up a system in
- rescue mode with init=/bin/sh, even when the /dev directory
- on the rootfs is completely empty.
+ Automatically mount devtmpfs at /dev on the root filesystem, which
+ lets the system to come up in rescue mode with [rd]init=/bin/sh.
+ Override with devtmpfs.mount=0 on the commandline. Initramfs can
+ create a /dev dir as needed, other rootfs needs the mount point.
 
 config STANDALONE
bool "Select only drivers that don't need compile-time external 
firmware"
diff --git a/fs/namespace.c b/fs/namespace.c
index f8893dc..06057d7 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2417,7 +2417,21 @@ static int do_add_mount(struct mount *newmnt, struct 
path *path, int mnt_flags)
err = -EBUSY;
if (path->mnt->mnt_sb == newmnt->mnt.mnt_sb &&
path->mnt->mnt_root == path->dentry)
+   {
+   if (IS_ENABLED(CONFIG_DEVTMPFS_MOUNT) &&
+   !strcmp(path->mnt->mnt_sb->s_type->name, "devtmpfs"))
+   {
+   /* Debian's kernel config enables DEVTMPFS_MOUNT, then
+  its initramfs setup script tries to mount devtmpfs
+  again, and if the second mount-over-itself fails
+  the script overmounts a tmpfs on /dev to hide the
+  existing contents, then boot fails with empty /dev. 
*/
+   printk(KERN_WARNING "Debian bug workaround for devtmpfs 
overmount.");
+
+   err = 0;
+   }
goto unlock;
+   }
 
err = -EINVAL;
if (d_is_symlink(newmnt->mnt.mnt_root))
diff --git a/init/main.c b/init/main.c
index 0ee9c686..0d8e5ec 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1065,12 +1065,6 @@ static noinline void __init kernel_init_freeable(void)
 
do_basic_setup();
 
-   /* Open the /dev/console on the rootfs, this should never fail */
-   if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
-   pr_err("Warning: unable to open an initial console.\n");
-
-   (void) sys_dup(0);
-   (void) sys_dup(0);
/*
 * check if there is an early userspace init.  If yes, let it do all
 * the work
@@ -1082,8 +1076,17 @@ static noinline void __init kernel_init_freeable(void)
if (sys_access((const char __user *) ramdisk_execute_command, 0) != 0) {
ramdisk_execute_command = NULL;
prepare_namespace();
+   } else if (IS_ENABLED(CONFIG_DEVTMPFS_MOUNT)) {
+   sys_mkdir("/dev", 0755);
+   devtmpfs_mount("/dev");
}
 
+   /* Open the /dev/console on the rootfs, this should never fail */
+   if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
+   pr_err("Warning: unable to open an initial console.\n");
+   (void) sys_dup(0);
+   (void) sys_dup(0);
+
/*
 * Ok, we have completed the initial bootup, and
 * we're essentially up and running. Get rid of the


Re: [PATCH 1/5] powerpc/kprobes: Some cosmetic updates to try_to_emulate()

2017-09-13 Thread Masami Hiramatsu
On Thu, 14 Sep 2017 02:50:32 +0530
"Naveen N. Rao"  wrote:

> 1. This is only used in kprobes.c, so make it static.
> 2. Remove the un-necessary (ret == 0) comparison in the else clause.
> 
> Signed-off-by: Naveen N. Rao 

Reviewed-by: Masami Hiramatsu 

Thanks!

> ---
>  arch/powerpc/kernel/kprobes.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> index 367494dc67d9..c2a6ab38a67f 100644
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -239,7 +239,7 @@ void arch_prepare_kretprobe(struct kretprobe_instance 
> *ri, struct pt_regs *regs)
>  }
>  NOKPROBE_SYMBOL(arch_prepare_kretprobe);
>  
> -int try_to_emulate(struct kprobe *p, struct pt_regs *regs)
> +static int try_to_emulate(struct kprobe *p, struct pt_regs *regs)
>  {
>   int ret;
>   unsigned int insn = *p->ainsn.insn;
> @@ -261,7 +261,7 @@ int try_to_emulate(struct kprobe *p, struct pt_regs *regs)
>*/
>   printk("Can't step on instruction %x\n", insn);
>   BUG();
> - } else if (ret == 0)
> + } else
>   /* This instruction can't be boosted */
>   p->ainsn.boostable = -1;
>  
> -- 
> 2.14.1
> 


-- 
Masami Hiramatsu 


Re: [PATCH] ASoC: fsl_ssi: Override bit clock rate based on slot number

2017-09-13 Thread Nicolin Chen
On Wed, Sep 13, 2017 at 10:02:20AM +0200, Arnaud Mouiche wrote:

> >Could you please give me a few set of examples of how you set
> >set_sysclk(), set_tdm_slot() with the current driver? The idea
> >here is to figure out a way to calculate the bclk in hw_params
> >without getting set_sysclk() involved any more.

> Here is one, where a bclk = 4*16*fs is expected

> In another setup, there are 8 x 16 bits slots, whatever the number
> of active channels is.
> In this case bclk = 128 * fs
> The number of slots is completely arbitrary. Some slots can even be
> reserved for communication between codecs that don't communicate
> with linux.

In summary, bclk = sample rate * slots * slot_width;

I will update my patch soon.

> >Unfortunately, it looks like a work around to me. I understand
> >the idea of leaving set_sysclk() out there to override the bit
> >clock is convenient, but it is not a standard ALSA design and
> >may eventually introduce new problems like today.
> 
> I agree. I'm not conservative at all concerning this question.
> I don't see a way to remove set_sysclk without breaking current TDM
> users anyway, at least for those who don't have their code
> upstreamed.

Which TDM case would be broken by this removal? The only impact
that I can see is that the ASoC core returns an ENOTSUPP for a
set_sysclk() call now, which is something that a dai-link driver
should have taken care of anyway.

> All information provided through snd_soc_dai_set_tdm_slot( cpu_dai,
> mask, mask, slots, width ) should be enough
> In this case, for TDM users
> 
>bclk = slots * width * fs   (where slots is != channels)

> will manage 99 % of the cases.
> And the remaining 1% will concern people who need to hack the kernel
> so widely they don't care about the set_sysclk removal.

A patch from those people will be always welcome.

> - fsl-asoc-card.c : *something will break since
> snd_soc_dai_set_sysclk returned code is checked*

I've already submitted a patch to ignore all ENOTSUPP.


Re: POWER9 PMU stops after idle workaround

2017-09-13 Thread Michael Ellerman
Yeah I messed up the subject when comitting, from memory, so I had to rebase 
and force push.

I think this is it:

https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=09539f9b123652e969894d6299ae0df2fe12cb5d

cheers



On 14 September 2017 2:11:16 am AEST, "Michal Suchánek"  
wrote:
>On Thu,  3 Aug 2017 20:19:38 +1000 (AEST)
>Michael Ellerman  wrote:
>
>> On Thu, 2017-07-20 at 01:53:22 UTC, Nicholas Piggin wrote:
>> > POWER9 DD2 PMU can stop after a state-loss idle in some conditions.
>> > 
>> > A solution is to set then clear MMCRA[60] after wake from
>state-loss
>> > idle.
>> > 
>> > Signed-off-by: Nicholas Piggin 
>> > Acked-by: Madhavan Srinivasan 
>> > Reviewed-by: Vaidyanathan Srinivasan 
>> > Acked-by: Anton Blanchard   
>> 
>> Applied to powerpc fixes, thanks.
>> 
>> https://git.kernel.org/powerpc/c/bdd21ddb919d28f9c62cdc6286cac9
>> 
>> cheers
>
>Not there anymore. I see this only in the merge branch.
>
>Thanks
>
>Michal

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

[PATCH 5/5] powerpc/jprobes: Validate break handler invocation as being due to a jprobe_return()

2017-09-13 Thread Naveen N. Rao
Fix a circa 2005 FIXME by implementing a check to ensure that we
actually got into the jprobe break handler() due to the trap in
jprobe_return().

Signed-off-by: Naveen N. Rao 
---
This is v2 of the patch posted previously (*) to change the WARN() to a 
pr_debug() as suggested by Masami.
(*) https://patchwork.ozlabs.org/patch/762670/

- Naveen

 arch/powerpc/kernel/kprobes.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index db40b13fd3d1..5b26d5202273 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -648,24 +648,22 @@ NOKPROBE_SYMBOL(setjmp_pre_handler);
 
 void __used jprobe_return(void)
 {
-   asm volatile("trap" ::: "memory");
+   asm volatile("jprobe_return_trap:\n"
+"trap\n"
+::: "memory");
 }
 NOKPROBE_SYMBOL(jprobe_return);
 
-static void __used jprobe_return_end(void)
-{
-}
-NOKPROBE_SYMBOL(jprobe_return_end);
-
 int longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
 {
struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
 
-   /*
-* FIXME - we should ideally be validating that we got here 'cos
-* of the "trap" in jprobe_return() above, before restoring the
-* saved regs...
-*/
+   if (regs->nip != ppc_kallsyms_lookup_name("jprobe_return_trap")) {
+   pr_debug("longjmp_break_handler NIP (0x%lx) does not match 
jprobe_return_trap (0x%lx)\n",
+   regs->nip, 
ppc_kallsyms_lookup_name("jprobe_return_trap"));
+   return 0;
+   }
+
memcpy(regs, &kcb->jprobe_saved_regs, sizeof(struct pt_regs));
/* It's OK to start function graph tracing again */
unpause_graph_tracing();
-- 
2.14.1



[PATCH 4/5] powerpc/jprobes: Disable preemption when triggered through ftrace

2017-09-13 Thread Naveen N. Rao
KPROBES_SANITY_TEST throws the below splat when CONFIG_PREEMPT is
enabled:

[3.140410] Kprobe smoke test: started
[3.149680] DEBUG_LOCKS_WARN_ON(val > preempt_count())
[3.149684] [ cut here ]
[3.149695] WARNING: CPU: 19 PID: 1 at kernel/sched/core.c:3094 
preempt_count_sub+0xcc/0x140
[3.149699] Modules linked in:
[3.149705] CPU: 19 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc7-nnr+ #97
[3.149709] task: c000fea8 task.stack: c000feb0
[3.149713] NIP:  c011d3dc LR: c011d3d8 CTR: c0a090d0
[3.149718] REGS: c000feb03400 TRAP: 0700   Not tainted  
(4.13.0-rc7-nnr+)
[3.149722] MSR:  80021033   CR: 28000282  XER: 

[3.149732] CFAR: c015aa18 SOFTE: 0

[3.149786] NIP [c011d3dc] preempt_count_sub+0xcc/0x140
[3.149790] LR [c011d3d8] preempt_count_sub+0xc8/0x140
[3.149794] Call Trace:
[3.149798] [c000feb03680] [c011d3d8] 
preempt_count_sub+0xc8/0x140 (unreliable)
[3.149804] [c000feb036e0] [c0046198] kprobe_handler+0x228/0x4b0
[3.149810] [c000feb03750] [c00269c8] 
program_check_exception+0x58/0x3b0
[3.149816] [c000feb037c0] [c000903c] 
program_check_common+0x16c/0x170
[3.149822] --- interrupt: 0 at kprobe_target+0x8/0x20
   LR = init_test_probes+0x248/0x7d0
[3.149829] [c000feb03ab0] [c0e4f048] kp+0x0/0x80 (unreliable)
[3.149835] [c000feb03b10] [c004ea60] livepatch_handler+0x38/0x74
[3.149841] [c000feb03ba0] [c0d0de54] init_kprobes+0x1d8/0x208
[3.149846] [c000feb03c40] [c000daa8] do_one_initcall+0x68/0x1d0
[3.149852] [c000feb03d00] [c0ce44f0] 
kernel_init_freeable+0x298/0x374
[3.149857] [c000feb03dc0] [c000dd84] kernel_init+0x24/0x160
[3.149863] [c000feb03e30] [c000bfec] 
ret_from_kernel_thread+0x5c/0x70
[3.149867] Instruction dump:
[3.149871] 419effdc 3d22001b 39299240 8129 2f89 409effc8 3c82ffcb 
3c62ffcb
[3.149879] 3884bc68 3863bc18 4803d5fd 6000 <0fe0> 4ba8 6000 
6000
[3.149890] ---[ end trace 432dd46b4ce3d29f ]---
[3.166003] Kprobe smoke test: passed successfully

The issue is that we aren't disabling preemption in
kprobe_ftrace_handler(). Disable it.

Fixes: ead514d5fb30a0 ("powerpc/kprobes: Add support for KPROBES_ON_FTRACE")
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/kprobes-ftrace.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes-ftrace.c 
b/arch/powerpc/kernel/kprobes-ftrace.c
index 6c089d9757c9..2d81404f818c 100644
--- a/arch/powerpc/kernel/kprobes-ftrace.c
+++ b/arch/powerpc/kernel/kprobes-ftrace.c
@@ -65,6 +65,7 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long 
parent_nip,
/* Disable irq for emulating a breakpoint and avoiding preempt */
local_irq_save(flags);
hard_irq_disable();
+   preempt_disable();
 
p = get_kprobe((kprobe_opcode_t *)nip);
if (unlikely(!p) || kprobe_disabled(p))
@@ -86,12 +87,18 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long 
parent_nip,
kcb->kprobe_status = KPROBE_HIT_ACTIVE;
if (!p->pre_handler || !p->pre_handler(p, regs))
__skip_singlestep(p, regs, kcb, orig_nip);
-   /*
-* If pre_handler returns !0, it sets regs->nip and
-* resets current kprobe.
-*/
+   else {
+   /*
+* If pre_handler returns !0, it sets regs->nip and
+* resets current kprobe. In this case, we still need
+* to restore irq, but not preemption.
+*/
+   local_irq_restore(flags);
+   return;
+   }
}
 end:
+   preempt_enable_no_resched();
local_irq_restore(flags);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
-- 
2.14.1



[PATCH 3/5] powerpc/kprobes: Fix warnings from __this_cpu_read() on preempt kernels

2017-09-13 Thread Naveen N. Rao
Kamalesh pointed out that we are getting the below call traces with
livepatched functions when we enable CONFIG_PREEMPT:

[  495.470721] BUG: using __this_cpu_read() in preemptible [] code: 
cat/8394
[  495.471167] caller is is_current_kprobe_addr+0x30/0x90
[  495.471171] CPU: 4 PID: 8394 Comm: cat Tainted: G  K 
4.13.0-rc7-nnr+ #95
[  495.471173] Call Trace:
[  495.471178] [c0008fd9b960] [c09f039c] dump_stack+0xec/0x160 
(unreliable)
[  495.471184] [c0008fd9b9a0] [c059169c] 
check_preemption_disabled+0x15c/0x170
[  495.471187] [c0008fd9ba30] [c0046460] 
is_current_kprobe_addr+0x30/0x90
[  495.471191] [c0008fd9ba60] [c004e9a0] ftrace_call+0x1c/0xb8
[  495.471195] [c0008fd9bc30] [c0376fd8] seq_read+0x238/0x5c0
[  495.471199] [c0008fd9bcd0] [c03cfd78] proc_reg_read+0x88/0xd0
[  495.471203] [c0008fd9bd00] [c033e5d4] __vfs_read+0x44/0x1b0
[  495.471206] [c0008fd9bd90] [c03402ec] vfs_read+0xbc/0x1b0
[  495.471210] [c0008fd9bde0] [c0342138] SyS_read+0x68/0x110
[  495.471214] [c0008fd9be30] [c000bc6c] system_call+0x58/0x6c

Commit c05b8c4474c030 ("powerpc/kprobes: Skip livepatch_handler() for
jprobes") introduced a helper is_current_kprobe_addr() to help determine
if the current function has been livepatched or if it has a jprobe
installed, both of which modify the NIP.

In the case of a jprobe, kprobe_ftrace_handler() disables pre-emption
before calling into setjmp_pre_handler() which returns without disabling
pre-emption. This is done to ensure that the jprobe handler won't
disappear beneath us if the jprobe is unregistered between the
setjmp_pre_handler() and the subsequent longjmp_break_handler() called
from the jprobe handler. Due to this, we can use __this_cpu_read() in
is_current_kprobe_addr() with the pre-emption check as we know that
pre-emption will be disabled.

However, if this function has been livepatched, we are still doing this
check and when we do so, pre-emption won't necessarily be disabled. This
results in the call trace shown above.

Fix this by only invoking is_current_kprobe_addr() when pre-emption is
disabled. And since we now guard this within a pre-emption check, we can
instead use raw_cpu_read() to get the current_kprobe value skipping the
check done by __this_cpu_read().

Fixes: c05b8c4474c030 ("powerpc/kprobes: Skip livepatch_handler() for jprobes")
Reported-by: Kamalesh Babulal 
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/kprobes.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index e848fe2c93fb..db40b13fd3d1 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -45,8 +45,12 @@ struct kretprobe_blackpoint kretprobe_blacklist[] = {{NULL, 
NULL}};
 
 int is_current_kprobe_addr(unsigned long addr)
 {
-   struct kprobe *p = kprobe_running();
-   return (p && (unsigned long)p->addr == addr) ? 1 : 0;
+   if (!preemptible()) {
+   struct kprobe *p = raw_cpu_read(current_kprobe);
+   return (p && (unsigned long)p->addr == addr) ? 1 : 0;
+   }
+
+   return 0;
 }
 
 bool arch_within_kprobe_blacklist(unsigned long addr)
-- 
2.14.1



[PATCH 2/5] powerpc/kprobes: Do not suppress instruction emulation if a single run failed

2017-09-13 Thread Naveen N. Rao
Currently, we disable instruction emulation if emulate_step() fails for
any reason. However, such failures could be transient and specific to a
particular run. Instead, only disable instruction emulation if we have
never been able to emulate this. If we had emulated this instruction
successfully at least once, then we single step only this probe hit and
continue to try emulating the instruction in subsequent probe hits.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/kprobes.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index c2a6ab38a67f..e848fe2c93fb 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -261,9 +261,19 @@ static int try_to_emulate(struct kprobe *p, struct pt_regs 
*regs)
 */
printk("Can't step on instruction %x\n", insn);
BUG();
-   } else
-   /* This instruction can't be boosted */
-   p->ainsn.boostable = -1;
+   } else {
+   /*
+* If we haven't previously emulated this instruction, then it
+* can't be boosted. Note it down so we don't try to do so 
again.
+*
+* If, however, we had emulated this instruction in the past,
+* then this is just an error with the current run. We return
+* 0 so that this is now single-stepped, but continue to try
+* emulating it in subsequent probe hits.
+*/
+   if (unlikely(p->ainsn.boostable != 1))
+   p->ainsn.boostable = -1;
+   }
 
return ret;
 }
-- 
2.14.1



[PATCH 1/5] powerpc/kprobes: Some cosmetic updates to try_to_emulate()

2017-09-13 Thread Naveen N. Rao
1. This is only used in kprobes.c, so make it static.
2. Remove the un-necessary (ret == 0) comparison in the else clause.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/kprobes.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 367494dc67d9..c2a6ab38a67f 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -239,7 +239,7 @@ void arch_prepare_kretprobe(struct kretprobe_instance *ri, 
struct pt_regs *regs)
 }
 NOKPROBE_SYMBOL(arch_prepare_kretprobe);
 
-int try_to_emulate(struct kprobe *p, struct pt_regs *regs)
+static int try_to_emulate(struct kprobe *p, struct pt_regs *regs)
 {
int ret;
unsigned int insn = *p->ainsn.insn;
@@ -261,7 +261,7 @@ int try_to_emulate(struct kprobe *p, struct pt_regs *regs)
 */
printk("Can't step on instruction %x\n", insn);
BUG();
-   } else if (ret == 0)
+   } else
/* This instruction can't be boosted */
p->ainsn.boostable = -1;
 
-- 
2.14.1



Re: [PATCH 2/7] powerpc: introduce pte_get_hash_gslot() helper

2017-09-13 Thread Ram Pai
On Wed, Sep 13, 2017 at 07:32:57PM +1000, Balbir Singh wrote:
> On Sat, Sep 9, 2017 at 8:44 AM, Ram Pai  wrote:
> > Introduce pte_get_hash_gslot()() which returns the slot number of the
> > HPTE in the global hash table.
> >
> > This function will come in handy as we work towards re-arranging the
> > PTE bits in the later patches.
> >
> > Reviewed-by: Aneesh Kumar K.V 
> > Signed-off-by: Ram Pai 
> > ---
> >  arch/powerpc/include/asm/book3s/64/hash.h |3 +++
> >  arch/powerpc/mm/hash_utils_64.c   |   18 ++
> >  2 files changed, 21 insertions(+), 0 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
> > b/arch/powerpc/include/asm/book3s/64/hash.h
> > index f884520..060c059 100644
> > --- a/arch/powerpc/include/asm/book3s/64/hash.h
> > +++ b/arch/powerpc/include/asm/book3s/64/hash.h
> > @@ -166,6 +166,9 @@ static inline int hash__pte_none(pte_t pte)
> > return (pte_val(pte) & ~H_PTE_NONE_MASK) == 0;
> >  }
> >
> > +unsigned long pte_get_hash_gslot(unsigned long vpn, unsigned long shift,
> > +   int ssize, real_pte_t rpte, unsigned int subpg_index);
> > +
> >  /* This low level function performs the actual PTE insertion
> >   * Setting the PTE depends on the MMU type and other factors. It's
> >   * an horrible mess that I'm not going to try to clean up now but
> > diff --git a/arch/powerpc/mm/hash_utils_64.c 
> > b/arch/powerpc/mm/hash_utils_64.c
> > index 67ec2e9..e68f053 100644
> > --- a/arch/powerpc/mm/hash_utils_64.c
> > +++ b/arch/powerpc/mm/hash_utils_64.c
> > @@ -1591,6 +1591,24 @@ static inline void tm_flush_hash_page(int local)
> >  }
> >  #endif
> >
> > +/*
> > + * return the global hash slot, corresponding to the given
> > + * pte, which contains the hpte.
> 
> Does this work with native/guest page tables? I guess both.

Yes. it is supposed to work with native as well as guest page tables.
The code has been this way for ages. This patch encapsulate
the logic in a standalone function.

> The comment sounds trivial, could you please elaborate more.
> Looking at the code, it seems like given a real pte, we use
> the hash value and hidx to figure out the slot value in the global
> slot information. This uses information in the software page
> tables. Is that correct?

Yes. This uses information passed to it by the caller, the information
is expected to be derived from linux page table.

> Do we have to consider validity and
> present state here or is that guaranteed?

This function's job is to do the math and return the global slot based
on the input. It will return the calculated value regardless of the validity of
its inputs.

Its the the callers' job to validate the pte and ensure that it is hashed,
before meaningfully using the return value of the this function.

> 
> > + */
> > +unsigned long pte_get_hash_gslot(unsigned long vpn, unsigned long shift,
> > +   int ssize, real_pte_t rpte, unsigned int subpg_index)
> > +{
> > +   unsigned long hash, slot, hidx;
> > +
> > +   hash = hpt_hash(vpn, shift, ssize);
> > +   hidx = __rpte_to_hidx(rpte, subpg_index);
> > +   if (hidx & _PTEIDX_SECONDARY)
> > +   hash = ~hash;
> > +   slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
> > +   slot += hidx & _PTEIDX_GROUP_IX;
> > +   return slot;
> > +}
> > +
> >  /* WARNING: This is called from hash_low_64.S, if you change this 
> > prototype,
> >   *  do not forget to update the assembly call site !
> >   */
> 
> Balbir Singh.

-- 
Ram Pai



[RFC PATCH] powerpc/uprobes: Fixup si_addr if we took an exception while single stepping

2017-09-13 Thread Naveen N. Rao
If the single-stepped instruction causes an exception, we may end up
setting siginfo.si_addr to the address of the uprobe xol area. This is
not desirable since the address won't make sense for the process if it
wants to handle the exception. Fixup the si_addr field in such cases.

Reported-by: Anton Blanchard 
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/include/asm/uprobes.h |  7 +++
 arch/powerpc/kernel/traps.c|  4 
 arch/powerpc/kernel/uprobes.c  | 17 +
 3 files changed, 28 insertions(+)

diff --git a/arch/powerpc/include/asm/uprobes.h 
b/arch/powerpc/include/asm/uprobes.h
index 7422a999a39a..13fc6af3c1fd 100644
--- a/arch/powerpc/include/asm/uprobes.h
+++ b/arch/powerpc/include/asm/uprobes.h
@@ -23,6 +23,7 @@
  */
 
 #include 
+#include 
 #include 
 
 typedef ppc_opcode_t uprobe_opcode_t;
@@ -45,4 +46,10 @@ struct arch_uprobe_task {
unsigned long   saved_trap_nr;
 };
 
+#ifdef CONFIG_UPROBES
+extern void uprobe_fixup_exception(struct pt_regs *regs, siginfo_t *info);
+#else
+static inline void uprobe_fixup_exception(struct pt_regs *regs, siginfo_t 
*info) { }
+#endif
+
 #endif /* _ASM_UPROBES_H */
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index ec74e203ee04..1bb858a37029 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -66,6 +66,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC_CORE)
 int (*__debugger)(struct pt_regs *regs) __read_mostly;
@@ -292,6 +293,9 @@ void _exception(int signr, struct pt_regs *regs, int code, 
unsigned long addr)
info.si_signo = signr;
info.si_code = code;
info.si_addr = (void __user *) addr;
+
+   uprobe_fixup_exception(regs, &info);
+
force_sig_info(signr, &info, current);
 }
 
diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index 5d105b8eeece..a361a56e6210 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -214,3 +215,19 @@ bool arch_uretprobe_is_alive(struct return_instance *ret, 
enum rp_check ctx,
else
return regs->gpr[1] < ret->stack;
 }
+
+void uprobe_fixup_exception(struct pt_regs *regs, siginfo_t *info)
+{
+   struct task_struct *t = current;
+   struct uprobe_task *utask = t->utask;
+
+   if (likely(!utask || !utask->active_uprobe))
+   return;
+
+   /*
+* We reset si_addr here.
+* regs->nip is reset during our way back through uprobe_deny_signal()
+*/
+   if (info->si_addr == (void __user *) utask->xol_vaddr)
+   info->si_addr = (void __user *) utask->vaddr;
+}
-- 
2.14.1



Re: [PATCH] uprobe: Warn if unable to install breakpoint

2017-09-13 Thread Naveen N. Rao
On 2017/09/01 03:09PM, Michael Ellerman wrote:
> "Naveen N. Rao"  writes:
> 
> > When we try to install a uprobe breakpoint in uprobe_mmap(), we ignore
> > all errors encountered in the process per this comment at the top of
> > the function:
> > /*
> >  * Called from mmap_region/vma_adjust with mm->mmap_sem acquired.
> >  *
> >  * Currently we ignore all errors and always return 0, the callers
> >  * can't handle the failure anyway.
> >  */
> >
> > However, this is very confusing for users since no probe hits are
> > recorded nor is an error logged in dmesg.
> >
> > Fix this by logging an error in dmesg so that users can discover that
> > there was an issue with the uprobe. To facilitate use of uprobe_warn(),
> > we move that function to the top of the file.
> >
> > With this patch, we see a message similar to this in dmesg:
> > [  201.449213] uprobe: uprobe_t:9740 failed to setup probe at 0x95c 
> > (-524)
> >
> > Reported-by: Anton Blanchard 
> > Signed-off-by: Naveen N. Rao 
> > ---
> >  kernel/events/uprobes.c | 21 ++---
> >  1 file changed, 14 insertions(+), 7 deletions(-)
> >
> > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> > index 0e137f98a50c..587c591a535c 100644
> > --- a/kernel/events/uprobes.c
> > +++ b/kernel/events/uprobes.c
> > @@ -112,6 +112,12 @@ struct xol_area {
> > unsigned long   vaddr;  /* Page(s) of 
> > instruction slots */
> >  };
> >  
> > +static void uprobe_warn(struct task_struct *t, const char *msg)
> > +{
> > +   pr_warn("uprobe: %s:%d failed to %s\n",
> > +   current->comm, current->pid, msg);
> 
> That should probably be ratelimited no?

Uprobes can only be installed by root today, so it is not as bad. But, I 
do agree that it is good to ratelimit. I will send a subsequent patch to 
do this.

Thanks for the review,
- Naveen



Re: [PATCH v3 04/20] mm: VMA sequence count

2017-09-13 Thread Laurent Dufour
Hi Sergey,

On 13/09/2017 13:53, Sergey Senozhatsky wrote:
> Hi,
> 
> On (09/08/17 20:06), Laurent Dufour wrote:
> [..]
>> @@ -903,6 +910,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned 
>> long start,
>>  mm->map_count--;
>>  mpol_put(vma_policy(next));
>>  kmem_cache_free(vm_area_cachep, next);
>> +write_seqcount_end(&next->vm_sequence);
>>  /*
>>   * In mprotect's case 6 (see comments on vma_merge),
>>   * we must remove another next too. It would clutter
>> @@ -932,11 +940,14 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned 
>> long start,
>>  if (remove_next == 2) {
>>  remove_next = 1;
>>  end = next->vm_end;
>> +write_seqcount_end(&vma->vm_sequence);
>>  goto again;
>> -}
>> -else if (next)
>> +} else if (next) {
>> +if (next != vma)
>> +write_seqcount_begin_nested(&next->vm_sequence,
>> +
>> SINGLE_DEPTH_NESTING);
>>  vma_gap_update(next);
>> -else {
>> +} else {
>>  /*
>>   * If remove_next == 2 we obviously can't
>>   * reach this path.
>> @@ -962,6 +973,10 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned 
>> long start,
>>  if (insert && file)
>>  uprobe_mmap(insert);
>>  
>> +if (next && next != vma)
>> +write_seqcount_end(&next->vm_sequence);
>> +write_seqcount_end(&vma->vm_sequence);
> 
> 
> ok, so what I got on my box is:
> 
> vm_munmap()  -> down_write_killable(&mm->mmap_sem)
>  do_munmap()
>   __split_vma()
>__vma_adjust()  -> write_seqcount_begin(&vma->vm_sequence)
>-> write_seqcount_begin_nested(&next->vm_sequence, 
> SINGLE_DEPTH_NESTING)
> 
> so this gives 3 dependencies  ->mmap_sem   ->   ->vm_seq
>   ->vm_seq ->   ->vm_seq/1
>   ->mmap_sem   ->   ->vm_seq/1
> 
> 
> SyS_mremap() -> down_write_killable(¤t->mm->mmap_sem)
>  move_vma()   -> write_seqcount_begin(&vma->vm_sequence)
>   -> write_seqcount_begin_nested(&new_vma->vm_sequence, 
> SINGLE_DEPTH_NESTING);
>   move_page_tables()
>__pte_alloc()
> pte_alloc_one()
>  __alloc_pages_nodemask()
>   fs_reclaim_acquire()
> 
> 
> I think here we have prepare_alloc_pages() call, that does
> 
> -> fs_reclaim_acquire(gfp_mask)
> -> fs_reclaim_release(gfp_mask)
> 
> so that adds one more dependency  ->mmap_sem   ->   ->vm_seq->   
> fs_reclaim
>   ->mmap_sem   ->   ->vm_seq/1  ->   
> fs_reclaim
> 
> 
> now, under memory pressure we hit the slow path and perform direct
> reclaim. direct reclaim is done under fs_reclaim lock, so we end up
> with the following call chain
> 
> __alloc_pages_nodemask()
>  __alloc_pages_slowpath()
>   __perform_reclaim()   ->   fs_reclaim_acquire(gfp_mask);
>try_to_free_pages()
> shrink_node()
>  shrink_active_list()
>   rmap_walk_file()  ->   i_mmap_lock_read(mapping);
> 
> 
> and this break the existing dependency. since we now take the leaf lock
> (fs_reclaim) first and the the root lock (->mmap_sem).

Thanks for looking at this.
I'm sorry, I should have miss something.

My understanding is that there are 2 chains of locks:
 1. from __vma_adjust() mmap_sem -> i_mmap_rwsem -> vm_seq
 2. from move_vmap() mmap_sem -> vm_seq -> fs_reclaim
 2. from __alloc_pages_nodemask() fs_reclaim -> i_mmap_rwsem

So the solution would be to have in __vma_adjust()
 mmap_sem -> vm_seq -> i_mmap_rwsem

But this will raised the following dependency from  unmap_mapping_range()
unmap_mapping_range()   -> i_mmap_rwsem
 unmap_mapping_range_tree()
  unmap_mapping_range_vma()
   zap_page_range_single()
unmap_single_vma()
 unmap_page_range() -> vm_seq

And there is no way to get rid of it easily as in unmap_mapping_range()
there is no VMA identified yet.

That's being said I can't see any clear way to get lock dependency cleaned
here.
Furthermore, this is not clear to me how a deadlock could happen as vm_seq
is a sequence lock, and there is no way to get blocked here.

Cheers,
Laurent.

> 
> well, seems to be the case.
> 
>   -ss
> 



Re: POWER9 PMU stops after idle workaround

2017-09-13 Thread Michal Suchánek
On Thu,  3 Aug 2017 20:19:38 +1000 (AEST)
Michael Ellerman  wrote:

> On Thu, 2017-07-20 at 01:53:22 UTC, Nicholas Piggin wrote:
> > POWER9 DD2 PMU can stop after a state-loss idle in some conditions.
> > 
> > A solution is to set then clear MMCRA[60] after wake from state-loss
> > idle.
> > 
> > Signed-off-by: Nicholas Piggin 
> > Acked-by: Madhavan Srinivasan 
> > Reviewed-by: Vaidyanathan Srinivasan 
> > Acked-by: Anton Blanchard   
> 
> Applied to powerpc fixes, thanks.
> 
> https://git.kernel.org/powerpc/c/bdd21ddb919d28f9c62cdc6286cac9
> 
> cheers

Not there anymore. I see this only in the merge branch.

Thanks

Michal


[PATCH] powerpc/tm: Flush TM only if CPU has TM feature

2017-09-13 Thread Gustavo Romero
Commit cd63f3cf1d59 ("powerpc/tm: Fix saving of TM SPRs in core dump")
added code to access TM SPRs in flush_tmregs_to_thread(). However
flush_tmregs_to_thread() does not check if TM feature is available on
CPU before trying to access TM SPRs in order to copy live state to
thread structures. flush_tmregs_to_thread() is indeed guarded by
CONFIG_PPC_TRANSACTIONAL_MEM but it might be the case that kernel
was compiled with CONFIG_PPC_TRANSACTIONAL_MEM enabled and ran on
a CPU without TM feature available, thus rendering the execution
of TM instructions that are treated by the CPU as illegal instructions.

The fix is just to add proper checking in flush_tmregs_to_thread()
if CPU has the TM feature before accessing any TM-specific resource,
returning immediately if TM is no available on the CPU. Adding
that checking in flush_tmregs_to_thread() instead of in places
where it is called, like in vsr_get() and vsr_set(), is better because
avoids the same problem cropping up elsewhere.

Cc: sta...@vger.kernel.org # v4.13+
Fixes: cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump")
Signed-off-by: Gustavo Romero 
---
 arch/powerpc/kernel/ptrace.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index 07cd22e..18f547f 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -124,6 +124,8 @@ static const struct pt_regs_offset regoffset_table[] = {
 static void flush_tmregs_to_thread(struct task_struct *tsk)
 {
/*
+* If CPU has no TM support, return.
+*
 * If task is not current, it will have been flushed already to
 * it's thread_struct during __switch_to().
 *
@@ -131,7 +133,7 @@ static void flush_tmregs_to_thread(struct task_struct *tsk)
 * in the appropriate thread structures from live.
 */
 
-   if (tsk != current)
+   if (!cpu_has_feature(CPU_FTR_TM) || tsk != current)
return;
 
if (MSR_TM_SUSPENDED(mfmsr())) {
-- 
2.7.4



[PATCH] powerpc: Fix handling of alignment interrupt on dcbz instruction

2017-09-13 Thread Christian Zigotzky

On 13 September 2017 at 09:15AM, Michal Sojka wrote:

On Wed, Sep 13 2017, Michael Ellerman wrote:

Michal, Christian, can you please confirm this fixes the problems you
were seeing.

Yes, it fixes my problem. Thanks.

-Michal


It also fixes my problem. Many thanks!

-- Christian



Re: [RFC PATCH 1/2] core: implement OPAL_SIGNAL_SYSTEM_RESET with POWER9 scoms

2017-09-13 Thread Nicholas Piggin
On Wed, 13 Sep 2017 09:18:34 +1000
Benjamin Herrenschmidt  wrote:

> On Wed, 2017-09-13 at 02:05 +1000, Nicholas Piggin wrote:
> > This implements a way to raise system reset interrupts on other
> > cores. This has not yet been tested on DD2 or with deeper sleep
> > states.  
> 
> Reminds me, we need to workaround a bug with XSCOMs on P9
> 
> PSCOMs to core in the range 20010A80-20010Ab8 (list below) can fail
> occasionally with an error of 4 (PCB_ADDRESS_ERROR). We need to
> (silently) retry up to 32 times.

[snip]

So, just put a loop into xscom_read and xscom_write for those
addresses for P9 chips?

Thanks,
Nick


Re: [RFC PATCH 2/2] powerpc/powernv: implement NMI IPIs with OPAL_SIGNAL_SYSTEM_RESET

2017-09-13 Thread Nicholas Piggin
On Wed, 13 Sep 2017 02:05:53 +1000
Nicholas Piggin  wrote:

> There are two complications. The first is that sreset from stop states
> come in with SRR1 set to do a powersave wakeup, with an sreset reason
> encoded.
> 
> The second is that threads on the same core can't be signalled directly
> so we must designate a bounce CPU to reflect the IPI back.

Here is an updated Linux patch for the latest OPAL patch. This has
a few assorted fixes as well to make it work nicely, I roll them into
one patch here to make it easy to apply for testing the OPAL patch.

Thanks,
Nick

---
 arch/powerpc/include/asm/opal-api.h|  1 +
 arch/powerpc/include/asm/opal.h|  2 +
 arch/powerpc/kernel/irq.c  | 18 ++
 arch/powerpc/kernel/watchdog.c | 30 +++--
 arch/powerpc/platforms/powernv/opal-wrappers.S |  1 +
 arch/powerpc/platforms/powernv/powernv.h   |  1 +
 arch/powerpc/platforms/powernv/setup.c |  3 +
 arch/powerpc/platforms/powernv/smp.c   | 89 ++
 arch/powerpc/xmon/xmon.c   | 17 +++--
 9 files changed, 151 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 450a60b81d2a..e39f4236b413 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -188,6 +188,7 @@
 #define OPAL_XIVE_DUMP 142
 #define OPAL_XIVE_RESERVED3143
 #define OPAL_XIVE_RESERVED4144
+#define OPAL_SIGNAL_SYSTEM_RESET145
 #define OPAL_NPU_INIT_CONTEXT  146
 #define OPAL_NPU_DESTROY_CONTEXT   147
 #define OPAL_NPU_MAP_LPAR  148
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 726c23304a57..7d7613c49f2b 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -281,6 +281,8 @@ int opal_get_power_shift_ratio(u32 handle, int token, u32 
*psr);
 int opal_set_power_shift_ratio(u32 handle, int token, u32 psr);
 int opal_sensor_group_clear(u32 group_hndl, int token);
 
+int64_t opal_signal_system_reset(int32_t cpu);
+
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
   int depth, void *data);
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 4e65bf82f5e0..5f2c0367bab2 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -407,10 +407,28 @@ static const u8 srr1_to_lazyirq[0x10] = {
PACA_IRQ_HMI,
0, 0, 0, 0, 0 };
 
+/*
+ * System reset does not have to wait for Linux interrupts
+ * to be re-enabled, so just replay it now.
+ */
+static noinline void replay_system_reset(void)
+{
+   struct pt_regs regs;
+
+   ppc_save_regs(®s);
+
+   get_paca()->in_nmi = 1;
+   system_reset_exception(®s);
+   get_paca()->in_nmi = 0;
+}
+
 void irq_set_pending_from_srr1(unsigned long srr1)
 {
unsigned int idx = (srr1 & SRR1_WAKEMASK_P8) >> 18;
 
+   if (unlikely(idx == 4))
+   replay_system_reset();
+
/*
 * The 0 index (SRR1[42:45]=b) must always evaluate to 0,
 * so this can be called unconditionally with srr1 wake reason.
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 2f6eadd9408d..a6aa85b0cdeb 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -61,6 +61,7 @@ static DEFINE_PER_CPU(u64, wd_timer_tb);
  */
 static unsigned long __wd_smp_lock;
 static cpumask_t wd_smp_cpus_pending;
+static cpumask_t wd_smp_cpus_stuck_tmp;
 static cpumask_t wd_smp_cpus_stuck;
 static u64 wd_smp_last_reset_tb;
 
@@ -97,8 +98,7 @@ static void wd_lockup_ipi(struct pt_regs *regs)
else
dump_stack();
 
-   if (hardlockup_panic)
-   nmi_panic(regs, "Hard LOCKUP");
+   /* Do not panic from here because that can recurse into NMI IPI layer */
 }
 
 static void set_cpumask_stuck(const struct cpumask *cpumask, u64 tb)
@@ -136,16 +136,29 @@ static void watchdog_smp_panic(int cpu, u64 tb)
 
/*
 * Try to trigger the stuck CPUs.
+*
+* There is a bit of a hack for OPAL here because it can not
+* signal sibling threads. Don't try to signal those or mark
+* them stuck, in the hope that another core will notice.
 */
+   cpumask_clear(&wd_smp_cpus_stuck_tmp);
for_each_cpu(c, &wd_smp_cpus_pending) {
if (c == cpu)
continue;
-   smp_send_nmi_ipi(c, wd_lockup_ipi, 100);
+   if (firmware_has_feature(FW_FEATURE_OPAL)) {
+   if (cpumask_test_cpu(c, cpu_sibling_mask(cpu)))
+   continue;
+   }
+   cpumask_set_cpu(c, &wd_smp_cpus_stuck_tmp);
+   if (!sysctl_hardlockup_all_cp

Re: [PATCH v3 04/20] mm: VMA sequence count

2017-09-13 Thread Sergey Senozhatsky
Hi,

On (09/08/17 20:06), Laurent Dufour wrote:
[..]
> @@ -903,6 +910,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned 
> long start,
>   mm->map_count--;
>   mpol_put(vma_policy(next));
>   kmem_cache_free(vm_area_cachep, next);
> + write_seqcount_end(&next->vm_sequence);
>   /*
>* In mprotect's case 6 (see comments on vma_merge),
>* we must remove another next too. It would clutter
> @@ -932,11 +940,14 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned 
> long start,
>   if (remove_next == 2) {
>   remove_next = 1;
>   end = next->vm_end;
> + write_seqcount_end(&vma->vm_sequence);
>   goto again;
> - }
> - else if (next)
> + } else if (next) {
> + if (next != vma)
> + write_seqcount_begin_nested(&next->vm_sequence,
> + 
> SINGLE_DEPTH_NESTING);
>   vma_gap_update(next);
> - else {
> + } else {
>   /*
>* If remove_next == 2 we obviously can't
>* reach this path.
> @@ -962,6 +973,10 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned 
> long start,
>   if (insert && file)
>   uprobe_mmap(insert);
>  
> + if (next && next != vma)
> + write_seqcount_end(&next->vm_sequence);
> + write_seqcount_end(&vma->vm_sequence);


ok, so what I got on my box is:

vm_munmap()  -> down_write_killable(&mm->mmap_sem)
 do_munmap()
  __split_vma()
   __vma_adjust()  -> write_seqcount_begin(&vma->vm_sequence)
   -> write_seqcount_begin_nested(&next->vm_sequence, 
SINGLE_DEPTH_NESTING)

so this gives 3 dependencies  ->mmap_sem   ->   ->vm_seq
  ->vm_seq ->   ->vm_seq/1
  ->mmap_sem   ->   ->vm_seq/1


SyS_mremap() -> down_write_killable(¤t->mm->mmap_sem)
 move_vma()   -> write_seqcount_begin(&vma->vm_sequence)
  -> write_seqcount_begin_nested(&new_vma->vm_sequence, 
SINGLE_DEPTH_NESTING);
  move_page_tables()
   __pte_alloc()
pte_alloc_one()
 __alloc_pages_nodemask()
  fs_reclaim_acquire()


I think here we have prepare_alloc_pages() call, that does

-> fs_reclaim_acquire(gfp_mask)
-> fs_reclaim_release(gfp_mask)

so that adds one more dependency  ->mmap_sem   ->   ->vm_seq->   fs_reclaim
  ->mmap_sem   ->   ->vm_seq/1  ->   fs_reclaim


now, under memory pressure we hit the slow path and perform direct
reclaim. direct reclaim is done under fs_reclaim lock, so we end up
with the following call chain

__alloc_pages_nodemask()
 __alloc_pages_slowpath()
  __perform_reclaim()   ->   fs_reclaim_acquire(gfp_mask);
   try_to_free_pages()
shrink_node()
 shrink_active_list()
  rmap_walk_file()  ->   i_mmap_lock_read(mapping);


and this break the existing dependency. since we now take the leaf lock
(fs_reclaim) first and the the root lock (->mmap_sem).


well, seems to be the case.

-ss


[PATCH] crypto: talitos - fix hashing

2017-09-13 Thread Christophe Leroy
md5sum on some files gives wrong result

Exemple:

With the md5sum from libkcapi:
c15115c05bad51113f81bdaee735dd09  test

With the original md5sum:
bbdf41d80ba7e8b2b7be3a0772be76cb  test

This patch fixes this issue

Signed-off-by: Christophe Leroy 
---
 drivers/crypto/talitos.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index f6f811a1bb..d47b3eb1f0 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -1769,7 +1769,7 @@ static int common_nonsnoop_hash(struct talitos_edesc 
*edesc,
 
sg_count = edesc->src_nents ?: 1;
if (is_sec1 && sg_count > 1)
-   sg_copy_to_buffer(areq->src, sg_count, edesc->buf, length);
+   sg_copy_to_buffer(req_ctx->psrc, sg_count, edesc->buf, length);
else
sg_count = dma_map_sg(dev, req_ctx->psrc, sg_count,
  DMA_TO_DEVICE);
-- 
2.13.3



[PATCH] crypto: talitos - fix sha224

2017-09-13 Thread Christophe Leroy
Kernel crypto tests report the following error at startup

[2.752626] alg: hash: Test 4 failed for sha224-talitos
[2.757907] : 30 e2 86 e2 e7 8a dd 0d d7 eb 9f d5 83 fe f1 b0
0010: 2d 5a 6c a5 f9 55 ea fd 0e 72 05 22

This patch fixes it

Signed-off-by: Christophe Leroy 
---
 drivers/crypto/talitos.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index d47b3eb1f0..e2d323fa24 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -1756,9 +1756,9 @@ static int common_nonsnoop_hash(struct talitos_edesc 
*edesc,
req_ctx->swinit = 0;
} else {
desc->ptr[1] = zero_entry;
-   /* Indicate next op is not the first. */
-   req_ctx->first = 0;
}
+   /* Indicate next op is not the first. */
+   req_ctx->first = 0;
 
/* HMAC key */
if (ctx->keylen)
-- 
2.13.3



Re: [PATCH 2/7] powerpc: introduce pte_get_hash_gslot() helper

2017-09-13 Thread Balbir Singh
On Sat, Sep 9, 2017 at 8:44 AM, Ram Pai  wrote:
> Introduce pte_get_hash_gslot()() which returns the slot number of the
> HPTE in the global hash table.
>
> This function will come in handy as we work towards re-arranging the
> PTE bits in the later patches.
>
> Reviewed-by: Aneesh Kumar K.V 
> Signed-off-by: Ram Pai 
> ---
>  arch/powerpc/include/asm/book3s/64/hash.h |3 +++
>  arch/powerpc/mm/hash_utils_64.c   |   18 ++
>  2 files changed, 21 insertions(+), 0 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
> b/arch/powerpc/include/asm/book3s/64/hash.h
> index f884520..060c059 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash.h
> @@ -166,6 +166,9 @@ static inline int hash__pte_none(pte_t pte)
> return (pte_val(pte) & ~H_PTE_NONE_MASK) == 0;
>  }
>
> +unsigned long pte_get_hash_gslot(unsigned long vpn, unsigned long shift,
> +   int ssize, real_pte_t rpte, unsigned int subpg_index);
> +
>  /* This low level function performs the actual PTE insertion
>   * Setting the PTE depends on the MMU type and other factors. It's
>   * an horrible mess that I'm not going to try to clean up now but
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index 67ec2e9..e68f053 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -1591,6 +1591,24 @@ static inline void tm_flush_hash_page(int local)
>  }
>  #endif
>
> +/*
> + * return the global hash slot, corresponding to the given
> + * pte, which contains the hpte.

Does this work with native/guest page tables? I guess both.
The comment sounds trivial, could you please elaborate more.
Looking at the code, it seems like given a real pte, we use
the hash value and hidx to figure out the slot value in the global
slot information. This uses information in the software page
tables. Is that correct? Do we have to consider validity and
present state here or is that guaranteed?

> + */
> +unsigned long pte_get_hash_gslot(unsigned long vpn, unsigned long shift,
> +   int ssize, real_pte_t rpte, unsigned int subpg_index)
> +{
> +   unsigned long hash, slot, hidx;
> +
> +   hash = hpt_hash(vpn, shift, ssize);
> +   hidx = __rpte_to_hidx(rpte, subpg_index);
> +   if (hidx & _PTEIDX_SECONDARY)
> +   hash = ~hash;
> +   slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
> +   slot += hidx & _PTEIDX_GROUP_IX;
> +   return slot;
> +}
> +
>  /* WARNING: This is called from hash_low_64.S, if you change this prototype,
>   *  do not forget to update the assembly call site !
>   */

Balbir Singh.


Re: [PATCH v2 3/5] powerpc/mce: Hookup derror (load/store) UE errors

2017-09-13 Thread Nicholas Piggin
On Wed, 13 Sep 2017 16:26:59 +1000
Balbir Singh  wrote:

> On Wed, Sep 13, 2017 at 4:21 PM, Nicholas Piggin  wrote:
> > On Wed, 13 Sep 2017 16:10:47 +1000
> > Balbir Singh  wrote:
> >  
> >> Extract physical_address for UE errors by walking the page
> >> tables for the mm and address at the NIP, to extract the
> >> instruction. Then use the instruction to find the effective
> >> address via analyse_instr().
> >>
> >> We might have page table walking races, but we expect them to
> >> be rare, the physical address extraction is best effort. The idea
> >> is to then hook up this infrastructure to memory failure eventually.  
> >
> > This all looks pretty good to me, you can probably update these
> > changelogs now because you are hooking it into memory failure.  
> 
> Yep, the eventually can probably go, I meant in the next patch.
> The following patch then hooks this up into memory_failure
> 
> >
> > I wonder if it would be worth skipping the instruction analysis and
> > page table walk if we've recursed up to the maximum MCE depth, just
> > in case we're hitting MCEs in part of that code or data.  
> 
> Yep, good idea. Would you be OK if we did this after this small series
> got merged?

I don't mind much, but I'd have thought being that it's all new code,
adding the check would be pretty easy.

if (get_paca()->in_mce == 4) {}

(Probably with the '4' appropriately #defined out of here and the
exception-64s.S code)

> Since that would mean that we got a UE error
> while processing the our third machine check exception, I think the
> probability of us running into that is low, but I'd definitely like to do that
> once these changes are merged.

If we're getting UEs in the machine check code or walking kernel
page tables though, it will just keep recurring. Unlikely yes, but
it's still a slight regression.

Thanks,
Nick


Re: [PATCH v3 1/2] powerpc/mm: Export flush_all_mm()

2017-09-13 Thread Frederic Barrat



Le 13/09/2017 à 06:04, Alistair Popple a écrit :

+static inline void hash__local_flush_all_mm(struct mm_struct *mm)
+{
+   /*
+* There's no Page Walk Cache for hash, so what is needed is
+* the same as flush_tlb_mm(), which doesn't really make sense
+* with hash. So the only thing we could do is flush the
+* entire LPID! Punt for now, as it's not being used.
+*/


Do you think it is worth putting a WARN_ON_ONCE here if we're asserting this
isn't used on hash?


I toyed with the idea. The reason I didn't add it was because 
hash__local_flush_tlb_mm() and hash__flush_tlb_mm() don't have one, yet 
it's also not supported. And I had faith in a developer thinking about 
using it would see the comment.
I was actually pretty close to have hash__local_flush_all_mm() call 
directly hash__local_flush_tlb_mm(), since the "all" stands for "pwc and 
tlb" and pwc doesn't exist for hash. But that doesn't do us any good for 
the time being.


Michael: any preference?

Side note: I'm under the impression that flush_tlb_mm() may be called on 
hash, even though it does nothing. kernel/fork.c, dup_mmap() and 
potentially another (through cscope).


  Fred



Otherwise looks good and is also needed for NPU.

Reviewed-By: Alistair Popple 


+}
+
+static inline void hash__flush_all_mm(struct mm_struct *mm)
+{
+   /*
+* There's no Page Walk Cache for hash, so what is needed is
+* the same as flush_tlb_mm(), which doesn't really make sense
+* with hash. So the only thing we could do is flush the
+* entire LPID! Punt for now, as it's not being used.
+*/
+}
+
  static inline void hash__local_flush_tlb_page(struct vm_area_struct *vma,
  unsigned long vmaddr)
  {
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index 9b433a624bf3..af06c6fe8a9f 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -21,17 +21,20 @@ extern void radix__flush_tlb_range(struct vm_area_struct 
*vma, unsigned long sta
  extern void radix__flush_tlb_kernel_range(unsigned long start, unsigned long 
end);
  
  extern void radix__local_flush_tlb_mm(struct mm_struct *mm);

+extern void radix__local_flush_all_mm(struct mm_struct *mm);
  extern void radix__local_flush_tlb_page(struct vm_area_struct *vma, unsigned 
long vmaddr);
  extern void radix__local_flush_tlb_page_psize(struct mm_struct *mm, unsigned 
long vmaddr,
  int psize);
  extern void radix__tlb_flush(struct mmu_gather *tlb);
  #ifdef CONFIG_SMP
  extern void radix__flush_tlb_mm(struct mm_struct *mm);
+extern void radix__flush_all_mm(struct mm_struct *mm);
  extern void radix__flush_tlb_page(struct vm_area_struct *vma, unsigned long 
vmaddr);
  extern void radix__flush_tlb_page_psize(struct mm_struct *mm, unsigned long 
vmaddr,
int psize);
  #else
  #define radix__flush_tlb_mm(mm)   radix__local_flush_tlb_mm(mm)
+#define radix__flush_all_mm(mm)radix__local_flush_all_mm(mm)
  #define radix__flush_tlb_page(vma,addr)   
radix__local_flush_tlb_page(vma,addr)
  #define radix__flush_tlb_page_psize(mm,addr,p) 
radix__local_flush_tlb_page_psize(mm,addr,p)
  #endif
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush.h
index 72b925f97bab..70760d018bcd 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
@@ -57,6 +57,13 @@ static inline void local_flush_tlb_page(struct 
vm_area_struct *vma,
return hash__local_flush_tlb_page(vma, vmaddr);
  }
  
+static inline void local_flush_all_mm(struct mm_struct *mm)

+{
+   if (radix_enabled())
+   return radix__local_flush_all_mm(mm);
+   return hash__local_flush_all_mm(mm);
+}
+
  static inline void tlb_flush(struct mmu_gather *tlb)
  {
if (radix_enabled())
@@ -79,9 +86,17 @@ static inline void flush_tlb_page(struct vm_area_struct *vma,
return radix__flush_tlb_page(vma, vmaddr);
return hash__flush_tlb_page(vma, vmaddr);
  }
+
+static inline void flush_all_mm(struct mm_struct *mm)
+{
+   if (radix_enabled())
+   return radix__flush_all_mm(mm);
+   return hash__flush_all_mm(mm);
+}
  #else
  #define flush_tlb_mm(mm)  local_flush_tlb_mm(mm)
  #define flush_tlb_page(vma, addr) local_flush_tlb_page(vma, addr)
+#define flush_all_mm(mm)   local_flush_all_mm(mm)
  #endif /* CONFIG_SMP */
  /*
   * flush the page walk cache for the address
diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index b3e849c4886e..5a1f46eff3a2 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -144,7 +144,7 @@ void radix__local_flush_tlb_mm(struct mm_struct *m

Re: [PATCH] ASoC: fsl_ssi: Override bit clock rate based on slot number

2017-09-13 Thread Arnaud Mouiche



On 12/09/2017 23:32, Nicolin Chen wrote:

On Tue, Sep 12, 2017 at 04:35:13PM +0200, Arnaud Mouiche wrote:


- * freq: Output BCLK frequency = samplerate * 32 (fixed) * channels
- * dir: SND_SOC_CLOCK_OUT -> TxBCLK, SND_SOC_CLOCK_IN -> RxBCLK.
+ * freq: Output BCLK frequency = samplerate * 32 (fixed) * slots (or channels)

Slots are not necessarily 32 bits width.
Indeed, a simple grep of snd_soc_dai_set_tdm_slot shows 16, 24, 32
and 0 bits usage. (don't know the real meaning of 0 BTW...)
So, it should be good to also remember the slots width given in
snd_soc_dai_set_tdm_slot() call.

RM says that the word length is fixed to 32 in I2S Master mode.
So I used it here. But I think using it with the slots could be
wrong here as you stated. What kind of clock rates (bit and lr)
does a TDM case have?

The problem of slot width here is handled in the set_tdm_slot()
at all. So I tried to ignored that. But we probably do need it
when calculating things with the slot number.

Could you please give me a few set of examples of how you set
set_sysclk(), set_tdm_slot() with the current driver? The idea
here is to figure out a way to calculate the bclk in hw_params
without getting set_sysclk() involved any more.


Here is one, where a bclk = 4*16*fs is expected

   static int imx_hifi_hw_params(struct snd_pcm_substream *substream,
 struct snd_pcm_hw_params *params)
   {
struct snd_soc_pcm_runtime *rtd = substream->private_data;
struct imx_priv *priv = &card_priv;
struct device *dev = &priv->pdev->dev;

int ret = 0;
struct snd_soc_dai *cpu_dai = rtd->cpu_dai;
unsigned int freq;
int ch;

/* configuring cpu_dai
 * - bitclk  (fclk is computed automatically)
 *  16bit * 4 (max) channels * sampling rate
 * - tdm maskto select the active channels
 */

freq = 4 * 16 * params_rate(params);
if (freq != priv->current_freq) {
/* clk_id and direction are not taken in count by fsl_ssi
   driver */
ret = snd_soc_dai_set_sysclk( cpu_dai, 0, freq, 0 );
if (ret) {
dev_err(dev, "failed to set cpu dai bitclk: %u\n", freq);
return ret;
}
priv->current_freq = freq;
}
ch = params_channels(params);
if (ch != priv->current_ch) {
ret = snd_soc_dai_set_tdm_slot( cpu_dai, (1 << ch)-1, (1 <<
   ch)-1, 4, 16 );
if (ret) {
dev_err(dev, "failed to set cpu dai tdm slots:
   ch=%d\n", ch);
return ret;
}
priv->current_ch = ch;
}
return 0;
   }

In another setup, there are 8 x 16 bits slots, whatever the number of 
active channels is.

In this case bclk = 128 * fs
The number of slots is completely arbitrary. Some slots can even be 
reserved for communication between codecs that don't communicate with linux.





Anyway, there is something wrong in the snd_soc_codec_set_sysclk
usage by fsl_ssi
Let's get a look again at the description:

/**
  * snd_soc_dai_set_sysclk - configure DAI system or master clock.
  * @dai: DAI
  * @clk_id: DAI specific clock ID
  * @freq: new clock frequency in Hz
  * @dir: new clock direction - input/output.
  *
  * Configures the DAI master (MCLK) or system (SYSCLK) clocking.
  */
int snd_soc_dai_set_sysclk(struct snd_soc_dai *dai, int clk_id,
 unsigned int freq, int dir)

So, it can be used to configure 2 different clocks (and more) with
different usages.

Lukasz complains that simple-card is configuring MCLK incorrectly.
but simple-card, only want to configure the SYSCLK, whereas fsl_ssi
understand "configure the MCLK..." (fsl_ssi doesn't check the clk_id
at all)

It's more than a clock_id in my opinion. The driver now sets
the bit clock rate via set_sysclk() other than the MCLK rate
actually.


I think the problem is here.
I would propose a new clk_id

 #define FSL_SSI_SYSCLK_MCLK 1

And leave fsl_ssi_set_dai_sysclk(xxx, FSL_SSI_SYSCLK_MCLK, ...)
override the computed mlck.
This will leave a way for obscure TDM users to specify a specific a
random mclk frequency for obscure reasons...
Unfortunately, such generic clock_id (sysclk, mclk) were never
defined widely.

Unfortunately, it looks like a work around to me. I understand
the idea of leaving set_sysclk() out there to override the bit
clock is convenient, but it is not a standard ALSA design and
may eventually introduce new problems like today.


I agree. I'm not conservative at all concerning this question.
I don't see a way to remove set_sysclk without breaking current TDM 
users anyway, at least for those who don't have their code upstreamed.



All information provided through snd_soc_dai_set_tdm_slot( cpu_dai, 
mask, mask, slots, width ) should be enough

In this case, for TDM users

   bclk = slots * width * fs   (where slots is != channels)



will manage 99 % of the ca

Re: [PATCH 1/7] powerpc: introduce pte_set_hash_slot() helper

2017-09-13 Thread Balbir Singh
On Sat, Sep 9, 2017 at 8:44 AM, Ram Pai  wrote:
> Introduce pte_set_hash_slot().It  sets the (H_PAGE_F_SECOND|H_PAGE_F_GIX)
> bits at  the   appropriate   location   in   the   PTE  of  4K  PTE.  For
> 64K PTE, it  sets  the  bits  in  the  second  part  of  the  PTE. Though
> the implementation  for the former just needs the slot parameter, it does
> take some additional parameters to keep the prototype consistent.
>
> This function  will  be  handy  as  we   work   towards  re-arranging the
> bits in the later patches.
>
> Reviewed-by: Aneesh Kumar K.V 
> Signed-off-by: Ram Pai 
> ---
>  arch/powerpc/include/asm/book3s/64/hash-4k.h  |   15 +++
>  arch/powerpc/include/asm/book3s/64/hash-64k.h |   25 
> +
>  2 files changed, 40 insertions(+), 0 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
> b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> index 0c4e470..8909039 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> @@ -48,6 +48,21 @@ static inline int hash__hugepd_ok(hugepd_t hpd)
>  }
>  #endif
>
> +/*
> + * 4k pte format is  different  from  64k  pte  format.  Saving  the
> + * hash_slot is just a matter of returning the pte bits that need to
> + * be modified. On 64k pte, things are a  little  more  involved and
> + * hence  needs   many   more  parameters  to  accomplish  the  same.
> + * However we  want  to abstract this out from the caller by keeping
> + * the prototype consistent across the two formats.
> + */
> +static inline unsigned long pte_set_hash_slot(pte_t *ptep, real_pte_t rpte,
> +   unsigned int subpg_index, unsigned long slot)
> +{
> +   return (slot << H_PAGE_F_GIX_SHIFT) &
> +   (H_PAGE_F_SECOND | H_PAGE_F_GIX);
> +}
> +
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>
>  static inline char *get_hpte_slot_array(pmd_t *pmdp)
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
> b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> index 9732837..6652669 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> @@ -74,6 +74,31 @@ static inline unsigned long __rpte_to_hidx(real_pte_t 
> rpte, unsigned long index)
> return (pte_val(rpte.pte) >> H_PAGE_F_GIX_SHIFT) & 0xf;
>  }
>
> +/*
> + * Commit the hash slot and return pte bits that needs to be modified.
> + * The caller is expected to modify the pte bits accordingly and
> + * commit the pte to memory.
> + */
> +static inline unsigned long pte_set_hash_slot(pte_t *ptep, real_pte_t rpte,
> +   unsigned int subpg_index, unsigned long slot)
> +{
> +   unsigned long *hidxp = (unsigned long *)(ptep + PTRS_PER_PTE);
> +
> +   rpte.hidx &= ~(0xfUL << (subpg_index << 2));
> +   *hidxp = rpte.hidx  | (slot << (subpg_index << 2));
> +   /*
> +* Commit the hidx bits to memory before returning.
> +* Anyone reading  pte  must  ensure hidx bits are
> +* read  only  after  reading the pte by using the

Can you lose the only and make it "read after reading the pte"
read only is easy to confuse as read-only

> +* read-side  barrier  smp_rmb(). __real_pte() can
> +* help ensure that.
> +*/
> +   smp_wmb();
> +
> +   /* no pte bits to be modified, return 0x0UL */
> +   return 0x0UL;

Acked-by: Balbir Singh 

Balbir Singh.


Re: [PATCH] powerpc: Fix handling of alignment interrupt on dcbz instruction

2017-09-13 Thread Michael Ellerman
Michal, Christian, can you please confirm this fixes the problems you were 
seeing.

cheers

On 13 September 2017 2:51:24 pm AEST, Paul Mackerras  wrote:
>This fixes the emulation of the dcbz instruction in the alignment
>interrupt handler.  The error was that we were comparing just the
>instruction type field of op.type rather than the whole thing,
>and therefore the comparison "type != CACHEOP + DCBZ" was always
>true.
>
>Fixes: 31bfdb036f12 ("powerpc: Use instruction emulation infrastructure
>to handle alignment faults")
>Signed-off-by: Paul Mackerras 
>---
> arch/powerpc/kernel/align.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c
>index 26b9994d27ee..43ef25156480 100644
>--- a/arch/powerpc/kernel/align.c
>+++ b/arch/powerpc/kernel/align.c
>@@ -341,7 +341,7 @@ int fix_alignment(struct pt_regs *regs)
> 
>   type = op.type & INSTR_TYPE_MASK;
>   if (!OP_IS_LOAD_STORE(type)) {
>-  if (type != CACHEOP + DCBZ)
>+  if (op.type != CACHEOP + DCBZ)
>   return -EINVAL;
>   PPC_WARN_ALIGNMENT(dcbz, regs);
>   r = emulate_dcbz(op.ea, regs);
>-- 
>2.11.0

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Re: [PATCH] powerpc: Fix handling of alignment interrupt on dcbz instruction

2017-09-13 Thread Michal Sojka
On Wed, Sep 13 2017, Michael Ellerman wrote:
> Michal, Christian, can you please confirm this fixes the problems you
> were seeing.

Yes, it fixes my problem. Thanks.

-Michal