Re: [PATCH] powerpc: dts: use #include "..." to include local DT

2017-06-13 Thread Michael Ellerman
Masahiro Yamada  writes:
> 2017-06-13 19:21 GMT+09:00 Michael Ellerman :
>> Masahiro Yamada  writes:
>>>
>>> (+Anatolij Gustschin )
>>>
>>> Ping.
>>> I am not 100% sure who is responsible for this,
>>> but somebody, could take a look at this patch, please?
>>
>> Have you tested it actually works?
>>
>> It sounds reasonable, and if it behaves as you describe there is no
>> change in behaviour, right?
>
> I do not have access to hardware,
> but it is pretty easy to test this patch.
>
> $ make O=foo ARCH=powerpc CROSS_COMPILE=powerpc-linux-  dts/ac14xx.dtb
>
> gave me the DTB output.
>
> The binary comparison matched with/without this patch,
> so I am sure there is no change in behavior.
>
> Likewise for mpc5121ads and pdm360ng.

Thanks.

Acked-by: Michael Ellerman 


cheers


Re: [PATCH] powerpc64/hw_breakpoints: Handle data breakpoints in radix mode

2017-06-13 Thread Ram Pai
On Wed, Jun 14, 2017 at 10:43:30AM +0530, Aneesh Kumar K.V wrote:
> 
> 
> On Wednesday 14 June 2017 10:41 AM, Naveen N. Rao wrote:
> >Hi Aneesh,
> >
> >On 2017/06/14 08:38AM, Aneesh Kumar K.V wrote:
> >>"Naveen N. Rao"  writes:
> >>
> >>>On P9, trying to use data breakpoints throws the splat shown below (*).
> >>>This is because the check for a data breakpoint in DSISR is in
> >>>do_hash_page(). Move this check to handle_page_fault() so as to catch
> >>>data breakpoints in both hash and radix MMU modes.
> >>>
> >>>While at it, also remove the label '11' that was made redundant by
> >>>commit a546498f3bf9aa ("powerpc: Call do_page_fault() with interrupts
> >>>off")
> >>>
> >>>(*)
> >>> Unable to handle kernel paging request for data at address 
> >>> 0xc0e19218
> >>> Faulting instruction address: 0xc01155e8
> >>> cpu 0x0: Vector: 300 (Data Access) at [c000ef1e7b20]
> >>> pc: c01155e8: find_pid_ns+0x48/0xe0
> >>> lr: c0116ac4: find_task_by_vpid+0x44/0x90
> >>> sp: c000ef1e7da0
> >>> msr: 90009033
> >>> dar: c0e19218
> >>> dsisr: 40
> >>> current = 0xc000f1f59700
> >>> paca= 0xcfd4 softe: 0 irq_happened: 0x01
> >>> pid   = 1192, comm = sh
> >>> Linux version 4.12.0-rc3-nnr (root@ea605ec2993c) (gcc version 5.4.0 
> >>> 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.1) ) #74 SMP Tue Jun 13 
> >>> 16:52:49 UTC 2017
> >>> enter ? for help
> >>> [c000ef1e7dc0] c0116ac4 find_task_by_vpid+0x44/0x90
> >>> [c000ef1e7de0] c0108800 SyS_setpgid+0x80/0x220
> >>> [c000ef1e7e30] c000ba6c system_call+0x38/0xfc
> >>> --- Exception: c01 (System Call) at 7fff94480890
> >>> SP (7fffd91e7260) is in userspace
> >>>
> >>>Fixes: caca285e5ab4a ("powerpc/mm/radix: Use STD_MMU_64 to properly
> >>>isolate hash related code")
> >>>Reported-by: Shriya R. Kulkarni 
> >>>Signed-off-by: Naveen N. Rao 
> >>>---
> >>>  arch/powerpc/kernel/exceptions-64s.S | 8 
> >>>  1 file changed, 4 insertions(+), 4 deletions(-)
> >>>
> >>>diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> >>>b/arch/powerpc/kernel/exceptions-64s.S
> >>>index ae418b85c17c..17ee701b8336 100644
> >>>--- a/arch/powerpc/kernel/exceptions-64s.S
> >>>+++ b/arch/powerpc/kernel/exceptions-64s.S
> >>>@@ -1411,10 +1411,8 @@ USE_TEXT_SECTION()
> >>>   .balign IFETCH_ALIGN_BYTES
> >>>  do_hash_page:
> >>>  #ifdef CONFIG_PPC_STD_MMU_64
> >>>-  andis.  r0,r4,0xa410/* weird error? */
> >>>+  andis.  r0,r4,0xa450/* weird error? */
> >>
> >>Can we convert that to a #define value. Ram did try to do that here.
> >>
> >>https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-June/158607.html
> >
> >Hmm... I feel it will be good to do that as part of Ram's series since
> >he has already coded it up :)
> >
> >Ram's patches will anyway require a rebase and the change I do here for
> >detecting DAWR already has a #define, so it should be a simple matter of
> >including DSISR_DABRMATCH in DSISR_PAGE_FAULT_MASK.
> >
> >But, if you really feel that I should make that change here, please do
> >let me know and I will re-spin with those changes.
> >
> 
> The thing is that change from 0xa410 to 0xa450 is not clear at all.
> And it needs proper documentation. IMHO the best way to do that is
> switch to #define name for that constant.

Naveen,

Feel free to take the macro from my patch. I think the magic
number is a little ugly. The earlier it goes the better.

My patch set will probably go through a couple of iterations. So I will
rebase it on top of your changes anyway.

RP



Re: [PATCH] recordmcount.pl: Add ppc64le to list of supported architectures

2017-06-13 Thread Kamalesh Babulal

On Wednesday 14 June 2017 10:23 AM, Michael Ellerman wrote:

I don't get this, the arch should always be powerpc.

Right. Something else is fubar for that to happen, we should fix
whatever it is.


Agree, ARCH over-ruling by reading the underlying architecture will
not work, as the expectation is to have ARCH=powerpc for all of the 
powerpc platform. Sorry for the noise, kindly ignore this patch.


--
cheers,
Kamalesh.



Re: [PATCH] powerpc64/hw_breakpoints: Handle data breakpoints in radix mode

2017-06-13 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:
> On Wednesday 14 June 2017 10:41 AM, Naveen N. Rao wrote:
>> On 2017/06/14 08:38AM, Aneesh Kumar K.V wrote:
>>> "Naveen N. Rao"  writes:
 diff --git a/arch/powerpc/kernel/exceptions-64s.S 
 b/arch/powerpc/kernel/exceptions-64s.S
 index ae418b85c17c..17ee701b8336 100644
 --- a/arch/powerpc/kernel/exceptions-64s.S
 +++ b/arch/powerpc/kernel/exceptions-64s.S
 @@ -1411,10 +1411,8 @@ USE_TEXT_SECTION()
.balign IFETCH_ALIGN_BYTES
   do_hash_page:
   #ifdef CONFIG_PPC_STD_MMU_64
 -  andis.  r0,r4,0xa410/* weird error? */
 +  andis.  r0,r4,0xa450/* weird error? */
>>>
>>> Can we convert that to a #define value. Ram did try to do that here.
>>>
>>> https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-June/158607.html
>> 
>> Hmm... I feel it will be good to do that as part of Ram's series since
>> he has already coded it up :)
>> 
>> Ram's patches will anyway require a rebase and the change I do here for
>> detecting DAWR already has a #define, so it should be a simple matter of
>> including DSISR_DABRMATCH in DSISR_PAGE_FAULT_MASK.
>> 
>> But, if you really feel that I should make that change here, please do
>> let me know and I will re-spin with those changes.
>
> The thing is that change from 0xa410 to 0xa450 is not clear at all. And 
> it needs proper documentation. IMHO the best way to do that is switch to 
> #define name for that constant.

Not in this patch. It needs to be backported, so it should be as minimal
as possible.

The change from 0xa410 to 0xa450 does need a mention in the changelog,
I'll add that.

cheers


Re: [PATCH] powerpc64/hw_breakpoints: Handle data breakpoints in radix mode

2017-06-13 Thread Anshuman Khandual
On 06/14/2017 12:12 AM, Naveen N. Rao wrote:
> On P9, trying to use data breakpoints throws the splat shown below (*).
> This is because the check for a data breakpoint in DSISR is in
> do_hash_page(). Move this check to handle_page_fault() so as to catch
> data breakpoints in both hash and radix MMU modes.

Why cant we check for DSISR inside do_hash_page() on P9 ?



Re: [PATCH] kernel/kprobes: Add test to validate pt_regs

2017-06-13 Thread Masami Hiramatsu
On Wed, 14 Jun 2017 11:40:08 +0900
Masami Hiramatsu  wrote:

> On Fri,  9 Jun 2017 00:53:08 +0530
> "Naveen N. Rao"  wrote:
> 
> > Add a test to verify that the registers passed in pt_regs on kprobe
> > (trap), optprobe (jump) and kprobe_on_ftrace (ftrace_caller) are
> > accurate. The tests are exercized if KPROBES_SANITY_TEST is enabled.
> 
> Great!
> 
> > 
> > Implemented for powerpc64. Other architectures will have to implement
> > the relevant arch_* helpers and define HAVE_KPROBES_REGS_SANITY_TEST.
> 
> Hmm, why don't you define that in arch/powerpc/Kconfig ?
> Also, could you split this into 3 patches for each case ?
> 
> > 
> > Signed-off-by: Naveen N. Rao 
> > ---
> >  arch/powerpc/include/asm/kprobes.h  |   4 +
> >  arch/powerpc/lib/Makefile   |   3 +-
> >  arch/powerpc/lib/test_kprobe_regs.S |  62 
> >  arch/powerpc/lib/test_kprobes.c | 115 ++
> >  include/linux/kprobes.h |  11 +++
> >  kernel/test_kprobes.c   | 183 
> > 
> >  6 files changed, 377 insertions(+), 1 deletion(-)
> >  create mode 100644 arch/powerpc/lib/test_kprobe_regs.S
> >  create mode 100644 arch/powerpc/lib/test_kprobes.c
> > 
> > diff --git a/arch/powerpc/include/asm/kprobes.h 
> > b/arch/powerpc/include/asm/kprobes.h
> > index 566da372e02b..10c91d3132a1 100644
> > --- a/arch/powerpc/include/asm/kprobes.h
> > +++ b/arch/powerpc/include/asm/kprobes.h
> > @@ -124,6 +124,10 @@ static inline int skip_singlestep(struct kprobe *p, 
> > struct pt_regs *regs,
> > return 0;
> >  }
> >  #endif
> > +#if defined(CONFIG_KPROBES_SANITY_TEST) && defined(CONFIG_PPC64)
> > +#define HAVE_KPROBES_REGS_SANITY_TEST
> > +void arch_kprobe_regs_set_ptregs(struct pt_regs *regs);
> > +#endif
> >  #else
> >  static inline int kprobe_handler(struct pt_regs *regs) { return 0; }
> >  static inline int kprobe_post_handler(struct pt_regs *regs) { return 0; }
> > diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
> > index 3c3146ba62da..8a0bb8e20179 100644
> > --- a/arch/powerpc/lib/Makefile
> > +++ b/arch/powerpc/lib/Makefile
> > @@ -27,7 +27,8 @@ obj64-y   += copypage_64.o copyuser_64.o mem_64.o 
> > hweight_64.o \
> >  
> >  obj64-$(CONFIG_SMP)+= locks.o
> >  obj64-$(CONFIG_ALTIVEC)+= vmx-helper.o
> > -obj64-$(CONFIG_KPROBES_SANITY_TEST) += test_emulate_step.o
> > +obj64-$(CONFIG_KPROBES_SANITY_TEST) += test_emulate_step.o 
> > test_kprobe_regs.o \
> > +  test_kprobes.o
> >  
> >  obj-y  += checksum_$(BITS).o checksum_wrappers.o
> >  
> > diff --git a/arch/powerpc/lib/test_kprobe_regs.S 
> > b/arch/powerpc/lib/test_kprobe_regs.S
> > new file mode 100644
> > index ..4e95eca6dcd3
> > --- /dev/null
> > +++ b/arch/powerpc/lib/test_kprobe_regs.S
> > @@ -0,0 +1,62 @@
> > +/*
> > + * test_kprobe_regs: architectural helpers for validating pt_regs
> > + *  received on a kprobe.
> > + *
> > + * Copyright 2017 Naveen N. Rao 
> > + *   IBM Corporation
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; version 2
> > + * of the License.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +_GLOBAL(arch_kprobe_regs_function)
> > +   mflrr0
> > +   std r0, LRSAVE(r1)
> > +   stdur1, -SWITCH_FRAME_SIZE(r1)
> > +
> > +   /* Tell pre handler about our pt_regs location */
> > +   addir3, r1, STACK_FRAME_OVERHEAD
> > +   bl  arch_kprobe_regs_set_ptregs
> > +
> > +   /* Load back our true LR */
> > +   ld  r0, (SWITCH_FRAME_SIZE + LRSAVE)(r1)
> > +   mtlrr0
> > +
> > +   /* Save all SPRs that we care about */
> > +   mfctr   r0
> > +   std r0, _CTR(r1)
> > +   mflrr0
> > +   std r0, _LINK(r1)
> > +   mfspr   r0, SPRN_XER
> > +   std r0, _XER(r1)
> > +   mfcrr0
> > +   std r0, _CCR(r1)
> > +
> > +   /* Now, save all GPRs */
> > +   SAVE_2GPRS(0, r1)
> > +   SAVE_10GPRS(2, r1)
> > +   SAVE_10GPRS(12, r1)
> > +   SAVE_10GPRS(22, r1)
> > +
> > +   /* We're now ready to be probed */
> > +.global arch_kprobe_regs_probepoint
> > +arch_kprobe_regs_probepoint:
> > +   nop
> > +
> > +#ifdef CONFIG_KPROBES_ON_FTRACE
> > +   /* Let's also test KPROBES_ON_FTRACE */
> > +   bl  kprobe_regs_kp_on_ftrace_target
> > +   nop
> > +#endif
> > +
> > +   /* All done */
> > +   addir1, r1, SWITCH_FRAME_SIZE
> > +   ld  r0, LRSAVE(r1)
> > +   mtlrr0
> > +   blr
> > diff --git a/arch/powerpc/lib/test_kprobes.c 
> > b/arch/powerpc/lib/test_kprobes.c
> > new file mode 100644
> > index ..23f7a7ffcdd6
> > --- /dev/null
> > +++ b/arch/powerpc/lib/test_kprobes.c
> > @@ -0,0 +1,115 @@
> > +/*
> > + * test_kprobes: architectural helpers for validating pt_regs
> > + *  received on a kprobe.
> > + *
> > + * Copyright 2017 Naveen N. Rao 
> > + *

Re: RESEND Re: [Patch 2/2]: powerpc/hotplug/mm: Fix hot-add memory node assoc

2017-06-13 Thread Balbir Singh
On Wed, Jun 14, 2017 at 8:21 AM, Michael Bringmann 
wrote:

> On a related note, we are discussing the addition of 2 new device-tree
> properties
> with Pete Heyrman and his fellows that should simplify the determination
> of the
> set of required nodes.
>
> * One property would provide the total/max number of nodes needed by the
> kernel
>   on the current hardware.
>

Yes, that would be nice to have


> * A second property would provide the total/max number of nodes that the
> kernel
>   could use on any system to which it could be migrated.
>
>
Not sure about this one, are you suggesting more memory can be added
depending on the migration target?



> These properties aren't available, yet, and it takes time to define new
> properties
> in the PAPR and have them implemented in pHyp and the kernel.  As an
> intermediary
> step, the systems which are doing a lot of dynamic hot-add/hot-remove
> configuration
> could provide equivalent information to the PowerPC kernel with a command
> line
> parameter.  The 'numa.c' code would then read this value and fill in the
> necessary
> entries in the 'node_possible_map'.
>
> Would you foresee any problems with using such a feature?
>


Balbir Singh


Re: RESEND Re: [Patch 2/2]: powerpc/hotplug/mm: Fix hot-add memory node assoc

2017-06-13 Thread Balbir Singh
On Wed, Jun 14, 2017 at 3:25 PM, Balbir Singh  wrote:
>
>
> On Wed, Jun 14, 2017 at 8:21 AM, Michael Bringmann 
> wrote:
>>
>> On a related note, we are discussing the addition of 2 new device-tree
>> properties
>> with Pete Heyrman and his fellows that should simplify the determination
>> of the
>> set of required nodes.
>>
>> * One property would provide the total/max number of nodes needed by the
>> kernel
>>   on the current hardware.
>
>

Yes, that would be nice to have

>
>>
>> * A second property would provide the total/max number of nodes that the
>> kernel
>>   could use on any system to which it could be migrated.
>>
>

Not sure about this one, are you suggesting more memory can be added
depending on the migration target?

>
>
>>
>> These properties aren't available, yet, and it takes time to define new
>> properties
>> in the PAPR and have them implemented in pHyp and the kernel.  As an
>> intermediary
>> step, the systems which are doing a lot of dynamic hot-add/hot-remove
>> configuration
>> could provide equivalent information to the PowerPC kernel with a command
>> line
>> parameter.  The 'numa.c' code would then read this value and fill in the
>> necessary
>> entries in the 'node_possible_map'.
>>
>> Would you foresee any problems with using such a feature?
>
>

Sorry my mailer goofed up, resending

Balbir Singh


Re: [PATCH] powerpc64/hw_breakpoints: Handle data breakpoints in radix mode

2017-06-13 Thread Aneesh Kumar K.V



On Wednesday 14 June 2017 10:41 AM, Naveen N. Rao wrote:

Hi Aneesh,

On 2017/06/14 08:38AM, Aneesh Kumar K.V wrote:

"Naveen N. Rao"  writes:


On P9, trying to use data breakpoints throws the splat shown below (*).
This is because the check for a data breakpoint in DSISR is in
do_hash_page(). Move this check to handle_page_fault() so as to catch
data breakpoints in both hash and radix MMU modes.

While at it, also remove the label '11' that was made redundant by
commit a546498f3bf9aa ("powerpc: Call do_page_fault() with interrupts
off")

(*)
 Unable to handle kernel paging request for data at address 
0xc0e19218
 Faulting instruction address: 0xc01155e8
 cpu 0x0: Vector: 300 (Data Access) at [c000ef1e7b20]
 pc: c01155e8: find_pid_ns+0x48/0xe0
 lr: c0116ac4: find_task_by_vpid+0x44/0x90
 sp: c000ef1e7da0
 msr: 90009033
 dar: c0e19218
 dsisr: 40
 current = 0xc000f1f59700
 paca= 0xcfd4 softe: 0 irq_happened: 0x01
 pid   = 1192, comm = sh
 Linux version 4.12.0-rc3-nnr (root@ea605ec2993c) (gcc version 5.4.0 
20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.1) ) #74 SMP Tue Jun 13 16:52:49 UTC 
2017
 enter ? for help
 [c000ef1e7dc0] c0116ac4 find_task_by_vpid+0x44/0x90
 [c000ef1e7de0] c0108800 SyS_setpgid+0x80/0x220
 [c000ef1e7e30] c000ba6c system_call+0x38/0xfc
 --- Exception: c01 (System Call) at 7fff94480890
 SP (7fffd91e7260) is in userspace

Fixes: caca285e5ab4a ("powerpc/mm/radix: Use STD_MMU_64 to properly
isolate hash related code")
Reported-by: Shriya R. Kulkarni 
Signed-off-by: Naveen N. Rao 
---
  arch/powerpc/kernel/exceptions-64s.S | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index ae418b85c17c..17ee701b8336 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1411,10 +1411,8 @@ USE_TEXT_SECTION()
.balign IFETCH_ALIGN_BYTES
  do_hash_page:
  #ifdef CONFIG_PPC_STD_MMU_64
-   andis.  r0,r4,0xa410/* weird error? */
+   andis.  r0,r4,0xa450/* weird error? */


Can we convert that to a #define value. Ram did try to do that here.

https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-June/158607.html


Hmm... I feel it will be good to do that as part of Ram's series since
he has already coded it up :)

Ram's patches will anyway require a rebase and the change I do here for
detecting DAWR already has a #define, so it should be a simple matter of
including DSISR_DABRMATCH in DSISR_PAGE_FAULT_MASK.

But, if you really feel that I should make that change here, please do
let me know and I will re-spin with those changes.



The thing is that change from 0xa410 to 0xa450 is not clear at all. And 
it needs proper documentation. IMHO the best way to do that is switch to 
#define name for that constant.


-aneesh



Re: [PATCH] powerpc64/hw_breakpoints: Handle data breakpoints in radix mode

2017-06-13 Thread Naveen N. Rao
Hi Aneesh,

On 2017/06/14 08:38AM, Aneesh Kumar K.V wrote:
> "Naveen N. Rao"  writes:
> 
> > On P9, trying to use data breakpoints throws the splat shown below (*).
> > This is because the check for a data breakpoint in DSISR is in
> > do_hash_page(). Move this check to handle_page_fault() so as to catch
> > data breakpoints in both hash and radix MMU modes.
> >
> > While at it, also remove the label '11' that was made redundant by
> > commit a546498f3bf9aa ("powerpc: Call do_page_fault() with interrupts
> > off")
> >
> > (*)
> > Unable to handle kernel paging request for data at address 
> > 0xc0e19218
> > Faulting instruction address: 0xc01155e8
> > cpu 0x0: Vector: 300 (Data Access) at [c000ef1e7b20]
> > pc: c01155e8: find_pid_ns+0x48/0xe0
> > lr: c0116ac4: find_task_by_vpid+0x44/0x90
> > sp: c000ef1e7da0
> > msr: 90009033
> > dar: c0e19218
> > dsisr: 40
> > current = 0xc000f1f59700
> > paca= 0xcfd4 softe: 0 irq_happened: 0x01
> > pid   = 1192, comm = sh
> > Linux version 4.12.0-rc3-nnr (root@ea605ec2993c) (gcc version 5.4.0 
> > 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.1) ) #74 SMP Tue Jun 13 16:52:49 
> > UTC 2017
> > enter ? for help
> > [c000ef1e7dc0] c0116ac4 find_task_by_vpid+0x44/0x90
> > [c000ef1e7de0] c0108800 SyS_setpgid+0x80/0x220
> > [c000ef1e7e30] c000ba6c system_call+0x38/0xfc
> > --- Exception: c01 (System Call) at 7fff94480890
> > SP (7fffd91e7260) is in userspace
> >
> > Fixes: caca285e5ab4a ("powerpc/mm/radix: Use STD_MMU_64 to properly
> > isolate hash related code")
> > Reported-by: Shriya R. Kulkarni 
> > Signed-off-by: Naveen N. Rao 
> > ---
> >  arch/powerpc/kernel/exceptions-64s.S | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> > b/arch/powerpc/kernel/exceptions-64s.S
> > index ae418b85c17c..17ee701b8336 100644
> > --- a/arch/powerpc/kernel/exceptions-64s.S
> > +++ b/arch/powerpc/kernel/exceptions-64s.S
> > @@ -1411,10 +1411,8 @@ USE_TEXT_SECTION()
> > .balign IFETCH_ALIGN_BYTES
> >  do_hash_page:
> >  #ifdef CONFIG_PPC_STD_MMU_64
> > -   andis.  r0,r4,0xa410/* weird error? */
> > +   andis.  r0,r4,0xa450/* weird error? */
> 
> Can we convert that to a #define value. Ram did try to do that here.
> 
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-June/158607.html

Hmm... I feel it will be good to do that as part of Ram's series since 
he has already coded it up :)

Ram's patches will anyway require a rebase and the change I do here for 
detecting DAWR already has a #define, so it should be a simple matter of 
including DSISR_DABRMATCH in DSISR_PAGE_FAULT_MASK.

But, if you really feel that I should make that change here, please do 
let me know and I will re-spin with those changes.


Thanks for the review!
- Naveen



Re: [PATCH] recordmcount.pl: Add ppc64le to list of supported architectures

2017-06-13 Thread Kamalesh Babulal

On Wednesday 14 June 2017 04:22 AM, Balbir Singh wrote:

On Tue, Jun 13, 2017 at 4:49 PM, Kamalesh Babulal <
kamal...@linux.vnet.ibm.com> wrote:


Module make on ppc64le, fails with:

make -C /root/kernel/linux M=/root/.kpatch/tmp/patch
kpatch-data-read-mostly.ko
make[1]: Entering directory '/root/kernel/linux'
  CC [M]  /root/.kpatch/tmp/patch/patch-hook.o
Arch ppc64le is not supported with CONFIG_FTRACE_MCOUNT_RECORD at
./scripts/recordmcount.pl line 379.

Fix it by adding 'ppc64le' to list of supported architectures
in recordmcount.pl script.

Signed-off-by: Kamalesh Babulal 
Cc: Michael Ellerman 
Cc: Balbir Singh 
---
 scripts/recordmcount.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/recordmcount.pl b/scripts/recordmcount.pl
index 1633c3e..683b8b5 100755
--- a/scripts/recordmcount.pl
+++ b/scripts/recordmcount.pl
@@ -264,7 +264,7 @@ if ($arch eq "x86_64") {
 $ld .= " -m shlelf_linux";
 $objcopy .= " -O elf32-sh-linux";

-} elsif ($arch eq "powerpc") {
+} elsif ($arch eq "powerpc" || $arch eq "ppc64le") {



I don't get this, the arch should always be powerpc. Where did you get the
ppc64le
from? Am I missing anything?

Balbir Singh.



Thanks for the review. True, the top level Makefile derives the ARCH 
from SUBARCH where ppc64le is replaced by powerpc. Out of tree module 
build fails, where the ARCH gets overruled to underlying arch type.


--
cheers,
Kamalesh.



Re: [PATCH V3] cxl: Fixes for Coherent Accelerator Interface Architecture 2.0

2017-06-13 Thread Michael Ellerman
Christophe Lombard  writes:

> A previous set of patches "cxl: Add support for Coherent Accelerator
> Interface Architecture 2.0" has introduced a new support for the CAPI
> cards.

Which commit is that?

cheers

> These patches have been tested on Simulation environment and
> quite a bit of them have been tested on real hardware.
>
> This patch brings new fixes after a series of tests carried out on
> new equipment:
> * Add POWER9 definition.
> * Re-enable any masked interrupts when the AFU is not activated after
>   resetting the AFU.
> * Remove the api cxl_is_psl8/9 which is no longer useful.
> * Do not dump CAPI1 registers.
> * Rewrite cxl_is_page_fault() function.
> * Do not register slb callack on P9.
>
> Changelog[v3]
>  - Rebase to latest upstream.
>  - Update the patch's header.
>  - Add new test in cxl_is_page_fault().
>
> Changelog[v2]
>  - Rebase to latest upstream.
>  - Update cxl_is_page_fault() to handle the checkout response status.
>  - Add comments.
>
> Signed-off-by: Christophe Lombard 
> ---
>  drivers/misc/cxl/context.c |  6 +++---
>  drivers/misc/cxl/cxl.h | 18 +-
>  drivers/misc/cxl/fault.c   | 23 +++
>  drivers/misc/cxl/main.c| 17 +
>  drivers/misc/cxl/native.c  | 29 +
>  drivers/misc/cxl/pci.c | 11 ---
>  6 files changed, 57 insertions(+), 47 deletions(-)
>
> diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
> index 4472ce1..8c32040 100644
> --- a/drivers/misc/cxl/context.c
> +++ b/drivers/misc/cxl/context.c
> @@ -45,7 +45,7 @@ int cxl_context_init(struct cxl_context *ctx, struct 
> cxl_afu *afu, bool master)
>   mutex_init(&ctx->mapping_lock);
>   ctx->mapping = NULL;
>  
> - if (cxl_is_psl8(afu)) {
> + if (cxl_is_power8()) {
>   spin_lock_init(&ctx->sste_lock);
>  
>   /*
> @@ -189,7 +189,7 @@ int cxl_context_iomap(struct cxl_context *ctx, struct 
> vm_area_struct *vma)
>   if (start + len > ctx->afu->adapter->ps_size)
>   return -EINVAL;
>  
> - if (cxl_is_psl9(ctx->afu)) {
> + if (cxl_is_power9()) {
>   /*
>* Make sure there is a valid problem state
>* area space for this AFU.
> @@ -324,7 +324,7 @@ static void reclaim_ctx(struct rcu_head *rcu)
>  {
>   struct cxl_context *ctx = container_of(rcu, struct cxl_context, rcu);
>  
> - if (cxl_is_psl8(ctx->afu))
> + if (cxl_is_power8())
>   free_page((u64)ctx->sstp);
>   if (ctx->ff_page)
>   __free_page(ctx->ff_page);
> diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
> index c8568ea..a03f8e7 100644
> --- a/drivers/misc/cxl/cxl.h
> +++ b/drivers/misc/cxl/cxl.h
> @@ -357,6 +357,7 @@ static const cxl_p2n_reg_t CXL_PSL_WED_An = {0x0A0};
>  #define CXL_PSL9_DSISR_An_PF_RGP  0x0090ULL  /* PTE not found 
> (Radix Guest (parent)) 0b1001 */
>  #define CXL_PSL9_DSISR_An_PF_HRH  0x0094ULL  /* PTE not found 
> (HPT/Radix Host)   0b10010100 */
>  #define CXL_PSL9_DSISR_An_PF_STEG 0x009CULL  /* PTE not found 
> (STEG VA)  0b10011100 */
> +#define CXL_PSL9_DSISR_An_URTCH   0x00B4ULL  /* Unsupported 
> Radix Tree Configuration 0b10110100 */
>  
>  /** CXL_PSL_TFC_An 
> **/
>  #define CXL_PSL_TFC_An_A  (1ull << (63-28)) /* Acknowledge non-translation 
> fault */
> @@ -844,24 +845,15 @@ static inline bool cxl_is_power8(void)
>  
>  static inline bool cxl_is_power9(void)
>  {
> - /* intermediate solution */
> - if (!cxl_is_power8() &&
> -(cpu_has_feature(CPU_FTRS_POWER9) ||
> - cpu_has_feature(CPU_FTR_POWER9_DD1)))
> + if (pvr_version_is(PVR_POWER9))
>   return true;
>   return false;
>  }
>  
> -static inline bool cxl_is_psl8(struct cxl_afu *afu)
> +static inline bool cxl_is_power9_dd1(void)
>  {
> - if (afu->adapter->caia_major == 1)
> - return true;
> - return false;
> -}
> -
> -static inline bool cxl_is_psl9(struct cxl_afu *afu)
> -{
> - if (afu->adapter->caia_major == 2)
> + if ((pvr_version_is(PVR_POWER9)) &&
> + cpu_has_feature(CPU_FTR_POWER9_DD1))
>   return true;
>   return false;
>  }
> diff --git a/drivers/misc/cxl/fault.c b/drivers/misc/cxl/fault.c
> index 538..c79e39b 100644
> --- a/drivers/misc/cxl/fault.c
> +++ b/drivers/misc/cxl/fault.c
> @@ -187,7 +187,7 @@ static struct mm_struct *get_mem_context(struct 
> cxl_context *ctx)
>  
>  static bool cxl_is_segment_miss(struct cxl_context *ctx, u64 dsisr)
>  {
> - if ((cxl_is_psl8(ctx->afu)) && (dsisr & CXL_PSL_DSISR_An_DS))
> + if ((cxl_is_power8() && (dsisr & CXL_PSL_DSISR_An_DS)))
>   return true;
>  
>   return false;
> @@ -195,16 +195,23 @@ static bool cxl_is_segment_miss(struct cxl_context 
> *ctx, u64 dsisr)

Re: [PATCH] recordmcount.pl: Add ppc64le to list of supported architectures

2017-06-13 Thread Michael Ellerman
Balbir Singh  writes:
> On Tue, Jun 13, 2017 at 4:49 PM, Kamalesh Babulal <
> kamal...@linux.vnet.ibm.com> wrote:
>> diff --git a/scripts/recordmcount.pl b/scripts/recordmcount.pl
>> index 1633c3e..683b8b5 100755
>> --- a/scripts/recordmcount.pl
>> +++ b/scripts/recordmcount.pl
>> @@ -264,7 +264,7 @@ if ($arch eq "x86_64") {
>>  $ld .= " -m shlelf_linux";
>>  $objcopy .= " -O elf32-sh-linux";
>>
>> -} elsif ($arch eq "powerpc") {
>> +} elsif ($arch eq "powerpc" || $arch eq "ppc64le") {
>>
>
> I don't get this, the arch should always be powerpc.

Right. Something else is fubar for that to happen, we should fix
whatever it is.

cheers


[PATCH] powernv/npu-dma.c: Remove spurious WARN_ON when a PCI device has no of_node

2017-06-13 Thread Alistair Popple
"4c3b89e powerpc/powernv: Add sanity checks to pnv_pci_get_{gpu|npu}_dev"
introduced explicit warnings in pnv_pci_get_npu_dev() when a PCIe device
has no associated device-tree node. However not all PCIe devices have an
of_node and pnv_pci_get_npu_dev() gets indirectly called at least once for
every PCIe device in the system. This results in spurious WARN_ON()'s so
remove it.

The same situation should not exist for pnv_pci_get_gpu_dev() as any NPU
based PCIe device requires a device-tree node.

Signed-off-by: Alistair Popple 
Reported-by: Alexey Kardashevskiy 
---
 arch/powerpc/platforms/powernv/npu-dma.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
b/arch/powerpc/platforms/powernv/npu-dma.c
index 78fa939..e6f444b 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -75,7 +75,8 @@ struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, 
int index)
if (WARN_ON(!gpdev))
return NULL;
 
-   if (WARN_ON(!gpdev->dev.of_node))
+   /* Not all PCI devices have device-tree nodes */
+   if (!gpdev->dev.of_node)
return NULL;
 
/* Get assoicated PCI device */
-- 
2.1.4



Re: [PATCH] powerpc/xive: Fix offset for store EOI MMIOs

2017-06-13 Thread Michael Ellerman
Benjamin Herrenschmidt  writes:

> Architecturally we should apply a 0x400 offset for these. Not doing
> it will break future HW implementations.

Can you elaborate a bit?

You're changing a write to 0x0 to be a write to 0x400, which at face
value appears like it breaks something, or is already broken?

cheers

> diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h
> index c8a822a..c23ff43 100644
> --- a/arch/powerpc/include/asm/xive.h
> +++ b/arch/powerpc/include/asm/xive.h
> @@ -94,11 +94,13 @@ struct xive_q {
>   * store at 0 and some ESBs support doing a trigger via a
>   * separate trigger page.
>   */
> -#define XIVE_ESB_GET 0x800
> -#define XIVE_ESB_SET_PQ_00   0xc00
> -#define XIVE_ESB_SET_PQ_01   0xd00
> -#define XIVE_ESB_SET_PQ_10   0xe00
> -#define XIVE_ESB_SET_PQ_11   0xf00
> +#define XIVE_ESB_STORE_EOI   0x400 /* Store */
> +#define XIVE_ESB_LOAD_EOI0x000 /* Load */
> +#define XIVE_ESB_GET 0x800 /* Load */
> +#define XIVE_ESB_SET_PQ_00   0xc00 /* Load */
> +#define XIVE_ESB_SET_PQ_01   0xd00 /* Load */
> +#define XIVE_ESB_SET_PQ_10   0xe00 /* Load */
> +#define XIVE_ESB_SET_PQ_11   0xf00 /* Load */
>  
>  #define XIVE_ESB_VAL_P   0x2
>  #define XIVE_ESB_VAL_Q   0x1
> diff --git a/arch/powerpc/kvm/book3s_xive_template.c 
> b/arch/powerpc/kvm/book3s_xive_template.c
> index 023a311..4636ca6 100644
> --- a/arch/powerpc/kvm/book3s_xive_template.c
> +++ b/arch/powerpc/kvm/book3s_xive_template.c
> @@ -69,7 +69,7 @@ static void GLUE(X_PFX,source_eoi)(u32 hw_irq, struct 
> xive_irq_data *xd)
>  {
>   /* If the XIVE supports the new "store EOI facility, use it */
>   if (xd->flags & XIVE_IRQ_FLAG_STORE_EOI)
> - __x_writeq(0, __x_eoi_page(xd));
> + __x_writeq(0, __x_eoi_page(xd) + XIVE_ESB_STORE_EOI);
>   else if (hw_irq && xd->flags & XIVE_IRQ_FLAG_EOI_FW) {
>   opal_int_eoi(hw_irq);
>   } else {
> @@ -89,7 +89,7 @@ static void GLUE(X_PFX,source_eoi)(u32 hw_irq, struct 
> xive_irq_data *xd)
>* properly.
>*/
>   if (xd->flags & XIVE_IRQ_FLAG_LSI)
> - __x_readq(__x_eoi_page(xd));
> + __x_readq(__x_eoi_page(xd) + XIVE_ESB_LOAD_EOI);
>   else {
>   eoi_val = GLUE(X_PFX,esb_load)(xd, XIVE_ESB_SET_PQ_00);
>  
> diff --git a/arch/powerpc/sysdev/xive/common.c 
> b/arch/powerpc/sysdev/xive/common.c
> index 9138250..8f5e303 100644
> --- a/arch/powerpc/sysdev/xive/common.c
> +++ b/arch/powerpc/sysdev/xive/common.c
> @@ -297,7 +297,7 @@ void xive_do_source_eoi(u32 hw_irq, struct xive_irq_data 
> *xd)
>  {
>   /* If the XIVE supports the new "store EOI facility, use it */
>   if (xd->flags & XIVE_IRQ_FLAG_STORE_EOI)
> - out_be64(xd->eoi_mmio, 0);
> + out_be64(xd->eoi_mmio + XIVE_ESB_STORE_EOI, 0);
>   else if (hw_irq && xd->flags & XIVE_IRQ_FLAG_EOI_FW) {
>   /*
>* The FW told us to call it. This happens for some



Re: [PATCH V2 2/2] powerpc/powernv : Add support for OPAL-OCC command/response interface

2017-06-13 Thread Cyril Bur
On Tue, 2017-06-13 at 23:26 +0530, Shilpasri G Bhat wrote:
> In P9, OCC (On-Chip-Controller) supports shared memory based
> commad-response interface. Within the shared memory there is an OPAL
> command buffer and OCC response buffer that can be used to send
> inband commands to OCC. This patch adds a platform driver to support
> the command/response interface between OCC and the host.
> 

Hi Shilpasri,

I have another question about printing of errors, see below. Otherwise
looks better.

When you're dropping the locks I wonder why you couldn't just use
atomic_set().

Finally, as you say below, if a user does a read() without having done
a write(), there isn't really any guarantee as to what they get. They
will get the previous response if there had been a write() call, what
about if there had never been a write() call? 
Could this be considered a security problem? Imagine that one program
is using the file descriptor and a malicious program keeps trying to
get the file descriptor and then the program which has the file
descriptor crashes? What information might the malicious program get?

More inline,

Cyril

> Signed-off-by: Shilpasri G Bhat 
> ---
> Changes from V2:
> - Remove spinlock and use atomic_t for setting and clearing flags
> - Fix endian swapping
> - Use pa() and va() before and after opal call for accessing buffer
>   data
> - Replace (u8 *) with __be64 for buffer pointers
> - User reads the previous OCC response if the user does a read()
>   before a write(). Is this wrong?
> - Add WARN_ON check for nr_occs > 254
> 
>  arch/powerpc/include/asm/opal-api.h|  41 +++-
>  arch/powerpc/include/asm/opal.h|   3 +
>  arch/powerpc/platforms/powernv/Makefile|   2 +-
>  arch/powerpc/platforms/powernv/opal-occ.c  | 314 
> +
>  arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
>  arch/powerpc/platforms/powernv/opal.c  |   8 +
>  6 files changed, 367 insertions(+), 2 deletions(-)
>  create mode 100644 arch/powerpc/platforms/powernv/opal-occ.c
> 
> diff --git a/arch/powerpc/include/asm/opal-api.h 
> b/arch/powerpc/include/asm/opal-api.h
> index cb3e624..011d86c 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -42,6 +42,10 @@
>  #define OPAL_I2C_STOP_ERR-24
>  #define OPAL_XIVE_PROVISIONING   -31
>  #define OPAL_XIVE_FREE_ACTIVE-32
> +#define OPAL_OCC_INVALID_STATE   -33
> +#define OPAL_OCC_BUSY-34
> +#define OPAL_OCC_CMD_TIMEOUT -35
> +#define OPAL_OCC_RSP_MISMATCH-36
>  
>  /* API Tokens (in r0) */
>  #define OPAL_INVALID_CALL   -1
> @@ -190,7 +194,8 @@
>  #define OPAL_NPU_INIT_CONTEXT146
>  #define OPAL_NPU_DESTROY_CONTEXT 147
>  #define OPAL_NPU_MAP_LPAR148
> -#define OPAL_LAST148
> +#define OPAL_OCC_COMMAND 149
> +#define OPAL_LAST149
>  
>  /* Device tree flags */
>  
> @@ -829,6 +834,40 @@ struct opal_prd_msg_header {
>  
>  struct opal_prd_msg;
>  
> +enum occ_cmd {
> + OCC_CMD_AMESTER_PASS_THRU = 0,
> + OCC_CMD_CLEAR_SENSOR_DATA,
> + OCC_CMD_SET_POWER_CAP,
> + OCC_CMD_SET_POWER_SHIFTING_RATIO,
> + OCC_CMD_SELECT_SENSOR_GROUPS,
> + OCC_CMD_LAST
> +};
> +
> +struct opal_occ_cmd_rsp_msg {
> + __be64 cdata;
> + __be64 rdata;
> + __be16 cdata_size;
> + __be16 rdata_size;
> + u8 cmd;
> + u8 request_id;
> + u8 status;
> +};
> +
> +struct opal_occ_cmd_data {
> + __be16 size;
> + u8 cmd;
> + u8 data[];
> +};
> +
> +struct opal_occ_rsp_data {
> + __be16 size;
> + u8 status;
> + u8 data[];
> +};
> +
> +#define MAX_OPAL_CMD_DATA_LENGTH4090
> +#define MAX_OCC_RSP_DATA_LENGTH 8698
> +
>  #define OCC_RESET   0
>  #define OCC_LOAD1
>  #define OCC_THROTTLE2
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 03ed493..e55ed79 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -346,6 +346,9 @@ static inline int opal_get_async_rc(struct opal_msg msg)
>  
>  void opal_wake_poller(void);
>  
> +int64_t opal_occ_command(int chip_id, struct opal_occ_cmd_rsp_msg *msg,
> +  bool retry);
> +
>  #endif /* __ASSEMBLY__ */
>  
>  #endif /* _ASM_POWERPC_OPAL_H */
> diff --git a/arch/powerpc/platforms/powernv/Makefile 
> b/arch/powerpc/platforms/powernv/Makefile
> index b5d98cb..f5f0902 100644
> --- a/arch/powerpc/platforms/powernv/Makefile
> +++ b/arch/powerpc/platforms/powernv/Makefile
> @@ -2,7 +2,7 @@ obj-y += setup.o opal-wrappers.o opal.o 
> opal-async.o idle.o
>  obj-y+= opal-rtc.o opal-nvram.o opal-lpc.o 
> opal-flash.o
>  obj-y+= rng.o opal-elog.o opal-dump.o 
> opal-sysparam.o opal-sensor.o
>  o

Re: [PATCH 0/8] Support for 24x7 hcall interface version 2

2017-06-13 Thread Sukadev Bhattiprolu
Thiago Jung Bauermann [bauer...@linux.vnet.ibm.com] wrote:
> Hello,
> 
> The hypervisor interface to access 24x7 performance counters (which collect
> performance information from system power on to system power off) has been
> extended in POWER9 adding new fields to the request and result element
> structures.
> 
> Also, results for some domains now return more than one result element and
> those need to be added to get a total count.
> 
> The first two patches fix bugs in the existing code. The following 4
> patches are code improvements and the last two finally implement support
> for the changes in POWER9 described above.
> 
> POWER8 systems only support version 1 of the interface, while POWER9
> systems only support version 2. I tested these patches on POWER8 to verify
> that there are no regressions, and also on POWER9 DD1.
> 
> Thiago Jung Bauermann (8):
>   powerpc/perf/hv-24x7: Fix passing of catalog version number
>   powerpc/perf/hv-24x7: Fix off-by-one error in request_buffer check
>   powerpc/perf/hv-24x7: Properly iterate through results
>   powerpc-perf/hx-24x7: Don't log failed hcall twice
>   powerpc/perf/hv-24x7: Fix return value of hcalls
>   powerpc/perf/hv-24x7: Minor improvements
>   powerpc/perf/hv-24x7: Support v2 of the hypervisor API
>   powerpc/perf/hv-24x7: Aggregate result elements on POWER9 SMT8
> 
>  arch/powerpc/perf/hv-24x7.c| 255 
> +
>  arch/powerpc/perf/hv-24x7.h|  70 +++--
>  arch/powerpc/platforms/pseries/Kconfig |   2 +-
>  3 files changed, 255 insertions(+), 72 deletions(-)

Reviewed-by: Sukadev Bhattiprolu 
> 
> -- 
> 2.7.4



[PATCH v2 3/3] powerpc/powernv/pci: Add support for PHB4 diagnostics

2017-06-13 Thread Russell Currey
As with P7IOC and PHB3, add kernel-side support for decoding and printing
diagnostic data for PHB4.

Signed-off-by: Russell Currey 
---
No changes from v1

 arch/powerpc/include/asm/opal-api.h  |  75 -
 arch/powerpc/platforms/powernv/pci.c | 105 +++
 2 files changed, 178 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index cb3e6242a78c..0b543f0f54f5 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -667,12 +667,14 @@ enum {
 
 enum {
OPAL_PHB_ERROR_DATA_TYPE_P7IOC = 1,
-   OPAL_PHB_ERROR_DATA_TYPE_PHB3 = 2
+   OPAL_PHB_ERROR_DATA_TYPE_PHB3 = 2,
+   OPAL_PHB_ERROR_DATA_TYPE_PHB4 = 3
 };
 
 enum {
OPAL_P7IOC_NUM_PEST_REGS = 128,
-   OPAL_PHB3_NUM_PEST_REGS = 256
+   OPAL_PHB3_NUM_PEST_REGS = 256,
+   OPAL_PHB4_NUM_PEST_REGS = 512
 };
 
 struct OpalIoPhbErrorCommon {
@@ -802,6 +804,75 @@ struct OpalIoPhb3ErrorData {
__be64 pestB[OPAL_PHB3_NUM_PEST_REGS];
 };
 
+struct OpalIoPhb4ErrorData {
+   struct OpalIoPhbErrorCommon common;
+
+   __be32 brdgCtl;
+
+   /* PHB4 cfg regs */
+   __be32 deviceStatus;
+   __be32 slotStatus;
+   __be32 linkStatus;
+   __be32 devCmdStatus;
+   __be32 devSecStatus;
+
+   /* cfg AER regs */
+   __be32 rootErrorStatus;
+   __be32 uncorrErrorStatus;
+   __be32 corrErrorStatus;
+   __be32 tlpHdr1;
+   __be32 tlpHdr2;
+   __be32 tlpHdr3;
+   __be32 tlpHdr4;
+   __be32 sourceId;
+
+   /* PHB4 ETU Error Regs */
+   __be64 nFir;/* 000 */
+   __be64 nFirMask;/* 003 */
+   __be64 nFirWOF; /* 008 */
+   __be64 phbPlssr;/* 120 */
+   __be64 phbCsr;  /* 110 */
+   __be64 lemFir;  /* C00 */
+   __be64 lemErrorMask;/* C18 */
+   __be64 lemWOF;  /* C40 */
+   __be64 phbErrorStatus;  /* C80 */
+   __be64 phbFirstErrorStatus; /* C88 */
+   __be64 phbErrorLog0;/* CC0 */
+   __be64 phbErrorLog1;/* CC8 */
+   __be64 phbTxeErrorStatus;   /* D00 */
+   __be64 phbTxeFirstErrorStatus;  /* D08 */
+   __be64 phbTxeErrorLog0; /* D40 */
+   __be64 phbTxeErrorLog1; /* D48 */
+   __be64 phbRxeArbErrorStatus;/* D80 */
+   __be64 phbRxeArbFirstErrorStatus;   /* D88 */
+   __be64 phbRxeArbErrorLog0;  /* DC0 */
+   __be64 phbRxeArbErrorLog1;  /* DC8 */
+   __be64 phbRxeMrgErrorStatus;/* E00 */
+   __be64 phbRxeMrgFirstErrorStatus;   /* E08 */
+   __be64 phbRxeMrgErrorLog0;  /* E40 */
+   __be64 phbRxeMrgErrorLog1;  /* E48 */
+   __be64 phbRxeTceErrorStatus;/* E80 */
+   __be64 phbRxeTceFirstErrorStatus;   /* E88 */
+   __be64 phbRxeTceErrorLog0;  /* EC0 */
+   __be64 phbRxeTceErrorLog1;  /* EC8 */
+
+   /* PHB4 REGB Error Regs */
+   __be64 phbPblErrorStatus;   /* 1900 */
+   __be64 phbPblFirstErrorStatus;  /* 1908 */
+   __be64 phbPblErrorLog0; /* 1940 */
+   __be64 phbPblErrorLog1; /* 1948 */
+   __be64 phbPcieDlpErrorLog1; /* 1AA0 */
+   __be64 phbPcieDlpErrorLog2; /* 1AA8 */
+   __be64 phbPcieDlpErrorStatus;   /* 1AB0 */
+   __be64 phbRegbErrorStatus;  /* 1C00 */
+   __be64 phbRegbFirstErrorStatus; /* 1C08 */
+   __be64 phbRegbErrorLog0;/* 1C40 */
+   __be64 phbRegbErrorLog1;/* 1C48 */
+
+   __be64 pestA[OPAL_PHB4_NUM_PEST_REGS];
+   __be64 pestB[OPAL_PHB4_NUM_PEST_REGS];
+};
+
 enum {
OPAL_REINIT_CPUS_HILE_BE= (1 << 0),
OPAL_REINIT_CPUS_HILE_LE= (1 << 1),
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index 209ad47a3383..7905d179d036 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -426,6 +426,108 @@ static void pnv_pci_dump_phb3_diag_data(struct 
pci_controller *hose,
pnv_pci_dump_pest(data->pestA, data->pestB, OPAL_PHB3_NUM_PEST_REGS);
 }
 
+static void pnv_pci_dump_phb4_diag_data(struct pci_controller *hose,
+   struct OpalIoPhbErrorCommon *common)
+{
+   struct OpalIoPhb4ErrorData *data;
+
+   data = (struct OpalIoPhb4ErrorData*)common;
+   pr_info("PHB4 PHB#%d Diag-data (Version: %d)\n",
+   hose->global_number, be32_to_cpu(common->version));
+   if (data->brdgCtl)
+   pr_info("brdgCtl:%08x

[PATCH v2 1/3] powerpc/powernv/pci: Reduce spam when dumping PEST

2017-06-13 Thread Russell Currey
Dumping the PE State Tables (PEST) can be highly verbose if a number of PEs
are affected, especially in the case where the whole PHB is frozen and 512
lines get printed.  Check for duplicates when dumping the PEST to reduce
useless output.

For example:

PE[0f8] A/B: 9726 8080d0f8
PE[0f9] A/B: 8000 
PE[..0fe] A/B: as above
PE[0ff] A/B: 8440002b 

instead of:

PE[0f8] A/B: 9726 8080d0f8
PE[0f9] A/B: 8000 
PE[0fa] A/B: 8000 
PE[0fb] A/B: 8000 
PE[0fc] A/B: 8000 
PE[0fd] A/B: 8000 
PE[0fe] A/B: 8000 
PE[0ff] A/B: 8440002b 

and you can imagine how much worse it can get for 512 PEs.

Signed-off-by: Russell Currey 
---
v2: Made a constant instead of ">> 63" grossness thanks to mpe & ajd
---
 arch/powerpc/platforms/powernv/pci.c | 51 ++--
 arch/powerpc/platforms/powernv/pci.h |  3 +++
 2 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index 935ccb249a8a..40071ad0bc42 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -227,11 +227,39 @@ void pnv_teardown_msi_irqs(struct pci_dev *pdev)
 }
 #endif /* CONFIG_PCI_MSI */
 
+/* Nicely print the contents of the PE State Tables (PEST). */
+static void pnv_pci_dump_pest(__be64 pestA[], __be64 pestB[], int pest_size)
+{
+   __be64 prevA = ULONG_MAX, prevB = ULONG_MAX;
+   bool dup = false;
+   int i;
+
+   for (i = 0; i < pest_size; i++) {
+   __be64 peA = be64_to_cpu(pestA[i]);
+   __be64 peB = be64_to_cpu(pestB[i]);
+
+   if (peA != prevA || peB != prevB) {
+   if (dup) {
+   pr_info("PE[..%03x] A/B: as above\n", i-1);
+   dup = false;
+   }
+   prevA = peA;
+   prevB = peB;
+   if (peA & PNV_IODA_STOPPED_STATE ||
+   peB & PNV_IODA_STOPPED_STATE)
+   pr_info("PE[%03x] A/B: %016llx %016llx\n",
+   i, peA, peB);
+   } else if (!dup && (peA & PNV_IODA_STOPPED_STATE ||
+   peB & PNV_IODA_STOPPED_STATE)) {
+   dup = true;
+   }
+   }
+}
+
 static void pnv_pci_dump_p7ioc_diag_data(struct pci_controller *hose,
 struct OpalIoPhbErrorCommon *common)
 {
struct OpalIoP7IOCPhbErrorData *data;
-   int i;
 
data = (struct OpalIoP7IOCPhbErrorData *)common;
pr_info("P7IOC PHB#%x Diag-data (Version: %d)\n",
@@ -308,22 +336,13 @@ static void pnv_pci_dump_p7ioc_diag_data(struct 
pci_controller *hose,
be64_to_cpu(data->dma1ErrorLog0),
be64_to_cpu(data->dma1ErrorLog1));
 
-   for (i = 0; i < OPAL_P7IOC_NUM_PEST_REGS; i++) {
-   if ((be64_to_cpu(data->pestA[i]) >> 63) == 0 &&
-   (be64_to_cpu(data->pestB[i]) >> 63) == 0)
-   continue;
-
-   pr_info("PE[%3d] A/B: %016llx %016llx\n",
-   i, be64_to_cpu(data->pestA[i]),
-   be64_to_cpu(data->pestB[i]));
-   }
+   pnv_pci_dump_pest(data->pestA, data->pestB, OPAL_P7IOC_NUM_PEST_REGS);
 }
 
 static void pnv_pci_dump_phb3_diag_data(struct pci_controller *hose,
struct OpalIoPhbErrorCommon *common)
 {
struct OpalIoPhb3ErrorData *data;
-   int i;
 
data = (struct OpalIoPhb3ErrorData*)common;
pr_info("PHB3 PHB#%x Diag-data (Version: %d)\n",
@@ -404,15 +423,7 @@ static void pnv_pci_dump_phb3_diag_data(struct 
pci_controller *hose,
be64_to_cpu(data->dma1ErrorLog0),
be64_to_cpu(data->dma1ErrorLog1));
 
-   for (i = 0; i < OPAL_PHB3_NUM_PEST_REGS; i++) {
-   if ((be64_to_cpu(data->pestA[i]) >> 63) == 0 &&
-   (be64_to_cpu(data->pestB[i]) >> 63) == 0)
-   continue;
-
-   pr_info("PE[%3d] A/B: %016llx %016llx\n",
-   i, be64_to_cpu(data->pestA[i]),
-   be64_to_cpu(data->pestB[i]));
-   }
+   pnv_pci_dump_pest(data->pestA, data->pestB, OPAL_PHB3_NUM_PEST_REGS);
 }
 
 void pnv_pci_dump_phb_diag_data(struct pci_controller *hose,
diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index 18c8a2fa03b8..6abc77dd9261 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci

[PATCH v2 2/3] powerpc/powernv/pci: Dynamically allocate PHB diag data

2017-06-13 Thread Russell Currey
Diagnostic data for PHBs currently works by allocated a fixed-sized buffer.
This is simple, but either wastes memory (though only a few kilobytes) or
in the case of PHB4 isn't enough to fit the whole data blob.

For machines that don't describe the diagnostic data size in the device
tree, use the hardcoded buffer size as before.  For those that do, only
allocate exactly what's needed.

In the special case of P7IOC (which has two types of diag data), the larger
should be specified in the device tree.

Signed-off-by: Russell Currey 
---
No changes from v1

 arch/powerpc/platforms/powernv/eeh-powernv.c | 16 +++-
 arch/powerpc/platforms/powernv/pci-ioda.c| 15 ---
 arch/powerpc/platforms/powernv/pci.c |  6 +++---
 arch/powerpc/platforms/powernv/pci.h | 10 +++---
 4 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index d12ea7b9fd47..3f48f6df1cf3 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -48,6 +48,7 @@ static int pnv_eeh_init(void)
 {
struct pci_controller *hose;
struct pnv_phb *phb;
+   int max_diag_size = PNV_PCI_DIAG_BUF_SIZE;
 
if (!firmware_has_feature(FW_FEATURE_OPAL)) {
pr_warn("%s: OPAL is required !\n",
@@ -69,6 +70,9 @@ static int pnv_eeh_init(void)
if (phb->model == PNV_PHB_MODEL_P7IOC)
eeh_add_flag(EEH_ENABLE_IO_FOR_LOG);
 
+   if (phb->diag_data_size > max_diag_size)
+   max_diag_size = phb->diag_data_size;
+
/*
 * PE#0 should be regarded as valid by EEH core
 * if it's not the reserved one. Currently, we
@@ -82,6 +86,8 @@ static int pnv_eeh_init(void)
break;
}
 
+   eeh_set_pe_aux_size(max_diag_size);
+
return 0;
 }
 
@@ -540,7 +546,7 @@ static void pnv_eeh_get_phb_diag(struct eeh_pe *pe)
s64 rc;
 
rc = opal_pci_get_phb_diag_data2(phb->opal_id, pe->data,
-PNV_PCI_DIAG_BUF_SIZE);
+phb->diag_data_size);
if (rc != OPAL_SUCCESS)
pr_warn("%s: Failure %lld getting PHB#%x diag-data\n",
__func__, rc, pe->phb->global_number);
@@ -1314,7 +1320,8 @@ static void pnv_eeh_dump_hub_diag_common(struct 
OpalIoP7IOCErrorData *data)
 static void pnv_eeh_get_and_dump_hub_diag(struct pci_controller *hose)
 {
struct pnv_phb *phb = hose->private_data;
-   struct OpalIoP7IOCErrorData *data = &phb->diag.hub_diag;
+   struct OpalIoP7IOCErrorData *data =
+   (struct OpalIoP7IOCErrorData*)phb->diag_data;
long rc;
 
rc = opal_pci_get_hub_diag_data(phb->hub_id, data, sizeof(*data));
@@ -1549,10 +1556,10 @@ static int pnv_eeh_next_error(struct eeh_pe **pe)
 
/* Dump PHB diag-data */
rc = opal_pci_get_phb_diag_data2(phb->opal_id,
-   phb->diag.blob, PNV_PCI_DIAG_BUF_SIZE);
+   phb->diag_data, phb->diag_data_size);
if (rc == OPAL_SUCCESS)
pnv_pci_dump_phb_diag_data(hose,
-   phb->diag.blob);
+   phb->diag_data);
 
/* Try best to clear it */
opal_pci_eeh_freeze_clear(phb->opal_id,
@@ -1795,7 +1802,6 @@ static int __init eeh_powernv_init(void)
 {
int ret = -EINVAL;
 
-   eeh_set_pe_aux_size(PNV_PCI_DIAG_BUF_SIZE);
ret = eeh_ops_register(&pnv_eeh_ops);
if (!ret)
pr_info("EEH: PowerNV platform initialized\n");
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 283caf1070c9..96d0156f48db 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3123,13 +3123,13 @@ static int pnv_pci_diag_data_set(void *data, u64 val)
phb = hose->private_data;
 
/* Retrieve the diag data from firmware */
-   ret = opal_pci_get_phb_diag_data2(phb->opal_id, phb->diag.blob,
- PNV_PCI_DIAG_BUF_SIZE);
+   ret = opal_pci_get_phb_diag_data2(phb->opal_id, phb->diag_data,
+ phb->diag_data_size);
if (ret != OPAL_SUCCESS)
return -EIO;
 
/* Print the diag data to the kernel log */
-   pnv_pci_dump_phb_diag_data(phb->hose, phb->diag.blob);
+   pnv_pci_dump_phb_diag_data(phb->hose, phb->diag_data);
return 0;
 }
 
@@ -3725,6 +3725,15 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
  

Re: [PATCH 13/14] powerpc/64: runlatch CTRL[RUN] set optimisation

2017-06-13 Thread Michael Ellerman
Benjamin Herrenschmidt  writes:

> On Tue, 2017-06-13 at 20:04 +1000, Michael Ellerman wrote:
>> > Good idea.  Writing to CTRL register can change only the RUN field.
>> > Was this any different in older generations?
>> 
>> No AFAICS back to 2.02.
>> 
>> > Anton and Ben kept the mfspr/mtspr part in earlier updates to this
>> > routine.
>> 
>> Doing the read/modify write is forward compatible vs a new writable
>> field, whereas writing the whole register with a known value is not.
>
> At this stage I wouldn't worry too much about it. What we can do is
> write a pre-cooked value (from reading it earlier once at boot) if we
> are paranoid or just do what Nick does and put the onus on future
> designs that might want to re-use it for other things to add a mode
> bits to configure the new feature in.

Fine by me.

cheers


Re: [PATCH] powerpc64/hw_breakpoints: Handle data breakpoints in radix mode

2017-06-13 Thread Aneesh Kumar K.V
"Naveen N. Rao"  writes:

> On P9, trying to use data breakpoints throws the splat shown below (*).
> This is because the check for a data breakpoint in DSISR is in
> do_hash_page(). Move this check to handle_page_fault() so as to catch
> data breakpoints in both hash and radix MMU modes.
>
> While at it, also remove the label '11' that was made redundant by
> commit a546498f3bf9aa ("powerpc: Call do_page_fault() with interrupts
> off")
>
> (*)
> Unable to handle kernel paging request for data at address 
> 0xc0e19218
> Faulting instruction address: 0xc01155e8
> cpu 0x0: Vector: 300 (Data Access) at [c000ef1e7b20]
> pc: c01155e8: find_pid_ns+0x48/0xe0
> lr: c0116ac4: find_task_by_vpid+0x44/0x90
> sp: c000ef1e7da0
> msr: 90009033
> dar: c0e19218
> dsisr: 40
> current = 0xc000f1f59700
> paca= 0xcfd4 softe: 0 irq_happened: 0x01
> pid   = 1192, comm = sh
> Linux version 4.12.0-rc3-nnr (root@ea605ec2993c) (gcc version 5.4.0 
> 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.1) ) #74 SMP Tue Jun 13 16:52:49 
> UTC 2017
> enter ? for help
> [c000ef1e7dc0] c0116ac4 find_task_by_vpid+0x44/0x90
> [c000ef1e7de0] c0108800 SyS_setpgid+0x80/0x220
> [c000ef1e7e30] c000ba6c system_call+0x38/0xfc
> --- Exception: c01 (System Call) at 7fff94480890
> SP (7fffd91e7260) is in userspace
>
> Fixes: caca285e5ab4a ("powerpc/mm/radix: Use STD_MMU_64 to properly
> isolate hash related code")
> Reported-by: Shriya R. Kulkarni 
> Signed-off-by: Naveen N. Rao 
> ---
>  arch/powerpc/kernel/exceptions-64s.S | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index ae418b85c17c..17ee701b8336 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1411,10 +1411,8 @@ USE_TEXT_SECTION()
>   .balign IFETCH_ALIGN_BYTES
>  do_hash_page:
>  #ifdef CONFIG_PPC_STD_MMU_64
> - andis.  r0,r4,0xa410/* weird error? */
> + andis.  r0,r4,0xa450/* weird error? */

Can we convert that to a #define value. Ram did try to do that here.

https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-June/158607.html


>   bne-handle_page_fault   /* if not, try to insert a HPTE */
> - andis.  r0,r4,DSISR_DABRMATCH@h
> - bne-handle_dabr_fault
>   CURRENT_THREAD_INFO(r11, r1)
>   lwz r0,TI_PREEMPT(r11)  /* If we're in an "NMI" */
>   andis.  r0,r0,NMI_MASK@h/* (i.e. an irq when soft-disabled) */
> @@ -1442,7 +1440,9 @@ do_hash_page:
>  
>  /* Here we have a page fault that hash_page can't handle. */
>  handle_page_fault:
> -11:  ld  r4,_DAR(r1)
> + andis.  r0,r4,DSISR_DABRMATCH@h
> + bne-handle_dabr_fault
> + ld  r4,_DAR(r1)
>   ld  r5,_DSISR(r1)
>   addir3,r1,STACK_FRAME_OVERHEAD
>   bl  do_page_fault
> -- 
> 2.12.2


-aneesh



[PATCH kernel] powerpc/debug: Add missing warn flag to WARN_ON's non-builtin path

2017-06-13 Thread Alexey Kardashevskiy
When trapped on WARN_ON(), report_bug() is expected to return
BUG_TRAP_TYPE_WARN so the caller could increment NIP by 4 and continue.
The __builtin_constant_p() path of the PPC's WARN_ON() calls (indirectly)
__WARN_FLAGS() which has BUGFLAG_WARNING set, however the other branch
does not which makes report_bug() report a bug rather than a warning.

Fixes: 19d436268dde95389 ("debug: Add _ONCE() logic to report_bug()")
Signed-off-by: Alexey Kardashevskiy 
---

Actually 19d436268dde95389 replaced __WARN_TAINT() with __WARN_FLAGS()
and lost BUGFLAG_TAINT() and this is not in the commit log so it is
unclear:
1) why
2) whether this particular patch should be doing
   BUGFLAG_WARNING|BUGFLAG_TAINT(TAINT_WARN)
 or
   BUGFLAG_WARNING|(flags)

Any ideas? Thanks.


---
 arch/powerpc/include/asm/bug.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index f2c562a0a427..0151af6c2a50 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -104,7 +104,7 @@
"1: "PPC_TLNEI" %4,0\n" \
_EMIT_BUG_ENTRY \
: : "i" (__FILE__), "i" (__LINE__), \
- "i" (BUGFLAG_TAINT(TAINT_WARN)),  \
+ "i" (BUGFLAG_WARNING|BUGFLAG_TAINT(TAINT_WARN)),\
  "i" (sizeof(struct bug_entry)),   \
  "r" (__ret_warn_on)); \
}   \
-- 
2.11.0



[PATCH] of: update ePAPR references to point to Devicetree Specification

2017-06-13 Thread frowand . list
From: Frank Rowand 

The Devicetree Specification has superseded the ePAPR as the
base specification for bindings.  Update files in Documentation
to reference the new document.

Some files are not updated because there is no hypervisor chapter
in the Devicetree Specification:
   Documentation/devicetree/bindings/powerpc/fsl/msi-pic.txt
   Documenation/virtual/kvm/api.txt
   Documenation/virtual/kvm/ppc-pv.txt

Signed-off-by: Frank Rowand 
---
 Documentation/devicetree/bindings/arm/cci.txt   | 12 ++--
 Documentation/devicetree/bindings/arm/cpus.txt  | 13 +++--
 Documentation/devicetree/bindings/arm/idle-states.txt   |  4 ++--
 Documentation/devicetree/bindings/arm/l2c2x0.txt|  4 ++--
 Documentation/devicetree/bindings/arm/topology.txt  |  4 ++--
 Documentation/devicetree/bindings/bus/simple-pm-bus.txt |  2 +-
 Documentation/devicetree/bindings/chosen.txt|  3 ++-
 Documentation/devicetree/bindings/common-properties.txt |  2 +-
 Documentation/devicetree/bindings/crypto/fsl-sec4.txt   |  4 ++--
 Documentation/devicetree/bindings/crypto/fsl-sec6.txt   |  4 ++--
 .../devicetree/bindings/interrupt-controller/open-pic.txt   |  5 ++---
 Documentation/devicetree/bindings/net/ethernet.txt  |  9 ++---
 Documentation/devicetree/bindings/powerpc/fsl/cpus.txt  |  6 +++---
 Documentation/devicetree/bindings/powerpc/fsl/l2cache.txt   |  2 +-
 Documentation/devicetree/bindings/powerpc/fsl/srio-rmu.txt  |  4 ++--
 Documentation/devicetree/bindings/powerpc/fsl/srio.txt  |  3 ++-
 Documentation/devicetree/booting-without-of.txt |  2 +-
 Documentation/devicetree/usage-model.txt|  2 +-
 Documentation/xtensa/mmu.txt|  6 +++---
 19 files changed, 48 insertions(+), 43 deletions(-)

diff --git a/Documentation/devicetree/bindings/arm/cci.txt 
b/Documentation/devicetree/bindings/arm/cci.txt
index 0f2153e8fa7e..cc7621b204f4 100644
--- a/Documentation/devicetree/bindings/arm/cci.txt
+++ b/Documentation/devicetree/bindings/arm/cci.txt
@@ -11,9 +11,9 @@ clusters, through memory mapped interface, with a global 
control register
 space and multiple sets of interface control registers, one per slave
 interface.
 
-Bindings for the CCI node follow the ePAPR standard, available from:
+Bindings for the CCI node follow the Devicetree Specification, available from:
 
-www.power.org/documentation/epapr-version-1-1/
+https://www.devicetree.org/specifications/
 
 with the addition of the bindings described in this document which are
 specific to ARM.
@@ -50,10 +50,10 @@ specific to ARM.
as a tuple of cells, containing child address,
parent address and the size of the region in the
child address space.
-   Definition: A standard property. Follow rules in the ePAPR for
-   hierarchical bus addressing. CCI interfaces
-   addresses refer to the parent node addressing
-   scheme to declare their register bases.
+   Definition: A standard property. Follow rules in the Devicetree
+   Specification for hierarchical bus addressing. CCI
+   interfaces addresses refer to the parent node
+   addressing scheme to declare their register bases.
 
CCI interconnect node can define the following child nodes:
 
diff --git a/Documentation/devicetree/bindings/arm/cpus.txt 
b/Documentation/devicetree/bindings/arm/cpus.txt
index 1030f5f50207..283c520a2224 100644
--- a/Documentation/devicetree/bindings/arm/cpus.txt
+++ b/Documentation/devicetree/bindings/arm/cpus.txt
@@ -6,9 +6,9 @@ The device tree allows to describe the layout of CPUs in a 
system through
 the "cpus" node, which in turn contains a number of subnodes (ie "cpu")
 defining properties for every cpu.
 
-Bindings for CPU nodes follow the ePAPR v1.1 standard, available from:
+Bindings for CPU nodes follow the Devicetree Specification, available from:
 
-https://www.power.org/documentation/epapr-version-1-1/
+https://www.devicetree.org/specifications/
 
 with updates for 32-bit and 64-bit ARM systems provided in this document.
 
@@ -16,8 +16,8 @@ with updates for 32-bit and 64-bit ARM systems provided in 
this document.
 Convention used in this document
 
 
-This document follows the conventions described in the ePAPR v1.1, with
-the addition:
+This document follows the conventions described in the Devicetree
+Specification, with the addition:
 
 - square brackets define bitfields, eg reg[7:0] value of the bitfield in
   the reg property contained in bits 7 down to 0
@@ -26,8 +26,9 @@ the addition:
 cpus and cpu node bindings definition
 =
 
-The ARM architecture, in accordance with the ePAPR, requires the cpus and

Re: [PATCH] kernel/kprobes: Add test to validate pt_regs

2017-06-13 Thread Masami Hiramatsu
On Fri,  9 Jun 2017 00:53:08 +0530
"Naveen N. Rao"  wrote:

> Add a test to verify that the registers passed in pt_regs on kprobe
> (trap), optprobe (jump) and kprobe_on_ftrace (ftrace_caller) are
> accurate. The tests are exercized if KPROBES_SANITY_TEST is enabled.

Great!

> 
> Implemented for powerpc64. Other architectures will have to implement
> the relevant arch_* helpers and define HAVE_KPROBES_REGS_SANITY_TEST.

Hmm, why don't you define that in arch/powerpc/Kconfig ?
Also, could you split this into 3 patches for each case ?

> 
> Signed-off-by: Naveen N. Rao 
> ---
>  arch/powerpc/include/asm/kprobes.h  |   4 +
>  arch/powerpc/lib/Makefile   |   3 +-
>  arch/powerpc/lib/test_kprobe_regs.S |  62 
>  arch/powerpc/lib/test_kprobes.c | 115 ++
>  include/linux/kprobes.h |  11 +++
>  kernel/test_kprobes.c   | 183 
> 
>  6 files changed, 377 insertions(+), 1 deletion(-)
>  create mode 100644 arch/powerpc/lib/test_kprobe_regs.S
>  create mode 100644 arch/powerpc/lib/test_kprobes.c
> 
> diff --git a/arch/powerpc/include/asm/kprobes.h 
> b/arch/powerpc/include/asm/kprobes.h
> index 566da372e02b..10c91d3132a1 100644
> --- a/arch/powerpc/include/asm/kprobes.h
> +++ b/arch/powerpc/include/asm/kprobes.h
> @@ -124,6 +124,10 @@ static inline int skip_singlestep(struct kprobe *p, 
> struct pt_regs *regs,
>   return 0;
>  }
>  #endif
> +#if defined(CONFIG_KPROBES_SANITY_TEST) && defined(CONFIG_PPC64)
> +#define HAVE_KPROBES_REGS_SANITY_TEST
> +void arch_kprobe_regs_set_ptregs(struct pt_regs *regs);
> +#endif
>  #else
>  static inline int kprobe_handler(struct pt_regs *regs) { return 0; }
>  static inline int kprobe_post_handler(struct pt_regs *regs) { return 0; }
> diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
> index 3c3146ba62da..8a0bb8e20179 100644
> --- a/arch/powerpc/lib/Makefile
> +++ b/arch/powerpc/lib/Makefile
> @@ -27,7 +27,8 @@ obj64-y += copypage_64.o copyuser_64.o mem_64.o 
> hweight_64.o \
>  
>  obj64-$(CONFIG_SMP)  += locks.o
>  obj64-$(CONFIG_ALTIVEC)  += vmx-helper.o
> -obj64-$(CONFIG_KPROBES_SANITY_TEST) += test_emulate_step.o
> +obj64-$(CONFIG_KPROBES_SANITY_TEST) += test_emulate_step.o 
> test_kprobe_regs.o \
> +test_kprobes.o
>  
>  obj-y+= checksum_$(BITS).o checksum_wrappers.o
>  
> diff --git a/arch/powerpc/lib/test_kprobe_regs.S 
> b/arch/powerpc/lib/test_kprobe_regs.S
> new file mode 100644
> index ..4e95eca6dcd3
> --- /dev/null
> +++ b/arch/powerpc/lib/test_kprobe_regs.S
> @@ -0,0 +1,62 @@
> +/*
> + * test_kprobe_regs: architectural helpers for validating pt_regs
> + *received on a kprobe.
> + *
> + * Copyright 2017 Naveen N. Rao 
> + * IBM Corporation
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; version 2
> + * of the License.
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +_GLOBAL(arch_kprobe_regs_function)
> + mflrr0
> + std r0, LRSAVE(r1)
> + stdur1, -SWITCH_FRAME_SIZE(r1)
> +
> + /* Tell pre handler about our pt_regs location */
> + addir3, r1, STACK_FRAME_OVERHEAD
> + bl  arch_kprobe_regs_set_ptregs
> +
> + /* Load back our true LR */
> + ld  r0, (SWITCH_FRAME_SIZE + LRSAVE)(r1)
> + mtlrr0
> +
> + /* Save all SPRs that we care about */
> + mfctr   r0
> + std r0, _CTR(r1)
> + mflrr0
> + std r0, _LINK(r1)
> + mfspr   r0, SPRN_XER
> + std r0, _XER(r1)
> + mfcrr0
> + std r0, _CCR(r1)
> +
> + /* Now, save all GPRs */
> + SAVE_2GPRS(0, r1)
> + SAVE_10GPRS(2, r1)
> + SAVE_10GPRS(12, r1)
> + SAVE_10GPRS(22, r1)
> +
> + /* We're now ready to be probed */
> +.global arch_kprobe_regs_probepoint
> +arch_kprobe_regs_probepoint:
> + nop
> +
> +#ifdef CONFIG_KPROBES_ON_FTRACE
> + /* Let's also test KPROBES_ON_FTRACE */
> + bl  kprobe_regs_kp_on_ftrace_target
> + nop
> +#endif
> +
> + /* All done */
> + addir1, r1, SWITCH_FRAME_SIZE
> + ld  r0, LRSAVE(r1)
> + mtlrr0
> + blr
> diff --git a/arch/powerpc/lib/test_kprobes.c b/arch/powerpc/lib/test_kprobes.c
> new file mode 100644
> index ..23f7a7ffcdd6
> --- /dev/null
> +++ b/arch/powerpc/lib/test_kprobes.c
> @@ -0,0 +1,115 @@
> +/*
> + * test_kprobes: architectural helpers for validating pt_regs
> + *received on a kprobe.
> + *
> + * Copyright 2017 Naveen N. Rao 
> + * IBM Corporation
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; version 2
> + * of the License.
> + */
> +
> +#define pr_fmt(

[PATCH] powerpc/xive: Fix offset for store EOI MMIOs

2017-06-13 Thread Benjamin Herrenschmidt
Architecturally we should apply a 0x400 offset for these. Not doing
it will break future HW implementations.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/include/asm/xive.h | 12 +++-
 arch/powerpc/kvm/book3s_xive_template.c |  4 ++--
 arch/powerpc/sysdev/xive/common.c   |  2 +-
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h
index c8a822a..c23ff43 100644
--- a/arch/powerpc/include/asm/xive.h
+++ b/arch/powerpc/include/asm/xive.h
@@ -94,11 +94,13 @@ struct xive_q {
  * store at 0 and some ESBs support doing a trigger via a
  * separate trigger page.
  */
-#define XIVE_ESB_GET   0x800
-#define XIVE_ESB_SET_PQ_00 0xc00
-#define XIVE_ESB_SET_PQ_01 0xd00
-#define XIVE_ESB_SET_PQ_10 0xe00
-#define XIVE_ESB_SET_PQ_11 0xf00
+#define XIVE_ESB_STORE_EOI 0x400 /* Store */
+#define XIVE_ESB_LOAD_EOI  0x000 /* Load */
+#define XIVE_ESB_GET   0x800 /* Load */
+#define XIVE_ESB_SET_PQ_00 0xc00 /* Load */
+#define XIVE_ESB_SET_PQ_01 0xd00 /* Load */
+#define XIVE_ESB_SET_PQ_10 0xe00 /* Load */
+#define XIVE_ESB_SET_PQ_11 0xf00 /* Load */
 
 #define XIVE_ESB_VAL_P 0x2
 #define XIVE_ESB_VAL_Q 0x1
diff --git a/arch/powerpc/kvm/book3s_xive_template.c 
b/arch/powerpc/kvm/book3s_xive_template.c
index 023a311..4636ca6 100644
--- a/arch/powerpc/kvm/book3s_xive_template.c
+++ b/arch/powerpc/kvm/book3s_xive_template.c
@@ -69,7 +69,7 @@ static void GLUE(X_PFX,source_eoi)(u32 hw_irq, struct 
xive_irq_data *xd)
 {
/* If the XIVE supports the new "store EOI facility, use it */
if (xd->flags & XIVE_IRQ_FLAG_STORE_EOI)
-   __x_writeq(0, __x_eoi_page(xd));
+   __x_writeq(0, __x_eoi_page(xd) + XIVE_ESB_STORE_EOI);
else if (hw_irq && xd->flags & XIVE_IRQ_FLAG_EOI_FW) {
opal_int_eoi(hw_irq);
} else {
@@ -89,7 +89,7 @@ static void GLUE(X_PFX,source_eoi)(u32 hw_irq, struct 
xive_irq_data *xd)
 * properly.
 */
if (xd->flags & XIVE_IRQ_FLAG_LSI)
-   __x_readq(__x_eoi_page(xd));
+   __x_readq(__x_eoi_page(xd) + XIVE_ESB_LOAD_EOI);
else {
eoi_val = GLUE(X_PFX,esb_load)(xd, XIVE_ESB_SET_PQ_00);
 
diff --git a/arch/powerpc/sysdev/xive/common.c 
b/arch/powerpc/sysdev/xive/common.c
index 9138250..8f5e303 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -297,7 +297,7 @@ void xive_do_source_eoi(u32 hw_irq, struct xive_irq_data 
*xd)
 {
/* If the XIVE supports the new "store EOI facility, use it */
if (xd->flags & XIVE_IRQ_FLAG_STORE_EOI)
-   out_be64(xd->eoi_mmio, 0);
+   out_be64(xd->eoi_mmio + XIVE_ESB_STORE_EOI, 0);
else if (hw_irq && xd->flags & XIVE_IRQ_FLAG_EOI_FW) {
/*
 * The FW told us to call it. This happens for some



Re: [PATCH 7/8] powerpc/perf/hv-24x7: Support v2 of the hypervisor API

2017-06-13 Thread Sukadev Bhattiprolu
Thiago Jung Bauermann [bauer...@linux.vnet.ibm.com] wrote:
> POWER9 introduces a new version of the hypervisor API to access the 24x7
> perf counters. The new version changed some of the structures used for
> requests and results.
> 
> Signed-off-by: Thiago Jung Bauermann 
> ---
>  arch/powerpc/perf/hv-24x7.c| 145 
> +++--
>  arch/powerpc/perf/hv-24x7.h|  59 --
>  arch/powerpc/platforms/pseries/Kconfig |   2 +-
>  3 files changed, 173 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
> index 043cbc78be98..95c44f1d2fd2 100644
> --- a/arch/powerpc/perf/hv-24x7.c
> +++ b/arch/powerpc/perf/hv-24x7.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
> 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -27,6 +28,9 @@
>  #include "hv-24x7-catalog.h"
>  #include "hv-common.h"
> 
> +/* Version of the 24x7 hypervisor API that we should use in this machine. */
> +static int interface_version;
> +
>  static bool domain_is_valid(unsigned domain)
>  {
>   switch (domain) {
> @@ -74,7 +78,11 @@ static const char *domain_name(unsigned domain)
> 
>  static bool catalog_entry_domain_is_valid(unsigned domain)
>  {
> - return is_physical_domain(domain);
> + /* POWER8 doesn't support virtual domains. */
> + if (interface_version == 1)
> + return is_physical_domain(domain);
> + else
> + return domain_is_valid(domain);
>  }
> 
>  /*
> @@ -166,9 +174,12 @@ DEFINE_PER_CPU(struct hv_24x7_hw, hv_24x7_hw);
>  DEFINE_PER_CPU(char, hv_24x7_reqb[H24x7_DATA_BUFFER_SIZE]) __aligned(4096);
>  DEFINE_PER_CPU(char, hv_24x7_resb[H24x7_DATA_BUFFER_SIZE]) __aligned(4096);
> 
> -#define MAX_NUM_REQUESTS ((H24x7_DATA_BUFFER_SIZE - \
> +#define MAX_NUM_REQUESTS_V1  ((H24x7_DATA_BUFFER_SIZE - \
> + sizeof(struct hv_24x7_request_buffer)) \
> + / H24x7_REQUEST_SIZE_V1)
> +#define MAX_NUM_REQUESTS_V2  ((H24x7_DATA_BUFFER_SIZE - \
>   sizeof(struct hv_24x7_request_buffer)) \
> - / sizeof(struct hv_24x7_request))
> + / H24x7_REQUEST_SIZE_V2)

Nit: Can we define MAX_NUM_REQUESTS(version) - with a version parameter ? It
will...
> 
>  static char *event_name(struct hv_24x7_event_data *ev, int *len)
>  {
> @@ -1052,7 +1063,7 @@ static void init_24x7_request(struct 
> hv_24x7_request_buffer *request_buffer,
>   memset(request_buffer, 0, H24x7_DATA_BUFFER_SIZE);
>   memset(result_buffer, 0, H24x7_DATA_BUFFER_SIZE);
> 
> - request_buffer->interface_version = HV_24X7_IF_VERSION_CURRENT;
> + request_buffer->interface_version = interface_version;
>   /* memset above set request_buffer->num_requests to 0 */
>  }
> 
> @@ -1077,7 +1088,7 @@ static int make_24x7_request(struct 
> hv_24x7_request_buffer *request_buffer,
>   if (ret) {
>   struct hv_24x7_request *req;
> 
> - req = &request_buffer->requests[0];
> + req = request_buffer->requests;
>   pr_notice_ratelimited("hcall failed: [%d %#x %#x %d] => ret 
> 0x%lx (%ld) detail=0x%x failing ix=%x\n",
> req->performance_domain, req->data_offset,
> req->starting_ix, req->starting_lpar_ix,
> @@ -1101,9 +1112,13 @@ static int add_event_to_24x7_request(struct perf_event 
> *event,
>  {
>   u16 idx;
>   int i;
> + size_t req_size;
>   struct hv_24x7_request *req;
> 
> - if (request_buffer->num_requests >= MAX_NUM_REQUESTS) {
> + if ((request_buffer->interface_version == 1
> +  && request_buffer->num_requests >= MAX_NUM_REQUESTS_V1)
> + || (request_buffer->interface_version > 1
> + && request_buffer->num_requests >= MAX_NUM_REQUESTS_V2)) {
>   pr_devel("Too many requests for 24x7 HCALL %d\n",

...simplify this check to

if (request->buffer->num_requests >= MAX_NUM_REQUESTS(version))

>   request_buffer->num_requests);
>   return -EINVAL;
> @@ -1120,8 +1135,11 @@ static int add_event_to_24x7_request(struct perf_event 
> *event,
>   idx = event_get_vcpu(event);
>   }
> 
> + req_size = request_buffer->interface_version == 1 ?
> +H24x7_REQUEST_SIZE_V1 : H24x7_REQUEST_SIZE_V2;
> +

Maybe similarly, with H24x7_REQUEST_SIZE(version) ?

>   i = request_buffer->num_requests++;
> - req = &request_buffer->requests[i];
> + req = (void *) request_buffer->requests + i * req_size;
> 
>   req->performance_domain = event_get_domain(event);
>   req->data_size = cpu_to_be16(8);
> @@ -1131,14 +1149,97 @@ static int add_event_to_24x7_request(struct 
> perf_event *event,
>   req->starting_ix = cpu_to_be16(idx);
>   req-

Re: [PATCH] recordmcount.pl: Add ppc64le to list of supported architectures

2017-06-13 Thread Balbir Singh
On Tue, Jun 13, 2017 at 4:49 PM, Kamalesh Babulal <
kamal...@linux.vnet.ibm.com> wrote:

> Module make on ppc64le, fails with:
>
> make -C /root/kernel/linux M=/root/.kpatch/tmp/patch
> kpatch-data-read-mostly.ko
> make[1]: Entering directory '/root/kernel/linux'
>   CC [M]  /root/.kpatch/tmp/patch/patch-hook.o
> Arch ppc64le is not supported with CONFIG_FTRACE_MCOUNT_RECORD at
> ./scripts/recordmcount.pl line 379.
>
> Fix it by adding 'ppc64le' to list of supported architectures
> in recordmcount.pl script.
>
> Signed-off-by: Kamalesh Babulal 
> Cc: Michael Ellerman 
> Cc: Balbir Singh 
> ---
>  scripts/recordmcount.pl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/scripts/recordmcount.pl b/scripts/recordmcount.pl
> index 1633c3e..683b8b5 100755
> --- a/scripts/recordmcount.pl
> +++ b/scripts/recordmcount.pl
> @@ -264,7 +264,7 @@ if ($arch eq "x86_64") {
>  $ld .= " -m shlelf_linux";
>  $objcopy .= " -O elf32-sh-linux";
>
> -} elsif ($arch eq "powerpc") {
> +} elsif ($arch eq "powerpc" || $arch eq "ppc64le") {
>

I don't get this, the arch should always be powerpc. Where did you get the
ppc64le
from? Am I missing anything?

Balbir Singh.


Re: RESEND Re: [Patch 2/2]: powerpc/hotplug/mm: Fix hot-add memory node assoc

2017-06-13 Thread Michael Bringmann
On a related note, we are discussing the addition of 2 new device-tree 
properties
with Pete Heyrman and his fellows that should simplify the determination of the
set of required nodes.

* One property would provide the total/max number of nodes needed by the kernel
  on the current hardware.
* A second property would provide the total/max number of nodes that the kernel
  could use on any system to which it could be migrated.

These properties aren't available, yet, and it takes time to define new 
properties
in the PAPR and have them implemented in pHyp and the kernel.  As an 
intermediary
step, the systems which are doing a lot of dynamic hot-add/hot-remove 
configuration
could provide equivalent information to the PowerPC kernel with a command line
parameter.  The 'numa.c' code would then read this value and fill in the 
necessary
entries in the 'node_possible_map'.

Would you foresee any problems with using such a feature?

Thanks.

On 06/13/2017 05:45 AM, Michael Ellerman wrote:
> Michael Bringmann  writes:
> 
>> Here is the information from 2 different kernels.  I have not been able to 
>> retrieve
>> the information matching yesterday's attachments, yet, as those dumps were
>> acquired in April.
>>  
>> Attached please find 2 dumps of similar material from kernels running with my
>> current patches (Linux 4.4, Linux 4.12).
> 
> OK thanks.
> 
> I'd actually like to see the dmesg output from a kernel *without* your
> patches.
> 
> Looking at the device tree properties:
> 
> ltcalpine2-lp9:/proc/device-tree/ibm,dynamic-reconfiguration-memory # lsprop 
> ibm,associativity-lookup-arrays
> ibm,associativity-lookup-arrays
>0004 = 4 arrays
>  0004 = of 4 entries each
>     
>  0001 0001
>   0003 0006 0006
>   0003 0007 0007
> 
> 
> Which does tell us that nodes 0, 1, 6 and 7 exist.
> 
> So your idea of looking at that and setting any node found in there
> online should work.
> 
> My only worry is that behaviour appears to be completely undocumented in
> PAPR, ie. PAPR explicitly says that property only needs to contain
> values for LMBs present at boot.
> 
> But possibly we can talk to the PowerVM/PAPR guys and have that changed
> so that it becomes something we can rely on.
> 
> cheers
> 
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:   (512) 466-0650
m...@linux.vnet.ibm.com



Re: [RFC PATCH 1/7 v1]powerpc: Free up four PTE bits to accommodate memory keys

2017-06-13 Thread Ram Pai
On Tue, Jun 13, 2017 at 10:22:43AM +0530, Aneesh Kumar K.V wrote:
> Ram Pai  writes:
> 
> > Rearrange  PTE   bits to  free  up  bits 3, 4, 5  and  6  for
> > memory keys. Bit 3, 4, 5, 6 and 57  shall  be used for memory
> > keys.
> >
> > The patch does the following change to the 64K PTE format
> >
> > H_PAGE_BUSY moves from bit 3 to bit 7
> > H_PAGE_F_SECOND which occupied bit 4 moves to the second part
> > of the pte.
> > H_PAGE_F_GIX which  occupied bit 5, 6 and 7 also moves to the
> > second part of the pte.
> >
> > The second part of the PTE will hold
> >a (H_PAGE_F_SECOND|H_PAGE_F_GIX)  for  64K page backed pte,
> >and sixteen (H_PAGE_F_SECOND|H_PAGE_F_GIX)  for 4k  backed
> > pte.
> >
> > the four  bits((H_PAGE_F_SECOND|H_PAGE_F_GIX) that represent a slot
> > is initialized to 0xF indicating a invalid slot. if a hashpage does
> > get  allocated  to  the  0xF  slot, it is released and not used. In
> > other words, even  though  0xF  is  a valid slot we discard it  and
> > consider it as invalid slot(HPTE_SOFT_INVALID). This  gives  us  an
> > opportunity to  not  depend on a bit in the primary PTE in order to
> > determine the validity of a slot.
> >
> > When  we  release  a  0xF slot we also release a legitimate primary
> > slot  and  unmap  that  entry. This  is  to  ensure  that we do get
> > a legimate non-0xF slot the next time we retry for a slot.
> >
> > Though treating 0xF slot as invalid reduces the number of available
> > slots and make have a effect on the performance, the probabilty
> > of hitting a 0xF is extermely low.
> >
> > Compared  to the current scheme, the above described scheme reduces
> > the number of false hash table updates  significantly  and  has the
> > added  advantage  of  releasing  four  valuable  PTE bits for other
> > purpose.
> >
> > This idea was jointly developed by Paul Mackerras, Aneesh, Michael
> > Ellermen and myself.
> >
> > 4K PTE format remain unchanged currently.
> >
> 
> Can you also split this patch into two. One which changes
> __hash_page_4k() ie, linux pte format w.r.t 4k hash pte. Second patch
> with changes w.r.t __hash_page_64k() ie, pte format w.r.t 64k hash pte.

ok. A v2 version of the patch series will be out in a day or two.
RP



Re: [RFC PATCH 1/7 v1]powerpc: Free up four PTE bits to accommodate memory keys

2017-06-13 Thread Ram Pai
On Tue, Jun 13, 2017 at 07:32:24AM +0530, Aneesh Kumar K.V wrote:
> Ram Pai  writes:
> 
> > On Mon, Jun 12, 2017 at 12:27:44PM +0530, Aneesh Kumar K.V wrote:
> >> Ram Pai  writes:
> >> 
> >> > Rearrange  PTE   bits to  free  up  bits 3, 4, 5  and  6  for
> >> > memory keys. Bit 3, 4, 5, 6 and 57  shall  be used for memory
> >> > keys.
> >> >
> >> > The patch does the following change to the 64K PTE format
> >> >
> >> > H_PAGE_BUSY moves from bit 3 to bit 7
> >> > H_PAGE_F_SECOND which occupied bit 4 moves to the second part
> >> >  of the pte.
> >> > H_PAGE_F_GIX which  occupied bit 5, 6 and 7 also moves to the
> >> >  second part of the pte.
> >> >
> >> > The second part of the PTE will hold
> >> >a (H_PAGE_F_SECOND|H_PAGE_F_GIX)  for  64K page backed pte,
> >> >and sixteen (H_PAGE_F_SECOND|H_PAGE_F_GIX)  for 4k  backed
> >> >  pte.
> >> >
> >> > the four  bits((H_PAGE_F_SECOND|H_PAGE_F_GIX) that represent a slot
> >> > is initialized to 0xF indicating a invalid slot. if a hashpage does
> >> > get  allocated  to  the  0xF  slot, it is released and not used. In
> >> > other words, even  though  0xF  is  a valid slot we discard it  and
> >> > consider it as invalid slot(HPTE_SOFT_INVALID). This  gives  us  an
> >> > opportunity to  not  depend on a bit in the primary PTE in order to
> >> > determine the validity of a slot.
> >> 
> >> Do we need to do this for 64K hptes ? H_PAGE_HASHPTE indicates whether a
> >> slot is valid or not. For 4K hptes, we do need this right ? ie, only
> >> when H_PAGE_COMBO is set we need to consider 0xf as an invalid slot
> >
> > for 64k hptes; you are right, we do not use 0xF to
> > track the validity of a slot. We just depend on H_PAGE_HASHPTE flag.
> >
> > for 4k hptes, we need to depend on both H_PAGE_HASHPTE as well as the
> > value of the slot.  H_PAGE_HASHPTE tells if there exists any valid
> > 4k HPTEs, and the 4-bit values in the second-part-of-the-pte tells
> > us if they are valid values.
> >
> > However in either case we do not need H_PAGE_COMBO. That flag is not
> > used for ptes.  But we continue to use that flag for pmd to track
> > hugepages, which is why I have not entirely divorced H_PAGE_COMBO from
> > the 64K pagesize case.
> 
> 
> Really ? May be i am missing that in the patch. H_PAGE_COMBO indicate
> whether a 64K linux page is mapped via 4k hash page table enries. I
> don't see you changing that in __hash_page_4k()
> 
> we still continue to do.
> 
>   new_pte = old_pte | H_PAGE_BUSY | _PAGE_ACCESSED | H_PAGE_COMBO;
> 
>   /*
>* Check if the pte was already inserted into the hash table
>* as a 64k HW page, and invalidate the 64k HPTE if so.
>*/
>   if (!(old_pte & H_PAGE_COMBO)) {
>   flush_hash_page(vpn, rpte, MMU_PAGE_64K, ssize, flags);
> 

ok. my memory blanked.  In this patch We continue to depend on COMBO
flag to distinguish between pte's backed by 4k hpte and 64k hpte. So
H_PAGE_COMBO flag cannot go for now in this patch.  Which means
_PAGE_HPTEFLAGS must continue to have the H_PAGE_COMBO flag too.

I had a patch to get rid of the COMBO bit too, which was dropped.  Will
regenerate a separate patch to rid the COMBO bit in __hash_page_4k().
That will divorse the COMBO bit entirely from 64K pte.

> 
> 
> >
> >> 
> >> 
> >> >
> >> > When  we  release  a  0xF slot we also release a legitimate primary
> >> > slot  and  unmap  that  entry. This  is  to  ensure  that we do get
> >> > a legimate non-0xF slot the next time we retry for a slot.
> >> 
> >> Can you explain this more ? what is a primary slot here ?
> >
> > I may not be using the right terminology here. But when I say slot, i
> > mean the four bits that tell the position of the hpte in the hash
> > buckets. Bit 0, indicates if it the primary or secondary hash bucket.
> > and bit 1,2,3 indicates the entry in the hash bucket.
> >
> > So when i say primary slot, I mean a entry in the primary hash bucket.
> >
> > The idea is, when hpte_insert returns a hpte which is cached in the 7slot
> > of the secondary bucket i.e 0xF, we discard it, and also release a
> > random entry from the primary bucket, so that on retry we can get that
> > entry.
> >
> >
> >> 
> >> >
> >> > Though treating 0xF slot as invalid reduces the number of available
> >> > slots and make have a effect on the performance, the probabilty
> >> > of hitting a 0xF is extermely low.
> >> >
> >> > Compared  to the current scheme, the above described scheme reduces
> >> > the number of false hash table updates  significantly  and  has the
> >> > added  advantage  of  releasing  four  valuable  PTE bits for other
> >> > purpose.
> >> >
> >> > This idea was jointly developed by Paul Mackerras, Aneesh, Michael
> >> > Ellermen and myself.
> >> >
> >> > 4K PTE format remain unchanged currently.
> >> >
> >> > Signed-off-by: Ram Pai 
> >> > ---
> >> >  arch/powerpc/include/asm/book3s/64/hash-4k.h  | 12 +
> >> >  arch/powerpc/include/asm/book3s/64/hash-64k.h | 38 ++
> >> >

Re: WARNING: CPU: 2 PID: 7 at kernel/workqueue.c:2041 process_one_work

2017-06-13 Thread Tejun Heo
On Tue, May 30, 2017 at 01:24:06PM +0530, Abdul Haleem wrote:
> Hi,
> 
> Test : stress-ng
> Machine : Power 8 Bare Metal
> Kernel : 4.12.0-rc3
> Config : attached
> gcc version: 4.8.5
> 
> 
> In file kernel/workqueue.c at line 2041
> 
> /* ensure we're on the correct CPU */
> WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
>  raw_smp_processor_id() != pool->cpu);
> 
> 
> WARN_ON_ONCE() is being triggered on Linus mainline kernel (4.12.0-rc3)
> when running stress-ng test.
> 
> 
> request_module: runaway loop modprobe net-pf-10
> request_module: runaway loop modprobe net-pf-10
> hrtimer: interrupt took 500359 ns
> RT Watchdog Timeout (hard): stress-ng-rlimi[80236]

Paul and Steven reported the same problem a while back.  Debugging on
the following thread.

 http://lkml.kernel.org/r/<20170501165747.ga...@linux.vnet.ibm.com>

Thanks.

-- 
tejun


[PATCH] powerpc64/hw_breakpoints: Handle data breakpoints in radix mode

2017-06-13 Thread Naveen N. Rao
On P9, trying to use data breakpoints throws the splat shown below (*).
This is because the check for a data breakpoint in DSISR is in
do_hash_page(). Move this check to handle_page_fault() so as to catch
data breakpoints in both hash and radix MMU modes.

While at it, also remove the label '11' that was made redundant by
commit a546498f3bf9aa ("powerpc: Call do_page_fault() with interrupts
off")

(*)
Unable to handle kernel paging request for data at address 
0xc0e19218
Faulting instruction address: 0xc01155e8
cpu 0x0: Vector: 300 (Data Access) at [c000ef1e7b20]
pc: c01155e8: find_pid_ns+0x48/0xe0
lr: c0116ac4: find_task_by_vpid+0x44/0x90
sp: c000ef1e7da0
msr: 90009033
dar: c0e19218
dsisr: 40
current = 0xc000f1f59700
paca= 0xcfd4 softe: 0 irq_happened: 0x01
pid   = 1192, comm = sh
Linux version 4.12.0-rc3-nnr (root@ea605ec2993c) (gcc version 5.4.0 
20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.1) ) #74 SMP Tue Jun 13 16:52:49 UTC 
2017
enter ? for help
[c000ef1e7dc0] c0116ac4 find_task_by_vpid+0x44/0x90
[c000ef1e7de0] c0108800 SyS_setpgid+0x80/0x220
[c000ef1e7e30] c000ba6c system_call+0x38/0xfc
--- Exception: c01 (System Call) at 7fff94480890
SP (7fffd91e7260) is in userspace

Fixes: caca285e5ab4a ("powerpc/mm/radix: Use STD_MMU_64 to properly
isolate hash related code")
Reported-by: Shriya R. Kulkarni 
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/exceptions-64s.S | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index ae418b85c17c..17ee701b8336 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1411,10 +1411,8 @@ USE_TEXT_SECTION()
.balign IFETCH_ALIGN_BYTES
 do_hash_page:
 #ifdef CONFIG_PPC_STD_MMU_64
-   andis.  r0,r4,0xa410/* weird error? */
+   andis.  r0,r4,0xa450/* weird error? */
bne-handle_page_fault   /* if not, try to insert a HPTE */
-   andis.  r0,r4,DSISR_DABRMATCH@h
-   bne-handle_dabr_fault
CURRENT_THREAD_INFO(r11, r1)
lwz r0,TI_PREEMPT(r11)  /* If we're in an "NMI" */
andis.  r0,r0,NMI_MASK@h/* (i.e. an irq when soft-disabled) */
@@ -1442,7 +1440,9 @@ do_hash_page:
 
 /* Here we have a page fault that hash_page can't handle. */
 handle_page_fault:
-11:ld  r4,_DAR(r1)
+   andis.  r0,r4,DSISR_DABRMATCH@h
+   bne-handle_dabr_fault
+   ld  r4,_DAR(r1)
ld  r5,_DSISR(r1)
addir3,r1,STACK_FRAME_OVERHEAD
bl  do_page_fault
-- 
2.12.2



[PATCH V2 2/2] powerpc/powernv : Add support for OPAL-OCC command/response interface

2017-06-13 Thread Shilpasri G Bhat
In P9, OCC (On-Chip-Controller) supports shared memory based
commad-response interface. Within the shared memory there is an OPAL
command buffer and OCC response buffer that can be used to send
inband commands to OCC. This patch adds a platform driver to support
the command/response interface between OCC and the host.

Signed-off-by: Shilpasri G Bhat 
---
Changes from V2:
- Remove spinlock and use atomic_t for setting and clearing flags
- Fix endian swapping
- Use pa() and va() before and after opal call for accessing buffer
  data
- Replace (u8 *) with __be64 for buffer pointers
- User reads the previous OCC response if the user does a read()
  before a write(). Is this wrong?
- Add WARN_ON check for nr_occs > 254

 arch/powerpc/include/asm/opal-api.h|  41 +++-
 arch/powerpc/include/asm/opal.h|   3 +
 arch/powerpc/platforms/powernv/Makefile|   2 +-
 arch/powerpc/platforms/powernv/opal-occ.c  | 314 +
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 arch/powerpc/platforms/powernv/opal.c  |   8 +
 6 files changed, 367 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/opal-occ.c

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index cb3e624..011d86c 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -42,6 +42,10 @@
 #define OPAL_I2C_STOP_ERR  -24
 #define OPAL_XIVE_PROVISIONING -31
 #define OPAL_XIVE_FREE_ACTIVE  -32
+#define OPAL_OCC_INVALID_STATE -33
+#define OPAL_OCC_BUSY  -34
+#define OPAL_OCC_CMD_TIMEOUT   -35
+#define OPAL_OCC_RSP_MISMATCH  -36
 
 /* API Tokens (in r0) */
 #define OPAL_INVALID_CALL -1
@@ -190,7 +194,8 @@
 #define OPAL_NPU_INIT_CONTEXT  146
 #define OPAL_NPU_DESTROY_CONTEXT   147
 #define OPAL_NPU_MAP_LPAR  148
-#define OPAL_LAST  148
+#define OPAL_OCC_COMMAND   149
+#define OPAL_LAST  149
 
 /* Device tree flags */
 
@@ -829,6 +834,40 @@ struct opal_prd_msg_header {
 
 struct opal_prd_msg;
 
+enum occ_cmd {
+   OCC_CMD_AMESTER_PASS_THRU = 0,
+   OCC_CMD_CLEAR_SENSOR_DATA,
+   OCC_CMD_SET_POWER_CAP,
+   OCC_CMD_SET_POWER_SHIFTING_RATIO,
+   OCC_CMD_SELECT_SENSOR_GROUPS,
+   OCC_CMD_LAST
+};
+
+struct opal_occ_cmd_rsp_msg {
+   __be64 cdata;
+   __be64 rdata;
+   __be16 cdata_size;
+   __be16 rdata_size;
+   u8 cmd;
+   u8 request_id;
+   u8 status;
+};
+
+struct opal_occ_cmd_data {
+   __be16 size;
+   u8 cmd;
+   u8 data[];
+};
+
+struct opal_occ_rsp_data {
+   __be16 size;
+   u8 status;
+   u8 data[];
+};
+
+#define MAX_OPAL_CMD_DATA_LENGTH4090
+#define MAX_OCC_RSP_DATA_LENGTH 8698
+
 #define OCC_RESET   0
 #define OCC_LOAD1
 #define OCC_THROTTLE2
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 03ed493..e55ed79 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -346,6 +346,9 @@ static inline int opal_get_async_rc(struct opal_msg msg)
 
 void opal_wake_poller(void);
 
+int64_t opal_occ_command(int chip_id, struct opal_occ_cmd_rsp_msg *msg,
+bool retry);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_OPAL_H */
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index b5d98cb..f5f0902 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -2,7 +2,7 @@ obj-y   += setup.o opal-wrappers.o opal.o 
opal-async.o idle.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
 obj-y  += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
-obj-y  += opal-kmsg.o
+obj-y  += opal-kmsg.o opal-occ.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
 obj-$(CONFIG_PCI)  += pci.o pci-ioda.o npu-dma.o
diff --git a/arch/powerpc/platforms/powernv/opal-occ.c 
b/arch/powerpc/platforms/powernv/opal-occ.c
new file mode 100644
index 000..912cdc4
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-occ.c
@@ -0,0 +1,314 @@
+/*
+ * Copyright IBM Corporation 2017
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#define pr_fmt(fmt) "opal-occ:

[PATCH V2 1/2] powerpc/powernv: Get a unique token for async completions

2017-06-13 Thread Shilpasri G Bhat
This patch adds support to get a unique token for async completion
requests. This will be used for creating non-repititive request
handles for consecutive requests in OPAL-OCC command/response
interface.

Signed-off-by: Shilpasri G Bhat 
---
No changes from V1

 arch/powerpc/include/asm/opal.h |  1 +
 arch/powerpc/platforms/powernv/opal-async.c | 46 +
 2 files changed, 47 insertions(+)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 588fb1c..03ed493 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -293,6 +293,7 @@ extern int opal_message_notifier_unregister(enum 
opal_msg_type msg_type,
 
 extern int __opal_async_get_token(void);
 extern int opal_async_get_token_interruptible(void);
+extern int opal_async_get_unique_token_interruptible(int last_token);
 extern int __opal_async_release_token(int token);
 extern int opal_async_release_token(int token);
 extern int opal_async_wait_response(uint64_t token, struct opal_msg *msg);
diff --git a/arch/powerpc/platforms/powernv/opal-async.c 
b/arch/powerpc/platforms/powernv/opal-async.c
index 83bebee..8caeea2 100644
--- a/arch/powerpc/platforms/powernv/opal-async.c
+++ b/arch/powerpc/platforms/powernv/opal-async.c
@@ -73,6 +73,52 @@ int opal_async_get_token_interruptible(void)
 }
 EXPORT_SYMBOL_GPL(opal_async_get_token_interruptible);
 
+static int __opal_async_get_new_token(int last_token)
+{
+   unsigned long flags;
+   int token;
+
+   spin_lock_irqsave(&opal_async_comp_lock, flags);
+   token = find_next_bit(opal_async_complete_map, opal_max_async_tokens,
+ last_token + 1);
+   if (token >= opal_max_async_tokens) {
+   token = find_first_bit(opal_async_complete_map,
+  opal_max_async_tokens);
+   if (token >= opal_max_async_tokens || token == last_token) {
+   token = -EBUSY;
+   goto out;
+   }
+   }
+
+   pr_debug("%s token = %d\n", __func__, token);
+   if (__test_and_set_bit(token, opal_async_token_map)) {
+   token = -EBUSY;
+   goto out;
+   }
+
+   __clear_bit(token, opal_async_complete_map);
+
+out:
+   spin_unlock_irqrestore(&opal_async_comp_lock, flags);
+   return token;
+}
+
+int opal_async_get_unique_token_interruptible(int last_token)
+{
+   int token;
+
+   /* Wait until a token is available */
+   if (down_interruptible(&opal_async_sem))
+   return -ERESTARTSYS;
+
+   token = __opal_async_get_new_token(last_token);
+   if (token < 0)
+   up(&opal_async_sem);
+
+   return token;
+}
+EXPORT_SYMBOL_GPL(opal_async_get_unique_token_interruptible);
+
 int __opal_async_release_token(int token)
 {
unsigned long flags;
-- 
1.8.3.1



[PATCH V2 0/2] Add support for OCC command/response interface

2017-06-13 Thread Shilpasri G Bhat
In P9, OCC(On Chip Controller) can be sent commands inband via shared
memory based command response interface.  This patch adds a platform
driver to support the OCC command-response interface.

The skiboot patch for the interface is posted here:
https://lists.ozlabs.org/pipermail/skiboot/2017-June/007705.html

Shilpasri G Bhat (2):
  powerpc/powernv: Get a unique token for async completions
  powerpc/powernv : Add support for OPAL-OCC command/response interface

 arch/powerpc/include/asm/opal-api.h|  41 +++-
 arch/powerpc/include/asm/opal.h|   4 +
 arch/powerpc/platforms/powernv/Makefile|   2 +-
 arch/powerpc/platforms/powernv/opal-async.c|  46 
 arch/powerpc/platforms/powernv/opal-occ.c  | 314 +
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 arch/powerpc/platforms/powernv/opal.c  |   8 +
 7 files changed, 414 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/opal-occ.c

-- 
1.8.3.1



Re: [PATCH v2] perf: libdw support for powerpc [ping]

2017-06-13 Thread Ravi Bangoria
Hi Mark,

On Tuesday 13 June 2017 05:14 PM, Mark Wielaard wrote:
> I see the same on very short runs. But when doing a slightly longer run,
> even just using ls -lahR, which does some more work, then I do see user
> backtraces. They are still missing for some of the early samples though.
> It is as if there is a stack/memory address mismatch when the probe is
> "too early" in ld.so.
>
> Could you do a test run on some program that does some more work to see
> if you never get any user stack traces, or if you only not get them for
> some specific probes?

Thanks for checking. I tried a proper workload this time, but I still
don't see any userspace callchain getting unwound.

  $ ./perf record --call-graph=dwarf -- zip -q -r temp.zip .
  [ perf record: Woken up 2891 times to write data ]
  [ perf record: Captured and wrote 723.290 MB perf.data (87934 samples) ]


With libdw:

 $ LD_LIBRARY_PATH=/home/ravi/elfutils-git/usr/local/lib:\
/home/ravi/elfutils-git/usr/local/lib/elfutils/:$LD_LIBRARY_PATH\
./perf script

  zip 16699  6857.354633:  37371 cycles:u:
   ecedc xmon_core 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
   8c4fc __hash_page_64K 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
   83450 hash_preload 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
   7cc34 update_mmu_cache 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
  330064 alloc_set_pte 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
  330efc do_fault 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
  334580 __handle_mm_fault 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
  335040 handle_mm_fault 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
   7bf94 do_page_fault 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
   7bec4 do_page_fault 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
   7be78 do_page_fault 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
   1a4f8 handle_page_fault 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)

  zip 16699  6857.354663: 300677 cycles:u:

  zip 16699  6857.354895: 584131 cycles:u:

  zip 16699  6857.355312: 589687 cycles:u:

  zip 16699  6857.355606: 560142 cycles:u:


With libunwind:

$ ./perf script

  zip 16699  6857.354633:  37371 cycles:u:
 ecedc xmon_core 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
 8c4fc __hash_page_64K 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
 83450 hash_preload 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
 7cc34 update_mmu_cache 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
330064 alloc_set_pte 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
330efc do_fault 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
334580 __handle_mm_fault 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
335040 handle_mm_fault 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
 7bf94 do_page_fault 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
 7bec4 do_page_fault 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
 7be78 do_page_fault 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
 1a4f8 handle_page_fault 
(/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
  1920 _start (/usr/lib64/ld-2.17.so)

  zip 16699  6857.354663: 300677 cycles:u:
  fa38 _dl_new_object (/usr/lib64/ld-2.17.so)
  3073 dl_main (/usr/lib64/ld-2.17.so)
 2045b _dl_sysdep_start (/usr/lib64/ld-2.17.so)
  1c7f _dl_start_final (/usr/lib64/ld-2.17.so)
  5ce7 _dl_start (/usr/lib64/ld-2.17.so)
  1937 _start (/usr/lib64/ld-2.17.so)

  zip 16699  6857.354895: 584131 cycles:u:
 103d0 _dl_relocate_object (/usr/lib64/ld-2.17.so)

  zip 16699  6857.355312: 589687 cycles:u:
  df68 do_lookup_x (/usr/lib64/ld-2.17.so)
  e8d7 _dl_lookup_symbol_x (/usr/lib64/ld-2.17.so)
 14bb3 _dl_fixup (/usr/lib64/ld-2.17.so)
 1ef37 _dl_runtime_resolve (/usr/lib64/ld-2.17.so)
 20bf7 copy_args (/usr/bin/zip)
  286f main (/usr/bin/zip)
 2497f generic_start_main.isra.0 (/usr/lib64/libc-2.17.so)
 24b73 __libc_start_main (/usr/lib64/libc-2.17.so)

  zip 16699  6857.355606: 560142 cycles:u:
 84764 _IO_getc (/usr

[PATCH V3] cxl: Fixes for Coherent Accelerator Interface Architecture 2.0

2017-06-13 Thread Christophe Lombard
A previous set of patches "cxl: Add support for Coherent Accelerator
Interface Architecture 2.0" has introduced a new support for the CAPI
cards. These patches have been tested on Simulation environment and
quite a bit of them have been tested on real hardware.

This patch brings new fixes after a series of tests carried out on
new equipment:
* Add POWER9 definition.
* Re-enable any masked interrupts when the AFU is not activated after
  resetting the AFU.
* Remove the api cxl_is_psl8/9 which is no longer useful.
* Do not dump CAPI1 registers.
* Rewrite cxl_is_page_fault() function.
* Do not register slb callack on P9.

Changelog[v3]
 - Rebase to latest upstream.
 - Update the patch's header.
 - Add new test in cxl_is_page_fault().

Changelog[v2]
 - Rebase to latest upstream.
 - Update cxl_is_page_fault() to handle the checkout response status.
 - Add comments.

Signed-off-by: Christophe Lombard 
---
 drivers/misc/cxl/context.c |  6 +++---
 drivers/misc/cxl/cxl.h | 18 +-
 drivers/misc/cxl/fault.c   | 23 +++
 drivers/misc/cxl/main.c| 17 +
 drivers/misc/cxl/native.c  | 29 +
 drivers/misc/cxl/pci.c | 11 ---
 6 files changed, 57 insertions(+), 47 deletions(-)

diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
index 4472ce1..8c32040 100644
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -45,7 +45,7 @@ int cxl_context_init(struct cxl_context *ctx, struct cxl_afu 
*afu, bool master)
mutex_init(&ctx->mapping_lock);
ctx->mapping = NULL;
 
-   if (cxl_is_psl8(afu)) {
+   if (cxl_is_power8()) {
spin_lock_init(&ctx->sste_lock);
 
/*
@@ -189,7 +189,7 @@ int cxl_context_iomap(struct cxl_context *ctx, struct 
vm_area_struct *vma)
if (start + len > ctx->afu->adapter->ps_size)
return -EINVAL;
 
-   if (cxl_is_psl9(ctx->afu)) {
+   if (cxl_is_power9()) {
/*
 * Make sure there is a valid problem state
 * area space for this AFU.
@@ -324,7 +324,7 @@ static void reclaim_ctx(struct rcu_head *rcu)
 {
struct cxl_context *ctx = container_of(rcu, struct cxl_context, rcu);
 
-   if (cxl_is_psl8(ctx->afu))
+   if (cxl_is_power8())
free_page((u64)ctx->sstp);
if (ctx->ff_page)
__free_page(ctx->ff_page);
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index c8568ea..a03f8e7 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -357,6 +357,7 @@ static const cxl_p2n_reg_t CXL_PSL_WED_An = {0x0A0};
 #define CXL_PSL9_DSISR_An_PF_RGP  0x0090ULL  /* PTE not found 
(Radix Guest (parent)) 0b1001 */
 #define CXL_PSL9_DSISR_An_PF_HRH  0x0094ULL  /* PTE not found 
(HPT/Radix Host)   0b10010100 */
 #define CXL_PSL9_DSISR_An_PF_STEG 0x009CULL  /* PTE not found 
(STEG VA)  0b10011100 */
+#define CXL_PSL9_DSISR_An_URTCH   0x00B4ULL  /* Unsupported Radix 
Tree Configuration 0b10110100 */
 
 /** CXL_PSL_TFC_An **/
 #define CXL_PSL_TFC_An_A  (1ull << (63-28)) /* Acknowledge non-translation 
fault */
@@ -844,24 +845,15 @@ static inline bool cxl_is_power8(void)
 
 static inline bool cxl_is_power9(void)
 {
-   /* intermediate solution */
-   if (!cxl_is_power8() &&
-  (cpu_has_feature(CPU_FTRS_POWER9) ||
-   cpu_has_feature(CPU_FTR_POWER9_DD1)))
+   if (pvr_version_is(PVR_POWER9))
return true;
return false;
 }
 
-static inline bool cxl_is_psl8(struct cxl_afu *afu)
+static inline bool cxl_is_power9_dd1(void)
 {
-   if (afu->adapter->caia_major == 1)
-   return true;
-   return false;
-}
-
-static inline bool cxl_is_psl9(struct cxl_afu *afu)
-{
-   if (afu->adapter->caia_major == 2)
+   if ((pvr_version_is(PVR_POWER9)) &&
+   cpu_has_feature(CPU_FTR_POWER9_DD1))
return true;
return false;
 }
diff --git a/drivers/misc/cxl/fault.c b/drivers/misc/cxl/fault.c
index 538..c79e39b 100644
--- a/drivers/misc/cxl/fault.c
+++ b/drivers/misc/cxl/fault.c
@@ -187,7 +187,7 @@ static struct mm_struct *get_mem_context(struct cxl_context 
*ctx)
 
 static bool cxl_is_segment_miss(struct cxl_context *ctx, u64 dsisr)
 {
-   if ((cxl_is_psl8(ctx->afu)) && (dsisr & CXL_PSL_DSISR_An_DS))
+   if ((cxl_is_power8() && (dsisr & CXL_PSL_DSISR_An_DS)))
return true;
 
return false;
@@ -195,16 +195,23 @@ static bool cxl_is_segment_miss(struct cxl_context *ctx, 
u64 dsisr)
 
 static bool cxl_is_page_fault(struct cxl_context *ctx, u64 dsisr)
 {
-   if ((cxl_is_psl8(ctx->afu)) && (dsisr & CXL_PSL_DSISR_An_DM))
-   return true;
+   u64 crs; /* Translation Checkout Response Status */
 
-   if (

Re: [next-20170609] WARNING: CPU: 3 PID: 71167 at lib/idr.c:157 idr_replace

2017-06-13 Thread Tejun Heo
Cc'ing David Airlie.

This is from drm driver calling in idr_replace() w/ a negative id.
Probably a silly bug in error handling path?

Thanks.

On Mon, Jun 12, 2017 at 08:10:54PM +0530, Abdul Haleem wrote:
> Hi,
> 
> WARN_ON_ONCE is being called from idr_replace() function in file
> lib/idr.c at line 157
> 
> struct radix_tree_node *node;
> void __rcu **slot = NULL;
> void *entry;
> 
> if (WARN_ON_ONCE(id < 0))
> return ERR_PTR(-EINVAL);
> if (WARN_ON_ONCE(radix_tree_is_internal_node(ptr)))
> return ERR_PTR(-EINVAL);
> 
> entry = __radix_tree_lookup(&idr->idr_rt, id, &node, &slot);
> 
> 
> Test: Trinity (https://github.com/kernelslacker/trinity)
> Machine : Power 8 PowerVM LPAR
> Kernel : 4.12.0-rc4-next-20170606
> gcc : version 5.2.1
> config : attached
> 
> trace logs:
> [ cut here ]
> WARNING: CPU: 3 PID: 71167 at lib/idr.c:157 idr_replace+0x100/0x110
> Modules linked in: xts(E) ip_set(E) ipmi_powernv(E) ipmi_devintf(E)
> shpchp(E) ibmpowernv(E) ofpart(E) uio_pdrv_genirq(E) sg(E) ses(E)
> at24(E) tg3(E) bnx2x(E) ahci(E) loop(E) xt_CHECKSUM(E) ipt_MASQUERADE(E)
> nf_nat_masquerade_ipv4(E) tun(E) kvm_hv(E) kvm_pr(E) kvm(E)
> ip6t_rpfilter(E) ipt_REJECT(E) nf_reject_ipv4(E) ip6t_REJECT(E)
> nf_reject_ipv6(E) xt_conntrack(E) nfnetlink(E) ebtable_nat(E)
> ebtable_broute(E) bridge(E) stp(E) llc(E) ip6table_nat(E)
> nf_conntrack_ipv6(E) nf_defrag_ipv6(E) nf_nat_ipv6(E) ip6table_mangle(E)
> ip6table_security(E) ip6table_raw(E) iptable_nat(E) nf_conntrack_ipv4(E)
> nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack(E)
> iptable_mangle(E) iptable_security(E) iptable_raw(E) ebtable_filter(E)
> ebtables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E)
> i2c_dev(E)
> [29316.280682]  ghash_generic(E) gf128mul(E) vmx_crypto(E) enclosure(E)
> scsi_transport_sas(E) nvmem_core(E) opal_prd(E) ipmi_msghandler(E)
> powernv_rng(E) powernv_flash(E) uio(E) rtc_opal(E) mtd(E) i2c_opal(E)
> nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E)
> ip_tables(E) ext4(E) jbd2(E) fscrypto(E) mbcache(E) sd_mod(E) mdio(E)
> libcrc32c(E) ptp(E) ast(E) i2c_algo_bit(E) drm_kms_helper(E)
> syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ttm(E) drm(E)
> aacraid(E) libahci(E) libata(E) i2c_core(E) pps_core(E) dm_mirror(E)
> dm_region_hash(E) dm_log(E) dm_mod(E) [last unloaded: xts]
> CPU: 3 PID: 71167 Comm: trinity-c43 Tainted: GE
> 4.12.0-rc4-next-20170609-autotest #1
> task: c03bd0799500 task.stack: c011e81f
> NIP: c04d20a0 LR: dfc16a98 CTR: c04d1fa0
> REGS: c011e81f38d0 TRAP: 0700   Tainted: GE
> (4.12.0-rc4-next-20170609-autotest)
> MSR: 90029033 
>   CR: 28002428  XER: 2000  
> CFAR: c04d1fd4 SOFTE: 1 
> GPR00: dfc16a98 c011e81f3b50 c106d800 c0334c89de38 
> GPR04:  d7d7d7d7 d7d7d7d7  
> GPR08: c011e81f4000  8003 dfc47760 
> GPR12: c04d1fa0 cfac1f80  10030d70 
> GPR16: 10030f38  dfc17150 0008 
> GPR20: dfc4f4e0 7fff7996 0009  
> GPR24:  c011e81f3c50 0008 dfc61958 
> GPR28: c0334c89de50 c0334c89de38 d7d7d7d7 d7d7d7d7 
> NIP [c04d20a0] idr_replace+0x100/0x110
> LR [dfc16a98] drm_gem_handle_delete+0x58/0x120 [drm]
> Call Trace:
> [c011e81f3b50] [c011e81f3bf0] 0xc011e81f3bf0 (unreliable)
> [c011e81f3ba0] [dfc16a98] drm_gem_handle_delete+0x58/0x120 [drm]
> [c011e81f3bf0] [dfc17e80] drm_ioctl+0x270/0x4e0 [drm]
> [c011e81f3d40] [c0344108] do_vfs_ioctl+0xc8/0x8c0
> [c011e81f3de0] [c03449c4] SyS_ioctl+0xc4/0xe0
> [c011e81f3e30] [c000af84] system_call+0x38/0xe0
> Instruction dump:
> 38210050 7f83e378 e8010010 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 7c0803a6
> 4e800020 0fe0 3860ffea 4b94 <0fe0> 3860ffea 4b88 6042
> ---[ end trace 5158244f52496ab9 ]---
> _exception: 47 callbacks suppressed
> 
> 
> -- 
> Regard's
> 
> Abdul Haleem
> IBM Linux Technology Centre
> 
> 

> #
> # Automatically generated file; DO NOT EDIT.
> # Linux/powerpc 4.11.0-rc7 Kernel Configuration
> #
> CONFIG_PPC64=y
> 
> #
> # Processor support
> #
> CONFIG_PPC_BOOK3S_64=y
> # CONFIG_PPC_BOOK3E_64 is not set
> # CONFIG_POWER7_CPU is not set
> CONFIG_POWER8_CPU=y
> CONFIG_PPC_BOOK3S=y
> CONFIG_PPC_FPU=y
> CONFIG_ALTIVEC=y
> CONFIG_VSX=y
> # CONFIG_PPC_ICSWX is not set
> CONFIG_PPC_STD_MMU=y
> CONFIG_PPC_STD_MMU_64=y
> CONFIG_PPC_RADIX_MMU=y
> CONFIG_PPC_MM_SLICES=y
> CONFIG_PPC_HAVE_PMU_SUPPORT=y
> CONFIG_PPC_PERF_CTRS=y
> CONFIG_SMP=y
> CONFIG_NR_CPUS=2048
> CONFIG_PPC_DOORBELL=y
> # CONFIG_CPU_BIG_ENDIAN is not set
> CONFIG_CPU_LITTLE_ENDIAN=y
> CONFIG_PPC64_BOOT_WRAPPER=y
> CONFIG_64BIT=y
> CONFIG_ARCH

Re: [PATCH 13/14] powerpc/64: runlatch CTRL[RUN] set optimisation

2017-06-13 Thread Benjamin Herrenschmidt
On Tue, 2017-06-13 at 20:04 +1000, Michael Ellerman wrote:
> > Good idea.  Writing to CTRL register can change only the RUN field.
> > Was this any different in older generations?
> 
> No AFAICS back to 2.02.
> 
> > Anton and Ben kept the mfspr/mtspr part in earlier updates to this
> > routine.
> 
> Doing the read/modify write is forward compatible vs a new writable
> field, whereas writing the whole register with a known value is not.

At this stage I wouldn't worry too much about it. What we can do is
write a pre-cooked value (from reading it earlier once at boot) if we
are paranoid or just do what Nick does and put the onus on future
designs that might want to re-use it for other things to add a mode
bits to configure the new feature in.

Ben.



Re: [PATCH] powerpc/configs: fix default values for NF_CT_PROTO_*

2017-06-13 Thread Davide Caratti
On Tue, 2017-06-13 at 20:49 +1000, Michael Ellerman wrote:
> Davide Caratti  writes:
> 
> > NF_CT_PROTO_{SCTP,UDPLITE,DCCP} can't be set to 'm' anymore, since they
> > have been redefined as 'bool': fix defconfig for linkstation, mvme5100 and
> > ppc6xx platforms accordingly.
> 
> Since when? ie. which commit changed the symbols to bool from tristate?
> 
> cheers

hello Michael,

the commits are:

a85406afeb3e ("netfilter: conntrack: built-in support for SCTP")
c51d39010a1b ("netfilter: conntrack: built-in support for DCCP")
9b91c96c5d1f ("netfilter: conntrack: built-in support for UDPlite")

they were causing a "warning symbol value 'm' invalid" in kconfig: sorry
for not noticing this before. I see that ARM and MIPS already have a fix:

5aff1d245e8c ("ARM: defconfigs: make NF_CT_PROTO_SCTP and
NF_CT_PROTO_UDPLITE built-in")
9ddc16ad8e0b ("MIPS: Update defconfigs for NF_CT_PROTO_DCCP/UDPLITE
change")

but (some) ppc and tile defconfig were still missing the correction:
that's why I submitted this patch (and a similar one, http://www.mail-arch
ive.com/linux-ker...@vger.kernel.org/msg1413473.html , for tile).

thank you for looking at this,
regards
--
davide








Re: [PATCH 03/13] powerpc/64s: idle process interrupts from system reset wakeup

2017-06-13 Thread Nicholas Piggin
On Tue, 13 Jun 2017 23:05:47 +1000
Nicholas Piggin  wrote:

> diff --git a/arch/powerpc/include/asm/hw_irq.h 
> b/arch/powerpc/include/asm/hw_irq.h
> index f06112cf8734..8366bdc69988 100644
> --- a/arch/powerpc/include/asm/hw_irq.h
> +++ b/arch/powerpc/include/asm/hw_irq.h
> @@ -32,6 +32,7 @@
>  #ifndef __ASSEMBLY__
>  
>  extern void __replay_interrupt(unsigned int vector);
> +extern void __replay_wakeup_interrupt(unsigned long srr1);
>  
>  extern void timer_interrupt(struct pt_regs *);
>  extern void performance_monitor_exception(struct pt_regs *regs);

Oops, just noticed this remnant from an earlier version. Please
ignore this hunk.

Thanks,
Nick


Re: [PATCH] powerpc: dts: use #include "..." to include local DT

2017-06-13 Thread Anatolij Gustschin
Hi,

On Wed, 24 May 2017 14:12:24 +0900
Masahiro Yamada yamada.masah...@socionext.com wrote:

>Most of DT files in PowerPC use #include "..." to make pre-processor
>include DT in the same directory, but we have 3 exceptional files
>that use #include <...> for that.
>
>Fix them to remove -I$(srctree)/arch/$(SRCARCH)/boot/dts path from
>dtc_cpp_flags.
>
>Signed-off-by: Masahiro Yamada 

Tested-by: Anatolij Gustschin 


Re: [PATCH] powerpc: dts: use #include "..." to include local DT

2017-06-13 Thread Anatolij Gustschin
On Tue, 13 Jun 2017 20:21:45 +1000
Michael Ellerman m...@ellerman.id.au wrote:

>Masahiro Yamada  writes:
...
>> Ping.
>> I am not 100% sure who is responsible for this,
>> but somebody, could take a look at this patch, please?  
>
>Have you tested it actually works?
>
>It sounds reasonable, and if it behaves as you describe there is no
>change in behaviour, right?

yes, these dtbs build with this patch and I've tested with
mpc5121ads.dtb.

Thanks,
Anatolij


[PATCH 13/13] powerpc/64s: idle runlatch switch is done with MSR[EE]=0

2017-06-13 Thread Nicholas Piggin
2*mfmsr and 2*mtmsr can be avoided in the idle sleep/wake code
because we know the MSR[EE] is clear.

Acked-by: Vaidyanathan Srinivasan 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/platforms/powernv/idle.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 1028df82cd2f..2abee070373f 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -291,9 +291,9 @@ static unsigned long __power7_idle_type(unsigned long type)
if (!prep_irq_for_idle_irqsoff())
return 0;
 
-   ppc64_runlatch_off();
+   __ppc64_runlatch_off();
srr1 = power7_idle_insn(type);
-   ppc64_runlatch_on();
+   __ppc64_runlatch_on();
 
fini_irq_for_idle_irqsoff();
 
@@ -328,9 +328,9 @@ static unsigned long __power9_idle_type(unsigned long 
stop_psscr_val,
psscr = mfspr(SPRN_PSSCR);
psscr = (psscr & ~stop_psscr_mask) | stop_psscr_val;
 
-   ppc64_runlatch_off();
+   __ppc64_runlatch_off();
srr1 = power9_idle_stop(psscr);
-   ppc64_runlatch_on();
+   __ppc64_runlatch_on();
 
fini_irq_for_idle_irqsoff();
 
@@ -365,7 +365,7 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
unsigned long srr1;
u32 idle_states = pnv_get_supported_cpuidle_states();
 
-   ppc64_runlatch_off();
+   __ppc64_runlatch_off();
 
if (cpu_has_feature(CPU_FTR_ARCH_300) && deepest_stop_found) {
unsigned long psscr;
@@ -392,7 +392,7 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
HMT_medium();
}
 
-   ppc64_runlatch_on();
+   __ppc64_runlatch_on();
 
return srr1;
 }
-- 
2.11.0



[PATCH 12/13] powerpc/64: runlatch CTRL[RUN] set optimisation

2017-06-13 Thread Nicholas Piggin
The CTRL register is read-only except bit 63 which is the run latch
control. This means it can be updated with a mtspr rather than
mfspr/mtspr.

Reviewed-by: Vaidyanathan Srinivasan 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/process.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index baae104b16c7..a44ea034c226 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1960,12 +1960,8 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack)
 void notrace __ppc64_runlatch_on(void)
 {
struct thread_info *ti = current_thread_info();
-   unsigned long ctrl;
-
-   ctrl = mfspr(SPRN_CTRLF);
-   ctrl |= CTRL_RUNLATCH;
-   mtspr(SPRN_CTRLT, ctrl);
 
+   mtspr(SPRN_CTRLT, CTRL_RUNLATCH);
ti->local_flags |= _TLF_RUNLATCH;
 }
 
@@ -1973,13 +1969,9 @@ void notrace __ppc64_runlatch_on(void)
 void notrace __ppc64_runlatch_off(void)
 {
struct thread_info *ti = current_thread_info();
-   unsigned long ctrl;
 
ti->local_flags &= ~_TLF_RUNLATCH;
-
-   ctrl = mfspr(SPRN_CTRLF);
-   ctrl &= ~CTRL_RUNLATCH;
-   mtspr(SPRN_CTRLT, ctrl);
+   mtspr(SPRN_CTRLT, 0);
 }
 #endif /* CONFIG_PPC64 */
 
-- 
2.11.0



[PATCH 11/13] powerpc/64s: cpuidle no memory barrier after break from idle

2017-06-13 Thread Nicholas Piggin
A memory barrier is not required after the task wakes up,
only if we clear the polling flag before waking. The case
where we have work to do is the important one, so optimise
for it.

Reviewed-by: Vaidyanathan Srinivasan 
Signed-off-by: Nicholas Piggin 
---
 drivers/cpuidle/cpuidle-powernv.c | 11 +--
 drivers/cpuidle/cpuidle-pseries.c | 11 +--
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index 9d03326ac05e..37b0698b7193 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -59,14 +59,21 @@ static int snooze_loop(struct cpuidle_device *dev,
ppc64_runlatch_off();
HMT_very_low();
while (!need_resched()) {
-   if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time)
+   if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
+   /*
+* Task has not woken up but we are exiting the polling
+* loop anyway. Require a barrier after polling is
+* cleared to order subsequent test of need_resched().
+*/
+   clear_thread_flag(TIF_POLLING_NRFLAG);
+   smp_mb();
break;
+   }
}
 
HMT_medium();
ppc64_runlatch_on();
clear_thread_flag(TIF_POLLING_NRFLAG);
-   smp_mb();
 
return index;
 }
diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index a404f352d284..e9b3853d93ea 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -71,13 +71,20 @@ static int snooze_loop(struct cpuidle_device *dev,
while (!need_resched()) {
HMT_low();
HMT_very_low();
-   if (snooze_timeout_en && get_tb() > snooze_exit_time)
+   if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
+   /*
+* Task has not woken up but we are exiting the polling
+* loop anyway. Require a barrier after polling is
+* cleared to order subsequent test of need_resched().
+*/
+   clear_thread_flag(TIF_POLLING_NRFLAG);
+   smp_mb();
break;
+   }
}
 
HMT_medium();
clear_thread_flag(TIF_POLLING_NRFLAG);
-   smp_mb();
 
idle_loop_epilog(in_purr);
 
-- 
2.11.0



[PATCH 10/13] powerpc/64s: cpuidle read mostly for common globals

2017-06-13 Thread Nicholas Piggin
Ensure these don't get put into bouncing cachelines.

Reviewed-by: Vaidyanathan Srinivasan 
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Nicholas Piggin 
---
 drivers/cpuidle/cpuidle-powernv.c | 10 +-
 drivers/cpuidle/cpuidle-pseries.c |  8 
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index 50b3c2e0306f..9d03326ac05e 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -32,18 +32,18 @@ static struct cpuidle_driver powernv_idle_driver = {
.owner= THIS_MODULE,
 };
 
-static int max_idle_state;
-static struct cpuidle_state *cpuidle_state_table;
+static int max_idle_state __read_mostly;
+static struct cpuidle_state *cpuidle_state_table __read_mostly;
 
 struct stop_psscr_table {
u64 val;
u64 mask;
 };
 
-static struct stop_psscr_table stop_psscr_table[CPUIDLE_STATE_MAX];
+static struct stop_psscr_table stop_psscr_table[CPUIDLE_STATE_MAX] 
__read_mostly;
 
-static u64 snooze_timeout;
-static bool snooze_timeout_en;
+static u64 snooze_timeout __read_mostly;
+static bool snooze_timeout_en __read_mostly;
 
 static int snooze_loop(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index 7b12bb2ea70f..a404f352d284 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -25,10 +25,10 @@ struct cpuidle_driver pseries_idle_driver = {
.owner= THIS_MODULE,
 };
 
-static int max_idle_state;
-static struct cpuidle_state *cpuidle_state_table;
-static u64 snooze_timeout;
-static bool snooze_timeout_en;
+static int max_idle_state __read_mostly;
+static struct cpuidle_state *cpuidle_state_table __read_mostly;
+static u64 snooze_timeout __read_mostly;
+static bool snooze_timeout_en __read_mostly;
 
 static inline void idle_loop_prolog(unsigned long *in_purr)
 {
-- 
2.11.0



[PATCH 09/13] powerpc/64s: cpuidle set polling before enabling irqs

2017-06-13 Thread Nicholas Piggin
local_irq_enable can cause interrupts to be taken which could
take significant amount of processing time. The idle process
should set its polling flag before this, so another process that
wakes it during this time will not have to send an IPI.

Expand the TIF_POLLING_NRFLAG coverage to as large as possible.

Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Nicholas Piggin 
---
 drivers/cpuidle/cpuidle-powernv.c | 4 +++-
 drivers/cpuidle/cpuidle-pseries.c | 3 ++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index 79152676f62b..50b3c2e0306f 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -51,9 +51,10 @@ static int snooze_loop(struct cpuidle_device *dev,
 {
u64 snooze_exit_time;
 
-   local_irq_enable();
set_thread_flag(TIF_POLLING_NRFLAG);
 
+   local_irq_enable();
+
snooze_exit_time = get_tb() + snooze_timeout;
ppc64_runlatch_off();
HMT_very_low();
@@ -66,6 +67,7 @@ static int snooze_loop(struct cpuidle_device *dev,
ppc64_runlatch_on();
clear_thread_flag(TIF_POLLING_NRFLAG);
smp_mb();
+
return index;
 }
 
diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index 166ccd711ec9..7b12bb2ea70f 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -62,9 +62,10 @@ static int snooze_loop(struct cpuidle_device *dev,
unsigned long in_purr;
u64 snooze_exit_time;
 
+   set_thread_flag(TIF_POLLING_NRFLAG);
+
idle_loop_prolog(&in_purr);
local_irq_enable();
-   set_thread_flag(TIF_POLLING_NRFLAG);
snooze_exit_time = get_tb() + snooze_timeout;
 
while (!need_resched()) {
-- 
2.11.0



[PATCH 08/13] powerpc/64s: idle hmi wakeup is unlikely

2017-06-13 Thread Nicholas Piggin
In a busy system, idle wakeups can be expected from IPIs and device
interrupts.

Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/idle_book3s.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index 6305d4d7a268..32b76fb28352 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -306,7 +306,7 @@ FTR_SECTION_ELSE_NESTED(66);
\
rlwinm  r0,r12,45-31,0xe;  /* P7 wake reason field is 3 bits */ \
 ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);   \
cmpwi   r0,0xa; /* Hypervisor maintenance ? */  \
-   bne 20f;\
+   bne+20f;\
/* Invoke opal call to handle hmi */\
ld  r2,PACATOC(r13);\
ld  r1,PACAR1(r13); \
-- 
2.11.0



[PATCH 07/13] powerpc/64s: idle avoid SRR usage in idle sleep/wake paths

2017-06-13 Thread Nicholas Piggin
Idle code now always runs at the 0xc... effective address whether
in real or virtual mode. This means rfid can be ditched, along
with a lot of SRR manipulations.

In the wakeup path, carry SRR1 around in r12. Use mtmsrd to change
MSR states as required.

This also balances the return prediction for the idle call, by
doing blr rather than rfid to return to the idle caller.

On POWER9, 2-process context switch on different cores, with snooze
disabled, increases performance by 2%.
---
 arch/powerpc/kernel/exceptions-64s.S|  1 +
 arch/powerpc/kernel/idle_book3s.S   | 57 +++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  8 -
 3 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index ada0a20ef46c..b5ea2863aed9 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -130,6 +130,7 @@ EXC_VIRT_NONE(0x4100, 0x100)
 
 #ifdef CONFIG_PPC_P7_NAP
 EXC_COMMON_BEGIN(system_reset_idle_common)
+   mfspr   r12,SPRN_SRR1
b   pnv_powersave_wakeup
 #endif
 
diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index 35cf5bb7daed..6305d4d7a268 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -111,7 +111,7 @@ core_idle_lock_held:
  * r3 - PNV_THREAD_NAP/SLEEP/WINKLE in POWER8
  *- Requested PSSCR value in POWER9
  *
- * Address of idle handler to 'rfid' to in r4
+ * Address of idle handler to branch to in realmode in r4
  */
 pnv_powersave_common:
/* Use r3 to pass state nap/sleep/winkle */
@@ -121,14 +121,14 @@ pnv_powersave_common:
 * need to save PC, some CR bits and the NV GPRs,
 * but for now an interrupt frame will do.
 */
+   mtctr   r4
+
mflrr0
std r0,16(r1)
stdur1,-INT_FRAME_SIZE(r1)
std r0,_LINK(r1)
std r0,_NIP(r1)
 
-   mfmsr   r9
-
/* We haven't lost state ... yet */
li  r0,0
stb r0,PACA_NAPSTATELOST(r13)
@@ -138,7 +138,6 @@ pnv_powersave_common:
SAVE_NVGPRS(r1)
mfcrr5
std r5,_CCR(r1)
-   std r9,_MSR(r1)
std r1,PACAR1(r13)
 
/*
@@ -148,12 +147,8 @@ pnv_powersave_common:
 * the MMU context to the guest.
 */
LOAD_REG_IMMEDIATE(r7, MSR_IDLE)
-   li  r6, MSR_RI
-   andcr6, r9, r6
-   mtmsrd  r6, 1   /* clear RI before setting SRR0/1 */
-   mtspr   SPRN_SRR0, r4
-   mtspr   SPRN_SRR1, r7
-   rfid
+   mtmsrd  r7,0
+   bctr
 
.globl pnv_enter_arch207_idle_mode
 pnv_enter_arch207_idle_mode:
@@ -305,11 +300,10 @@ _GLOBAL(power7_idle_insn)
b   pnv_powersave_common
 
 #define CHECK_HMI_INTERRUPT\
-   mfspr   r0,SPRN_SRR1;   \
 BEGIN_FTR_SECTION_NESTED(66);  \
-   rlwinm  r0,r0,45-31,0xf;  /* extract wake reason field (P8) */  \
+   rlwinm  r0,r12,45-31,0xf;  /* extract wake reason field (P8) */ \
 FTR_SECTION_ELSE_NESTED(66);   \
-   rlwinm  r0,r0,45-31,0xe;  /* P7 wake reason field is 3 bits */  \
+   rlwinm  r0,r12,45-31,0xe;  /* P7 wake reason field is 3 bits */ \
 ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);   \
cmpwi   r0,0xa; /* Hypervisor maintenance ? */  \
bne 20f;\
@@ -388,17 +382,17 @@ pnv_powersave_wakeup_mce:
 
/*
 * Now put the original SRR1 with SRR1_WAKEMCE_RESVD as the wake
-* reason into SRR1, which allows reuse of the system reset wakeup
+* reason into r12, which allows reuse of the system reset wakeup
 * code without being mistaken for another type of wakeup.
 */
-   orisr3,r3,SRR1_WAKEMCE_RESVD@h
-   mtspr   SPRN_SRR1,r3
+   orisr12,r3,SRR1_WAKEMCE_RESVD@h
 
b   pnv_powersave_wakeup
 
 /*
  * Called from reset vector for powersave wakeups.
  * cr3 - set to gt if waking up with partial/complete hypervisor state loss
+ * r12 - SRR1
  */
 .global pnv_powersave_wakeup
 pnv_powersave_wakeup:
@@ -408,8 +402,10 @@ BEGIN_FTR_SECTION
 BEGIN_FTR_SECTION_NESTED(70)
bl  power9_dd1_recover_paca
 END_FTR_SECTION_NESTED_IFSET(CPU_FTR_POWER9_DD1, 70)
+   ld  r1,PACAR1(r13)
bl  pnv_restore_hyp_resource_arch300
 FTR_SECTION_ELSE
+   ld  r1,PACAR1(r13)
bl  pnv_restore_hyp_resource_arch207
 ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
 
@@ -429,7 +425,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
 #endif
 
/* Return SRR1 from power7_nap() */
-   mfspr   r3,SPRN_SRR1
+   mr  r3,r12
blt cr3,pnv_wakeup_noloss
b   pnv_wakeup_l

[PATCH 06/13] powerpc/64s: idle branch to handler with virtual mode offset

2017-06-13 Thread Nicholas Piggin
Have the system reset idle wakeup handlers branched to in real mode
with the 0xc... kernel address applied. This allows simplifications of
avoiding rfid when switching to virtual mode in the wakeup handler.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/exception-64s.h | 13 +
 arch/powerpc/kernel/exceptions-64s.S |  6 --
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 183d73b6ed99..33473cbc0986 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -236,6 +236,19 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define kvmppc_interrupt kvmppc_interrupt_pr
 #endif
 
+/*
+ * Branch to label using its 0xC000 address. This results in instruction
+ * address suitable for MSR[IR]=0 or 1, which allows relocation to be turned
+ * on using mtmsr rather than rfid.
+ *
+ * This could set the 0xc bits for !RELOCATABLE as an immediate, rather than
+ * load KBASE for a slight optimisation.
+ */
+#define BRANCH_TO_C000(reg, label) \
+   __LOAD_HANDLER(reg, label); \
+   mtctr   reg;\
+   bctr
+
 #ifdef CONFIG_RELOCATABLE
 #define BRANCH_TO_COMMON(reg, label)   \
__LOAD_HANDLER(reg, label); \
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 31a9114860c4..ada0a20ef46c 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -99,7 +99,9 @@ EXC_VIRT_NONE(0x4000, 0x100)
 #ifdef CONFIG_PPC_P7_NAP
/*
 * If running native on arch 2.06 or later, check if we are waking up
-* from nap/sleep/winkle, and branch to idle handler.
+* from nap/sleep/winkle, and branch to idle handler. The idle wakeup
+* handler initially runs in real mode, but we branch to the 0xc000...
+* address so we can turn on relocation with mtmsr.
 */
 #define IDLETEST(n)\
BEGIN_FTR_SECTION ; \
@@ -107,7 +109,7 @@ EXC_VIRT_NONE(0x4000, 0x100)
rlwinm. r10,r10,47-31,30,31 ;   \
beq-1f ;\
cmpwi   cr3,r10,2 ; \
-   BRANCH_TO_COMMON(r10, system_reset_idle_common) ;   \
+   BRANCH_TO_C000(r10, system_reset_idle_common) ; \
 1: \
END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
 #else
-- 
2.11.0



[PATCH 05/13] powerpc/64s: interrupt replay balance the return branch predictor

2017-06-13 Thread Nicholas Piggin
The __replay_interrupt code is branched to with bl, but the caller is
returned to directly with rfid from the interrupt.

Instead, rfid to a stub that returns to the caller with blr, which
should keep the return branch predictor balanced.

Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index a04ee0d7f88e..31a9114860c4 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1579,6 +1579,10 @@ doorbell_super_common_msgclr:
  * Note: While MSR:EE is off, we need to make sure that _MSR
  * in the generated frame has EE set to 1 or the exception
  * handler will not properly re-enable them.
+ *
+ * Note that we don't specify LR as the NIP (return address) for
+ * the interrupt because that would unbalance the return branch
+ * predictor.
  */
 _GLOBAL(__replay_interrupt)
/* We are going to jump to the exception common code which
@@ -1586,7 +1590,7 @@ _GLOBAL(__replay_interrupt)
 * we don't give a damn about, so we don't bother storing them.
 */
mfmsr   r12
-   mflrr11
+   LOAD_REG_ADDR(r11, .L__replay_interrupt_return)
mfcrr9
ori r12,r12,MSR_EE
cmpwi   r3,0x900
@@ -1604,4 +1608,6 @@ FTR_SECTION_ELSE
cmpwi   r3,0xa00
beq doorbell_super_common_msgclr
 ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
+.L__replay_interrupt_return:
blr
+
-- 
2.11.0



[PATCH 04/13] powerpc/64s: msgclr when handling doorbell exceptions

2017-06-13 Thread Nicholas Piggin
msgsnd doorbell exceptions are cleared when the doorbell interrupt is
taken. However if a doorbell exception causes a system reset interrupt
wake from power saving state, the message is not cleared. Processing
the doorbell from the system reset interrupt requires msgclr to avoid
taking the exception again.

Testing this plus the previous wakup direct patch gives:

original wakeup direct msgclr
Different threads, same core:   315k/s   264k/s345k/s
Different cores:235k/s   242k/s242k/s

Net speedup is +10% for same core, and +3% for different core.

Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/dbell.h  | 13 +
 arch/powerpc/include/asm/ppc-opcode.h |  3 +++
 arch/powerpc/kernel/asm-offsets.c |  1 +
 arch/powerpc/kernel/exceptions-64s.S  | 23 +--
 4 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/dbell.h b/arch/powerpc/include/asm/dbell.h
index f70cbfe0ec04..9f2ae0d25e15 100644
--- a/arch/powerpc/include/asm/dbell.h
+++ b/arch/powerpc/include/asm/dbell.h
@@ -56,6 +56,19 @@ static inline void ppc_msgsync(void)
: : "i" (CPU_FTR_HVMODE|CPU_FTR_ARCH_300));
 }
 
+static inline void _ppc_msgclr(u32 msg)
+{
+   __asm__ __volatile__ (ASM_FTR_IFSET(PPC_MSGCLR(%1), PPC_MSGCLRP(%1), %0)
+   : : "i" (CPU_FTR_HVMODE), "r" (msg));
+}
+
+static inline void ppc_msgclr(enum ppc_dbell type)
+{
+   u32 msg = PPC_DBELL_TYPE(type);
+
+   _ppc_msgclr(msg);
+}
+
 #else /* CONFIG_PPC_BOOK3S */
 
 #define PPC_DBELL_MSGTYPE  PPC_DBELL
diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 3a8d278e7421..3b29c54e51fa 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -221,6 +221,7 @@
 #define PPC_INST_MSGCLR0x7c0001dc
 #define PPC_INST_MSGSYNC   0x7c0006ec
 #define PPC_INST_MSGSNDP   0x7c00011c
+#define PPC_INST_MSGCLRP   0x7c00015c
 #define PPC_INST_MTTMR 0x7c0003dc
 #define PPC_INST_NOP   0x6000
 #define PPC_INST_PASTE 0x7c00070c
@@ -409,6 +410,8 @@
___PPC_RB(b))
 #define PPC_MSGSNDP(b) stringify_in_c(.long PPC_INST_MSGSNDP | \
___PPC_RB(b))
+#define PPC_MSGCLRP(b) stringify_in_c(.long PPC_INST_MSGCLRP | \
+   ___PPC_RB(b))
 #define PPC_POPCNTB(a, s)  stringify_in_c(.long PPC_INST_POPCNTB | \
__PPC_RA(a) | __PPC_RS(s))
 #define PPC_POPCNTD(a, s)  stringify_in_c(.long PPC_INST_POPCNTD | \
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index e15c178ba079..9624851ca276 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -746,6 +746,7 @@ int main(void)
 #endif
 
DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
+   DEFINE(PPC_DBELL_MSGTYPE, PPC_DBELL_MSGTYPE);
 
 #ifdef CONFIG_PPC_8xx
DEFINE(VIRT_IMMR_BASE, (u64)__fix_to_virt(FIX_IMMR_BASE));
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index ae418b85c17c..a04ee0d7f88e 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1552,6 +1552,25 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
b   1b
 
 /*
+ * When doorbell is triggered from system reset wakeup, the message is
+ * not cleared, so it would fire again when EE is enabled.
+ *
+ * When coming from local_irq_enable, there may be the same problem if
+ * we were hard disabled.
+ *
+ * Execute msgclr to clear pending exceptions before handling it.
+ */
+h_doorbell_common_msgclr:
+   LOAD_REG_IMMEDIATE(r3, PPC_DBELL_MSGTYPE << (63-36))
+   PPC_MSGCLR(3)
+   b   h_doorbell_common
+
+doorbell_super_common_msgclr:
+   LOAD_REG_IMMEDIATE(r3, PPC_DBELL_MSGTYPE << (63-36))
+   PPC_MSGCLRP(3)
+   b   doorbell_super_common
+
+/*
  * Called from arch_local_irq_enable when an interrupt needs
  * to be resent. r3 contains 0x500, 0x900, 0xa00 or 0xe80 to indicate
  * which kind of interrupt. MSR:EE is already off. We generate a
@@ -1576,13 +1595,13 @@ _GLOBAL(__replay_interrupt)
beq hardware_interrupt_common
 BEGIN_FTR_SECTION
cmpwi   r3,0xe80
-   beq h_doorbell_common
+   beq h_doorbell_common_msgclr
cmpwi   r3,0xea0
beq h_virt_irq_common
cmpwi   r3,0xe60
beq hmi_exception_common
 FTR_SECTION_ELSE
cmpwi   r3,0xa00
-   beq doorbell_super_common
+   beq doorbell_super_common_msgclr
 ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
blr
-- 
2.11.0



[PATCH 03/13] powerpc/64s: idle process interrupts from system reset wakeup

2017-06-13 Thread Nicholas Piggin
When the CPU wakes from low power state, it begins at the system reset
interrupt with the exception that caused the wakeup encoded in SRR1.

Today, powernv idle wakeup ignores the wakeup reason (except a special
case for HMI), and the regular interrupt corresponding to the
exception will fire after the idle wakeup exits.

Change this to replay the interrupt from the idle wakeup before
interrupts are hard-enabled.

Test on POWER8 of context_switch selftests benchmark with polling idle
disabled (e.g., always nap, giving cross-CPU IPIs) gives the following
results:

original wakeup direct
Different threads, same core:   315k/s   264k/s
Different cores:235k/s   242k/s

There is a slowdown for doorbell IPI (same core) case because system
reset wakeup does not clear the message and the doorbell interrupt
fires again needlessly.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/hw_irq.h |  2 ++
 arch/powerpc/kernel/irq.c | 29 +
 arch/powerpc/platforms/powernv/idle.c | 10 --
 3 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index f06112cf8734..8366bdc69988 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -32,6 +32,7 @@
 #ifndef __ASSEMBLY__
 
 extern void __replay_interrupt(unsigned int vector);
+extern void __replay_wakeup_interrupt(unsigned long srr1);
 
 extern void timer_interrupt(struct pt_regs *);
 extern void performance_monitor_exception(struct pt_regs *regs);
@@ -130,6 +131,7 @@ static inline bool arch_irq_disabled_regs(struct pt_regs 
*regs)
 
 extern bool prep_irq_for_idle(void);
 extern bool prep_irq_for_idle_irqsoff(void);
+extern void irq_set_pending_from_srr1(unsigned long srr1);
 
 #define fini_irq_for_idle_irqsoff() trace_hardirqs_off();
 
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index be32cec28107..76224869059d 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -348,6 +348,7 @@ bool prep_irq_for_idle(void)
return true;
 }
 
+#ifdef CONFIG_PPC_BOOK3S
 /*
  * This is for idle sequences that return with IRQs off, but the
  * idle state itself wakes on interrupt. Tell the irq tracer that
@@ -379,6 +380,34 @@ bool prep_irq_for_idle_irqsoff(void)
 }
 
 /*
+ * Take the SRR1 wakeup reason, index into this table to find the
+ * appropriate irq_happened bit.
+ */
+static const u8 srr1_to_lazyirq[0x10] = {
+   0, 0, 0,
+   PACA_IRQ_DBELL,
+   0,
+   PACA_IRQ_DBELL,
+   PACA_IRQ_DEC,
+   0,
+   PACA_IRQ_EE,
+   PACA_IRQ_EE,
+   PACA_IRQ_HMI,
+   0, 0, 0, 0, 0 };
+
+void irq_set_pending_from_srr1(unsigned long srr1)
+{
+   unsigned int idx = (srr1 & SRR1_WAKEMASK_P8) >> 18;
+
+   /*
+* The 0 index (SRR1[42:45]=b) must always evaluate to 0,
+* so this can be called unconditionally with srr1 wake reason.
+*/
+   local_paca->irq_happened |= srr1_to_lazyirq[idx];
+}
+#endif /* CONFIG_PPC_BOOK3S */
+
+/*
  * Force a replay of the external interrupt handler on this CPU.
  */
 void force_external_irq_replay(void)
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index f188d84d9c59..1028df82cd2f 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -302,7 +302,10 @@ static unsigned long __power7_idle_type(unsigned long type)
 
 void power7_idle_type(unsigned long type)
 {
-   __power7_idle_type(type);
+   unsigned long srr1;
+
+   srr1 = __power7_idle_type(type);
+   irq_set_pending_from_srr1(srr1);
 }
 
 void power7_idle(void)
@@ -337,7 +340,10 @@ static unsigned long __power9_idle_type(unsigned long 
stop_psscr_val,
 void power9_idle_type(unsigned long stop_psscr_val,
  unsigned long stop_psscr_mask)
 {
-   __power9_idle_type(stop_psscr_val, stop_psscr_mask);
+   unsigned long srr1;
+
+   srr1 = __power9_idle_type(stop_psscr_val, stop_psscr_mask);
+   irq_set_pending_from_srr1(srr1);
 }
 
 /*
-- 
2.11.0



[PATCH 02/13] powerpc/64s: idle hotplug lazy-irq simplification

2017-06-13 Thread Nicholas Piggin
Rather than concern ourselves with any soft-mask logic in the CPU
hotplug handler, just hard disable interrupts. This ensures there
are no lazy-irqs pending, which means we can call directly to idle
instruction in order to sleep.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/platforms/powernv/idle.c | 23 +++
 arch/powerpc/platforms/powernv/smp.c  | 29 ++---
 2 files changed, 29 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index f875879ff1eb..f188d84d9c59 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -352,25 +352,31 @@ void power9_idle(void)
 /*
  * pnv_cpu_offline: A function that puts the CPU into the deepest
  * available platform idle state on a CPU-Offline.
+ * interrupts hard disabled and no lazy irq pending.
  */
 unsigned long pnv_cpu_offline(unsigned int cpu)
 {
unsigned long srr1;
-
u32 idle_states = pnv_get_supported_cpuidle_states();
 
+   ppc64_runlatch_off();
+
if (cpu_has_feature(CPU_FTR_ARCH_300) && deepest_stop_found) {
-   srr1 = __power9_idle_type(pnv_deepest_stop_psscr_val,
-   pnv_deepest_stop_psscr_mask);
+   unsigned long psscr;
+
+   psscr = mfspr(SPRN_PSSCR);
+   psscr = (psscr & ~pnv_deepest_stop_psscr_mask) |
+   pnv_deepest_stop_psscr_val;
+   srr1 = power9_idle_stop(psscr);
+
} else if (idle_states & OPAL_PM_WINKLE_ENABLED) {
-   srr1 = __power7_idle_type(PNV_THREAD_WINKLE);
+   srr1 = power7_idle_insn(PNV_THREAD_WINKLE);
} else if ((idle_states & OPAL_PM_SLEEP_ENABLED) ||
   (idle_states & OPAL_PM_SLEEP_ENABLED_ER1)) {
-   srr1 = __power7_idle_type(PNV_THREAD_SLEEP);
+   srr1 = power7_idle_insn(PNV_THREAD_SLEEP);
} else if (idle_states & OPAL_PM_NAP_ENABLED) {
-   srr1 = __power7_idle_type(PNV_THREAD_NAP);
+   srr1 = power7_idle_insn(PNV_THREAD_NAP);
} else {
-   ppc64_runlatch_off();
/* This is the fallback method. We emulate snooze */
while (!generic_check_cpu_restart(cpu)) {
HMT_low();
@@ -378,9 +384,10 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
}
srr1 = 0;
HMT_medium();
-   ppc64_runlatch_on();
}
 
+   ppc64_runlatch_on();
+
return srr1;
 }
 #endif
diff --git a/arch/powerpc/platforms/powernv/smp.c 
b/arch/powerpc/platforms/powernv/smp.c
index f8752795decf..c04c87adad94 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -144,7 +144,14 @@ static void pnv_smp_cpu_kill_self(void)
unsigned long srr1, wmask;
 
/* Standard hot unplug procedure */
-   local_irq_disable();
+   /*
+* This hard disables local interurpts, ensuring we have no lazy
+* irqs pending.
+*/
+   WARN_ON(irqs_disabled());
+   hard_irq_disable();
+   WARN_ON(lazy_irq_pending());
+
idle_task_exit();
current->active_mm = NULL; /* for sanity */
cpu = smp_processor_id();
@@ -162,16 +169,6 @@ static void pnv_smp_cpu_kill_self(void)
 */
mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1);
 
-   /*
-* Hard-disable interrupts, and then clear irq_happened flags
-* that we can safely ignore while off-line, since they
-* are for things for which we do no processing when off-line
-* (or in the case of HMI, all the processing we need to do
-* is done in lower-level real-mode code).
-*/
-   hard_irq_disable();
-   local_paca->irq_happened &= ~(PACA_IRQ_DEC | PACA_IRQ_HMI);
-
while (!generic_check_cpu_restart(cpu)) {
/*
 * Clear IPI flag, since we don't handle IPIs while
@@ -184,6 +181,8 @@ static void pnv_smp_cpu_kill_self(void)
 
srr1 = pnv_cpu_offline(cpu);
 
+   WARN_ON(lazy_irq_pending());
+
/*
 * If the SRR1 value indicates that we woke up due to
 * an external interrupt, then clear the interrupt.
@@ -196,8 +195,7 @@ static void pnv_smp_cpu_kill_self(void)
 * contains 0.
 */
if (((srr1 & wmask) == SRR1_WAKEEE) ||
-   ((srr1 & wmask) == SRR1_WAKEHVI) ||
-   (local_paca->irq_happened & PACA_IRQ_EE)) {
+   ((srr1 & wmask) == SRR1_WAKEHVI)) {
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
if (xive_enabled())
xive_flush_interrupt();
@@ -209,14 +207,15 @@ static void pnv_smp_cpu_kill_self(void)
 

[PATCH 01/13] powerpc/64s: idle move soft interrupt mask logic into C code

2017-06-13 Thread Nicholas Piggin
This simplifies the asm and fixes irq-off tracing over sleep
instructions.

Also move powersave_nap check for POWER8 into C code, and move
PSSCR register value calculation for POWER9 into C.

Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/hw_irq.h|  3 ++
 arch/powerpc/include/asm/machdep.h   |  1 +
 arch/powerpc/include/asm/processor.h | 10 ++--
 arch/powerpc/kernel/idle_book3s.S| 82 ++--
 arch/powerpc/kernel/irq.c| 33 -
 arch/powerpc/platforms/powernv/idle.c| 71 ---
 arch/powerpc/platforms/powernv/smp.c |  2 -
 arch/powerpc/platforms/powernv/subcore.c |  3 +-
 drivers/cpuidle/cpuidle-powernv.c| 12 ++---
 9 files changed, 128 insertions(+), 89 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index eba60416536e..f06112cf8734 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -129,6 +129,9 @@ static inline bool arch_irq_disabled_regs(struct pt_regs 
*regs)
 }
 
 extern bool prep_irq_for_idle(void);
+extern bool prep_irq_for_idle_irqsoff(void);
+
+#define fini_irq_for_idle_irqsoff() trace_hardirqs_off();
 
 extern void force_external_irq_replay(void);
 
diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index f90b22c722e1..cd2fc1cc1cc7 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -226,6 +226,7 @@ struct machdep_calls {
 extern void e500_idle(void);
 extern void power4_idle(void);
 extern void power7_idle(void);
+extern void power9_idle(void);
 extern void ppc6xx_idle(void);
 extern void book3e_idle(void);
 
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index a2123f291ab0..c49165a7439c 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -481,11 +481,11 @@ extern unsigned long cpuidle_disable;
 enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
 
 extern int powersave_nap;  /* set if nap mode can be used in idle loop */
-extern unsigned long power7_nap(int check_irq);
-extern unsigned long power7_sleep(void);
-extern unsigned long power7_winkle(void);
-extern unsigned long power9_idle_stop(unsigned long stop_psscr_val,
- unsigned long stop_psscr_mask);
+extern unsigned long power7_idle_insn(unsigned long type); /* 
PNV_THREAD_NAP/etc*/
+extern void power7_idle_type(unsigned long type);
+extern unsigned long power9_idle_stop(unsigned long psscr_val);
+extern void power9_idle_type(unsigned long stop_psscr_val,
+ unsigned long stop_psscr_mask);
 
 extern void flush_instruction_cache(void);
 extern void hard_reset_now(void);
diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index 98a6d07ecb5c..35cf5bb7daed 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -109,13 +109,9 @@ core_idle_lock_held:
 /*
  * Pass requested state in r3:
  * r3 - PNV_THREAD_NAP/SLEEP/WINKLE in POWER8
- *- Requested STOP state in POWER9
+ *- Requested PSSCR value in POWER9
  *
- * To check IRQ_HAPPENED in r4
- * 0 - don't check
- * 1 - check
- *
- * Address to 'rfid' to in r5
+ * Address of idle handler to 'rfid' to in r4
  */
 pnv_powersave_common:
/* Use r3 to pass state nap/sleep/winkle */
@@ -131,30 +127,7 @@ pnv_powersave_common:
std r0,_LINK(r1)
std r0,_NIP(r1)
 
-   /* Hard disable interrupts */
-   mfmsr   r9
-   rldicl  r9,r9,48,1
-   rotldi  r9,r9,16
-   mtmsrd  r9,1/* hard-disable interrupts */
-
-   /* Check if something happened while soft-disabled */
-   lbz r0,PACAIRQHAPPENED(r13)
-   andi.   r0,r0,~PACA_IRQ_HARD_DIS@l
-   beq 1f
-   cmpwi   cr0,r4,0
-   beq 1f
-   addir1,r1,INT_FRAME_SIZE
-   ld  r0,16(r1)
-   li  r3,0/* Return 0 (no nap) */
-   mtlrr0
-   blr
-
-1: /* We mark irqs hard disabled as this is the state we'll
-* be in when returning and we need to tell arch_local_irq_restore()
-* about it
-*/
-   li  r0,PACA_IRQ_HARD_DIS
-   stb r0,PACAIRQHAPPENED(r13)
+   mfmsr   r9
 
/* We haven't lost state ... yet */
li  r0,0
@@ -163,8 +136,8 @@ pnv_powersave_common:
/* Continue saving state */
SAVE_GPR(2, r1)
SAVE_NVGPRS(r1)
-   mfcrr4
-   std r4,_CCR(r1)
+   mfcrr5
+   std r5,_CCR(r1)
std r9,_MSR(r1)
std r1,PACAR1(r13)
 
@@ -178,7 +151,7 @@ pnv_powersave_common:
li  r6, MSR_RI
andcr6, r9, r6
mtmsrd  r6, 1   /* clear RI before setting SRR0/1 */
-   mtspr   SPRN_SRR0, r5
+

[PATCH 00/13 v3] idle performance improvements

2017-06-13 Thread Nicholas Piggin
Since last time, I accounted for the various comments
in reviews, most importantly fixed the miscalculation
of SRR1 bit for the wakeup-interrupt. Verified it does
the right thing and replays the right wakeup interrupt
(e.g., decrementer) from __replay_interrupt stepping
through the instructions in the simulator.

I've found that performance testing is a little difficult
because the ping-pong test cases must sometimes get into
synchronization and run concurrently without sleeping.
Looking at context switch rates in vmstat, I run the
context-switch ping-poing test with snooze idle disabled
on a POWER8, and the numbers before and after this series
are:

 different thread   different core
vanilla  600K/s 470K/s
patched  780K/s 550K/s

(these are 2x what's reported by context_switch selftest
because each step switches to and from idle thread)

It's still not a perfect measurement because if there is
some unwanted concurrency happening then you can get
different amount of userspace work per context switch,
but there seems to be a decent speedup here.

Thanks,
Nick

Nicholas Piggin (13):
  powerpc/64s: idle move soft interrupt mask logic into C code
  powerpc/64s: idle hotplug lazy-irq simplification
  powerpc/64s: idle process interrupts from system reset wakeup
  powerpc/64s: msgclr when handling doorbell exceptions
  powerpc/64s: interrupt replay balance the return branch predictor
  powerpc/64s: idle branch to handler with virtual mode offset
  powerpc/64s: idle avoid SRR usage in idle sleep/wake paths
  powerpc/64s: idle hmi wakeup is unlikely
  powerpc/64s: cpuidle set polling before enabling irqs
  powerpc/64s: cpuidle read mostly for common globals
  powerpc/64s: cpuidle no memory barrier after break from idle
  powerpc/64: runlatch CTRL[RUN] set optimisation
  powerpc/64s: idle runlatch switch is done with MSR[EE]=0

 arch/powerpc/include/asm/dbell.h |  13 +++
 arch/powerpc/include/asm/exception-64s.h |  13 +++
 arch/powerpc/include/asm/hw_irq.h|   5 ++
 arch/powerpc/include/asm/machdep.h   |   1 +
 arch/powerpc/include/asm/ppc-opcode.h|   3 +
 arch/powerpc/include/asm/processor.h |  10 +--
 arch/powerpc/kernel/asm-offsets.c|   1 +
 arch/powerpc/kernel/exceptions-64s.S |  38 +++--
 arch/powerpc/kernel/idle_book3s.S| 135 +--
 arch/powerpc/kernel/irq.c|  62 +-
 arch/powerpc/kernel/process.c|  12 +--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |   8 +-
 arch/powerpc/platforms/powernv/idle.c|  88 ++--
 arch/powerpc/platforms/powernv/smp.c |  31 ---
 arch/powerpc/platforms/powernv/subcore.c |   3 +-
 drivers/cpuidle/cpuidle-powernv.c|  37 +
 drivers/cpuidle/cpuidle-pseries.c|  22 +++--
 17 files changed, 315 insertions(+), 167 deletions(-)

-- 
2.11.0



[PATCH 00/13 v3] idle performance improvements

2017-06-13 Thread Nicholas Piggin
Since last time, I accounted for the various comments
in reviews, most importantly fixed the miscalculation
of SRR1 bit for the wakeup-interrupt. Verified it does
the right thing and replays the right wakeup interrupt
(e.g., decrementer) from __replay_interrupt stepping
through the instructions in the simulator.

I've found that performance testing is a little difficult
because the ping-pong test cases must sometimes get into
synchronization and run concurrently without sleeping.
Looking at context switch rates in vmstat, I run the
context-switch ping-poing test with snooze idle disabled
on a POWER8, and the numbers before and after this series
are:

 different thread   different core
vanilla  600K/s 470K/s
patched  780K/s 550K/s

(these are 2x what's reported by context_switch selftest
because each step switches to and from idle thread)

It's still not a perfect measurement because if there is
some unwanted concurrency happening then you can get
different amount of userspace work per context switch,
but there seems to be a decent speedup here.

Thanks,
Nick

Nicholas Piggin (13):
  powerpc/64s: idle move soft interrupt mask logic into C code
  powerpc/64s: idle hotplug lazy-irq simplification
  powerpc/64s: idle process interrupts from system reset wakeup
  powerpc/64s: msgclr when handling doorbell exceptions
  powerpc/64s: interrupt replay balance the return branch predictor
  powerpc/64s: idle branch to handler with virtual mode offset
  powerpc/64s: idle avoid SRR usage in idle sleep/wake paths
  powerpc/64s: idle hmi wakeup is unlikely
  powerpc/64s: cpuidle set polling before enabling irqs
  powerpc/64s: cpuidle read mostly for common globals
  powerpc/64s: cpuidle no memory barrier after break from idle
  powerpc/64: runlatch CTRL[RUN] set optimisation
  powerpc/64s: idle runlatch switch is done with MSR[EE]=0

 arch/powerpc/include/asm/dbell.h |  13 +++
 arch/powerpc/include/asm/exception-64s.h |  13 +++
 arch/powerpc/include/asm/hw_irq.h|   5 ++
 arch/powerpc/include/asm/machdep.h   |   1 +
 arch/powerpc/include/asm/ppc-opcode.h|   3 +
 arch/powerpc/include/asm/processor.h |  10 +--
 arch/powerpc/kernel/asm-offsets.c|   1 +
 arch/powerpc/kernel/exceptions-64s.S |  38 +++--
 arch/powerpc/kernel/idle_book3s.S| 135 +--
 arch/powerpc/kernel/irq.c|  62 +-
 arch/powerpc/kernel/process.c|  12 +--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |   8 +-
 arch/powerpc/platforms/powernv/idle.c|  88 ++--
 arch/powerpc/platforms/powernv/smp.c |  31 ---
 arch/powerpc/platforms/powernv/subcore.c |   3 +-
 drivers/cpuidle/cpuidle-powernv.c|  37 +
 drivers/cpuidle/cpuidle-pseries.c|  22 +++--
 17 files changed, 315 insertions(+), 167 deletions(-)

-- 
2.11.0



Re: [PATCH 12/14] powerpc/64s: cpuidle no memory barrier after break from idle

2017-06-13 Thread Nicholas Piggin
On Mon, 12 Jun 2017 23:18:44 +0530
Vaidyanathan Srinivasan  wrote:

> * Nicholas Piggin  [2017-06-12 09:58:33]:
> 
> > A memory barrier is not required after the task wakes up,
> > only if we clear the polling flag before waking. The case
> > where we have work to do is the important one, so optimise
> > for it.
> > 
> > Signed-off-by: Nicholas Piggin   
> 
> Reviewed-by: Vaidyanathan Srinivasan 
> 
> 
> > ---
> >  drivers/cpuidle/cpuidle-powernv.c | 11 +--
> >  drivers/cpuidle/cpuidle-pseries.c | 11 +--
> >  2 files changed, 18 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/cpuidle/cpuidle-powernv.c 
> > b/drivers/cpuidle/cpuidle-powernv.c
> > index 9d03326ac05e..37b0698b7193 100644
> > --- a/drivers/cpuidle/cpuidle-powernv.c
> > +++ b/drivers/cpuidle/cpuidle-powernv.c
> > @@ -59,14 +59,21 @@ static int snooze_loop(struct cpuidle_device *dev,
> > ppc64_runlatch_off();
> > HMT_very_low();
> > while (!need_resched()) {
> > -   if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time)
> > +   if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
> > +   /*
> > +* Task has not woken up but we are exiting the polling
> > +* loop anyway. Require a barrier after polling is
> > +* cleared to order subsequent test of need_resched().
> > +*/
> > +   clear_thread_flag(TIF_POLLING_NRFLAG);
> > +   smp_mb();
> > break;
> > +   }
> > }
> > 
> > HMT_medium();
> > ppc64_runlatch_on();
> > clear_thread_flag(TIF_POLLING_NRFLAG);
> > -   smp_mb();  
> 
> If we reach here without executing if(snooze_timeout) with the
> barrier+break, that means we have seen the need_resched flag and hence
> we can avoid the barrier.  Clearing of the polling flag can be seen
> (take affect) later as we will exit the idle loop and re-enter anyway
> at this point.  The caller also will check for need_resched and exit
> the idle loop.
> 
> Actually do_idle() has a __current_set_polling() and
> __current_clr_polling() in the default idle loop.  Do we really need
> to set/clear the TIF_POLLING_NRFLAG again in cpuidle driver?

We do because cpuidle drivers are entered with POLLING clear, don't
they?

It would be nice to have the entire scheduler idle including cpuidle
entry run with POLLING set. It would then be up to interrupt-based
cpuidle drivers to clear POLLING before sleeping. I think that should
remove... maybe 4 atomic ops from the most performance-critical idle
states (the polling ones).

That's a bigger change though and will touch many archs and core code.

Thanks,
Nick


Re: [PATCH 01/44] firmware/ivc: use dma_mapping_error

2017-06-13 Thread Thierry Reding
On Thu, Jun 08, 2017 at 03:25:26PM +0200, Christoph Hellwig wrote:
> DMA_ERROR_CODE is not supposed to be used by drivers.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/firmware/tegra/ivc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Acked-by: Thierry Reding 


signature.asc
Description: PGP signature


Re: [PATCH v2] perf: libdw support for powerpc [ping]

2017-06-13 Thread Mark Wielaard
Hi Ravi,

On Mon, 2017-06-12 at 17:28 +0530, Ravi Bangoria wrote:
> So, I tested this patch along with Mark's patch[1] on elfutils an looks
> like it's not working. Steps on what I did:
> 
> After applying Mark's patch on upstream elfutils:
> 
>   $ aclocal
>   $ autoheader
>   $ autoconf
>   $ automake --add-missing
>   $ ./configure
>   $ make
>   $ make install DESTDIR=/home/ravi/elfutils-git
> 
> After applying your patch on upstream perf:
> 
>   $ make
>   $ ./perf record --call-graph=dwarf ls
>   $ LD_LIBRARY_PATH=/home/ravi/elfutils-git/usr/local/lib:\
> /home/ravi/elfutils-git/usr/local/lib/elfutils/:$LD_LIBRARY_PATH \
> ./perf script
> 
> ls 44159  1800.878468: 191408 cycles:u:
> 
> ls 44159  1800.878673: 419356 cycles:u:
>8a97c hpte_need_flush 
> (/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
>835f4 flush_hash_page 
> (/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
>8acec hpte_need_flush 
> (/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
>   3468f4 ptep_clear_flush 
> (/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
>   328b10 wp_page_copy 
> (/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
>   32ebe4 do_wp_page 
> (/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
>   33434c __handle_mm_fault 
> (/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
>   335040 handle_mm_fault 
> (/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
>7bf94 do_page_fault 
> (/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
>1a4f8 handle_page_fault 
> (/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux)
> 
> ls 44159  1800.878961: 430876 cycles:u:
> 
> ls 44159  1800.879195: 423785 cycles:u:
> 
> ls 44159  1800.879360: 427359 cycles:u:
> 
> Here I don't see userspace callchain getting unwound. Please let me know
> if I'm doing anything wrong.

I see the same on very short runs. But when doing a slightly longer run,
even just using ls -lahR, which does some more work, then I do see user
backtraces. They are still missing for some of the early samples though.
It is as if there is a stack/memory address mismatch when the probe is
"too early" in ld.so.

Could you do a test run on some program that does some more work to see
if you never get any user stack traces, or if you only not get them for
some specific probes?

Thanks,

Mark


Re: [PATCH 13/14] powerpc/64: runlatch CTRL[RUN] set optimisation

2017-06-13 Thread Nicholas Piggin
On Tue, 13 Jun 2017 20:04:27 +1000
Michael Ellerman  wrote:

> Vaidyanathan Srinivasan  writes:
> > * Nicholas Piggin  [2017-06-12 09:58:34]:
> >  
> >> The CTRL register is read-only except bit 63 which is the run latch
> >> control. This means it can be updated with a mtspr rather than
> >> mfspr/mtspr.
> >> 
> >> Signed-off-by: Nicholas Piggin   
> >  
> >> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> >> index baae104b16c7..a44ea034c226 100644
> >> --- a/arch/powerpc/kernel/process.c
> >> +++ b/arch/powerpc/kernel/process.c
> >> @@ -1973,13 +1969,9 @@ void notrace __ppc64_runlatch_on(void)
> >>  void notrace __ppc64_runlatch_off(void)
> >>  {
> >>struct thread_info *ti = current_thread_info();
> >> -  unsigned long ctrl;
> >> 
> >>ti->local_flags &= ~_TLF_RUNLATCH;
> >> -
> >> -  ctrl = mfspr(SPRN_CTRLF);
> >> -  ctrl &= ~CTRL_RUNLATCH;
> >> -  mtspr(SPRN_CTRLT, ctrl);
> >> +  mtspr(SPRN_CTRLT, 0);  
> >
> > Good idea.  Writing to CTRL register can change only the RUN field.
> > Was this any different in older generations?  
> 
> No AFAICS back to 2.02.
> 
> > Anton and Ben kept the mfspr/mtspr part in earlier updates to this
> > routine.  
> 
> Doing the read/modify write is forward compatible vs a new writable
> field, whereas writing the whole register with a known value is not.

Can we call that an incompatible arch change and not worry about
it? ISA says we may expect TS (read-only) field to expand, but I
guess they could shoehorn something else in there.


Re: [PATCH 06/14] powerpc/64s: interrupt replay balance the return branch predictor

2017-06-13 Thread Nicholas Piggin
On Tue, 13 Jun 2017 19:51:19 +1000
Michael Ellerman  wrote:

> Nicholas Piggin  writes:
> 
> > The __replay_interrupt code is branched to with bl, but the caller is
> > returned to directly with rfid from the interrupt.
> >
> > Instead return to a return stub that returns to the caller with blr,
> > which should do better with the return predictor.
> >
> > Signed-off-by: Nicholas Piggin 
> > ---
> >  arch/powerpc/kernel/exceptions-64s.S | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> > b/arch/powerpc/kernel/exceptions-64s.S
> > index a04ee0d7f88e..d55201625ea3 100644
> > --- a/arch/powerpc/kernel/exceptions-64s.S
> > +++ b/arch/powerpc/kernel/exceptions-64s.S
> > @@ -1586,7 +1586,7 @@ _GLOBAL(__replay_interrupt)
> >  * we don't give a damn about, so we don't bother storing them.
> >  */
> > mfmsr   r12
> > -   mflrr11
> > +   LOAD_REG_ADDR(r11, __replay_interrupt_return)  
> 
> Can you make it a local label, to make it clear nothing outside the file
> returns to there, and to not clutter the symbol map?

I can do that. Interrupt returns will now get significantly
attributed to this guy in profiles (and you can see it on some
sleep/wake workloads). __replay_interrupt is probably better
than arch_local_irq_restore, I guess.

Thanks,
Nick


Re: [PATCH] powerpc/configs: fix default values for NF_CT_PROTO_*

2017-06-13 Thread Michael Ellerman
Davide Caratti  writes:

> NF_CT_PROTO_{SCTP,UDPLITE,DCCP} can't be set to 'm' anymore, since they
> have been redefined as 'bool': fix defconfig for linkstation, mvme5100 and
> ppc6xx platforms accordingly.

Since when? ie. which commit changed the symbols to bool from tristate?

cheers


Re: [PATCH 08/14] powerpc/64s: idle avoid SRR usage in idle sleep/wake paths

2017-06-13 Thread Nicholas Piggin
On Tue, 13 Jun 2017 15:55:53 +0530
Gautham R Shenoy  wrote:

> Hi Nick,
> 
> On Mon, Jun 12, 2017 at 09:58:29AM +1000, Nicholas Piggin wrote:
> > Idle code now always runs at the 0xc... effective address whether
> > in real or virtual mode. This means rfid can be ditched, along
> > with a lot of SRR manipulations.
> > 
> > In the wakeup path, carry SRR1 around in r12. Use mtmsrd to change
> > MSR states as required.
> > 
> > This also balances the return prediction for the idle call, by
> > doing blr rather than rfid to return to the idle caller.
> > 
> > On POWER9, 2-process context switch on different cores, with snooze
> > disabled, increases performance by 2%.
> > ---
> >  arch/powerpc/kernel/exceptions-64s.S|  1 +
> >  arch/powerpc/kernel/idle_book3s.S   | 57 
> > +++--
> >  arch/powerpc/kvm/book3s_hv_rmhandlers.S |  8 -
> >  3 files changed, 33 insertions(+), 33 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> > b/arch/powerpc/kernel/exceptions-64s.S
> > index fec7c933d095..c3d0aef089a7 100644
> > --- a/arch/powerpc/kernel/idle_book3s.S
> > +++ b/arch/powerpc/kernel/idle_book3s.S
> > @@ -148,12 +147,8 @@ pnv_powersave_common:
> >  * the MMU context to the guest.
> >  */
> > LOAD_REG_IMMEDIATE(r7, MSR_IDLE)
> > -   li  r6, MSR_RI
> > -   andcr6, r9, r6
> > -   mtmsrd  r6, 1   /* clear RI before setting SRR0/1 */
> > -   mtspr   SPRN_SRR0, r4
> > -   mtspr   SPRN_SRR1, r7
> > -   rfid
> > +   mtmsrd  r7,0
> > +   bctr  
> 
> So at this point we need to transition from virtual to real mode as
> the comment in pnv_enter_arch207_idle_mode expects us to. Which is
> being performed by mtmsrd here. Then we jump to the function via
> bctr. So, in this patch we are using two instructions to modify the
> MSR and the PC, while earlier the rfid would update these atomically.
> 
> Does forgoing atomicity have any risk? I am asking this because
> historically we have modified IR/DR bits in the MSR via rfid
> mechanism.

I believe it's not a problem. Actually the ISA has a note about using
it in this way, p.1181 of ISA v3.0B, the programming note suggests
using mtmsrd rather than rfid to enable IR.

I've tested on POWER8 and 9 and not had any problems with it. Interesting
question though.

One thing I wonder about is that ERAT installs real mode entries, so
using mtmsrd to transition from real to virtual mode will require 2
I-ERAT entries for these nearby instruction addresses. Then if you did
a branch to a distant address, that would require another I-ERAT. On
the other hand if you do an rfid to switch MSR and branch to distant
address at the same time, it should only require 2 I-ERAT entries. So
you may see better microbenchmark performance of the first case, but
the latter may end up being slower on real work.

I've decided that probably the kernel is compact enough that we aren't
likely to cause more ERAT footprint by doing this. It would be
interesting to do a proper analysis of this, but I haven't got around
to it yet.

Thanks,
Nick


Re: RESEND Re: [Patch 2/2]: powerpc/hotplug/mm: Fix hot-add memory node assoc

2017-06-13 Thread Michael Ellerman
Michael Bringmann  writes:

> Here is the information from 2 different kernels.  I have not been able to 
> retrieve
> the information matching yesterday's attachments, yet, as those dumps were
> acquired in April.
>  
> Attached please find 2 dumps of similar material from kernels running with my
> current patches (Linux 4.4, Linux 4.12).

OK thanks.

I'd actually like to see the dmesg output from a kernel *without* your
patches.

Looking at the device tree properties:

ltcalpine2-lp9:/proc/device-tree/ibm,dynamic-reconfiguration-memory # lsprop 
ibm,associativity-lookup-arrays
ibm,associativity-lookup-arrays
 0004 = 4 arrays
 0004 = of 4 entries each
    
   0001 0001
  0003 0006 0006
  0003 0007 0007


Which does tell us that nodes 0, 1, 6 and 7 exist.

So your idea of looking at that and setting any node found in there
online should work.

My only worry is that behaviour appears to be completely undocumented in
PAPR, ie. PAPR explicitly says that property only needs to contain
values for LMBs present at boot.

But possibly we can talk to the PowerVM/PAPR guys and have that changed
so that it becomes something we can rely on.

cheers


Re: [PATCH 08/14] powerpc/64s: idle avoid SRR usage in idle sleep/wake paths

2017-06-13 Thread Gautham R Shenoy
Hi Nick,

On Mon, Jun 12, 2017 at 09:58:29AM +1000, Nicholas Piggin wrote:
> Idle code now always runs at the 0xc... effective address whether
> in real or virtual mode. This means rfid can be ditched, along
> with a lot of SRR manipulations.
> 
> In the wakeup path, carry SRR1 around in r12. Use mtmsrd to change
> MSR states as required.
> 
> This also balances the return prediction for the idle call, by
> doing blr rather than rfid to return to the idle caller.
> 
> On POWER9, 2-process context switch on different cores, with snooze
> disabled, increases performance by 2%.
> ---
>  arch/powerpc/kernel/exceptions-64s.S|  1 +
>  arch/powerpc/kernel/idle_book3s.S   | 57 
> +++--
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |  8 -
>  3 files changed, 33 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index fec7c933d095..c3d0aef089a7 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -148,12 +147,8 @@ pnv_powersave_common:
>* the MMU context to the guest.
>*/
>   LOAD_REG_IMMEDIATE(r7, MSR_IDLE)
> - li  r6, MSR_RI
> - andcr6, r9, r6
> - mtmsrd  r6, 1   /* clear RI before setting SRR0/1 */
> - mtspr   SPRN_SRR0, r4
> - mtspr   SPRN_SRR1, r7
> - rfid
> + mtmsrd  r7,0
> + bctr

So at this point we need to transition from virtual to real mode as
the comment in pnv_enter_arch207_idle_mode expects us to. Which is
being performed by mtmsrd here. Then we jump to the function via
bctr. So, in this patch we are using two instructions to modify the
MSR and the PC, while earlier the rfid would update these atomically.

Does forgoing atomicity have any risk? I am asking this because
historically we have modified IR/DR bits in the MSR via rfid
mechanism.
--
Thanks and Regards
gautham.



Re: [PATCH] powerpc: dts: use #include "..." to include local DT

2017-06-13 Thread Michael Ellerman
Masahiro Yamada  writes:

> Hi
>
> (+Anatolij Gustschin )
>
>
> Ping.
> I am not 100% sure who is responsible for this,
> but somebody, could take a look at this patch, please?

Have you tested it actually works?

It sounds reasonable, and if it behaves as you describe there is no
change in behaviour, right?

cheers

> 2017-05-24 14:12 GMT+09:00 Masahiro Yamada :
>> Most of DT files in PowerPC use #include "..." to make pre-processor
>> include DT in the same directory, but we have 3 exceptional files
>> that use #include <...> for that.
>>
>> Fix them to remove -I$(srctree)/arch/$(SRCARCH)/boot/dts path from
>> dtc_cpp_flags.
>>
>> Signed-off-by: Masahiro Yamada 
>> ---
>>
>>  arch/powerpc/boot/dts/ac14xx.dts | 2 +-
>>  arch/powerpc/boot/dts/mpc5121ads.dts | 2 +-
>>  arch/powerpc/boot/dts/pdm360ng.dts   | 2 +-
>>  3 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/boot/dts/ac14xx.dts 
>> b/arch/powerpc/boot/dts/ac14xx.dts
>> index 27fcabc2f857..83bcfd865167 100644
>> --- a/arch/powerpc/boot/dts/ac14xx.dts
>> +++ b/arch/powerpc/boot/dts/ac14xx.dts
>> @@ -10,7 +10,7 @@
>>   */
>>
>>
>> -#include 
>> +#include "mpc5121.dtsi"
>>
>>  / {
>> model = "ac14xx";
>> diff --git a/arch/powerpc/boot/dts/mpc5121ads.dts 
>> b/arch/powerpc/boot/dts/mpc5121ads.dts
>> index 75888ce2c792..73c30621429b 100644
>> --- a/arch/powerpc/boot/dts/mpc5121ads.dts
>> +++ b/arch/powerpc/boot/dts/mpc5121ads.dts
>> @@ -9,7 +9,7 @@
>>   * option) any later version.
>>   */
>>
>> -#include 
>> +#include "mpc5121.dtsi"
>>
>>  / {
>> model = "mpc5121ads";
>> diff --git a/arch/powerpc/boot/dts/pdm360ng.dts 
>> b/arch/powerpc/boot/dts/pdm360ng.dts
>> index 0cec7244abe7..445b88114009 100644
>> --- a/arch/powerpc/boot/dts/pdm360ng.dts
>> +++ b/arch/powerpc/boot/dts/pdm360ng.dts
>> @@ -13,7 +13,7 @@
>>   * option) any later version.
>>   */
>>
>> -#include 
>> +#include "mpc5121.dtsi"
>>
>>  / {
>> model = "pdm360ng";
>> --
>> 2.7.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe devicetree" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> -- 
> Best Regards
> Masahiro Yamada


Re: [PATCH v2 0/6] Appended signatures support for IMA appraisal

2017-06-13 Thread Michael Ellerman
Thiago Jung Bauermann  writes:

> Michael Ellerman  writes:
>
>> Thiago Jung Bauermann  writes:
>>
>>> On the OpenPOWER platform, secure boot and trusted boot are being
>>> implemented using IMA for taking measurements and verifying signatures.
>>
>> I still want you to implement arch_kexec_kernel_verify_sig() as well :)
>
> Yes, I will implement it! We are still working on loading the public
> keys for kernel signing from the firmware into a kernel keyring, so
> there's not much point in implementing arch_kexec_kernel_verify_sig
> without having that first.

OK. What's the ETA on those patches?

cheers


Re: [PATCH 11/16] powerpc: vio_cmo: use dev_groups and not dev_attrs for bus_type

2017-06-13 Thread Michael Ellerman
Greg Kroah-Hartman  writes:

> On Fri, Jun 09, 2017 at 09:23:10PM +1000, Michael Ellerman wrote:
>> Greg Kroah-Hartman  writes:
>> 
>> > On Fri, Jun 09, 2017 at 08:53:22AM +1000, Michael Ellerman wrote:
>> >> Greg Kroah-Hartman  writes:
>> >> 
>> >> > On Thu, Jun 08, 2017 at 11:12:10PM +1000, Michael Ellerman wrote:
>> >> >> Greg Kroah-Hartman  writes:
>> >> >> 
>> >> >> > The dev_attrs field has long been "depreciated" and is finally being
>> >> >> > removed, so move the driver to use the "correct" dev_groups field
>> >> >> > instead for struct bus_type.
>> >> >> >
>> >> >> > Cc: Benjamin Herrenschmidt 
>> >> >> > Cc: Paul Mackerras 
>> >> >> > Cc: Michael Ellerman 
>> >> >> > Cc: Vineet Gupta 
>> >> >> > Cc: Bart Van Assche 
>> >> >> > Cc: Robin Murphy 
>> >> >> > Cc: Joerg Roedel 
>> >> >> > Cc: Johan Hovold 
>> >> >> > Cc: Alexey Kardashevskiy 
>> >> >> > Cc: Krzysztof Kozlowski 
>> >> >> > Cc: 
>> >> >> > Signed-off-by: Greg Kroah-Hartman 
>> >> >> > ---
>> >> >> >  arch/powerpc/platforms/pseries/vio.c | 37 
>> >> >> > +---
>> >> >> >  1 file changed, 22 insertions(+), 15 deletions(-)
>> >> >> 
>> >> >> This one needed a bit more work to get building, the incremental diff 
>> >> >> is
>> >> >> below. We need a forward declaration of name, devspec and modalias,
>> >> >> which is a bit weird, but that's how the code is currently structured.
>> >> >> And there's dev and bus attributes with the same name, so that needed 
>> >> >> an
>> >> >> added "bus".
>> >> >> 
>> >> >> I booted v2 of patch 10 and this one and everything looks identical to
>> >> >> upstream.
>> >> >
>> >> > Ah, many thanks, this was on my todo list to fix up today.
>> >> >
>> >> > But you renamed the sysfs files when you added "bus" to the function
>> >> > names, are you sure you want to do that?  I don't mind, but if you
>> >> > happen to have userspace tools that look at those files, they just broke
>> >> > :(
>> >> 
>> >> Ugh crap, no that won't work.
>> >> 
>> >> I didn't see it when I tested because my machine doesn't have the CMO
>> >> feature enabled.
>> >> 
>> >> I guess we have to open code some of the BUS_ATTR_RO() etc. so we can
>> >> avoid the name clash.
>> >
>> > Or split it into multiple files, I've solved this that way in the past.
>> > You shouldn't have to "open code" BUS_ATTR_RO().
>> 
>> It just requires one use of __ATTR(), which seems simpler than splitting
>> the file in two.
>> 
>> Here's a new incremental diff against your patch.
>> 
>> I confirmed none of the cmo names changed, result after is:
>> 
>> ./devices/vio/cmo_desired
>> ./devices/vio/cmo_allocated
>> ./devices/vio/cmo_entitled
>> ./devices/vio/cmo_allocs_failed
>> ./devices/vio/7100/cmo_desired
>> ./devices/vio/7100/cmo_allocated
>> ./devices/vio/7100/cmo_entitled
>> ./devices/vio/7100/cmo_allocs_failed
>> ./devices/vio/3000/cmo_desired
>> ./devices/vio/3000/cmo_allocated
>> ./devices/vio/3000/cmo_entitled
>> ./devices/vio/3000/cmo_allocs_failed
>> ./devices/vio/2000/cmo_desired
>> ./devices/vio/2000/cmo_allocated
>> ./devices/vio/2000/cmo_entitled
>> ./devices/vio/2000/cmo_allocs_failed
>> ./bus/vio/cmo_high
>> ./bus/vio/cmo_spare
>> ./bus/vio/cmo_reserve_size
>> ./bus/vio/cmo_desired
>> ./bus/vio/cmo_entitled
>> ./bus/vio/cmo_excess_free
>> ./bus/vio/cmo_excess_size
>> ./bus/vio/cmo_min
>> ./bus/vio/cmo_curr
>
> Thanks for this, it seems to have passed all of the 0-day testing.  I'll
> go apply it to my "real" tree now, thanks again for the help.

No worries. It'll get some more build & boot testing from my CI once it's
in linux-next.

cheers


Re: [PATCH 13/14] powerpc/64: runlatch CTRL[RUN] set optimisation

2017-06-13 Thread Michael Ellerman
Vaidyanathan Srinivasan  writes:
> * Nicholas Piggin  [2017-06-12 09:58:34]:
>
>> The CTRL register is read-only except bit 63 which is the run latch
>> control. This means it can be updated with a mtspr rather than
>> mfspr/mtspr.
>> 
>> Signed-off-by: Nicholas Piggin 
>
>> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
>> index baae104b16c7..a44ea034c226 100644
>> --- a/arch/powerpc/kernel/process.c
>> +++ b/arch/powerpc/kernel/process.c
>> @@ -1973,13 +1969,9 @@ void notrace __ppc64_runlatch_on(void)
>>  void notrace __ppc64_runlatch_off(void)
>>  {
>>  struct thread_info *ti = current_thread_info();
>> -unsigned long ctrl;
>> 
>>  ti->local_flags &= ~_TLF_RUNLATCH;
>> -
>> -ctrl = mfspr(SPRN_CTRLF);
>> -ctrl &= ~CTRL_RUNLATCH;
>> -mtspr(SPRN_CTRLT, ctrl);
>> +mtspr(SPRN_CTRLT, 0);
>
> Good idea.  Writing to CTRL register can change only the RUN field.
> Was this any different in older generations?

No AFAICS back to 2.02.

> Anton and Ben kept the mfspr/mtspr part in earlier updates to this
> routine.

Doing the read/modify write is forward compatible vs a new writable
field, whereas writing the whole register with a known value is not.

cheers


Re: [PATCH 06/14] powerpc/64s: interrupt replay balance the return branch predictor

2017-06-13 Thread Michael Ellerman
Nicholas Piggin  writes:

> The __replay_interrupt code is branched to with bl, but the caller is
> returned to directly with rfid from the interrupt.
>
> Instead return to a return stub that returns to the caller with blr,
> which should do better with the return predictor.
>
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/kernel/exceptions-64s.S | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index a04ee0d7f88e..d55201625ea3 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1586,7 +1586,7 @@ _GLOBAL(__replay_interrupt)
>* we don't give a damn about, so we don't bother storing them.
>*/
>   mfmsr   r12
> - mflrr11
> + LOAD_REG_ADDR(r11, __replay_interrupt_return)

Can you make it a local label, to make it clear nothing outside the file
returns to there, and to not clutter the symbol map?

cheers


Re: [kbuild-all] [PATCH] include/linux/vfio.h: Guard powerpc-specific functions with CONFIG_VFIO_SPAPR_EEH

2017-06-13 Thread Ye Xiaolong
On 06/08, Alexey Kardashevskiy wrote:
>On 08/06/17 15:35, Alexey Kardashevskiy wrote:
>> Hi,
>> 
>> How did you manage to have CONFIG_EEH=y and CONFIG_VFIO_SPAPR_EEH=n? "make
>> oldconfig" fixes this to CONFIG_VFIO_SPAPR_EEH=y.
>
>
>Also, the attached config has "CONFIG_VFIO_SPAPR_EEH=m" and cannot produce
>the error below, what am I missing here?

Sorry for the late, I can reproduce below error by following below steps with
attached config in original report:

   wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
   chmod +x ~/bin/make.cross
   # save the attached .config to linux build tree
   make.cross ARCH=powerpc 

What's your steps?

Thanks,
Xiaolong
>
>
>
>> 
>> 
>> 
>> On 08/06/17 02:31, kbuild test robot wrote:
>>> Hi Murilo,
>>>
>>> [auto build test ERROR on linus/master]
>>> [also build test ERROR on v4.12-rc4 next-20170607]
>>> [if your patch is applied to the wrong git tree, please drop us a note to 
>>> help improve the system]
>>>
>>> url:
>>> https://github.com/0day-ci/linux/commits/Murilo-Opsfelder-Araujo/include-linux-vfio-h-Guard-powerpc-specific-functions-with-CONFIG_VFIO_SPAPR_EEH/20170607-000643
>>> config: powerpc-allmodconfig (attached as .config)
>>> compiler: powerpc64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
>>> reproduce:
>>> wget 
>>> https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
>>> ~/bin/make.cross
>>> chmod +x ~/bin/make.cross
>>> # save the attached .config to linux build tree
>>> make.cross ARCH=powerpc 
>>>
>>> All errors (new ones prefixed by >>):
>>>
> drivers/vfio/vfio_spapr_eeh.c:22:6: error: redefinition of 
> 'vfio_spapr_pci_eeh_open'
>>> void vfio_spapr_pci_eeh_open(struct pci_dev *pdev)
>>>  ^~~
>>>In file included from drivers/vfio/vfio_spapr_eeh.c:14:0:
>>>include/linux/vfio.h:160:20: note: previous definition of 
>>> 'vfio_spapr_pci_eeh_open' was here
>>> static inline void vfio_spapr_pci_eeh_open(struct pci_dev *pdev)
>>>^~~
> drivers/vfio/vfio_spapr_eeh.c:28:6: error: redefinition of 
> 'vfio_spapr_pci_eeh_release'
>>> void vfio_spapr_pci_eeh_release(struct pci_dev *pdev)
>>>  ^~
>>>In file included from drivers/vfio/vfio_spapr_eeh.c:14:0:
>>>include/linux/vfio.h:164:20: note: previous definition of 
>>> 'vfio_spapr_pci_eeh_release' was here
>>> static inline void vfio_spapr_pci_eeh_release(struct pci_dev *pdev)
>>>^~
> drivers/vfio/vfio_spapr_eeh.c:34:6: error: redefinition of 
> 'vfio_spapr_iommu_eeh_ioctl'
>>> long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
>>>  ^~
>>>In file included from drivers/vfio/vfio_spapr_eeh.c:14:0:
>>>include/linux/vfio.h:168:20: note: previous definition of 
>>> 'vfio_spapr_iommu_eeh_ioctl' was here
>>> static inline long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
>>>^~
>>>
>>> vim +/vfio_spapr_pci_eeh_open +22 drivers/vfio/vfio_spapr_eeh.c
>>>
>>> 1b69be5e Gavin Shan   2014-06-10  16  
>>> 89a2edd6 Alexey Kardashevskiy 2014-08-08  17  #define DRIVER_VERSION
>>> "0.1"
>>> 89a2edd6 Alexey Kardashevskiy 2014-08-08  18  #define DRIVER_AUTHOR "Gavin 
>>> Shan, IBM Corporation"
>>> 89a2edd6 Alexey Kardashevskiy 2014-08-08  19  #define DRIVER_DESC   "VFIO 
>>> IOMMU SPAPR EEH"
>>> 89a2edd6 Alexey Kardashevskiy 2014-08-08  20  
>>> 1b69be5e Gavin Shan   2014-06-10  21  /* We might build address 
>>> mapping here for "fast" path later */
>>> 9b936c96 Alexey Kardashevskiy 2014-08-08 @22  void 
>>> vfio_spapr_pci_eeh_open(struct pci_dev *pdev)
>>> 1b69be5e Gavin Shan   2014-06-10  23  {
>>> 9b936c96 Alexey Kardashevskiy 2014-08-08  24eeh_dev_open(pdev);
>>> 1b69be5e Gavin Shan   2014-06-10  25  }
>>> 92d18a68 Gavin Shan   2014-08-08  26  
>>> EXPORT_SYMBOL_GPL(vfio_spapr_pci_eeh_open);
>>> 1b69be5e Gavin Shan   2014-06-10  27  
>>> 1b69be5e Gavin Shan   2014-06-10 @28  void 
>>> vfio_spapr_pci_eeh_release(struct pci_dev *pdev)
>>> 1b69be5e Gavin Shan   2014-06-10  29  {
>>> 1b69be5e Gavin Shan   2014-06-10  30eeh_dev_release(pdev);
>>> 1b69be5e Gavin Shan   2014-06-10  31  }
>>> 92d18a68 Gavin Shan   2014-08-08  32  
>>> EXPORT_SYMBOL_GPL(vfio_spapr_pci_eeh_release);
>>> 1b69be5e Gavin Shan   2014-06-10  33  
>>> 1b69be5e Gavin Shan   2014-06-10 @34  long 
>>> vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
>>> 1b69be5e Gavin Shan   2014-06-10  35
>>> unsigned int cmd, unsigned long arg)
>>> 1b69be5e Gavin Shan   2014-06-10  36  {
>>> 1b69be5e Gavin Shan   2014-06-10  37struct eeh_pe *pe;
>>>
>>> :: T

[PATCH v3] net: phy: Make phy_ethtool_ksettings_get return void

2017-06-13 Thread Yuval Shaia
Make return value void since function never return meaningfull value

Signed-off-by: Yuval Shaia 
Acked-by: Sergei Shtylyov 
---
v0 ->v1:
* These files were missing in v0
* drivers/net/ethernet/renesas/ravb_main.c
* drivers/net/ethernet/renesas/sh_eth.c
* drivers/net/ethernet/ti/netcp_ethss.c
* Add Acked-by: Sergei Shtylyov

v1 -> v2:
* Adjust to net-next tree

v2 -> v3:
* These files were missing in v1
* drivers/net/ethernet/apm/xgene-v2/ethtool.c
* drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
* drivers/net/ethernet/broadcom/b44.c
* drivers/net/ethernet/broadcom/bcm63xx_enet.c
* drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
* drivers/net/ethernet/mediatek/mtk_eth_soc.c
* drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
* drivers/net/ethernet/ti/cpsw.c
* drivers/staging/netlogic/xlr_net.c
---
 drivers/net/ethernet/apm/xgene-v2/ethtool.c  |  4 +++-
 drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c  |  8 ++--
 drivers/net/ethernet/broadcom/b44.c  |  4 +++-
 drivers/net/ethernet/broadcom/bcm63xx_enet.c |  5 -
 drivers/net/ethernet/broadcom/genet/bcmgenet.c   |  4 +++-
 drivers/net/ethernet/broadcom/tg3.c  |  4 +++-
 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c   |  6 ++
 drivers/net/ethernet/freescale/ucc_geth_ethtool.c|  4 +++-
 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c |  2 +-
 drivers/net/ethernet/marvell/mv643xx_eth.c   |  5 ++---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c  |  4 +++-
 drivers/net/ethernet/renesas/ravb_main.c | 14 +++---
 drivers/net/ethernet/renesas/sh_eth.c|  5 ++---
 drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c |  5 ++---
 drivers/net/ethernet/ti/cpsw.c   |  8 
 drivers/net/ethernet/ti/netcp_ethss.c|  8 +++-
 drivers/net/phy/phy.c| 10 +-
 drivers/net/usb/lan78xx.c|  2 +-
 drivers/staging/netlogic/xlr_net.c   |  5 -
 include/linux/phy.h  |  4 ++--
 net/dsa/slave.c  |  9 +
 21 files changed, 68 insertions(+), 52 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene-v2/ethtool.c 
b/drivers/net/ethernet/apm/xgene-v2/ethtool.c
index be4..d31ad82 100644
--- a/drivers/net/ethernet/apm/xgene-v2/ethtool.c
+++ b/drivers/net/ethernet/apm/xgene-v2/ethtool.c
@@ -157,7 +157,9 @@ static int xge_get_link_ksettings(struct net_device *ndev,
if (!phydev)
return -ENODEV;
 
-   return phy_ethtool_ksettings_get(phydev, cmd);
+   phy_ethtool_ksettings_get(phydev, cmd);
+
+   return 0;
 }
 
 static int xge_set_link_ksettings(struct net_device *ndev,
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
index 559963b..4f50f11 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
@@ -131,13 +131,17 @@ static int xgene_get_link_ksettings(struct net_device 
*ndev,
if (phydev == NULL)
return -ENODEV;
 
-   return phy_ethtool_ksettings_get(phydev, cmd);
+   phy_ethtool_ksettings_get(phydev, cmd);
+
+   return 0;
} else if (pdata->phy_mode == PHY_INTERFACE_MODE_SGMII) {
if (pdata->mdio_driver) {
if (!phydev)
return -ENODEV;
 
-   return phy_ethtool_ksettings_get(phydev, cmd);
+   phy_ethtool_ksettings_get(phydev, cmd);
+
+   return 0;
}
 
supported = SUPPORTED_1000baseT_Full | SUPPORTED_Autoneg |
diff --git a/drivers/net/ethernet/broadcom/b44.c 
b/drivers/net/ethernet/broadcom/b44.c
index 5b95bb4..f411936 100644
--- a/drivers/net/ethernet/broadcom/b44.c
+++ b/drivers/net/ethernet/broadcom/b44.c
@@ -1836,7 +1836,9 @@ static int b44_get_link_ksettings(struct net_device *dev,
 
if (bp->flags & B44_FLAG_EXTERNAL_PHY) {
BUG_ON(!dev->phydev);
-   return phy_ethtool_ksettings_get(dev->phydev, cmd);
+   phy_ethtool_ksettings_get(dev->phydev, cmd);
+
+   return 0;
}
 
supported = (SUPPORTED_Autoneg);
diff --git a/drivers/net/ethernet/broadcom/bcm63xx_enet.c 
b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
index 50d88d3..ea3c906 100644
--- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c
+++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
@@ -1453,7 +1453,10 @@ static int bcm_enet_get_link_ksettings(struct net_device 
*dev,
if (priv->has_phy