from:"Michael Ellerman"

Re: [PATCH] powerpc/64: mark start_here_multiplatform as __ref

2019-06-30 Thread Michael Ellerman

On Fri, 2019-05-10 at 06:31:28 UTC, Christophe Leroy wrote:
> Otherwise, the following warning is encountered:
> 
> WARNING: vmlinux.o(.text+0x3dc6): Section mismatch in reference from the 
> variable start_here_multiplatform to the function .init.text:.early_setup()
> The function start_here_multiplatform() references
> the function __init .early_setup().
> This is often because start_here_multiplatform lacks a __init
> annotation or the annotation of .early_setup is wrong.
> 
> Fixes: 56c46bba9bbf ("powerpc/64: Fix booting large kernels with 
> STRICT_KERNEL_RWX")
> Cc: Russell Currey 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/9c4e4c90ec24652921e31e9551fcaedc26eec86d

cheers

[GIT PULL] Please pull powerpc/linux.git powerpc-5.2-7 tag

2019-06-29 Thread Michael Ellerman

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Linus,

Please pull one more powerpc fix for 5.2:

The following changes since commit 50087112592016a3fc10b394a55f1f1a1bde6908:

  KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries 
(2019-06-20 22:11:25 +1000)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-5.2-7

for you to fetch changes up to e13e7cd4c0c1cc9984d9b6a8663e10d76b53f2aa:

  powerpc/64s/exception: Fix machine check early corrupting AMR (2019-06-25 
21:04:27 +1000)

- --
powerpc fixes for 5.2 #7

One fix for a regression in my commit adding KUAP (Kernel User Access
Prevention) on Radix, which incorrectly touched the AMR in the early machine
check handler.

Thanks to:
  Nicholas Piggin.

- --
Nicholas Piggin (1):
  powerpc/64s/exception: Fix machine check early corrupting AMR


 arch/powerpc/kernel/exceptions-64s.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
-BEGIN PGP SIGNATURE-

iQIcBAEBAgAGBQJdF1XlAAoJEFHr6jzI4aWAYGcP/0zAPEOaQjgDIn8fgHIMQxiS
A3R2Il77YJXukTkbukGE3i47x22N+dOl0Cl3LruUVVHy5sEGf1y7qxJuJY8qLY85
rMsObpkOtxq2+Cax59t7heBd3W7kJ81FgELc2d3V5/xKTUrc6A3X1Lvf2V6sejEz
qfvcW+BvH56x4an8WuZTw/QyndI28bvYWeWd/fHrIqqNUmYIViS6huKX6vgWZOvr
LsKqSgmlA/fmR658hkG1bayJdmkQfT0XzJCck9xFftY2/tm2NYXk+F06PviUqVDT
VjWAu0CUohj+rb+b1NtrIaRz645gxYQJL51H9IWvjssUqhZ9Xf9kcVogQJ4l1U3W
q3Kcs0FhZbtkJ7b72Wt829Z95awZqiZpQYeBB0QnGLzr7anQWxzCosXw1jWrnI2e
xWfUBpBzzRK92Dn9Qkg7e5iydipSfMMiEMyoKTbZu4bG7aRltodzDuhWM1yzv1RU
zrznOTZqB97Ui+DtwMAZsnLzpAAacm4JuSWKEHA+kjE8TLZlOtmHX16cCWGZU2FE
xTUn30yJJj5qesQalIZ/33sVm1vH+DjbMhFZhnjGwb1W7USiPQby5H5eEvj0l3at
cZLWQOiskZpya6+FCL9sRPd5OUyONEiyQz0pnw0/evOvJuo2nounDled0k/s5ljz
WtsfvGeQBE760byvtENF
=dP4O
-END PGP SIGNATURE-

Re: power9 NUMA crash while reading debugfs imc_cmd

2019-06-29 Thread Michael Ellerman

Qian Cai  writes:
> On Fri, 2019-06-28 at 17:19 +0530, Anju T Sudhakar wrote:
>> On 6/28/19 9:04 AM, Qian Cai wrote:
>> > 
>> > > On Jun 27, 2019, at 11:12 PM, Michael Ellerman  
>> > > wrote:
>> > > 
>> > > Qian Cai  writes:
>> > > > Read of debugfs imc_cmd file for a memory-less node will trigger a 
>> > > > crash
>> > > > below
>> > > > on this power9 machine which has the following NUMA layout.
>> > > 
>> > > What type of machine is it?
>> > 
>> > description: PowerNV
>> > product: 8335-GTH (ibm,witherspoon)
>> > vendor: IBM
>> > width: 64 bits
>> > capabilities: smp powernv opal
>> 
>> 
>> Hi Qian Cai,
>> 
>> Could you please try with this patch: 
>> https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-June/192803.html
>> 
>> and see if the issue is resolved?
>
> It works fine.
>
> Just feel a bit silly that a node without CPU and memory is still online by
> default during boot at the first place on powerpc, but that is probably a
> different issue. For example,

Those are there to represent the memory on your attached GPUs. It's not
onlined by default.

I don't really love that they show up like that, but I think that's
working as expected.

cheers

> # numactl -H
> available: 6 nodes (0,8,252-255)
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
> 25
> 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 
> 52
> 53 54 55 56 57 58 59 60 61 62 63
> node 0 size: 126801 MB
> node 0 free: 123199 MB
> node 8 cpus: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
> 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
> 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
> node 8 size: 130811 MB
> node 8 free: 128436 MB
> node 252 cpus:
> node 252 size: 0 MB
> node 252 free: 0 MB
> node 253 cpus:
> node 253 size: 0 MB
> node 253 free: 0 MB
> node 254 cpus:
> node 254 size: 0 MB
> node 254 free: 0 MB
> node 255 cpus:
> node 255 size: 0 MB
> node 255 free: 0 MB
> node distances:
> node   0   8  252  253  254  255 
>   0:  10  40  80  80  80  80 
>   8:  40  10  80  80  80  80 
>  252:  80  80  10  80  80  80 
>  253:  80  80  80  10  80  80 
>  254:  80  80  80  80  10  80 
>  255:  80  80  80  80  80  10 
>
> # cat /sys/devices/system/node/online 
> 0,8,252-255

Re: [PATCH] selftests/powerpc: ppc_asm.h: typo in the header guard

2019-06-27 Thread Michael Ellerman

Denis Efremov  writes:
> The guard macro __PPC_ASM_H in the header ppc_asm.h
> doesn't match the #ifndef macro _PPC_ASM_H. The patch
> makes them the same.
>
> Signed-off-by: Denis Efremov 
> ---
>  tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

I'll merge this.

Please include linuxppc-dev on powerpc selftest patches.

cheers

> diff --git a/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h 
> b/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h
> index d2c0a911f55e..2b488b78c4f2 100644
> --- a/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h
> +++ b/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h
> @@ -1,6 +1,6 @@
>  /* SPDX-License-Identifier: GPL-2.0 */
>  #ifndef _PPC_ASM_H
> -#define __PPC_ASM_H
> +#define _PPC_ASM_H
>  #include 
>  
>  #ifndef r1
> -- 
> 2.21.0

Re: power9 NUMA crash while reading debugfs imc_cmd

2019-06-27 Thread Michael Ellerman

Qian Cai  writes:
> Read of debugfs imc_cmd file for a memory-less node will trigger a crash below
> on this power9 machine which has the following NUMA layout.

What type of machine is it?

cheers

> I don't understand why I only saw it recently on linux-next where it
> was tested everyday. I can reproduce it back to 4.20 where 4.18 seems
> work fine.
>
> # cat /sys/kernel/debug/powerpc/imc/imc_cmd_252 (On a 4.18-based kernel)
> 0x
>
> # numactl -H
> available: 6 nodes (0,8,252-255)
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
> 25
> 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 
> 52
> 53 54 55 56 57 58 59 60 61 62 63
> node 0 size: 130210 MB
> node 0 free: 128406 MB
> node 8 cpus: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
> 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
> 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
> node 8 size: 130784 MB
> node 8 free: 130051 MB
> node 252 cpus:
> node 252 size: 0 MB
> node 252 free: 0 MB
> node 253 cpus:
> node 253 size: 0 MB
> node 253 free: 0 MB
> node 254 cpus:
> node 254 size: 0 MB
> node 254 free: 0 MB
> node 255 cpus:
> node 255 size: 0 MB
> node 255 free: 0 MB
> node distances:
> node   0   8  252  253  254  255 
>   0:  10  40  80  80  80  80 
>   8:  40  10  80  80  80  80 
>  252:  80  80  10  80  80  80 
>  253:  80  80  80  10  80  80 
>  254:  80  80  80  80  10  80 
>  255:  80  80  80  80  80  10
>
> # cat /sys/kernel/debug/powerpc/imc/imc_cmd_252
>
> [ 1139.415461][ T5301] Faulting instruction address: 0xc00d0d58
> [ 1139.415492][ T5301] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 1139.415509][ T5301] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=256
> DEBUG_PAGEALLOC NUMA PowerNV
> [ 1139.415542][ T5301] Modules linked in: i2c_opal i2c_core ip_tables x_tables
> xfs sd_mod bnx2x mdio ahci libahci tg3 libphy libata firmware_class dm_mirror
> dm_region_hash dm_log dm_mod
> [ 1139.415595][ T5301] CPU: 67 PID: 5301 Comm: cat Not tainted 5.2.0-rc6-next-
> 20190627+ #19
> [ 1139.415634][ T5301] NIP:  c00d0d58 LR: c049aa18 CTR:
> c00d0d50
> [ 1139.415675][ T5301] REGS: c00020194548f9e0 TRAP: 0300   Not tainted  
> (5.2.0-
> rc6-next-20190627+)
> [ 1139.415705][ T5301] MSR:  90009033   CR:
> 28022822  XER: 
> [ 1139.415777][ T5301] CFAR: c049aa14 DAR: 0003fc08 DSISR:
> 4000 IRQMASK: 0 
> [ 1139.415777][ T5301] GPR00: c049aa18 c00020194548fc70 
> c16f8b00
> 0003fc08 
> [ 1139.415777][ T5301] GPR04: c00020194548fcd0  
> 14884e73
> 00011eaa 
> [ 1139.415777][ T5301] GPR08: 7eea5a52 c00d0d50 
> 
>  
> [ 1139.415777][ T5301] GPR12: c00d0d50 c000201fff7f8c00 
> 
>  
> [ 1139.415777][ T5301] GPR16: 000d 7fffeb0c3368 
> 
>  
> [ 1139.415777][ T5301] GPR20:   
> 
> 0002 
> [ 1139.415777][ T5301] GPR24:   
> 0002
> 00010ec9 
> [ 1139.415777][ T5301] GPR28: c00020194548fdf0 c00020049a584ef8 
> 
> c00020049a584ea8 
> [ 1139.416116][ T5301] NIP [c00d0d58] imc_mem_get+0x8/0x20
> [ 1139.416143][ T5301] LR [c049aa18] simple_attr_read+0x118/0x170
> [ 1139.416158][ T5301] Call Trace:
> [ 1139.416182][ T5301] [c00020194548fc70] [c049a970]
> simple_attr_read+0x70/0x170 (unreliable)
> [ 1139.416255][ T5301] [c00020194548fd10] [c054385c]
> debugfs_attr_read+0x6c/0xb0
> [ 1139.416305][ T5301] [c00020194548fd60] [c0454c1c]
> __vfs_read+0x3c/0x70
> [ 1139.416363][ T5301] [c00020194548fd80] [c0454d0c] 
> vfs_read+0xbc/0x1a0
> [ 1139.416392][ T5301] [c00020194548fdd0] [c045519c]
> ksys_read+0x7c/0x140
> [ 1139.416434][ T5301] [c00020194548fe20] [c000b108]
> system_call+0x5c/0x70
> [ 1139.416473][ T5301] Instruction dump:
> [ 1139.416511][ T5301] 4e800020 6000 7c0802a6 6000 7c801d28 3860
> 4e800020 6000 
> [ 1139.416572][ T5301] 6000 6000 7c0802a6 6000 <7d201c28> 3860
> f924 4e800020 
> [ 1139.416636][ T5301] ---[ end trace c44d1fb4ace04784 ]---
> [ 1139.520686][ T5301] 
> [ 1140.520820][ T5301] Kernel panic - not syncing: Fatal exception

Re: [PATCH] recordmcount: Fix spurious mcount entries on powerpc

2019-06-27 Thread Michael Ellerman

Satheesh Rajendran  writes:
> On Thu, Jun 27, 2019 at 12:08:01AM +0530, Naveen N. Rao wrote:
>> The recent change enabling HAVE_C_RECORDMCOUNT on powerpc started
>> showing the following issue:
>> 
>>   # modprobe kprobe_example
>>ftrace-powerpc: Not expected bl: opcode is 3c4c0001
>>WARNING: CPU: 0 PID: 227 at kernel/trace/ftrace.c:2001 
>> ftrace_bug+0x90/0x318
>>Modules linked in:
>>CPU: 0 PID: 227 Comm: modprobe Not tainted 5.2.0-rc6-00678-g1c329100b942 
>> #2
>>NIP:  c0264318 LR: c025d694 CTR: c0f5cd30
>>REGS: c1f2b7b0 TRAP: 0700   Not tainted  
>> (5.2.0-rc6-00678-g1c329100b942)
>>MSR:  90010282b033   CR: 
>> 28228222  XER: 
>>CFAR: c02642fc IRQMASK: 0
>>
>>NIP [c0264318] ftrace_bug+0x90/0x318
>>LR [c025d694] ftrace_process_locs+0x4f4/0x5e0
>>Call Trace:
>>[c1f2ba40] [0004] 0x4 (unreliable)
>>[c1f2bad0] [c025d694] ftrace_process_locs+0x4f4/0x5e0
>>[c1f2bb90] [c020ff10] load_module+0x25b0/0x30c0
>>[c1f2bd00] [c0210cb0] sys_finit_module+0xc0/0x130
>>[c1f2be20] [c000bda4] system_call+0x5c/0x70
>>Instruction dump:
>>419e0018 2f83 419e00bc 2f83ffea 409e00cc 481c 0fe0 3c62ff96
>>3901 3940 386386d0 48c4 <0fe0> 3ce20003 3901 3c62ff96
>>---[ end trace 4c438d5cebf78381 ]---
>>ftrace failed to modify
>>[] 0xc008012a0008
>> actual:   01:00:4c:3c
>>Initializing ftrace call sites
>>ftrace record flags: 200
>> (0)
>> expected tramp: c006af4c
>> 
>> Looking at the relocation records in __mcount_loc showed a few spurious
>> entries:
>>   RELOCATION RECORDS FOR [__mcount_loc]:
>>   OFFSET   TYPE  VALUE
>>    R_PPC64_ADDR64.text.unlikely+0x0008
>>   0008 R_PPC64_ADDR64.text.unlikely+0x0014
>>   0010 R_PPC64_ADDR64.text.unlikely+0x0060
>>   0018 R_PPC64_ADDR64.text.unlikely+0x00b4
>>   0020 R_PPC64_ADDR64.init.text+0x0008
>>   0028 R_PPC64_ADDR64.init.text+0x0014
>> 
>> The first entry in each section is incorrect. Looking at the relocation
>> records, the spurious entries correspond to the R_PPC64_ENTRY records:
>>   RELOCATION RECORDS FOR [.text.unlikely]:
>>   OFFSET   TYPE  VALUE
>>    R_PPC64_REL64 .TOC.-0x0008
>>   0008 R_PPC64_ENTRY *ABS*
>>   0014 R_PPC64_REL24 _mcount
>>   
>> 
>> The problem is that we are not validating the return value from
>> get_mcountsym() in sift_rel_mcount(). With this entry, mcountsym is 0,
>> but Elf_r_sym(relp) also ends up being 0. Fix this by ensuring mcountsym
>> is valid before processing the entry.
>> 
>> Fixes: c7d64b560ce80 ("powerpc/ftrace: Enable C Version of recordmcount")
>> Signed-off-by: Naveen N. Rao 
>> ---
>>  scripts/recordmcount.h | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
> Hi,
>
> linuxppc/merge branch latest commit 850f6274c5b5 was failing to boot IBM
> Power8 Box, and the failure got resolved with this patch.
> https://github.com/linuxppc/issues/issues/254
>
> # git log -2 --oneline
> 0e0f55b31ea8 (HEAD -> issue_254) recordmcount: Fix spurious mcount entries on 
> powerpc
> 850f6274c5b5 (origin/merge, merge) Automatic merge of branches 'master', 
> 'next' and 'fixes' into merge
> # uname -r
> 5.2.0-rc6-00123-g0e0f55b31ea8
>  
> Tested-by: Satheesh Rajendran 

Thanks. I've pulled the broken commit out of merge for now.

The fix and the original commit will show up in merge later today
probably.

cheers

Re: [PATCH] recordmcount: Fix spurious mcount entries on powerpc

2019-06-27 Thread Michael Ellerman

Steven Rostedt  writes:
> On Thu, 27 Jun 2019 15:55:47 +1000
> Michael Ellerman  wrote:
>
>> Steve are you OK if I merge this via the powerpc tree? I'll reword the
>> commit message so that it makes sense coming prior to the commit
>> mentioned above.
>
> Yes, please add:
>
> Acked-by: Steven Rostedt (VMware) 

Thanks.

cheers

Re: [PATCH] recordmcount: Fix spurious mcount entries on powerpc

2019-06-26 Thread Michael Ellerman

"Naveen N. Rao"  writes:
> The recent change enabling HAVE_C_RECORDMCOUNT on powerpc started
> showing the following issue:
>
>   # modprobe kprobe_example
>ftrace-powerpc: Not expected bl: opcode is 3c4c0001
>WARNING: CPU: 0 PID: 227 at kernel/trace/ftrace.c:2001 
> ftrace_bug+0x90/0x318
>Modules linked in:
>CPU: 0 PID: 227 Comm: modprobe Not tainted 5.2.0-rc6-00678-g1c329100b942 #2
>NIP:  c0264318 LR: c025d694 CTR: c0f5cd30
>REGS: c1f2b7b0 TRAP: 0700   Not tainted  
> (5.2.0-rc6-00678-g1c329100b942)
>MSR:  90010282b033   CR: 
> 28228222  XER: 
>CFAR: c02642fc IRQMASK: 0
>
>NIP [c0264318] ftrace_bug+0x90/0x318
>LR [c025d694] ftrace_process_locs+0x4f4/0x5e0
>Call Trace:
>[c1f2ba40] [0004] 0x4 (unreliable)
>[c1f2bad0] [c025d694] ftrace_process_locs+0x4f4/0x5e0
>[c1f2bb90] [c020ff10] load_module+0x25b0/0x30c0
>[c1f2bd00] [c0210cb0] sys_finit_module+0xc0/0x130
>[c1f2be20] [c000bda4] system_call+0x5c/0x70
>Instruction dump:
>419e0018 2f83 419e00bc 2f83ffea 409e00cc 481c 0fe0 3c62ff96
>3901 3940 386386d0 48c4 <0fe0> 3ce20003 3901 3c62ff96
>---[ end trace 4c438d5cebf78381 ]---
>ftrace failed to modify
>[] 0xc008012a0008
> actual:   01:00:4c:3c
>Initializing ftrace call sites
>ftrace record flags: 200
> (0)
> expected tramp: c006af4c

Aha, thanks. I saw that on one of my text boxes but hadn't pinned it
down to this commit.

> Fixes: c7d64b560ce80 ("powerpc/ftrace: Enable C Version of recordmcount")

That commit is the tip of my next, so I'll drop it for now and merge
them in the other order so there's breakage.

Steve are you OK if I merge this via the powerpc tree? I'll reword the
commit message so that it makes sense coming prior to the commit
mentioned above.

cheers

> Signed-off-by: Naveen N. Rao 
> ---
>  scripts/recordmcount.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/scripts/recordmcount.h b/scripts/recordmcount.h
> index 13c5e6c8829c..47fca2c69a73 100644
> --- a/scripts/recordmcount.h
> +++ b/scripts/recordmcount.h
> @@ -325,7 +325,8 @@ static uint_t *sift_rel_mcount(uint_t *mlocp,
>   if (!mcountsym)
>   mcountsym = get_mcountsym(sym0, relp, str0);
>  
> - if (mcountsym == Elf_r_sym(relp) && !is_fake_mcount(relp)) {
> + if (mcountsym && mcountsym == Elf_r_sym(relp) &&
> + !is_fake_mcount(relp)) {
>   uint_t const addend =
>   _w(_w(relp->r_offset) - recval + mcount_adjust);
>   mrelp->r_offset = _w(offbase
> -- 
> 2.22.0

Re: [PATCH] powerpc/64s/radix: Define arch_ioremap_p4d_supported()

2019-06-26 Thread Michael Ellerman

Anshuman Khandual  writes:
> Recent core ioremap changes require HAVE_ARCH_HUGE_VMAP subscribing archs
> provide arch_ioremap_p4d_supported() failing which will result in a build
> failure like the following.
>
> ld: lib/ioremap.o: in function `.ioremap_huge_init':
> ioremap.c:(.init.text+0x3c): undefined reference to
> `.arch_ioremap_p4d_supported'
>
> This defines a stub implementation for arch_ioremap_p4d_supported() keeping
> it disabled for now to fix the build problem.

The easiest option is for this to be folded into your patch that creates
the requirement for arch_ioremap_p4d_supported().

Andrew might do that for you, or you could send a v2.

This looks fine from a powerpc POV:

Acked-by: Michael Ellerman 

cheers

> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: "Aneesh Kumar K.V" 
> Cc: Nicholas Piggin 
> Cc: Andrew Morton 
> Cc: Stephen Rothwell 
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-n...@vger.kernel.org
>
> Signed-off-by: Anshuman Khandual 
> ---
> This has been just build tested and fixes the problem reported earlier.
>
>  arch/powerpc/mm/book3s64/radix_pgtable.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
> b/arch/powerpc/mm/book3s64/radix_pgtable.c
> index 8904aa1..c81da88 100644
> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> @@ -1124,6 +1124,11 @@ void radix__ptep_modify_prot_commit(struct 
> vm_area_struct *vma,
>   set_pte_at(mm, addr, ptep, pte);
>  }
>  
> +int __init arch_ioremap_p4d_supported(void)
> +{
> + return 0;
> +}
> +
>  int __init arch_ioremap_pud_supported(void)
>  {
>   /* HPT does not cope with large pages in the vmalloc area */
> -- 
> 2.7.4

CVE-2019-12817: Linux kernel: powerpc: Unrelated processes may be able to read/write to each other's virtual memory

2019-06-24 Thread Michael Ellerman

The Linux kernel for powerpc since 4.17 has a bug where unrelated processes may
be able to read/write to each other's virtual memory under certain conditions.

This bug only affects machines using 64-bit CPUs with the hash page table MMU,
see below for more detail on affected CPUs.

To trigger the bug a process must allocate memory above 512TB. That only happens
if userspace explicitly requests it with mmap(). That process must then fork(),
at this point the child incorrectly inherits the "context id" of the parent
associated with the mapping above 512TB. It may then be possible for the
parent/child to write to each other's mappings above 512TB, which should not be
possible, and constitutes memory corruption.

If instead the child process exits, all its context ids are freed, including the
context id that is still in use by the parent for the mapping above 512TB. That
id can then be reallocated to a third process, that process can then read/write
to the parent's mapping above 512TB. Additionally if the freed id is used for
the third process's primary context id, then the parent is able to read/write to
the third process's mappings *below* 512TB.

If the parent and child both exit before another process is allocated the freed
context id, the kernel will notice the double free of the id and print a warning
such as:

  ida_free called for id=103 which is not allocated.
  WARNING: CPU: 8 PID: 7293 at lib/idr.c:520 ida_free_rc+0x1b4/0x1d0

The bug was introduced in commit:
  f384796c40dc ("powerpc/mm: Add support for handling > 512TB address in SLB 
miss")

Which was originally merged in v4.17.

Only machines using the hash page table (HPT) MMU are affected, eg. PowerPC 970
(G5), PA6T, Power5/6/7/8/9. By default Power9 bare metal machines (powernv) use
the Radix MMU and are not affected, unless the machine has been explicitly
booted in HPT mode (using disable_radix on the kernel command line). KVM guests
on Power9 may be affected if the host or guest is configured to use the HPT MMU.
LPARs under PowerVM on Power9 are affected as they always use the HPT MMU.
Kernels built with PAGE_SIZE=4K are not affected.

The upstream fix is here:

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ca72d88378b2f2444d3ec145dd442d449d3fefbc

There's also a kernel selftest to verify the fix:

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=16391bfc862342f285195013b73c1394fab28b97

Or a similar standalone version is included below.

cheers


cat > test.c <
#include 
#include 
#include 
#include 
#include 
#include 

#ifndef MAP_FIXED_NOREPLACE
#define MAP_FIXED_NOREPLACE MAP_FIXED   // "Should be safe" above 512TB
#endif

int main(void)
{
int p2c[2], c2p[2], rc, status, c, *p;
unsigned long page_size;
pid_t pid;

page_size = sysconf(_SC_PAGESIZE);
if (page_size != 65536) {
printf("Unsupported page size - not affected\n");
return 1;
}

// Create a mapping at 512TB to allocate an extended_id
p = mmap((void *)(512ul << 40), page_size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED_NOREPLACE, -1, 0);
if (p == MAP_FAILED) {
perror("mmap");
printf("Error: couldn't mmap(), confirm kernel has 4TB 
support\n");
return 1;
}

printf("parent writing %p = 1\n", p);
*p = 1;

assert(pipe(p2c) != -1 && pipe(c2p) != -1);

pid = fork();
if (pid == 0) {
close(p2c[1]);
close(c2p[0]);
assert(read(p2c[0], &c, 1) == 1);

pid = getpid();
printf("child writing  %p = %d\n", p, pid);
*p = pid;

assert(write(c2p[1], &c, 1) == 1);
assert(read(p2c[0], &c, 1) == 1);
exit(0);
}
close(p2c[0]);
close(c2p[1]);

c = 0;
assert(write(p2c[1], &c, 1) == 1);
assert(read(c2p[0], &c, 1) == 1);

// Prevent compiler optimisation
asm volatile("" : : : "memory");

rc = 0;
printf("parent reading %p = %d\n", p, *p);
if (*p != 1) {
printf("Error: BUG! parent saw child's write! *p = %d\n", *p);
rc = 1;
}

assert(write(p2c[1], &c, 1) == 1);
assert(waitpid(pid, &status, 0) != -1);
assert(WIFEXITED(status) && WEXITSTATUS(status) == 0);

if (rc == 0)
printf("success: test completed OK\n");

return rc;
}
EOF


signature.asc
Description: PGP signature

Re: [PATCH] powerpc: enable a 30-bit ZONE_DMA for 32-bit pmac

2019-06-23 Thread Michael Ellerman

On Thu, 2019-06-13 at 08:24:46 UTC, Christoph Hellwig wrote:
> With the strict dma mask checking introduced with the switch to
> the generic DMA direct code common wifi chips on 32-bit powerbooks
> stopped working.  Add a 30-bit ZONE_DMA to the 32-bit pmac builds
> to allow them to reliably allocate dma coherent memory.
> 
> Fixes: 65a21b71f948 ("powerpc/dma: remove dma_nommu_dma_supported")
> Reported-by: Aaro Koskinen 
> Signed-off-by: Christoph Hellwig 
> Tested-by: Larry Finger 
> Acked-by: Larry Finger 
> Tested-by: Aaro Koskinen 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/9739ab7eda459f0669ec9807e0d9be5020bab88c

cheers

[GIT PULL] Please pull powerpc/linux.git powerpc-5.2-5 tag

2019-06-22 Thread Michael Ellerman

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Linus,

Please pull some more powerpc fixes for 5.2.

This is a frustratingly large batch at rc5. Some of these were sent earlier but
were missed by me due to being distracted by other things, and some took a while
to track down due to needing manual bisection on old hardware. But still we
clearly need to improve our testing of KVM, and of 32-bit, so that we catch
these earlier.

cheers


The following changes since commit c21f5a9ed85ca3e914ca11f421677ae9ae0d04b0:

  powerpc/32s: fix booting with CONFIG_PPC_EARLY_DEBUG_BOOTX (2019-06-07 
19:00:14 +1000)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-5.2-5

for you to fetch changes up to 50087112592016a3fc10b394a55f1f1a1bde6908:

  KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries 
(2019-06-20 22:11:25 +1000)

- --
powerpc fixes for 5.2 #5

Seven fixes, all for bugs introduced this cycle.

The commit to add KASAN support broke booting on 32-bit SMP machines, due to a
refactoring that moved some setup out of the secondary CPU path.

A fix for another 32-bit SMP bug introduced by the fast syscall entry
implementation for 32-bit BOOKE. And a build fix for the same commit.

Our change to allow the DAWR to be force enabled on Power9 introduced a bug in
KVM, where we clobber r3 leading to a host crash.

The same commit also exposed a previously unreachable bug in the nested KVM
handling of DAWR, which could lead to an oops in a nested host.

One of the DMA reworks broke the b43legacy WiFi driver on some people's
powermacs, fix it by enabling a 30-bit ZONE_DMA on 32-bit.

A fix for TLB flushing in KVM introduced a new bug, as it neglected to also
flush the ERAT, this could lead to memory corruption in the guest.

Thanks to:
  Aaro Koskinen, Christoph Hellwig, Christophe Leroy, Larry Finger, Michael
  Neuling, Suraj Jitindar Singh.

- --
Christoph Hellwig (1):
  powerpc: enable a 30-bit ZONE_DMA for 32-bit pmac

Christophe Leroy (3):
  powerpc/32s: fix initial setup of segment registers on secondary CPU
  powerpc/booke: fix fast syscall entry on SMP
  powerpc/32: fix build failure on book3e with KVM

Michael Neuling (1):
  KVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr()

Suraj Jitindar Singh (2):
  KVM: PPC: Book3S HV: Only write DAWR[X] when handling h_set_dawr in real 
mode
  KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries


 arch/powerpc/include/asm/page.h |  7 +++
 arch/powerpc/kernel/head_32.S   |  1 +
 arch/powerpc/kernel/head_booke.h| 10 +-
 arch/powerpc/kernel/head_fsl_booke.S|  2 +-
 arch/powerpc/kvm/book3s_hv_builtin.c|  1 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 15 +--
 arch/powerpc/mm/mem.c   |  3 ++-
 arch/powerpc/platforms/powermac/Kconfig |  1 +
 8 files changed, 31 insertions(+), 9 deletions(-)
-BEGIN PGP SIGNATURE-

iQIcBAEBAgAGBQJdDhYHAAoJEFHr6jzI4aWAe9MQALg4ximulK/aabpsUZ9VXvJG
xGtfDYi41KQt73CdiE4nacx3RH7YlcqmZAoKU0JhcuLL2zbqqufhFYnKJNYPEHcG
S2vHoEFfuVMR27B0Ba9FPHUwE1ND7dzPx8BGqjg4nUkoFd9rWV6xxQ5nYil3NBOi
+O5jtKMJxkF2DvSonaUrE6qX34F8N7HfVb8s3ZQpLEdcuyuJt3r9Zne/fvR9GRJ8
gDvjkQervuw0iA3BcJlRWnJqf5ch9iijd+YvkAIeAjO7M1yWXoGUdRbVK3M39iO7
n7znfy7JSdcM/AaMP+qiK0KDUgUNlBbtm/bvC9TFMBsD/dBHlYE3crCUIoYXxCE8
0bVyQL502J4Qd8vbIqK3WCZCprqQpp/q2SVYxgIj1jnk2enFn8kEVREVrdyVDuTJ
LcBQEyUtZooS/ATrwPOzIAC/XsdnHP7tBSqU23J3Ba+W6GM/t8wuDkGwN7nhwoYE
SU8p0AbAQ/G6Yi9JvgATbtSXAMQ2pPO3TCYkLVzD18KQLhfSYD4cbMs+gWvMCwVR
/8lRkRi4uHurUk8eE4y/Sp4T7pRk4mwVxYighLbGLXLW/pj9RvfdTRA3i+E51j/U
Wu1lZSTrKPRzNp7XYUOM5ZGfngptgO7istNkgeQz8zJsPu1S0aoxyK1ypCp1aUZ8
flxdVv62fmt3H/8A0I3O
=5RHe
-END PGP SIGNATURE-

Re: [PATCH] powerpc: enable a 30-bit ZONE_DMA for 32-bit pmac

2019-06-20 Thread Michael Ellerman

Benjamin Herrenschmidt  writes:
> On Wed, 2019-06-19 at 22:32 +1000, Michael Ellerman wrote:
>> Christoph Hellwig  writes:
>> > Any chance this could get picked up to fix the regression?
>> 
>> Was hoping Ben would Ack it. He's still powermac maintainer :)
>> 
>> I guess he OK'ed it in the other thread, will add it to my queue.
>
> Yeah ack. If I had written it myself, I would have made the DMA bits a
> variable and only set it down to 30 if I see that device in the DT
> early on, but I can't be bothered now, if it works, ship it :-)

OK, we can do that next release if someone's motivated.

> Note: The patch affects all ppc32, though I don't think it will cause
> any significant issue on those who don't need it.

Yeah. We could always hide it behind CONFIG_PPC_PMAC if it becomes a problem.

cheers

Re: [PATCH] powerpc: enable a 30-bit ZONE_DMA for 32-bit pmac

2019-06-19 Thread Michael Ellerman

Christoph Hellwig  writes:
> Any chance this could get picked up to fix the regression?

Was hoping Ben would Ack it. He's still powermac maintainer :)

I guess he OK'ed it in the other thread, will add it to my queue.

cheers

> On Thu, Jun 13, 2019 at 10:24:46AM +0200, Christoph Hellwig wrote:
>> With the strict dma mask checking introduced with the switch to
>> the generic DMA direct code common wifi chips on 32-bit powerbooks
>> stopped working.  Add a 30-bit ZONE_DMA to the 32-bit pmac builds
>> to allow them to reliably allocate dma coherent memory.
>> 
>> Fixes: 65a21b71f948 ("powerpc/dma: remove dma_nommu_dma_supported")
>> Reported-by: Aaro Koskinen 
>> Signed-off-by: Christoph Hellwig 
>> ---
>>  arch/powerpc/include/asm/page.h | 7 +++
>>  arch/powerpc/mm/mem.c   | 3 ++-
>>  arch/powerpc/platforms/powermac/Kconfig | 1 +
>>  3 files changed, 10 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/include/asm/page.h 
>> b/arch/powerpc/include/asm/page.h
>> index b8286a2013b4..0d52f57fca04 100644
>> --- a/arch/powerpc/include/asm/page.h
>> +++ b/arch/powerpc/include/asm/page.h
>> @@ -319,6 +319,13 @@ struct vm_area_struct;
>>  #endif /* __ASSEMBLY__ */
>>  #include 
>>  
>> +/*
>> + * Allow 30-bit DMA for very limited Broadcom wifi chips on many powerbooks.
>> + */
>> +#ifdef CONFIG_PPC32
>> +#define ARCH_ZONE_DMA_BITS 30
>> +#else
>>  #define ARCH_ZONE_DMA_BITS 31
>> +#endif
>>  
>>  #endif /* _ASM_POWERPC_PAGE_H */
>> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
>> index cba29131bccc..2540d3b2588c 100644
>> --- a/arch/powerpc/mm/mem.c
>> +++ b/arch/powerpc/mm/mem.c
>> @@ -248,7 +248,8 @@ void __init paging_init(void)
>> (long int)((top_of_ram - total_ram) >> 20));
>>  
>>  #ifdef CONFIG_ZONE_DMA
>> -max_zone_pfns[ZONE_DMA] = min(max_low_pfn, 0x7fffUL >> PAGE_SHIFT);
>> +max_zone_pfns[ZONE_DMA] = min(max_low_pfn,
>> +((1UL << ARCH_ZONE_DMA_BITS) - 1) >> PAGE_SHIFT);
>>  #endif
>>  max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
>>  #ifdef CONFIG_HIGHMEM
>> diff --git a/arch/powerpc/platforms/powermac/Kconfig 
>> b/arch/powerpc/platforms/powermac/Kconfig
>> index f834a19ed772..c02d8c503b29 100644
>> --- a/arch/powerpc/platforms/powermac/Kconfig
>> +++ b/arch/powerpc/platforms/powermac/Kconfig
>> @@ -7,6 +7,7 @@ config PPC_PMAC
>>  select PPC_INDIRECT_PCI if PPC32
>>  select PPC_MPC106 if PPC32
>>  select PPC_NATIVE
>> +select ZONE_DMA if PPC32
>>  default y
>>  
>>  config PPC_PMAC64
>> -- 
>> 2.20.1
> ---end quoted text---

Re: [PATCH 4/7] powerpc/ftrace: Additionally nop out the preceding mflr with -mprofile-kernel

2019-06-18 Thread Michael Ellerman

Hi Naveen,

Sorry I meant to reply to this earlier .. :/

"Naveen N. Rao"  writes:
> With -mprofile-kernel, gcc emits 'mflr r0', followed by 'bl _mcount' to
> enable function tracing and profiling. So far, with dynamic ftrace, we
> used to only patch out the branch to _mcount(). However, mflr is
> executed by the branch unit that can only execute one per cycle on
> POWER9 and shared with branches, so it would be nice to avoid it where
> possible.
>
> We cannot simply nop out the mflr either. When enabling function
> tracing, there can be a race if tracing is enabled when some thread was
> interrupted after executing a nop'ed out mflr. In this case, the thread
> would execute the now-patched-in branch to _mcount() without having
> executed the preceding mflr.
>
> To solve this, we now enable function tracing in 2 steps: patch in the
> mflr instruction, use synchronize_rcu_tasks() to ensure all existing
> threads make progress, and then patch in the branch to _mcount(). We
> override ftrace_replace_code() with a powerpc64 variant for this
> purpose.

According to the ISA we're not allowed to patch mflr at runtime. See the
section on "CMODX".

I'm also not convinced the ordering between the two patches is
guaranteed by the ISA, given that there's possibly no isync on the other
CPU.

But I haven't had time to dig into it sorry, hopefully later in the
week?

cheers

> diff --git a/arch/powerpc/kernel/trace/ftrace.c 
> b/arch/powerpc/kernel/trace/ftrace.c
> index 517662a56bdc..5e2b29808af1 100644
> --- a/arch/powerpc/kernel/trace/ftrace.c
> +++ b/arch/powerpc/kernel/trace/ftrace.c
> @@ -125,7 +125,7 @@ __ftrace_make_nop(struct module *mod,
>  {
>   unsigned long entry, ptr, tramp;
>   unsigned long ip = rec->ip;
> - unsigned int op, pop;
> + unsigned int op;
>  
>   /* read where this goes */
>   if (probe_kernel_read(&op, (void *)ip, sizeof(int))) {
> @@ -160,8 +160,6 @@ __ftrace_make_nop(struct module *mod,
>  
>  #ifdef CONFIG_MPROFILE_KERNEL
>   /* When using -mkernel_profile there is no load to jump over */
> - pop = PPC_INST_NOP;
> -
>   if (probe_kernel_read(&op, (void *)(ip - 4), 4)) {
>   pr_err("Fetching instruction at %lx failed.\n", ip - 4);
>   return -EFAULT;
> @@ -169,26 +167,23 @@ __ftrace_make_nop(struct module *mod,
>  
>   /* We expect either a mflr r0, or a std r0, LRSAVE(r1) */
>   if (op != PPC_INST_MFLR && op != PPC_INST_STD_LR) {
> - pr_err("Unexpected instruction %08x around bl _mcount\n", op);
> + pr_err("Unexpected instruction %08x before bl _mcount\n", op);
>   return -EINVAL;
>   }
> -#else
> - /*
> -  * Our original call site looks like:
> -  *
> -  * bl 
> -  * ld r2,XX(r1)
> -  *
> -  * Milton Miller pointed out that we can not simply nop the branch.
> -  * If a task was preempted when calling a trace function, the nops
> -  * will remove the way to restore the TOC in r2 and the r2 TOC will
> -  * get corrupted.
> -  *
> -  * Use a b +8 to jump over the load.
> -  */
>  
> - pop = PPC_INST_BRANCH | 8;  /* b +8 */
> + /* We should patch out the bl to _mcount first */
> + if (patch_instruction((unsigned int *)ip, PPC_INST_NOP)) {
> + pr_err("Patching NOP failed.\n");
> + return -EPERM;
> + }
>  
> + /* then, nop out the preceding 'mflr r0' as an optimization */
> + if (op == PPC_INST_MFLR &&
> + patch_instruction((unsigned int *)(ip - 4), PPC_INST_NOP)) {
> + pr_err("Patching NOP failed.\n");
> + return -EPERM;
> + }
> +#else
>   /*
>* Check what is in the next instruction. We can see ld r2,40(r1), but
>* on first pass after boot we will see mflr r0.
> @@ -202,12 +197,25 @@ __ftrace_make_nop(struct module *mod,
>   pr_err("Expected %08x found %08x\n", PPC_INST_LD_TOC, op);
>   return -EINVAL;
>   }
> -#endif /* CONFIG_MPROFILE_KERNEL */
>  
> - if (patch_instruction((unsigned int *)ip, pop)) {
> + /*
> +  * Our original call site looks like:
> +  *
> +  * bl 
> +  * ld r2,XX(r1)
> +  *
> +  * Milton Miller pointed out that we can not simply nop the branch.
> +  * If a task was preempted when calling a trace function, the nops
> +  * will remove the way to restore the TOC in r2 and the r2 TOC will
> +  * get corrupted.
> +  *
> +  * Use a b +8 to jump over the load.
> +  */
> + if (patch_instruction((unsigned int *)ip, PPC_INST_BRANCH | 8)) {
>   pr_err("Patching NOP failed.\n");
>   return -EPERM;
>   }
> +#endif /* CONFIG_MPROFILE_KERNEL */
>  
>   return 0;
>  }
> @@ -421,6 +429,26 @@ static int __ftrace_make_nop_kernel(struct dyn_ftrace 
> *rec, unsigned long addr)
>   return -EPERM;
>   }
>  
> +#ifdef CONFIG_MPROFILE_KERNEL
> + /* Nop out the preceding 'mflr r0' as an optim

Re: [PATCH] powerpc/32s: fix initial setup of segment registers on secondary CPU

2019-06-18 Thread Michael Ellerman

Christophe Leroy  writes:
> Le 11/06/2019 à 17:47, Christophe Leroy a écrit :
>> The patch referenced below moved the loading of segment registers
>> out of load_up_mmu() in order to do it earlier in the boot sequence.
>> However, the secondary CPU still needs it to be done when loading up
>> the MMU.
>> 
>> Reported-by: Erhard F. 
>> Fixes: 215b823707ce ("powerpc/32s: set up an early static hash table for 
>> KASAN")
>
> Cc: sta...@vger.kernel.org

Sorry patchwork didn't pick that up and I missed it. The AUTOSEL bot
will probably pick it up anyway though.

cheers

>> diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
>> index 1d5f1bd0dacd..f255e22184b4 100644
>> --- a/arch/powerpc/kernel/head_32.S
>> +++ b/arch/powerpc/kernel/head_32.S
>> @@ -752,6 +752,7 @@ __secondary_start:
>>  stw r0,0(r3)
>>   
>>  /* load up the MMU */
>> +bl  load_segment_registers
>>  bl  load_up_mmu
>>   
>>  /* ptr to phys current thread */
>>

Re: [PATCH v2] perf ioctl: Add check for the sample_period value

2019-06-18 Thread Michael Ellerman

Ravi Bangoria  writes:
> Peter / mpe,
>
> Is the v2 looks good? If so, can anyone of you please pick this up.

I usually wouldn't take it, it's generic perf code. Unless
peter/ingo/acme tell me otherwise.

It's sort of a bug fix for 0819b2e30ccb, should it have a fixes and/or
stable tag?

  Fixes: 0819b2e30ccb ("perf: Limit perf_event_attr::sample_period to 63 bits")
  Cc: sta...@vger.kernel.org # v3.15+

cheers

> On 6/4/19 9:59 AM, Ravi Bangoria wrote:
>> perf_event_open() limits the sample_period to 63 bits. See
>> commit 0819b2e30ccb ("perf: Limit perf_event_attr::sample_period
>> to 63 bits"). Make ioctl() consistent with it.
>> 
>> Also on powerpc, negative sample_period could cause a recursive
>> PMIs leading to a hang (reported when running perf-fuzzer).
>> 
>> Signed-off-by: Ravi Bangoria 
>> ---
>>  kernel/events/core.c | 3 +++
>>  1 file changed, 3 insertions(+)
>> 
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index abbd4b3b96c2..e44c90378940 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -5005,6 +5005,9 @@ static int perf_event_period(struct perf_event *event, 
>> u64 __user *arg)
>>  if (perf_event_check_period(event, value))
>>  return -EINVAL;
>>  
>> +if (!event->attr.freq && (value & (1ULL << 63)))
>> +return -EINVAL;
>> +
>>  event_function_call(event, __perf_event_period, &value);
>>  
>>  return 0;
>>

Re: [PATCH] selftests/powerpc: Add missing newline at end of file

2019-06-18 Thread Michael Ellerman

Geert Uytterhoeven  writes:
> "git diff" says:
>
> \ No newline at end of file
>
> after modifying the file.

Is that a problem?

Just curious because it was presumably me that broke it :)

cheers

> diff --git a/tools/testing/selftests/powerpc/mm/.gitignore 
> b/tools/testing/selftests/powerpc/mm/.gitignore
> index ba919308fe3052f3..16861ab840f57e90 100644
> --- a/tools/testing/selftests/powerpc/mm/.gitignore
> +++ b/tools/testing/selftests/powerpc/mm/.gitignore
> @@ -3,4 +3,4 @@ subpage_prot
>  tempfile
>  prot_sao
>  segv_errors
> -wild_bctr
> \ No newline at end of file
> +wild_bctr
> -- 
> 2.17.1

Re: [PATCH v1 1/6] mm: Section numbers use the type "unsigned long"

2019-06-18 Thread Michael Ellerman

Andrew Morton  writes:
> On Sat, 15 Jun 2019 10:06:54 +0200 Christophe Leroy  
> wrote:
>> Le 14/06/2019 à 21:00, Andrew Morton a écrit :
>> > On Fri, 14 Jun 2019 12:01:09 +0200 David Hildenbrand  
>> > wrote:
>> > 
>> >> We are using a mixture of "int" and "unsigned long". Let's make this
>> >> consistent by using "unsigned long" everywhere. We'll do the same with
>> >> memory block ids next.
>> >>
>> >> ...
>> >>
>> >> - int i, ret, section_count = 0;
>> >> + unsigned long i;
>> >>
>> >> ...
>> >>
>> >> - unsigned int i;
>> >> + unsigned long i;
>> > 
>> > Maybe I did too much fortran back in the day, but I think the
>> > expectation is that a variable called "i" has type "int".
...
>> Codying style says the following, which makes full sense in my opinion:
>> 
>> LOCAL variable names should be short, and to the point.  If you have
>> some random integer loop counter, it should probably be called ``i``.
>> Calling it ``loop_counter`` is non-productive, if there is no chance of it
>> being mis-understood.
>
> Well.  It did say "integer".  Calling an unsigned long `i' is flat out
> misleading.

I always thought `i` was for loop `index` not `integer`.

But I've never written any Fortran :)

cheers

Re: [PATCH] powerpc/32: fix build failure on book3e with KVM

2019-06-16 Thread Michael Ellerman

On Thu, 2019-05-23 at 08:39:27 UTC, Christophe Leroy wrote:
> Build failure was introduced by the commit identified below,
> due to missed macro expension leading to wrong called function's name.
> 
> arch/powerpc/kernel/head_fsl_booke.o: In function `SystemCall':
> arch/powerpc/kernel/head_fsl_booke.S:416: undefined reference to 
> `kvmppc_handler_BOOKE_INTERRUPT_SYSCALL_SPRN_SRR1'
> Makefile:1052: recipe for target 'vmlinux' failed
> 
> The called function should be kvmppc_handler_8_0x01B(). This patch fixes it.
> 
> Reported-by: Paul Mackerras 
> Fixes: 1a4b739bbb4f ("powerpc/32: implement fast entry for syscalls on BOOKE")
> Signed-off-by: Christophe Leroy 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/82f6e266f8123d7938713c0e10c03aa655b3e68a

cheers

Re: [PATCH] powerpc/32s: fix initial setup of segment registers on secondary CPU

2019-06-16 Thread Michael Ellerman

On Tue, 2019-06-11 at 15:47:20 UTC, Christophe Leroy wrote:
> The patch referenced below moved the loading of segment registers
> out of load_up_mmu() in order to do it earlier in the boot sequence.
> However, the secondary CPU still needs it to be done when loading up
> the MMU.
> 
> Reported-by: Erhard F. 
> Fixes: 215b823707ce ("powerpc/32s: set up an early static hash table for 
> KASAN")
> Signed-off-by: Christophe Leroy 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/b7f8b440f3001cc1775c028f0a783786113c2ae3

cheers

Re: [PATCH] powerpc/booke: fix fast syscall entry on SMP

2019-06-16 Thread Michael Ellerman

On Thu, 2019-06-13 at 13:52:30 UTC, Christophe Leroy wrote:
> Use r10 instead of r9 to calculate CPU offset as r9 contains
> the value from SRR1 which is used later.
> 
> Fixes: 1a4b739bbb4f ("powerpc/32: implement fast entry for syscalls on BOOKE")
> Signed-off-by: Christophe Leroy 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/e8732ffa2e096d433c3f2349b871d43ed0d39f5c

cheers

Re: [PATCH] ocxl: do not use C++ style comments in uapi header

2019-06-15 Thread Michael Ellerman

On Tue, 2019-06-04 at 11:16:32 UTC, Masahiro Yamada wrote:
> Linux kernel tolerates C++ style comments these days. Actually, the
> SPDX License tags for .c files start with //.
> 
> On the other hand, uapi headers are written in more strict C, where
> the C++ comment style is forbidden.
> 
> Signed-off-by: Masahiro Yamada 
> Acked-by: Frederic Barrat 
> Acked-by: Andrew Donnellan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/2305ff225c0b1691ec2e93f3d6990e13

cheers

[GIT PULL] Please pull powerpc/linux.git powerpc-5.2-4 tag

2019-06-15 Thread Michael Ellerman

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Linus,

Please pull some more powerpc fixes for 5.2:

The following changes since commit cd6c84d8f0cdc911df435bb075ba22ce3c605b07:

  Linux 5.2-rc2 (2019-05-26 16:49:19 -0700)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-5.2-4

for you to fetch changes up to c21f5a9ed85ca3e914ca11f421677ae9ae0d04b0:

  powerpc/32s: fix booting with CONFIG_PPC_EARLY_DEBUG_BOOTX (2019-06-07 
19:00:14 +1000)

- --
powerpc fixes for 5.2 #4

One fix for a regression introduced by our 32-bit KASAN support, which broke
booting on machines with "bootx" early debugging enabled.

A fix for a bug which broke kexec on 32-bit, introduced by changes to the 32-bit
STRICT_KERNEL_RWX support in v5.1.

Finally two fixes going to stable for our THP split/collapse handling,
discovered by Nick. The first fixes random crashes and/or corruption in guests
under sufficient load.

Thanks to:
  Nicholas Piggin, Christophe Leroy, Aaro Koskinen, Mathieu Malaterre.

- --
Christophe Leroy (2):
  powerpc: Fix kexec failure on book3s/32
  powerpc/32s: fix booting with CONFIG_PPC_EARLY_DEBUG_BOOTX

Nicholas Piggin (2):
  powerpc/64s: Fix THP PMD collapse serialisation
  powerpc/64s: __find_linux_pte() synchronization vs pmdp_invalidate()


 arch/powerpc/include/asm/book3s/64/pgtable.h | 30 
 arch/powerpc/include/asm/btext.h |  4 
 arch/powerpc/include/asm/kexec.h |  3 +++
 arch/powerpc/kernel/machine_kexec_32.c   |  4 +++-
 arch/powerpc/kernel/prom_init.c  |  1 +
 arch/powerpc/kernel/prom_init_check.sh   |  2 +-
 arch/powerpc/mm/book3s64/pgtable.c   |  3 +++
 arch/powerpc/mm/pgtable.c| 16 +--
 8 files changed, 59 insertions(+), 4 deletions(-)
-BEGIN PGP SIGNATURE-

iQIcBAEBAgAGBQJdBO4+AAoJEFHr6jzI4aWAsAwQAK+CK0jvw2pgZMk8/RwuPihJ
gr6pvRaiUuyyiCpWxpzHslZx0WYSg84EYaog4e3fRss6MZeTd4CxxJqAIIny2XTK
3Z6EI3GQGtA8U/+GY+whaQ5+ILdJotbPNRci+yGwc3HNZwT/4RScbmJz7E84MZv+
9gyXrKUio0RdtdZmMHtkrCbpg24QYf1+168gUlJ8H5XGy5NVXVhXwxbYcFeN4zIY
JI+exlBZwtYBJQMtR0FCvjybKk7kRdQzrrUEFM/ZmzsXQryUR7tLrwqAeLvcDc6x
CY9/fn2q7BcFRiOxeZ3AGG89NRTGdOOC1cNJ+Wqn8bIxzP/yFwTEr+lcbdpooCAs
MYyR0yoiI8Aty55lH0uTYQDbXWBZigvKDjLJzn3KN91NKnb3Yw37y5fM5e1ZYQez
bJmbcJJpQzv0YVxXpxd27QeLQtJe6B8D5y0HkpRzYifma5ItAzc1VGzp66jLRFT+
m4LmzD3TjQ61LWyxxDBjAWCHUKW7+cu++sFw0LOA2Wib5DjLjhQAu9qXN1sR5704
FXji4jULMajLMhqqMxjwTEatS46THyz2rqOtJ5/eRWOHBMBS8rHTbHRtFF20mL7x
tHtDmKCfFs2HwHOndtaWduBjiGVVwOo84o2jY0EvfaQ5nscf2XE9acVo6czpJacn
NnIsVZZ6RU/y4Q/f55T4
=Viyv
-END PGP SIGNATURE-

Re: [PATCH] powerpc: Enable kernel XZ compression option on PPC_85xx

2019-06-13 Thread Michael Ellerman

Daniel Axtens  writes:
> Pawel Dembicki  writes:
>
>> Enable kernel XZ compression option on PPC_85xx. Tested with
>> simpleImage on TP-Link TL-WDR4900 (Freescale P1014 processor).
>>
>> Suggested-by: Christian Lamparter 
>> Signed-off-by: Pawel Dembicki 
>> ---
>>  arch/powerpc/Kconfig | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index 8c1c636308c8..daf4cb968922 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -196,7 +196,7 @@ config PPC
>>  select HAVE_IOREMAP_PROT
>>  select HAVE_IRQ_EXIT_ON_IRQ_STACK
>>  select HAVE_KERNEL_GZIP
>> -select HAVE_KERNEL_XZ   if PPC_BOOK3S || 44x
>> +select HAVE_KERNEL_XZ   if PPC_BOOK3S || 44x || PPC_85xx
>
> (I'm not super well versed in the compression stuff, so apologies if
> this is a dumb question.) If it's this simple, is there any reason we
> can't turn it on generally, or convert it to a blacklist of platforms
> known not to work?

For some platforms enabling XZ requires that your u-boot has XZ support,
and I'm not very clear on when that support landed in u-boot and what
boards have it. And there are boards out there with old/custom u-boots
that effectively can't be updated.

But as a server guy I don't really know the details of all that very
well. So if someone tells me that we should enable XZ for everything, or
as you say just black list some platforms, then that's fine by me.

cheers

Re: [PATCH] powerpc/32s: fix booting with CONFIG_PPC_EARLY_DEBUG_BOOTX

2019-06-11 Thread Michael Ellerman

On Mon, 2019-06-03 at 13:00:51 UTC, Christophe Leroy wrote:
> When booting through OF, setup_disp_bat() does nothing because
> disp_BAT are not set. By change, it used to work because BOOTX
> buffer is mapped 1:1 at address 0x8100 by the bootloader, and
> btext_setup_display() sets virt addr same as phys addr.
> 
> But since commit 215b823707ce ("powerpc/32s: set up an early static
> hash table for KASAN."), a temporary page table overrides the
> bootloader mapping.
> 
> This 0x8100 is also problematic with the newly implemented
> Kernel Userspace Access Protection (KUAP) because it is within user
> address space.
> 
> This patch fixes those issues by properly setting disp_BAT through
> a call to btext_prepare_BAT(), allowing setup_disp_bat() to
> properly setup BAT3 for early bootx screen buffer access.
> 
> Reported-by: Mathieu Malaterre 
> Fixes: 215b823707ce ("powerpc/32s: set up an early static hash table for 
> KASAN.")
> Signed-off-by: Christophe Leroy 
> Tested-by: Mathieu Malaterre 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/c21f5a9ed85ca3e914ca11f421677ae9

cheers

Re: [PATCH v3] powerpc: fix kexec failure on book3s/32

2019-06-11 Thread Michael Ellerman

On Mon, 2019-06-03 at 08:20:28 UTC, Christophe Leroy wrote:
> In the old days, _PAGE_EXEC didn't exist on 6xx aka book3s/32.
> Therefore, allthough __mapin_ram_chunk() was already mapping kernel
> text with PAGE_KERNEL_TEXT and the rest with PAGE_KERNEL, the entire
> memory was executable. Part of the memory (first 512kbytes) was
> mapped with BATs instead of page table, but it was also entirely
> mapped as executable.
> 
> In commit 385e89d5b20f ("powerpc/mm: add exec protection on
> powerpc 603"), we started adding exec protection to some 6xx, namely
> the 603, for pages mapped via pagetables.
> 
> Then, in commit 63b2bc619565 ("powerpc/mm/32s: Use BATs for
> STRICT_KERNEL_RWX"), the exec protection was extended to BAT mapped
> memory, so that really only the kernel text could be executed.
> 
> The problem here is that kexec is based on copying some code into
> upper part of memory then executing it from there in order to install
> a fresh new kernel at its definitive location.
> 
> However, the code is position independant and first part of it is
> just there to deactivate the MMU and jump to the second part. So it
> is possible to run this first part inplace instead of running the
> copy. Once the MMU is off, there is no protection anymore and the
> second part of the code will just run as before.
> 
> Reported-by: Aaro Koskinen 
> Fixes: 63b2bc619565 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Christophe Leroy 
> Tested-by: Aaro Koskinen 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/6c284228eb356a1ec62a704b4d232971

cheers

Re: [PATCH] Powerpc/Watchpoint: Restore nvgprs while returning from exception

2019-06-06 Thread Michael Ellerman

Ravi Bangoria  writes:

> Powerpc hw triggers watchpoint before executing the instruction.
> To make trigger-after-execute behavior, kernel emulates the
> instruction. If the instruction is 'load something into non-
> volatile register', exception handler should restore emulated
> register state while returning back, otherwise there will be
> register state corruption. Ex, Adding a watchpoint on a list
> can corrput the list:
>
>   # cat /proc/kallsyms | grep kthread_create_list
>   c121c8b8 d kthread_create_list
>
> Add watchpoint on kthread_create_list->next:
>
>   # perf record -e mem:0xc121c8c0
>
> Run some workload such that new kthread gets invoked. Ex, I
> just logged out from console:
>
>   list_add corruption. next->prev should be prev (c1214e00), \
>   but was c121c8b8. (next=c121c8b8).
>   WARNING: CPU: 59 PID: 309 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0
>   CPU: 59 PID: 309 Comm: kworker/59:0 Kdump: loaded Not tainted 5.1.0-rc7+ #69
>   ...
>   NIP __list_add_valid+0xb4/0xc0
>   LR __list_add_valid+0xb0/0xc0
>   Call Trace:
>   __list_add_valid+0xb0/0xc0 (unreliable)
>   __kthread_create_on_node+0xe0/0x260
>   kthread_create_on_node+0x34/0x50
>   create_worker+0xe8/0x260
>   worker_thread+0x444/0x560
>   kthread+0x160/0x1a0
>   ret_from_kernel_thread+0x5c/0x70

This all depends on what code the compiler generates for the list
access. Can you include a disassembly of the relevant code in your
kernel so we have an example of the bad case.

> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index 9481a11..96de0d1 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1753,7 +1753,7 @@ handle_dabr_fault:
>   ld  r5,_DSISR(r1)
>   addir3,r1,STACK_FRAME_OVERHEAD
>   bl  do_break
> -12:  b   ret_from_except_lite
> +12:  b   ret_from_except

This probably warrants a comment explaining why we can't use the (badly
named) "lite" version.

cheers

Re: [RFC V2] mm: Generalize notify_page_fault()

2019-06-05 Thread Michael Ellerman

Anshuman Khandual  writes:
> Similar notify_page_fault() definitions are being used by architectures
> duplicating much of the same code. This attempts to unify them into a
> single implementation, generalize it and then move it to a common place.
> kprobes_built_in() can detect CONFIG_KPROBES, hence notify_page_fault()
> need not be wrapped again within CONFIG_KPROBES. Trap number argument can
> now contain upto an 'unsigned int' accommodating all possible platforms.
...
> diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> index 58f69fa..1bc3b18 100644
> --- a/arch/arm/mm/fault.c
> +++ b/arch/arm/mm/fault.c
> @@ -30,28 +30,6 @@
>  
>  #ifdef CONFIG_MMU
>  
> -#ifdef CONFIG_KPROBES
> -static inline int notify_page_fault(struct pt_regs *regs, unsigned int fsr)
> -{
> - int ret = 0;
> -
> - if (!user_mode(regs)) {
> - /* kprobe_running() needs smp_processor_id() */
> - preempt_disable();
> - if (kprobe_running() && kprobe_fault_handler(regs, fsr))
> - ret = 1;
> - preempt_enable();
> - }
> -
> - return ret;
> -}
> -#else

You've changed several of the architectures from something like above,
where it disables preemption around the call into the below:

> +int __kprobes notify_page_fault(struct pt_regs *regs, unsigned int trap)
> +{
> + int ret = 0;
> +
> + /*
> +  * To be potentially processing a kprobe fault and to be allowed
> +  * to call kprobe_running(), we have to be non-preemptible.
> +  */
> + if (kprobes_built_in() && !preemptible() && !user_mode(regs)) {
> + if (kprobe_running() && kprobe_fault_handler(regs, trap))
> + ret = 1;
> + }
> + return ret;
> +}

Which skips everything if we're preemptible. Is that an equivalent
change? If so can you please explain why in more detail.

Also why not have it return bool?

cheers

Re: [PATCH v3] scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work

2019-06-05 Thread Michael Ellerman

"Martin K. Petersen"  writes:
> Nathan,
>
>> clang warns:
>>
>> drivers/scsi/ibmvscsi/ibmvscsi.c:2126:7: warning: variable 'rc' is used
>> uninitialized whenever switch case is taken [-Wsometimes-uninitialized]
>> case IBMVSCSI_HOST_ACTION_NONE:
>>  ^
>
> Applied to 5.3/scsi-queue, thanks!

Thanks all.

cheers

Re: [PATCH] perf: Fix oops when kthread execs user process

2019-06-03 Thread Michael Ellerman

Will Deacon  writes:
> On Thu, May 30, 2019 at 03:57:36PM +0530, Ravi Bangoria wrote:
>> On 5/30/19 2:08 PM, Ravi Bangoria wrote:
>> >> ---
>> >> Subject: perf: Fix perf_sample_regs_user()
>> >> From: Peter Zijlstra 
>> >> Date: Wed May 29 14:37:24 CEST 2019
>> >>
>> >> perf_sample_regs_user() uses 'current->mm' to test for the presence of
>> >> userspace, but this is insufficient, consider use_mm().
>> >>
>> >> A better test is: '!(current->flags & PF_KTHREAD)', exec() clears
>> >> PF_KTHREAD after it sets the new ->mm but before it drops to userspace
>> >> for the first time.
>> > 
>> > This looks correct. I'll give it a try.
>> > 
>> >>
>> >> Possibly obsoletes: bf05fc25f268 ("powerpc/perf: Fix oops when kthread 
>> >> execs user process")
>> >>
>> >> Reported-by: Ravi Bangoria 
>> >> Reported-by: Young Xiao <92siuy...@gmail.com>
>> >> Cc: Ravi Bangoria 
>> >> Cc: Naveen N. Rao 
>> >> Cc: Michael Ellerman 
>> >> Cc: Jiri Olsa 
>> >> Cc: Frederic Weisbecker 
>> >> Cc: Stephane Eranian 
>> >> Cc: Arnaldo Carvalho de Melo 
>> >> Acked-by: Will Deacon 
>> >> Fixes: 4018994f3d87 ("perf: Add ability to attach user level registers 
>> >> dump to sample")
>> >> Signed-off-by: Peter Zijlstra (Intel) 
>> >> ---
>> >>  kernel/events/core.c |2 +-
>> >>  1 file changed, 1 insertion(+), 1 deletion(-)
>> >>
>> >> --- a/kernel/events/core.c
>> >> +++ b/kernel/events/core.c
>> >> @@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct
>> >>   if (user_mode(regs)) {
>> >>   regs_user->abi = perf_reg_abi(current);
>> >>   regs_user->regs = regs;
>> >> - } else if (current->mm) {
>> >> + } else if (!(current->flags & PF_KTHREAD)) {
>> 
>> With this patch applied and my patch reverted, I still see it crashing
>> because current->flags does not have PF_KTHREAD set. Sample trace with
>> v5.0 kernel:
>> 
>> 
>>BUG: Kernel NULL pointer dereference at 0x
>>Faulting instruction address: 0xc00f1a6c
>>Oops: Kernel access of bad area, sig: 11 [#1]
>>LE SMP NR_CPUS=2048 NUMA pSeries
>>CPU: 17 PID: 3241 Comm: systemd-cgroups Kdump: loaded Not tainted 5.0.0+ 
>> #7
>>NIP:  c00f1a6c LR: c02acc7c CTR: c02a8f90
>>REGS: c001e80469a0 TRAP: 0300   Not tainted  (5.0.0+)
>>MSR:  80001033   CR: 48022448  XER: 2000
>>CFAR: c000deb4 DAR:  DSISR: 4000 IRQMASK: 1 
>>GPR00: c02acc7c c001e8046c30 c1607500 
>>  
>>GPR04:    
>> c0128618 
>>GPR08: 07ff   
>> c001cd40 
>>GPR12: c0446fd8 c0003ffdf080  
>> 0007 
>>GPR16: c001edd74988 c001edd60400 7fff89801130 
>> 0005e1b0 
>>GPR20: c001edb77a08 c001e8047208 c001f03d9800 
>> c001e8046e00 
>>GPR24: b1af c1126938 c001f03d9b28 
>> 0001 
>>GPR28: c001e8046d30   
>>  
>>NIP [c00f1a6c] perf_reg_value+0x5c/0xc0
>>LR [c02acc7c] perf_output_sample_regs+0x6c/0xd0
>>Call Trace:
>>[c001e8046c30] [c02acc7c] perf_output_sample_regs+0x6c/0xd0 
>> (unreliable)
>>[c001e8046c80] [c02b9cd0] perf_output_sample+0x620/0x8c0
>>[c001e8046d10] [c02ba6b4] perf_event_output_forward+0x64/0x90
>>[c001e8046d80] [c02b2908] __perf_event_overflow+0x88/0x1e0
>>[c001e8046dd0] [c00f3d18] record_and_restart+0x288/0x670
>>[c001e8047180] [c00f4c18] perf_event_interrupt+0x2b8/0x4b0
>>[c001e8047280] [c002b380] 
>> performance_monitor_exception+0x50/0x70
>>[c001e80472a0] [c0009ca0] 
>> performance_monitor_common+0x110/0x120
>>--- interrupt: f01 at slice_scan_available+0x20/0xc0
>>LR = slice_find_area+0x174/0x210
>>[c001e8047630] [c0083ea0] slice_get_unmapped_area+0x3d0/0x7f0
>>[c001e8047ae0]

Re: [PATCH 1/2] powerpc/lib: fix redundant inclusion of quad.o

2019-06-03 Thread Michael Ellerman

On Mon, 2019-05-13 at 10:00:14 UTC, Christophe Leroy wrote:
> quad.o is only for PPC64, and already included in obj64-y,
> so it doesn't have to be in obj-y
> 
> Fixes: 31bfdb036f12 ("powerpc: Use instruction emulation infrastructure to 
> handle alignment faults")
> Signed-off-by: Christophe Leroy 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f8e0d0fddf87f26aed576983f752329d

cheers

Re: [PATCH -next] powerpc/book3s64: Make some symbols static

2019-06-03 Thread Michael Ellerman

On Sat, 2019-05-04 at 10:24:27 UTC, YueHaibing wrote:
> Fix sparse warnings:
> 
> arch/powerpc/mm/book3s64/radix_pgtable.c:326:13: warning: symbol 
> 'radix_init_pgtable' was not declared. Should it be static?
> arch/powerpc/mm/book3s64/hash_native.c:48:1: warning: symbol 
> 'native_tlbie_lock' was not declared. Should it be static?
> arch/powerpc/mm/book3s64/hash_utils.c:988:24: warning: symbol 
> 'init_hash_mm_context' was not declared. Should it be static?
> 
> Signed-off-by: YueHaibing 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/d667edc01bedcd23988ef69f851e7cc2

cheers

Re: [PATCH] dlpar: Fix a missing-check bug in dlpar_parse_cc_property()

2019-06-03 Thread Michael Ellerman

On Sun, 2019-05-26 at 02:42:40 UTC, Gen Zhang wrote:
> In dlpar_parse_cc_property(), 'prop->name' is allocated by kstrdup().
> kstrdup() may return NULL, so it should be checked and handle error.
> And prop should be freed if 'prop->name' is NULL.
> 
> Signed-off-by: Gen Zhang 
> Acked-by: Nathan Lynch 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/efa9ace68e487ddd29c2b4d6dd232421

cheers

Re: [PATCH -next] misc: ocxl: Make ocxl_remove static

2019-06-03 Thread Michael Ellerman

On Sat, 2019-05-04 at 10:27:20 UTC, YueHaibing wrote:
> Fix sparse warning:
> 
> drivers/misc/ocxl/pci.c:44:6: warning:
>  symbol 'ocxl_remove' was not declared. Should it be static?
> 
> Reported-by: Hulk Robot 
> Signed-off-by: YueHaibing 
> Acked-by: Andrew Donnellan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/00b0cdbbc87fb4b531a0d75f4004ed3d

cheers

Re: [PATCH] powerpc/powernv/npu: Fix reference leak

2019-06-03 Thread Michael Ellerman

On Fri, 2019-04-19 at 15:34:13 UTC, Greg Kurz wrote:
> Since 902bdc57451c, get_pci_dev() calls pci_get_domain_bus_and_slot(). This
> has the effect of incrementing the reference count of the PCI device, as
> explained in drivers/pci/search.c:
> 
>  * Given a PCI domain, bus, and slot/function number, the desired PCI
>  * device is located in the list of PCI devices. If the device is
>  * found, its reference count is increased and this function returns a
>  * pointer to its data structure.  The caller must decrement the
>  * reference count by calling pci_dev_put().  If no device is found,
>  * %NULL is returned.
> 
> Nothing was done to call pci_dev_put() and the reference count of GPU and
> NPU PCI devices rockets up.
> 
> A natural way to fix this would be to teach the callers about the change,
> so that they call pci_dev_put() when done with the pointer. This turns
> out to be quite intrusive, as it affects many paths in npu-dma.c,
> pci-ioda.c and vfio_pci_nvlink2.c. Also, the issue appeared in 4.16 and
> some affected code got moved around since then: it would be problematic
> to backport the fix to stable releases.
> 
> All that code never cared for reference counting anyway. Call pci_dev_put()
> from get_pci_dev() to revert to the previous behavior.
> 
> Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev from 
> pci_dn")
> Cc: sta...@vger.kernel.org # v4.16
> Signed-off-by: Greg Kurz 
> Reviewed-by: Alexey Kardashevskiy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/02c5f5394918b9b47ff4357b1b183357

cheers

[GIT PULL] Please pull powerpc/linux.git powerpc-5.2-3 tag

2019-06-02 Thread Michael Ellerman

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Linus,

Please pull some more powerpc fixes for 5.2:

The following changes since commit a188339ca5a396acc588e5851ed7e19f66b0ebd9:

  Linux 5.2-rc1 (2019-05-19 15:47:09 -0700)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-5.2-3

for you to fetch changes up to 8b909e3548706cbebc0a676067b81aadda57f47e:

  powerpc/kexec: Fix loading of kernel + initramfs with kexec_file_load() 
(2019-05-23 14:00:32 +1000)

- --
powerpc fixes for 5.2 #3

A minor fix to our IMC PMU code to print a less confusing error message when the
driver can't initialise properly.

A fix for a bug where a user requesting an unsupported branch sampling filter
can corrupt PMU state, preventing the PMU from counting properly.

And finally a fix for a bug in our support for kexec_file_load(), which
prevented loading a kernel and initramfs. Most versions of kexec don't yet use
kexec_file_load().

Thanks to:
  Anju T Sudhakar, Dave Young, Madhavan Srinivasan, Ravi Bangoria, Thiago Jung
  Bauermann.

- --
Anju T Sudhakar (1):
  powerpc/powernv: Return for invalid IMC domain

Ravi Bangoria (1):
  powerpc/perf: Fix MMCRA corruption by bhrb_filter

Thiago Jung Bauermann (1):
  powerpc/kexec: Fix loading of kernel + initramfs with kexec_file_load()


 arch/powerpc/kernel/kexec_elf_64.c| 6 +-
 arch/powerpc/perf/core-book3s.c   | 6 --
 arch/powerpc/perf/power8-pmu.c| 3 +++
 arch/powerpc/perf/power9-pmu.c| 3 +++
 arch/powerpc/platforms/powernv/opal-imc.c | 4 
 5 files changed, 19 insertions(+), 3 deletions(-)
-BEGIN PGP SIGNATURE-

iQIcBAEBAgAGBQJc860sAAoJEFHr6jzI4aWAgk0QAJ7e67M/DrigLDIi5LdnwDQQ
AtQW+QzeoBHWiSgWfibqv5NjC9XCdOtvbOkD44TAlF99YMe5k8wShLLwiPSCIYEu
7r83+NHPp7jpeoO8fmE4dTJsmp4Ez+cJfOKpAF6h2w+1yJ5gL2AP5wNLUBi6Cliw
lUIRb73JgWj2hwu0HMNAxbE+mlyIpi8fXRk8TeUXVB+IEInOQxU0x/RkxqN4cCtG
f0hzAnZPywdDvRBuU6roPU3zrII7nVgrLUPXjgin/v58sdqR7zFnWnsm+ou0jkuy
K5zMcCuqZ6lrYjoak+OiqOt8CcalBtqju9ZANQkDIe5hMhXn4Maex1YbFE0i1UYm
Ljbm6Dp4dSTxQcx7GV1xMzHGNHEMQFKSABX+jF9l/KOVl4aVZAEz5F6DNKZO4lo+
EVX9HPBb6ZPyvwntLei8zn9C9LiSVWP5zsAW2zFam4isi498Ca1YpAyoSd58NrRn
WXVcDwMIp9c8uiQllbICNWdHzJhJWhUu/lW2idKFy05zG5+g6dg80StVpYlaEZwK
jggKwkD1H9VWrOZjoIHceOWPUxfjJ6wrvkovGqQqx6l9CkpEYfn1EG0p4s7CwXZ/
wEq0wVsfVaHPkEDSHSwg8mI9ZaEwoY6WMXE63WbVPNMWN26yw5vLJJx4o91Gk/wq
1F4UgNfF5XuQu8m5NWyU
=qxoS
-END PGP SIGNATURE-

Re: [PATCH] scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work

2019-06-02 Thread Michael Ellerman

Hi Nathan,

Nathan Chancellor  writes:
> clang warns:
>
> drivers/scsi/ibmvscsi/ibmvscsi.c:2126:7: warning: variable 'rc' is used
> uninitialized whenever switch case is taken [-Wsometimes-uninitialized]
> case IBMVSCSI_HOST_ACTION_NONE:
>  ^
> drivers/scsi/ibmvscsi/ibmvscsi.c:2151:6: note: uninitialized use occurs
> here
> if (rc) {
> ^~
>
> Initialize rc to zero so that the atomic_set and dev_err statement don't
> trigger for the cases that just break.
>
> Fixes: 035a3c4046b5 ("scsi: ibmvscsi: redo driver work thread to use enum 
> action states")
> Link: https://github.com/ClangBuiltLinux/linux/issues/502
> Signed-off-by: Nathan Chancellor 
> ---
>  drivers/scsi/ibmvscsi/ibmvscsi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c 
> b/drivers/scsi/ibmvscsi/ibmvscsi.c
> index 727c31dc11a0..6714d8043e62 100644
> --- a/drivers/scsi/ibmvscsi/ibmvscsi.c
> +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
> @@ -2118,7 +2118,7 @@ static unsigned long ibmvscsi_get_desired_dma(struct 
> vio_dev *vdev)
>  static void ibmvscsi_do_work(struct ibmvscsi_host_data *hostdata)
>  {
>   unsigned long flags;
> - int rc;
> + int rc = 0;
>   char *action = "reset";
>  
>   spin_lock_irqsave(hostdata->host->host_lock, flags);

It's always preferable IMHO to keep any initialisation as localised as
possible, so that the compiler can continue to warn about uninitialised
usages elsewhere. In this case that would mean doing the rc = 0 in the
switch, something like:

diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index 727c31dc11a0..7ee5755cf636 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -2123,9 +2123,6 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data 
*hostdata)
 
spin_lock_irqsave(hostdata->host->host_lock, flags);
switch (hostdata->action) {
-   case IBMVSCSI_HOST_ACTION_NONE:
-   case IBMVSCSI_HOST_ACTION_UNBLOCK:
-   break;
case IBMVSCSI_HOST_ACTION_RESET:
spin_unlock_irqrestore(hostdata->host->host_lock, flags);
rc = ibmvscsi_reset_crq_queue(&hostdata->queue, hostdata);
@@ -2142,7 +2139,10 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data 
*hostdata)
if (!rc)
rc = ibmvscsi_send_crq(hostdata, 0xC001LL, 
0);
break;
+   case IBMVSCSI_HOST_ACTION_NONE:
+   case IBMVSCSI_HOST_ACTION_UNBLOCK:
default:
+   rc = 0;
break;
}


But then that makes me wonder if that's actually correct?

If we get an action that we don't recognise should we just throw it away
like that? (by doing hostdata->action = IBMVSCSI_HOST_ACTION_NONE). Tyrel?

cheers

Re: [PATCH] perf: Fix oops when kthread execs user process

2019-05-28 Thread Michael Ellerman

Will Deacon  writes:
> On Tue, May 28, 2019 at 04:01:03PM +0200, Peter Zijlstra wrote:
>> On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote:
>> > When a kthread calls call_usermodehelper() the steps are:
>> >   1. allocate current->mm
>> >   2. load_elf_binary()
>> >   3. populate current->thread.regs
>> > 
>> > While doing this, interrupts are not disabled. If there is a perf
>> > interrupt in the middle of this process (i.e. step 1 has completed
>> > but not yet reached to step 3) and if perf tries to read userspace
>> > regs, kernel oops.
>
> This seems to be because pt_regs(current) gives NULL for kthreads on Power.

Right, we've done that since roughly forever in copy_thread():

int copy_thread(unsigned long clone_flags, unsigned long usp,
unsigned long kthread_arg, struct task_struct *p)
{
...
/* Copy registers */
sp -= sizeof(struct pt_regs);
childregs = (struct pt_regs *) sp;
if (unlikely(p->flags & PF_KTHREAD)) {
/* kernel thread */
memset(childregs, 0, sizeof(struct pt_regs));
childregs->gpr[1] = sp + sizeof(struct pt_regs);
...
p->thread.regs = NULL;  /* no user register state */

See commit from 2002:

https://github.com/mpe/linux-fullhistory/commit/c0a96c0918d21d8a99270e94d9c4a4a322d04581#diff-edb76bfcc84905163f34d24d2aad3f3aR187

> From the initial report [1], it doesn't look like the mm isn't initialised,
> but rather than we're dereferencing a NULL pt_regs pointer somehow for the
> current task (see previous comment). I don't see how that can happen on
> arm64, given that we put the pt_regs on the kernel stack which is allocated
> during fork.

We have the regs on the stack too (see above), but we're explicitly
NULL'ing the link from task->thread.

Looks like on arm64 and x86 there is no link from task->thread, instead
you get from task to pt_regs via task_stack_page().

That actually seems potentially fishy given the comment on
task_stack_page() about the stack going away for exiting tasks. We
should probably be NULL'ing the regs pointer in free_thread_stack() or
similar. Though that race mustn't be happening because other arches
would see it.

Or are we just wrong and kthreads should have non-NULL regs? I can't
find another arch that does the same as us.

cheers

Re: [PATCH] perf: Fix oops when kthread execs user process

2019-05-28 Thread Michael Ellerman

Peter Zijlstra  writes:
> On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote:
>> When a kthread calls call_usermodehelper() the steps are:
>>   1. allocate current->mm
>>   2. load_elf_binary()
>>   3. populate current->thread.regs
>> 
>> While doing this, interrupts are not disabled. If there is a perf
>> interrupt in the middle of this process (i.e. step 1 has completed
>> but not yet reached to step 3) and if perf tries to read userspace
>> regs, kernel oops.
>> 
>> Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace
>> pt_regs are not set.
>> 
>> See commit bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs
>> user process") for details.
>
> Why the hell do we set current->mm before it is complete? Note that
> normally exec() builds the new mm before attaching it, see exec_mmap()
> in flush_old_exec().
>
> Also, why did those PPC folks 'fix' this in isolation? And why didn't
> you Cc them?

We just assumed it was our bug, 'cause we have plenty of those :)

cheers

Re: [PATCH v2] powerpc/32: sstep: Move variable `rc` within CONFIG_PPC64 sentinels

2019-05-28 Thread Michael Ellerman

Mathieu Malaterre  writes:

> Fix warnings treated as errors with W=1:
>
>   arch/powerpc/lib/sstep.c:1172:31: error: variable 'rc' set but not used 
> [-Werror=unused-but-set-variable]
>
> Suggested-by: Christophe Leroy 
> Signed-off-by: Mathieu Malaterre 
> ---
> v2: as suggested prefer CONFIG_PPC64 sentinel instead of unused keyword

I'd rather avoid adding more ifdefs if we can.

I think this works?

cheers

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 3d33fb509ef4..600b036ddfda 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1169,7 +1169,7 @@ static nokprobe_inline int trap_compare(long v1, long v2)
 int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
  unsigned int instr)
 {
-   unsigned int opcode, ra, rb, rc, rd, spr, u;
+   unsigned int opcode, ra, rb, rd, spr, u;
unsigned long int imm;
unsigned long int val, val2;
unsigned int mb, me, sh;
@@ -1292,7 +1292,6 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
rd = (instr >> 21) & 0x1f;
ra = (instr >> 16) & 0x1f;
rb = (instr >> 11) & 0x1f;
-   rc = (instr >> 6) & 0x1f;
 
switch (opcode) {
 #ifdef __powerpc64__
@@ -1307,10 +1306,14 @@ int analyse_instr(struct instruction_op *op, const 
struct pt_regs *regs,
return 1;
 
 #ifdef __powerpc64__
-   case 4:
+   case 4: {
+   unsigned int rc;
+
if (!cpu_has_feature(CPU_FTR_ARCH_300))
return -1;
 
+   rc = (instr >> 6) & 0x1f;
+
switch (instr & 0x3f) {
case 48:/* maddhd */
asm volatile(PPC_MADDHD(%0, %1, %2, %3) :
@@ -1336,6 +1339,7 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
 * primary opcode which do not have emulation support yet.
 */
return -1;
+   }
 #endif
 
case 7: /* mulli */

Re: [PATCH 1/2] perf ioctl: Add check for the sample_period value

2019-05-28 Thread Michael Ellerman

Ravi Bangoria  writes:
> On 5/13/19 2:26 PM, Peter Zijlstra wrote:
>> On Mon, May 13, 2019 at 09:42:13AM +0200, Peter Zijlstra wrote:
>>> On Sat, May 11, 2019 at 08:12:16AM +0530, Ravi Bangoria wrote:
 Add a check for sample_period value sent from userspace. Negative
 value does not make sense. And in powerpc arch code this could cause
 a recursive PMI leading to a hang (reported when running perf-fuzzer).

 Signed-off-by: Ravi Bangoria 
 ---
  kernel/events/core.c | 3 +++
  1 file changed, 3 insertions(+)

 diff --git a/kernel/events/core.c b/kernel/events/core.c
 index abbd4b3b96c2..e44c90378940 100644
 --- a/kernel/events/core.c
 +++ b/kernel/events/core.c
 @@ -5005,6 +5005,9 @@ static int perf_event_period(struct perf_event 
 *event, u64 __user *arg)
if (perf_event_check_period(event, value))
return -EINVAL;
  
 +  if (!event->attr.freq && (value & (1ULL << 63)))
 +  return -EINVAL;
>>>
>>> Well, perf_event_attr::sample_period is __u64. Would not be the site
>>> using it as signed be the one in error?
>> 
>> You forgot to mention commit: 0819b2e30ccb9, so I guess this just makes
>> it consistent and is fine.
>> 
>
> Yeah, I was about to reply :)

I've taken patch 2. You should probably do a v2 of patch 1 with an
updated change log that explains things fully?

cheers

Re: [PATCH v3 14/16] powerpc/32: implement fast entry for syscalls on BOOKE

2019-05-27 Thread Michael Ellerman

Christophe Leroy  writes:
> Le 23/05/2019 à 09:00, Christophe Leroy a écrit :
>
> [...]
>
>>> arch/powerpc/kernel/head_fsl_booke.o: In function `SystemCall':
>>> arch/powerpc/kernel/head_fsl_booke.S:416: undefined reference to 
>>> `kvmppc_handler_BOOKE_INTERRUPT_SYSCALL_SPRN_SRR1'
>>> Makefile:1052: recipe for target 'vmlinux' failed
>>>
 +.macro SYSCALL_ENTRY trapno intno
 +    mfspr    r10, SPRN_SPRG_THREAD
 +#ifdef CONFIG_KVM_BOOKE_HV
 +BEGIN_FTR_SECTION
 +    mtspr    SPRN_SPRG_WSCRATCH0, r10
 +    stw    r11, THREAD_NORMSAVE(0)(r10)
 +    stw    r13, THREAD_NORMSAVE(2)(r10)
 +    mfcr    r13    /* save CR in r13 for now   */
 +    mfspr    r11, SPRN_SRR1
 +    mtocrf    0x80, r11    /* check MSR[GS] without clobbering reg */
 +    bf    3, 1975f
 +    b    kvmppc_handler_BOOKE_INTERRUPT_\intno\()_SPRN_SRR1
>>>
>>> It seems to me that the "_SPRN_SRR1" on the end of this line
>>> isn't meant to be there...  However, it still fails to link with that
>>> removed.
>
> It looks like I missed the macro expansion.
>
> The called function should be kvmppc_handler_8_0x01B
>
> Seems like kisskb doesn't build any config like this.

I thought we did, ie:

http://kisskb.ellerman.id.au/kisskb/buildresult/13817941/

But clearly something is missing to trigger the bug.

cheers

Re: [PATCH v3 3/3] tests: add close_range() tests

2019-05-27 Thread Michael Ellerman

Christian Brauner  writes:
> This adds basic tests for the new close_range() syscall.
> - test that no invalid flags can be passed
> - test that a range of file descriptors is correctly closed
> - test that a range of file descriptors is correctly closed if there there
>   are already closed file descriptors in the range
> - test that max_fd is correctly capped to the current fdtable maximum
>
> Signed-off-by: Christian Brauner 
> Cc: Arnd Bergmann 
> Cc: Jann Horn 
> Cc: David Howells 
> Cc: Dmitry V. Levin 
> Cc: Oleg Nesterov 
> Cc: Linus Torvalds 
> Cc: Florian Weimer 
> Cc: Shuah Khan 
> Cc: linux-...@vger.kernel.org
> Cc: linux-kselft...@vger.kernel.org
> ---
> v1: unchanged
> v2:
> - Christian Brauner :
>   - verify that close_range() correctly closes a single file descriptor
> v3:
> - Christian Brauner :
>   - add missing Cc for Shuah
>   - add missing Cc for linux-kselftest

Sorry I replied to v2, but I think my comments still apply:

https://lore.kernel.org/lkml/8736kzqpdm@concordia.ellerman.id.au/

cheers

Re: [PATCH v2] powerpc/power: Expose pfn_is_nosave prototype

2019-05-27 Thread Michael Ellerman

"Rafael J. Wysocki"  writes:
> On Friday, May 24, 2019 12:44:18 PM CEST Mathieu Malaterre wrote:
>> The declaration for pfn_is_nosave is only available in
>> kernel/power/power.h. Since this function can be override in arch,
>> expose it globally. Having a prototype will make sure to avoid warning
>> (sometime treated as error with W=1) such as:
>> 
>>   arch/powerpc/kernel/suspend.c:18:5: error: no previous prototype for 
>> 'pfn_is_nosave' [-Werror=missing-prototypes]
>> 
>> This moves the declaration into a globally visible header file and add
>> missing include to avoid a warning on powerpc. Also remove the
>> duplicated prototypes since not required anymore.
>> 
>> Cc: Christophe Leroy 
>> Signed-off-by: Mathieu Malaterre 
>> ---
>> v2: As suggestion by christophe remove duplicates prototypes
>> 
>>  arch/powerpc/kernel/suspend.c | 1 +
>>  arch/s390/kernel/entry.h  | 1 -
>>  include/linux/suspend.h   | 1 +
>>  kernel/power/power.h  | 2 --
>>  4 files changed, 2 insertions(+), 3 deletions(-)
>> 
>> diff --git a/kernel/power/power.h b/kernel/power/power.h
>> index 9e58bdc8a562..44bee462ff57 100644
>> --- a/kernel/power/power.h
>> +++ b/kernel/power/power.h
>> @@ -75,8 +75,6 @@ static inline void hibernate_reserved_size_init(void) {}
>>  static inline void hibernate_image_size_init(void) {}
>>  #endif /* !CONFIG_HIBERNATION */
>>  
>> -extern int pfn_is_nosave(unsigned long);
>> -
>>  #define power_attr(_name) \
>>  static struct kobj_attribute _name##_attr = {   \
>>  .attr   = { \
>> 
>
> With an ACK from the powerpc maintainers, I could apply this one.

Sent.

cheers

Re: [PATCH v2] powerpc/power: Expose pfn_is_nosave prototype

2019-05-27 Thread Michael Ellerman

Mathieu Malaterre  writes:
> The declaration for pfn_is_nosave is only available in
> kernel/power/power.h. Since this function can be override in arch,
> expose it globally. Having a prototype will make sure to avoid warning
> (sometime treated as error with W=1) such as:
>
>   arch/powerpc/kernel/suspend.c:18:5: error: no previous prototype for 
> 'pfn_is_nosave' [-Werror=missing-prototypes]
>
> This moves the declaration into a globally visible header file and add
> missing include to avoid a warning on powerpc. Also remove the
> duplicated prototypes since not required anymore.
>
> Cc: Christophe Leroy 
> Signed-off-by: Mathieu Malaterre 
> ---
> v2: As suggestion by christophe remove duplicates prototypes
>
>  arch/powerpc/kernel/suspend.c | 1 +
>  arch/s390/kernel/entry.h  | 1 -
>  include/linux/suspend.h   | 1 +
>  kernel/power/power.h  | 2 --
>  4 files changed, 2 insertions(+), 3 deletions(-)

Looks fine to me.

Acked-by: Michael Ellerman  (powerpc)

cheers

Re: [PATCH 2/2] powerpc/perf: Fix mmcra corruption by bhrb_filter

2019-05-24 Thread Michael Ellerman

On Sat, 2019-05-11 at 02:42:17 UTC, Ravi Bangoria wrote:
> Consider a scenario where user creates two events:
> 
>   1st event:
> attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
> attr.branch_sample_type = PERF_SAMPLE_BRANCH_ANY;
> fd = perf_event_open(attr, 0, 1, -1, 0);
> 
>   This sets cpuhw->bhrb_filter to 0 and returns valid fd.
> 
>   2nd event:
> attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
> attr.branch_sample_type = PERF_SAMPLE_BRANCH_CALL;
> fd = perf_event_open(attr, 0, 1, -1, 0);
> 
>   It overrides cpuhw->bhrb_filter to -1 and returns with error.
> 
> Now if power_pmu_enable() gets called by any path other than
> power_pmu_add(), ppmu->config_bhrb(-1) will set mmcra to -1.
> 
> Signed-off-by: Ravi Bangoria 
> Reviewed-by: Madhavan Srinivasan 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/3202e35ec1c8fc19cea24253ff83edf7

cheers

Re: [PATCH 10/18] locking/atomic: powerpc: use s64 for atomic64

2019-05-23 Thread Michael Ellerman

Mark Rutland  writes:
> As a step towards making the atomic64 API use consistent types treewide,
> let's have the powerpc atomic64 implementation use s64 as the underlying
> type for atomic64_t, rather than long, matching the generated headers.
>
> As atomic64_read() depends on the generic defintion of atomic64_t, this
> still returns long on 64-bit. This will be converted in a subsequent
> patch.
>
> Otherwise, there should be no functional change as a result of this
> patch.
>
> Signed-off-by: Mark Rutland 
> Cc: Michael Ellerman 
> Cc: Paul Mackerras 
> Cc: Peter Zijlstra 
> Cc: Will Deacon 
> ---
>  arch/powerpc/include/asm/atomic.h | 44 
> +++
>  1 file changed, 22 insertions(+), 22 deletions(-)

Conversion looks good to me.

Reviewed-by: Michael Ellerman  (powerpc)

cheers

> diff --git a/arch/powerpc/include/asm/atomic.h 
> b/arch/powerpc/include/asm/atomic.h
> index 52eafaf74054..31c231ea56b7 100644
> --- a/arch/powerpc/include/asm/atomic.h
> +++ b/arch/powerpc/include/asm/atomic.h
> @@ -297,24 +297,24 @@ static __inline__ int atomic_dec_if_positive(atomic_t 
> *v)
>  
>  #define ATOMIC64_INIT(i) { (i) }
>  
> -static __inline__ long atomic64_read(const atomic64_t *v)
> +static __inline__ s64 atomic64_read(const atomic64_t *v)
>  {
> - long t;
> + s64 t;
>  
>   __asm__ __volatile__("ld%U1%X1 %0,%1" : "=r"(t) : "m"(v->counter));
>  
>   return t;
>  }
>  
> -static __inline__ void atomic64_set(atomic64_t *v, long i)
> +static __inline__ void atomic64_set(atomic64_t *v, s64 i)
>  {
>   __asm__ __volatile__("std%U0%X0 %1,%0" : "=m"(v->counter) : "r"(i));
>  }
>  
>  #define ATOMIC64_OP(op, asm_op)  
> \
> -static __inline__ void atomic64_##op(long a, atomic64_t *v)  \
> +static __inline__ void atomic64_##op(s64 a, atomic64_t *v)   \
>  {\
> - long t; \
> + s64 t;  \
>   \
>   __asm__ __volatile__(   \
>  "1:  ldarx   %0,0,%3 # atomic64_" #op "\n"   \
> @@ -327,10 +327,10 @@ static __inline__ void atomic64_##op(long a, atomic64_t 
> *v) \
>  }
>  
>  #define ATOMIC64_OP_RETURN_RELAXED(op, asm_op)   
> \
> -static inline long   \
> -atomic64_##op##_return_relaxed(long a, atomic64_t *v)
> \
> +static inline s64\
> +atomic64_##op##_return_relaxed(s64 a, atomic64_t *v) \
>  {\
> - long t; \
> + s64 t;  \
>   \
>   __asm__ __volatile__(   \
>  "1:  ldarx   %0,0,%3 # atomic64_" #op "_return_relaxed\n"\
> @@ -345,10 +345,10 @@ atomic64_##op##_return_relaxed(long a, atomic64_t *v)   
> \
>  }
>  
>  #define ATOMIC64_FETCH_OP_RELAXED(op, asm_op)
> \
> -static inline long   \
> -atomic64_fetch_##op##_relaxed(long a, atomic64_t *v) \
> +static inline s64\
> +atomic64_fetch_##op##_relaxed(s64 a, atomic64_t *v)  \
>  {\
> - long res, t;\
> + s64 res, t; \
>   \
>   __asm__ __volatile__(   \
>  "1:  ldarx   %0,0,%4 # atomic64_fetch_" #op "_relaxed\n" \
> @@ -396,7 +396,7 @@ ATOMIC64_OPS(xor, xor)
>  
>  static __inline__ void atomic64_inc(atomic64_t *v)
>  {
> - long t;
> + s64 t;
>  
>   __asm__ __volatile__(
>  "1:  ldarx   %0,0,%2 # atomic64_inc\n\
> @@ -409,9 +409,9 @@ static __inline__ void atomic64_inc(atomic64_t *v)
>  }
>  #define atomic64_inc atomic64_inc
&g

Re: [PATCH] misc: remove redundant 'default n' from Kconfig-s

2019-05-21 Thread Michael Ellerman

Bartlomiej Zolnierkiewicz  writes:
> 'default n' is the default value for any bool or tristate Kconfig
> setting so there is no need to write it explicitly.
>
> Also since commit f467c5640c29 ("kconfig: only write '# CONFIG_FOO
> is not set' for visible symbols") the Kconfig behavior is the same
> regardless of 'default n' being present or not:
>
> ...
> One side effect of (and the main motivation for) this change is making
> the following two definitions behave exactly the same:
> 
> config FOO
> bool
> 
> config FOO
> bool
> default n
> 
> With this change, neither of these will generate a
> '# CONFIG_FOO is not set' line (assuming FOO isn't selected/implied).
> That might make it clearer to people that a bare 'default n' is
> redundant.
> ...
>
> Signed-off-by: Bartlomiej Zolnierkiewicz 
...
> Index: b/drivers/misc/ocxl/Kconfig
> ===
> --- a/drivers/misc/ocxl/Kconfig
> +++ b/drivers/misc/ocxl/Kconfig
> @@ -4,7 +4,6 @@
>  
>  config OCXL_BASE
>   bool
> - default n
>   select PPC_COPRO_BASE
>  
>  config OCXL

Acked-by: Michael Ellerman  (ocxl)

cheers

Re: [PATCH] powerpc/mm: mark more tlb functions as __always_inline

2019-05-21 Thread Michael Ellerman

Masahiro Yamada  writes:
> On Tue, May 21, 2019 at 3:54 PM Christophe Leroy
>  wrote:
>> Le 21/05/2019 à 08:16, Masahiro Yamada a écrit :
>> > With CONFIG_OPTIMIZE_INLINING enabled, Laura Abbott reported error
>> > with gcc 9.1.1:
>> >
>> >arch/powerpc/mm/book3s64/radix_tlb.c: In function '_tlbiel_pid':
>> >arch/powerpc/mm/book3s64/radix_tlb.c:104:2: warning: asm operand 3 
>> > probably doesn't match constraints
>> >  104 |  asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
>> >  |  ^~~
>> >arch/powerpc/mm/book3s64/radix_tlb.c:104:2: error: impossible 
>> > constraint in 'asm'
>> >
>> > Fixing _tlbiel_pid() is enough to address the warning above, but I
>> > inlined more functions to fix all potential issues.
>> >
>> > To meet the 'i' (immediate) constraint for the asm operands, functions
>> > propagating propagated 'ric' must be always inlined.
>> >
>> > Fixes: 9012d011660e ("compiler: allow all arches to enable 
>> > CONFIG_OPTIMIZE_INLINING")
>> > Reported-by: Laura Abbott 
>> > Signed-off-by: Masahiro Yamada 
>> > ---
>> >
>> >   arch/powerpc/mm/book3s64/hash_native.c |  8 +++--
>> >   arch/powerpc/mm/book3s64/radix_tlb.c   | 44 +++---
>> >   2 files changed, 30 insertions(+), 22 deletions(-)
>> >
>> > diff --git a/arch/powerpc/mm/book3s64/hash_native.c 
>> > b/arch/powerpc/mm/book3s64/hash_native.c
>> > index aaa28fd918fe..bc2c35c0d2b1 100644
>> > --- a/arch/powerpc/mm/book3s64/hash_native.c
>> > +++ b/arch/powerpc/mm/book3s64/hash_native.c
>> > @@ -60,9 +60,11 @@ static inline void tlbiel_hash_set_isa206(unsigned int 
>> > set, unsigned int is)
>> >* tlbiel instruction for hash, set invalidation
>> >* i.e., r=1 and is=01 or is=10 or is=11
>> >*/
>> > -static inline void tlbiel_hash_set_isa300(unsigned int set, unsigned int 
>> > is,
>> > - unsigned int pid,
>> > - unsigned int ric, unsigned int prs)
>> > +static __always_inline void tlbiel_hash_set_isa300(unsigned int set,
>> > +unsigned int is,
>> > +unsigned int pid,
>> > +unsigned int ric,
>> > +unsigned int prs)
>>
>> Please don't split the line more than it is.
>>
>> powerpc accepts lines up to 90 chars, see arch/powerpc/tools/checkpatch.pl
>
> Ugh, I did not know this. Horrible.
>
> The Linux coding style should be global in the kernel tree.
> No subsystem should adopts its own coding style.

Well that ship sailed long ago.

But we don't have our own coding style, we just don't enforce 80 columns
rigidly, there are cases where a slightly longer line (up to ~90) is
preferable to a split line.

In a case like this with a long attribute and function name I think this
is probably the least worst option:

static __always_inline
void tlbiel_hash_set_isa300(unsigned int set, unsigned int is, unsigned int pid,
unsigned int ric, unsigned int prs)
{
...

cheers

Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest

2019-05-19 Thread Michael Ellerman

Bharata B Rao  writes:
> On Thu, May 16, 2019 at 07:44:20PM +0530, srikanth wrote:
>> Hello,
>> 
>> On power9 host, performing memory hotunplug from ppc64le guest results in
>> kernel oops.
>> 
>> Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using
>> ppc64le_defconfig for host and ppc64le_guest_defconfig for guest.
>> 
>> Recreation steps:
>> 
>> 1. Boot a guest with below mem configuration:
>>   33554432
>>   8388608
>>   4194304
>>   
>>     
>>   
>>     
>>   
>> 
>> 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> now
>> reboot guest -> once guest comes back try to unplug 8G memory
>> 
>> mem.xml used:
>> 
>> 
>> 8
>> 0
>> 
>> 
>> 
>> Memory attach and detach commands used:
>>     virsh attach-device vm1 ./mem.xml --live
>>     virsh detach-device vm1 ./mem.xml --live
>> 
>> Trace seen inside guest after unplug, guest just hangs there forever:
>> 
>> [   21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
>> [   21.963064] Oops: Exception in kernel mode, sig: 5 [#1]
>> [   21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA
>> pSeries
>> [   21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse
>> vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi
>> ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress
>> raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
>> xor raid6_pq multipath crc32c_vpmsum
>> [   21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not
>> tainted 5.1.0-dirty #2
>> [   21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn
>> [   21.963355] NIP:  c0079e18 LR: c0c79308 CTR:
>> 8000
>> [   21.963392] REGS: c003f88034f0 TRAP: 0700   Not tainted (5.1.0-dirty)
>> [   21.963422] MSR:  8282b033   CR:
>> 28002884  XER: 2004
>> [   21.963470] CFAR: c0c79304 IRQMASK: 0
>> [   21.963470] GPR00: c0c79308 c003f8803780 c1521000
>> 00fff8c0
>> [   21.963470] GPR04: 0001 ffe30005 0005
>> 0020
>> [   21.963470] GPR08:  0001 c00a00fff8e0
>> c16d21a0
>> [   21.963470] GPR12: c16e7b90 c7ff2700 c00a00a0
>> c003ffe30100
>> [   21.963470] GPR16: c003ffe3 c14aa4de c00a009f
>> c16d21b0
>> [   21.963470] GPR20: c14de588 0001 c16d21b8
>> c00a00a0
>> [   21.963470] GPR24:   c00a00a0
>> c003ffe96000
>> [   21.963470] GPR28: c00a00a0 c00a00a0 c003fffec000
>> c00a00fff8c0
>> [   21.963802] NIP [c0079e18] pte_fragment_free+0x48/0xd0
>> [   21.963838] LR [c0c79308] remove_pagetable+0x49c/0x5b4
>> [   21.963873] Call Trace:
>> [   21.963890] [c003f8803780] [c003ffe997f0] 0xc003ffe997f0
>> (unreliable)
>> [   21.963933] [c003f88037b0] [] (null)
>> [   21.963969] [c003f88038c0] [c006f038]
>> vmemmap_free+0x218/0x2e0
>> [   21.964006] [c003f8803940] [c036f100]
>> sparse_remove_one_section+0xd0/0x138
>> [   21.964050] [c003f8803980] [c0383a50]
>> __remove_pages+0x410/0x560
>> [   21.964093] [c003f8803a90] [c0c784d8]
>> arch_remove_memory+0x68/0xdc
>> [   21.964136] [c003f8803ad0] [c0385d74]
>> __remove_memory+0xc4/0x110
>> [   21.964180] [c003f8803b10] [c00d44e4]
>> dlpar_remove_lmb+0x94/0x140
>> [   21.964223] [c003f8803b50] [c00d52b4]
>> dlpar_memory+0x464/0xd00
>> [   21.964259] [c003f8803be0] [c00cd5c0]
>> handle_dlpar_errorlog+0xc0/0x190
>> [   21.964303] [c003f8803c50] [c00cd6bc]
>> pseries_hp_work_fn+0x2c/0x60
>> [   21.964346] [c003f8803c80] [c013a4a0]
>> process_one_work+0x2b0/0x5a0
>> [   21.964388] [c003f8803d10] [c013a818]
>> worker_thread+0x88/0x610
>> [   21.964434] [c003f8803db0] [c0143884] kthread+0x1a4/0x1b0
>> [   21.964468] [c003f8803e20] [c000bdc4]
>> ret_from_kernel_thread+0x5c/0x78
>> [   21.964506] Instruction dump:
>> [   21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe9 7fff1a14
>> 395f0020 813f0020
>> [   21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b09> 7c0004ac
>> 7d205028 3129
>> [   21.964613] ---[ end trace aaa571aa1636fee6 ]---
>> [   21.966349]
>> [   21.966383] Sending IPI to other CPUs
>> [   21.978335] IPI complete
>> [   21.981354] kexec: Starting switchover sequence.
>> I'm in purgatory
>
> git bisect points to
>
> commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
> Author: Nicholas Piggin 
> Date:   Fri Jul 27 21:48:17 2018 +1000
>
> powerpc/64s: Fix page table fragment refcount race vs speculative 
> references
>
> The page table fragment allocator uses the main page refcount racily
> with respect to speculative references. A customer observed a BUG due
>

Re: [PATCH] powerpc/32s: fix flush_hash_pages() on SMP

2019-05-18 Thread Michael Ellerman

On Thu, 2019-05-09 at 12:59:38 UTC, Christophe Leroy wrote:
> flush_hash_pages() runs with data translation off, so current
> task_struct has to be accesssed using physical address.
> 
> Reported-by: Erhard F. 
> Fixes: f7354ccac844 ("powerpc/32: Remove CURRENT_THREAD_INFO and rename 
> TI_CPU")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Christophe Leroy 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/397d2300b08cdee052053e362018cdb6

cheers

Re: [PATCH] powerpc: Remove double free

2019-05-18 Thread Michael Ellerman

On Wed, 2019-05-15 at 09:07:50 UTC, "Tobin C. Harding" wrote:
> kfree() after kobject_put().  Who ever wrote this was on crack.
> 
> Fixes: 7e8039795a80 ("powerpc/cacheinfo: Fix kobject memleak")
> Signed-off-by: Tobin C. Harding 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/672eaf37db9f99fd794eed2c68a8b3b0

cheers

[GIT PULL] Please pull powerpc/linux.git powerpc-5.2-2 tag

2019-05-18 Thread Michael Ellerman

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Linus,

Please pull some powerpc fixes for 5.2:

The following changes since commit b970afcfcabd63cd3832e95db096439c177c3592:

  Merge tag 'powerpc-5.2-1' of 
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/powerpc/linux (2019-05-10 
05:29:27 -0700)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-5.2-2

for you to fetch changes up to 672eaf37db9f99fd794eed2c68a8b3b05d484081:

  powerpc/cacheinfo: Remove double free (2019-05-17 23:28:00 +1000)

- --
powerpc fixes for 5.2 #2

One fix going back to stable, for a bug on 32-bit introduced when we added
support for THREAD_INFO_IN_TASK.

A fix for a typo in a recent rework of our hugetlb code that leads to crashes on
64-bit when using hugetlbfs with a 4K PAGE_SIZE.

Two fixes for our recent rework of the address layout on 64-bit hash CPUs, both
only triggered when userspace tries to access addresses outside the user or
kernel address ranges.

Finally a fix for a recently introduced double free in an error path in our
cacheinfo code.

Thanks to:
  Aneesh Kumar K.V, Christophe Leroy, Sachin Sant, Tobin C. Harding.

- --
Aneesh Kumar K.V (2):
  powerpc/mm: Drop VM_BUG_ON in get_region_id()
  powerpc/mm/hash: Fix get_region_id() for invalid addresses

Christophe Leroy (1):
  powerpc/32s: fix flush_hash_pages() on SMP

Michael Ellerman (1):
  powerpc/mm: Fix crashes with hugepages & 4K pages

Tobin C. Harding (1):
  powerpc/cacheinfo: Remove double free


 arch/powerpc/include/asm/book3s/64/hash.h | 6 --
 arch/powerpc/kernel/cacheinfo.c   | 1 -
 arch/powerpc/mm/book3s32/hash_low.S   | 3 ++-
 arch/powerpc/mm/hugetlbpage.c | 2 +-
 4 files changed, 7 insertions(+), 5 deletions(-)
-BEGIN PGP SIGNATURE-

iQIcBAEBAgAGBQJc3+h8AAoJEFHr6jzI4aWA8wQQAK6ChAWbSy5Xb8E9q/txwxhV
tCjg+CEbs4a5WpVPnL7oZm/axUEJVRDk++f1MydxnB2C6TDk0rBM20Umeh548ALP
RWe65asxMHHN+TihSSLf1kN5M+/6gnkRysg++u54KH0RU1fMj+KG+NudcFABJPZ/
5x7Pr1Ffy3FtEqlFAA3vjxioHOFA9aWB1nmjiW8osf01R7KQXNoZk7IX3dZ8YQce
aALvCx/OXqxJtiAj8ynb422nkBgDtwR5haEKZ8gxeVLE1hPnIXaSZfRHYvk3GvYY
t0q37adNeLR2a4UNB1dMaTiNn+6qANJO9tC4ITSsH5KzxIFJ6GI6fMfp+XDxI/pY
7CLAzO5WO466ar/XnJN1Z6cJXfC/WT8h9delzErFm8omEE2wHd+XsxffM7ToATh8
/ZMEip+gmKRU08qm46gU8o823Qpp0A85f5d5LJHjVWRomXTKovTNQvveBCIEB7H4
KKiWPOdo/vTUKh68l6ROR6g96qQbDjnzDZ5yUBOmHE1XXGjJqURtlm8hkhUO6/ws
+awGShDim6nfpOwqlDrpnu9wB0W7HxUDU32pvl0bi5VSJyZckPzaPB2MoVuIf5Aa
8i0x231708Kk8htB4n6tD7gJtrTWEZG3dyRwYZzU1FKKzrbItbDFcwN3H9O+QU5H
Xgk5Sd3CAiEQMbDE+t5x
=Iuyr
-END PGP SIGNATURE-

Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest

2019-05-17 Thread Michael Ellerman

srikanth  writes:
> Hello,
>
> On power9 host, performing memory hotunplug from ppc64le guest results 
> in kernel oops.

Thanks for the report.

Did this used to work in the past? If so what is the last version that
worked?

> Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using 
> ppc64le_defconfig for host and ppc64le_guest_defconfig for guest.
>
> Recreation steps:
>
> 1. Boot a guest with below mem configuration:
>    33554432
>    8388608
>    4194304
>    
>      
>    
>      
>    
>
> 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> 
> now reboot guest -> once guest comes back try to unplug 8G memory

I assume the reboot is required to trigger the bug? ie. if you unplug
without rebooting it doesn't crash?

> mem.xml used:
> 
> 
> 8
> 0
> 
> 
>
> Memory attach and detach commands used:
>      virsh attach-device vm1 ./mem.xml --live
>      virsh detach-device vm1 ./mem.xml --live
>
> Trace seen inside guest after unplug, guest just hangs there forever:
>
> [   21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
> [   21.963064] Oops: Exception in kernel mode, sig: 5 [#1]
> [   21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA 
> pSeries
> [   21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse 
> vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi 
> scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_decompress 
> zstd_compress lzo_compress raid10 raid456 async_raid6_recov async_memcpy 
> async_pq async_xor async_tx xor raid6_pq multipath crc32c_vpmsum
> [   21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not 
> tainted 5.1.0-dirty #2
> [   21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn
> [   21.963355] NIP:  c0079e18 LR: c0c79308 CTR: 
> 8000
> [   21.963392] REGS: c003f88034f0 TRAP: 0700   Not tainted (5.1.0-dirty)
> [   21.963422] MSR:  8282b033   
> CR: 28002884  XER: 2004
> [   21.963470] CFAR: c0c79304 IRQMASK: 0
> [   21.963470] GPR00: c0c79308 c003f8803780 c1521000 
> 00fff8c0

Can you try not to word wrap these, it makes them much harder to read.

There's some instructions here on configuring Thunderbird:
  
https://www.kernel.org/doc/html/latest/process/email-clients.html#thunderbird-gui

> [   21.963470] GPR04: 0001 ffe30005 0005 
> 0020
> [   21.963470] GPR08:  0001 c00a00fff8e0 
> c16d21a0
> [   21.963470] GPR12: c16e7b90 c7ff2700 c00a00a0 
> c003ffe30100
> [   21.963470] GPR16: c003ffe3 c14aa4de c00a009f 
> c16d21b0
> [   21.963470] GPR20: c14de588 0001 c16d21b8 
> c00a00a0
> [   21.963470] GPR24:   c00a00a0 
> c003ffe96000
> [   21.963470] GPR28: c00a00a0 c00a00a0 c003fffec000 
> c00a00fff8c0
> [   21.963802] NIP [c0079e18] pte_fragment_free+0x48/0xd0
> [   21.963838] LR [c0c79308] remove_pagetable+0x49c/0x5b4
> [   21.963873] Call Trace:
> [   21.963890] [c003f8803780] [c003ffe997f0] 0xc003ffe997f0 
> (unreliable)
> [   21.963933] [c003f88037b0] [] (null)
> [   21.963969] [c003f88038c0] [c006f038] 
> vmemmap_free+0x218/0x2e0
> [   21.964006] [c003f8803940] [c036f100] 
> sparse_remove_one_section+0xd0/0x138
> [   21.964050] [c003f8803980] [c0383a50] 
> __remove_pages+0x410/0x560
> [   21.964093] [c003f8803a90] [c0c784d8] 
> arch_remove_memory+0x68/0xdc
> [   21.964136] [c003f8803ad0] [c0385d74] 
> __remove_memory+0xc4/0x110
> [   21.964180] [c003f8803b10] [c00d44e4] 
> dlpar_remove_lmb+0x94/0x140
> [   21.964223] [c003f8803b50] [c00d52b4] 
> dlpar_memory+0x464/0xd00
> [   21.964259] [c003f8803be0] [c00cd5c0] 
> handle_dlpar_errorlog+0xc0/0x190
> [   21.964303] [c003f8803c50] [c00cd6bc] 
> pseries_hp_work_fn+0x2c/0x60
> [   21.964346] [c003f8803c80] [c013a4a0] 
> process_one_work+0x2b0/0x5a0
> [   21.964388] [c003f8803d10] [c013a818] 
> worker_thread+0x88/0x610
> [   21.964434] [c003f8803db0] [c0143884] kthread+0x1a4/0x1b0
> [   21.964468] [c003f8803e20] [c000bdc4] 
> ret_from_kernel_thread+0x5c/0x78
> [   21.964506] Instruction dump:
> [   21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe9 7fff1a14 
> 395f0020 813f0020
> [   21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b09> 7c0004ac 
> 7d205028 3129
> [   21.964613] ---[ end trace aaa571aa1636fee6 ]---
> [   21.966349]
> [   21.966383] Sending IPI to other CPUs
> [   21.978335] IPI complete
> [   21.981354] kexec: Starting switchover sequence.
> I'm in purgatory

It's not hung here, it's just not executing what we want it to :)

If you break into the qemu monitor

Re: [PATCH] powerpc/powernv/npu: Fix reference leak

2019-05-14 Thread Michael Ellerman

Greg Kurz  writes:
> Michael,
>
> Any comments on this patch ? Should I repost with a shorter comment
> as suggested by Alexey ?

No the longer comment seems fine to me.

I'm not a big fan of the patch, it's basically a hack :)

But for a backportable fix I guess it is OK.

I would be happier though if we eventually fix up the code to do the
refcounting properly.

cheers

> On Mon, 29 Apr 2019 12:36:59 +0200
> Greg Kurz  wrote:
>> On Mon, 29 Apr 2019 16:01:29 +1000
>> Alexey Kardashevskiy  wrote:
>> 
>> > On 20/04/2019 01:34, Greg Kurz wrote:  
>> > > Since 902bdc57451c, get_pci_dev() calls pci_get_domain_bus_and_slot(). 
>> > > This
>> > > has the effect of incrementing the reference count of the PCI device, as
>> > > explained in drivers/pci/search.c:
>> > > 
>> > >  * Given a PCI domain, bus, and slot/function number, the desired PCI
>> > >  * device is located in the list of PCI devices. If the device is
>> > >  * found, its reference count is increased and this function returns a
>> > >  * pointer to its data structure.  The caller must decrement the
>> > >  * reference count by calling pci_dev_put().  If no device is found,
>> > >  * %NULL is returned.
>> > > 
>> > > Nothing was done to call pci_dev_put() and the reference count of GPU and
>> > > NPU PCI devices rockets up.
>> > > 
>> > > A natural way to fix this would be to teach the callers about the change,
>> > > so that they call pci_dev_put() when done with the pointer. This turns
>> > > out to be quite intrusive, as it affects many paths in npu-dma.c,
>> > > pci-ioda.c and vfio_pci_nvlink2.c.
>> > 
>> > 
>> > afaict this referencing is only done to protect the current traverser
>> > and what you've done is actually a natural way (and the generic
>> > pci_get_dev_by_id() does exactly the same), although this looks a bit 
>> > weird.
>> >   
>> 
>> Not exactly the same: pci_get_dev_by_id() always increment the refcount
>> of the returned PCI device. The refcount is only decremented when this
>> device is passed to pci_get_dev_by_id() to continue searching.
>> 
>> That means that the users of the PCI device pointer returned by
>> pci_get_dev_by_id() or its exported variants pci_get_subsys(),
>> pci_get_device() and pci_get_class() do handle the refcount. They
>> all pass the pointer to pci_dev_put() or continue the search,
>> which calls pci_dev_put() internally.
>> 
>> Direct and indirect callers of get_pci_dev() don't care for the
>> refcount at all unless I'm missing something.
>> 
>> >   
>> > > Also, the issue appeared in 4.16 and
>> > > some affected code got moved around since then: it would be problematic
>> > > to backport the fix to stable releases.
>> > > 
>> > > All that code never cared for reference counting anyway. Call 
>> > > pci_dev_put()
>> > > from get_pci_dev() to revert to the previous behavior.
>> > >> Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev   
>> > >>  
>> > from pci_dn")  
>> > > Cc: sta...@vger.kernel.org # v4.16
>> > > Signed-off-by: Greg Kurz 
>> > > ---
>> > >  arch/powerpc/platforms/powernv/npu-dma.c |   15 ++-
>> > >  1 file changed, 14 insertions(+), 1 deletion(-)
>> > > 
>> > > diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
>> > > b/arch/powerpc/platforms/powernv/npu-dma.c
>> > > index e713ade30087..d8f3647e8fb2 100644
>> > > --- a/arch/powerpc/platforms/powernv/npu-dma.c
>> > > +++ b/arch/powerpc/platforms/powernv/npu-dma.c
>> > > @@ -31,9 +31,22 @@ static DEFINE_SPINLOCK(npu_context_lock);
>> > >  static struct pci_dev *get_pci_dev(struct device_node *dn)
>> > >  {
>> > >  struct pci_dn *pdn = PCI_DN(dn);
>> > > +struct pci_dev *pdev;
>> > >  
>> > > -return pci_get_domain_bus_and_slot(pci_domain_nr(pdn->phb->bus),
>> > > +pdev = pci_get_domain_bus_and_slot(pci_domain_nr(pdn->phb->bus),
>> > > pdn->busno, pdn->devfn);
>> > > +
>> > > +/*
>> > > + * pci_get_domain_bus_and_slot() increased the reference count 
>> > > of
>> > > + * the PCI device, but callers don't need that actually as the 
>> > > PE
>> > > + * already holds a reference to the device.
>> > 
>> > Imho this would be just enough.
>> > 
>> > Anyway,
>> > 
>> > Reviewed-by: Alexey Kardashevskiy 
>> >   
>> 
>> Thanks !
>> 
>> I now realize that I forgot to add the --cc option for stable on my stgit
>> command line :-\.
>> 
>> Cc'ing now.
>> 
>> > 
>> > How did you find it? :)
>> >   
>> 
>> While reading code to find some inspiration for OpenCAPI passthrough. :)
>> 
>> I saw the following in vfio_pci_ibm_npu2_init():
>> 
>>  if (!pnv_pci_get_gpu_dev(vdev->pdev))
>>  return -ENODEV;
>> 
>> and simply followed the function calls.
>> 
>> >   
>> > > Since callers aren't
>> > > + * aware of the reference count change, call pci_dev_put() now 
>> > > to
>> > > + * avoid leaks.
>> > > + */
>> > > +if (pdev)
>> > > +pci_dev_put(p

Re: [PATCH stable 4.9] powerpc/lib: fix code patching during early init on PPC32

2019-05-14 Thread Michael Ellerman

Christophe Leroy  writes:
> [Backport of upstream commit b45ba4a51cde29b2939365ef0c07ad34c8321789]
>
> On powerpc32, patch_instruction() is called by apply_feature_fixups()
> which is called from early_init()
>
> There is the following note in front of early_init():
>  * Note that the kernel may be running at an address which is different
>  * from the address that it was linked at, so we must use RELOC/PTRRELOC
>  * to access static data (including strings).  -- paulus
>
> Therefore init_mem_is_free must be accessed with PTRRELOC()
>
> Fixes: 1c38a84d4586 ("powerpc: Avoid code patching freed init sections")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=203597
> Signed-off-by: Christophe Leroy 
>
> ---
> Can't apply the upstream commit as such due to several of other unrelated 
> stuff
> like STRICT_KERNEL_RWX which are missing for instance.
> So instead, using same approach as for commit 
> 252eb55816a6f69ef9464cad303cdb3326cdc61d

Yeah this looks good to me.

Though should we keep the same subject as the upstream commit this is
sort of a backport of? That might make it simpler for people who are
trying to keep track of things?

ie. "powerpc/lib: fix book3s/32 boot failure due to code patching"

cheers

> diff --git a/arch/powerpc/lib/code-patching.c 
> b/arch/powerpc/lib/code-patching.c
> index 14535ad4cdd1..c312955977ce 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -23,7 +23,7 @@ int patch_instruction(unsigned int *addr, unsigned int 
> instr)
>   int err;
>  
>   /* Make sure we aren't patching a freed init section */
> - if (init_mem_is_free && init_section_contains(addr, 4)) {
> + if (*PTRRELOC(&init_mem_is_free) && init_section_contains(addr, 4)) {
>   pr_debug("Skipping init section patching addr: 0x%px\n", addr);
>   return 0;
>   }
> -- 
> 2.13.3

Re: [RESEND PATCH] powerpc/pseries: Fix cpu_hotplug_lock acquisition in resize_hpt

2019-05-14 Thread Michael Ellerman

"Gautham R. Shenoy"  writes:
> From: "Gautham R. Shenoy" 
>
> Subject: Re: [RESEND PATCH] powerpc/pseries: Fix cpu_hotplug_lock acquisition 
> in resize_hpt

ps. A "RESEND" implies the patch is unchanged and you're just resending
it because it was ignored.

In this case it should have just been "PATCH v2", with a note below the "---"
saying "v2: Rebased onto powerpc/next ..."

cheers

> During a memory hotplug operations involving resizing of the HPT, we
> invoke a stop_machine() to perform the resizing. In this code path, we
> end up recursively taking the cpu_hotplug_lock, first in
> memory_hotplug_begin() and then subsequently in stop_machine(). This
> causes the system to hang. With lockdep enabled we get the following
> error message before the hang.
>
>   swapper/0/1 is trying to acquire lock:
>   (ptrval) (cpu_hotplug_lock.rw_sem){}, at: stop_machine+0x2c/0x60
>
>   but task is already holding lock:
>   (ptrval) (cpu_hotplug_lock.rw_sem){}, at: 
> mem_hotplug_begin+0x20/0x50
>
>   other info that might help us debug this:
>Possible unsafe locking scenario:
>
>  CPU0
>  
> lock(cpu_hotplug_lock.rw_sem);
> lock(cpu_hotplug_lock.rw_sem);
>
>*** DEADLOCK ***
>
> Fix this issue by
>   1) Requiring all the calls to pseries_lpar_resize_hpt() be made
>  with cpu_hotplug_lock held.
>
>   2) In pseries_lpar_resize_hpt() invoke stop_machine_cpuslocked()
>  as a consequence of 1)
>
>   3) To satisfy 1), in hpt_order_set(), call mmu_hash_ops.resize_hpt()
>  with cpu_hotplug_lock held.
>
> Reported-by: Aneesh Kumar K.V 
> Signed-off-by: Gautham R. Shenoy 
> ---
>
> Rebased this one against powerpc/next instead of linux/master.
>
>  arch/powerpc/mm/book3s64/hash_utils.c | 9 -
>  arch/powerpc/platforms/pseries/lpar.c | 8 ++--
>  2 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
> b/arch/powerpc/mm/book3s64/hash_utils.c
> index 919a861..d07fcafd 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -38,6 +38,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -1928,10 +1929,16 @@ static int hpt_order_get(void *data, u64 *val)
>  
>  static int hpt_order_set(void *data, u64 val)
>  {
> + int ret;
> +
>   if (!mmu_hash_ops.resize_hpt)
>   return -ENODEV;
>  
> - return mmu_hash_ops.resize_hpt(val);
> + cpus_read_lock();
> + ret = mmu_hash_ops.resize_hpt(val);
> + cpus_read_unlock();
> +
> + return ret;
>  }
>  
>  DEFINE_DEBUGFS_ATTRIBUTE(fops_hpt_order, hpt_order_get, hpt_order_set, 
> "%llu\n");
> diff --git a/arch/powerpc/platforms/pseries/lpar.c 
> b/arch/powerpc/platforms/pseries/lpar.c
> index 1034ef1..2fc9756 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -859,7 +859,10 @@ static int pseries_lpar_resize_hpt_commit(void *data)
>   return 0;
>  }
>  
> -/* Must be called in user context */
> +/*
> + * Must be called in user context. The caller should hold the
> + * cpus_lock.
> + */
>  static int pseries_lpar_resize_hpt(unsigned long shift)
>  {
>   struct hpt_resize_state state = {
> @@ -913,7 +916,8 @@ static int pseries_lpar_resize_hpt(unsigned long shift)
>  
>   t1 = ktime_get();
>  
> - rc = stop_machine(pseries_lpar_resize_hpt_commit, &state, NULL);
> + rc = stop_machine_cpuslocked(pseries_lpar_resize_hpt_commit,
> +  &state, NULL);
>  
>   t2 = ktime_get();
>  
> -- 
> 1.9.4

Re: [RESEND PATCH] powerpc/pseries: Fix cpu_hotplug_lock acquisition in resize_hpt

2019-05-14 Thread Michael Ellerman

"Gautham R. Shenoy"  writes:
> From: "Gautham R. Shenoy" 
>
> During a memory hotplug operations involving resizing of the HPT, we
> invoke a stop_machine() to perform the resizing. In this code path, we
> end up recursively taking the cpu_hotplug_lock, first in
> memory_hotplug_begin() and then subsequently in stop_machine(). This
> causes the system to hang.

This implies we have never tested a memory hotplug that resized the HPT.
Is that really true? Or did something change?

> With lockdep enabled we get the following
> error message before the hang.
>
>   swapper/0/1 is trying to acquire lock:
>   (ptrval) (cpu_hotplug_lock.rw_sem){}, at: stop_machine+0x2c/0x60
>
>   but task is already holding lock:
>   (ptrval) (cpu_hotplug_lock.rw_sem){}, at: 
> mem_hotplug_begin+0x20/0x50

Do we have the full stack trace?

>   other info that might help us debug this:
>Possible unsafe locking scenario:
>
>  CPU0
>  
> lock(cpu_hotplug_lock.rw_sem);
> lock(cpu_hotplug_lock.rw_sem);
>
>*** DEADLOCK ***
>
> Fix this issue by
>   1) Requiring all the calls to pseries_lpar_resize_hpt() be made
>  with cpu_hotplug_lock held.
>
>   2) In pseries_lpar_resize_hpt() invoke stop_machine_cpuslocked()
>  as a consequence of 1)
>
>   3) To satisfy 1), in hpt_order_set(), call mmu_hash_ops.resize_hpt()
>  with cpu_hotplug_lock held.
>
> Reported-by: Aneesh Kumar K.V 
> Signed-off-by: Gautham R. Shenoy 
> ---
>
> Rebased this one against powerpc/next instead of linux/master.
>
>  arch/powerpc/mm/book3s64/hash_utils.c | 9 -
>  arch/powerpc/platforms/pseries/lpar.c | 8 ++--
>  2 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
> b/arch/powerpc/mm/book3s64/hash_utils.c
> index 919a861..d07fcafd 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -38,6 +38,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -1928,10 +1929,16 @@ static int hpt_order_get(void *data, u64 *val)
>  
>  static int hpt_order_set(void *data, u64 val)
>  {
> + int ret;
> +
>   if (!mmu_hash_ops.resize_hpt)
>   return -ENODEV;
>  
> - return mmu_hash_ops.resize_hpt(val);
> + cpus_read_lock();
> + ret = mmu_hash_ops.resize_hpt(val);
> + cpus_read_unlock();
> +
> + return ret;
>  }
>  
>  DEFINE_DEBUGFS_ATTRIBUTE(fops_hpt_order, hpt_order_get, hpt_order_set, 
> "%llu\n");
> diff --git a/arch/powerpc/platforms/pseries/lpar.c 
> b/arch/powerpc/platforms/pseries/lpar.c
> index 1034ef1..2fc9756 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -859,7 +859,10 @@ static int pseries_lpar_resize_hpt_commit(void *data)
>   return 0;
>  }
>  
> -/* Must be called in user context */
> +/*
> + * Must be called in user context. The caller should hold the

I realise you're just copying that comment, but it seems wrong. "user
context" means userspace. I think it means "process context" doesn't it?

Also "should" should be "must" :)

> + * cpus_lock.
> + */
>  static int pseries_lpar_resize_hpt(unsigned long shift)
>  {
>   struct hpt_resize_state state = {
> @@ -913,7 +916,8 @@ static int pseries_lpar_resize_hpt(unsigned long shift)
>  
>   t1 = ktime_get();
>  
> - rc = stop_machine(pseries_lpar_resize_hpt_commit, &state, NULL);
> + rc = stop_machine_cpuslocked(pseries_lpar_resize_hpt_commit,
> +  &state, NULL);
>  
>   t2 = ktime_get();

cheers

Re: [PATCH 2/2] powerpc/8xx: Add microcode patch to move SMC parameter RAM.

2019-05-13 Thread Michael Ellerman

Christophe Leroy  writes:

> Some SCC functions like the QMC requires an extended parameter RAM.
> On modern 8xx (ie 866 and 885), SPI area can already be relocated,
> allowing the use of those functions on SCC2. But SCC3 and SCC4
> parameter RAM collide with SMC1 and SMC2 parameter RAMs.
>
> This patch adds microcode to allow the relocation of both SMC1 and
> SMC2, and relocate them at offsets 0x1ec0 and 0x1fc0.
> Those offsets are by default for the CPM1 DSP1 and DSP2, but there
> is no kernel driver using them at the moment so this area can be
> reused.
>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/platforms/8xx/Kconfig  |   7 ++
>  arch/powerpc/platforms/8xx/micropatch.c | 109 
> +++-
>  2 files changed, 114 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/platforms/8xx/micropatch.c 
> b/arch/powerpc/platforms/8xx/micropatch.c
> index 33a9042fca80..dc4423daf7d4 100644
> --- a/arch/powerpc/platforms/8xx/micropatch.c
> +++ b/arch/powerpc/platforms/8xx/micropatch.c
> @@ -622,6 +622,86 @@ static uint patch_2f00[] __initdata = {
>  };
>  #endif
>  
> +/*
> + * SMC relocation patch arrays.
> + */
> +
> +#ifdef CONFIG_SMC_UCODE_PATCH
> +
> +static uint patch_2000[] __initdata = {
> + 0x3fff, 0x3ffd, 0x3ffb, 0x3ff9,
> + 0x5fefeff8, 0x5f91eff8, 0x3ff3, 0x3ff1,
> + 0x3a11e710, 0xedf0ccb9, 0xf318ed66, 0x7f0e5fe2,

Do we have any doc on what these values are?

I get that it's microcode but do we have any more detail than that?
What's the source etc?

cheers

Re: [PATCH] EDAC, mpc85xx: Prevent building as a module

2019-05-13 Thread Michael Ellerman

Borislav Petkov  writes:
> On Fri, May 10, 2019 at 04:13:20PM +0200, Borislav Petkov wrote:
>> On Fri, May 10, 2019 at 08:50:52PM +1000, Michael Ellerman wrote:
>> > Yeah that looks better to me. I didn't think about the case where EDAC
>> > core is modular.
>> > 
>> > Do you want me to send a new patch?
>> 
>> Nah, I'll fix it up.
>
> I've pushed it here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/commit/?h=edac-fix-for-5.2
>
> in case you wanna throw your build tests on it. My dingy cross-compiler
> can't do much really.

Looks good. I even booted it :)

cheers

Re: [PATCH v3] fs/proc: add VmTaskSize field to /proc/$$/status

2019-05-13 Thread Michael Ellerman

Yury Norov  writes:
> On Fri, May 10, 2019 at 01:32:22PM +1000, Michael Ellerman wrote:
>> Yury Norov  writes:
>> > On Tue, May 07, 2019 at 08:54:31AM -0400, Rafael Aquini wrote:
>> >> On Mon, May 06, 2019 at 11:53:43AM -0400, Joel Savitz wrote:
>> >> > There is currently no easy and architecture-independent way to find the
>> >> > lowest unusable virtual address available to a process without
>> >> > brute-force calculation. This patch allows a user to easily retrieve
>> >> > this value via /proc//status.
>> >> > 
>> >> > Using this patch, any program that previously needed to waste cpu cycles
>> >> > recalculating a non-sensitive process-dependent value already known to
>> >> > the kernel can now be optimized to use this mechanism.
>> >> > 
>> >> > Signed-off-by: Joel Savitz 
>> >> > ---
>> >> >  Documentation/filesystems/proc.txt | 2 ++
>> >> >  fs/proc/task_mmu.c | 2 ++
>> >> >  2 files changed, 4 insertions(+)
>> >> > 
>> >> > diff --git a/Documentation/filesystems/proc.txt 
>> >> > b/Documentation/filesystems/proc.txt
>> >> > index 66cad5c86171..1c6a912e3975 100644
>> >> > --- a/Documentation/filesystems/proc.txt
>> >> > +++ b/Documentation/filesystems/proc.txt
>> >> > @@ -187,6 +187,7 @@ read the file /proc/PID/status:
>> >> >VmLib:  1412 kB
>> >> >VmPTE:20 kb
>> >> >VmSwap:0 kB
>> >> > +  VmTaskSize:  137438953468 kB
>> >> >HugetlbPages:  0 kB
>> >> >CoreDumping:0
>> >> >THP_enabled:   1
>> >> > @@ -263,6 +264,7 @@ Table 1-2: Contents of the status files (as of 4.19)
>> >> >   VmPTE   size of page table entries
>> >> >   VmSwap  amount of swap used by anonymous private 
>> >> > data
>> >> >   (shmem swap usage is not included)
>> >> > + VmTaskSize  lowest unusable address in process 
>> >> > virtual memory
>> >> 
>> >> Can we change this help text to "size of process' virtual address space 
>> >> memory" ?
>> >
>> > Agree. Or go in other direction and make it VmEnd
>> 
>> Yeah I think VmEnd would be clearer to folks who aren't familiar with
>> the kernel's usage of the TASK_SIZE terminology.
>> 
>> >> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>> >> > index 95ca1fe7283c..0af7081f7b19 100644
>> >> > --- a/fs/proc/task_mmu.c
>> >> > +++ b/fs/proc/task_mmu.c
>> >> > @@ -74,6 +74,8 @@ void task_mem(struct seq_file *m, struct mm_struct 
>> >> > *mm)
>> >> > seq_put_decimal_ull_width(m,
>> >> > " kB\nVmPTE:\t", mm_pgtables_bytes(mm) >> 10, 8);
>> >> > SEQ_PUT_DEC(" kB\nVmSwap:\t", swap);
>> >> > +   seq_put_decimal_ull_width(m,
>> >> > +   " kB\nVmTaskSize:\t", mm->task_size >> 10, 8);
>> >> > seq_puts(m, " kB\n");
>> >> > hugetlb_report_usage(m, mm);
>> >> >  }
>> >
>> > I'm OK with technical part, but I still have questions not answered
>> > (or wrongly answered) in v1 and v2. Below is the very detailed
>> > description of the concerns I have.
>> >
>> > 1. What is the exact reason for it? Original version tells about some
>> > test that takes so much time that you were able to drink a cup of
>> > coffee before it was done. The test as you said implements linear
>> > search to find the last page and so is of O(n). If it's only for some
>> > random test, I think the kernel can survive without it. Do you have a
>> > real example of useful programs that suffer without this information?
>> >
>> >
>> > 2. I have nothing against taking breaks and see nothing weird if
>> > ineffective algorithms take time. On my system (x86, Ubuntu) the last
>> > mapped region according to /proc//maps is:
>> > ff60-ff601000 r-xp  00:00 0 [vsyscall]
>> > So to find the required address, we have to inspect 2559 pages. With a
>> > binary search it would take 12 iterations at

Re: [PATCH RESEND] powerpc: add simd.h implementation specific to PowerPC

2019-05-13 Thread Michael Ellerman

Shawn Landden  writes:
> It is safe to do SIMD in an interrupt on PowerPC.

No it's not sorry :)

> Only disable when there is no SIMD available
> (and this is a static branch).
>
> Tested and works with the WireGuard (Zinc) patch I wrote that needs this.
> Also improves performance of the crypto subsystem that checks this.
>
> Re-sending because this linuxppc-dev didn't seem to accept it.

It did but you were probably moderated as a non-subscriber? In future if
you just wait a while for the moderators to wake up it should come
through. Though having posted once and been approved I think you might
not get moderated at all in future (?).

> Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=203571
> Signed-off-by: Shawn Landden 
> ---
>  arch/powerpc/include/asm/simd.h | 15 +++
>  1 file changed, 15 insertions(+)
>  create mode 100644 arch/powerpc/include/asm/simd.h
>
> diff --git a/arch/powerpc/include/asm/simd.h b/arch/powerpc/include/asm/simd.h
> new file mode 100644
> index 0..b3fecb95a
> --- /dev/null
> +++ b/arch/powerpc/include/asm/simd.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +
> +#include 
> +
> +/*
> + * may_use_simd - whether it is allowable at this time to issue SIMD
> + *instructions or access the SIMD register file
> + *
> + * As documented in Chapter 6.2.1 Machine Status Save/Restore Registers
> + * of Power ISA (2.07 and 3.0), all registers are saved/restored in an 
> interrupt.

I think the confusion here is that the ISA says:

  When various interrupts occur, the state of the machine is saved in the
  Machine Status Save/Restore registers (SRR0 and SRR1).

And you've read that to mean all "the state" of the machine, ie.
including GPRs, FPRs etc.

But unfortunately it's not that simple. All the hardware does is write
two 64-bit registers (SRR0 & SRR1) with the information required to be
able to return to the state the CPU was in prior to the interrupt. That
includes (obviously) the PC you were executing at, and what state the
CPU was in (ie. 64/32-bit, MMU on/off, FP on/off etc.). All the saving
of registers etc. is left up to software. It's the RISC way :)

I guess we need to work out why exactly may_use_simd() is returning
false in your kworker. 

cheers

Re: [PATCH, RFC] byteorder: sanity check toolchain vs kernel endianess

2019-05-13 Thread Michael Ellerman

Dmitry Vyukov  writes:
> From: Arnd Bergmann 
> Date: Sat, May 11, 2019 at 2:51 AM
> To: Dmitry Vyukov
> Cc: Nick Kossifidis, Christoph Hellwig, Linus Torvalds, Andrew Morton,
> linux-arch, Linux Kernel Mailing List, linuxppc-dev
>
>> On Fri, May 10, 2019 at 6:53 AM Dmitry Vyukov  wrote:
>> > >
>> > > I think it's good to have a sanity check in-place for consistency.
>> >
>> >
>> > Hi,
>> >
>> > This broke our cross-builds from x86. I am using:
>> >
>> > $ powerpc64le-linux-gnu-gcc --version
>> > powerpc64le-linux-gnu-gcc (Debian 7.2.0-7) 7.2.0
>> >
>> > and it says that it's little-endian somehow:
>> >
>> > $ powerpc64le-linux-gnu-gcc -dM -E - < /dev/null | grep BYTE_ORDER
>> > #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__
>> >
>> > Is it broke compiler? Or I always hold it wrong? Is there some
>> > additional flag I need to add?
>>
>> It looks like a bug in the kernel Makefiles to me. powerpc32 is always
>> big-endian,
>> powerpc64 used to be big-endian but is now usually little-endian. There are
>> often three separate toolchains that default to the respective user
>> space targets
>> (ppc32be, ppc64be, ppc64le), but generally you should be able to build
>> any of the
>> three kernel configurations with any of those compilers, and have the 
>> Makefile
>> pass the correct -m32/-m64/-mbig-endian/-mlittle-endian command line options
>> depending on the kernel configuration. It seems that this is not happening
>> here. I have not checked why, but if this is the problem, it should be
>> easy enough
>> to figure out.
>
>
> Thanks! This clears a lot.
> This may be a bug in our magic as we try to build kernel files outside
> of make with own flags (required to extract parts of kernel
> interfaces).
> So don't spend time looking for the Makefile bugs yet.

OK :)

We did have some bugs in the past (~1-2 y/ago) but AFAIK they are all
fixed now. These days I build most of my kernels with a bi-endian 64-bit
toolchain, and switching endian without running `make clean` also works.

cheers

Re: [PATCH] EDAC, mpc85xx: Prevent building as a module

2019-05-10 Thread Michael Ellerman

Borislav Petkov  writes:

> On Thu, May 09, 2019 at 04:55:34PM +0200, Borislav Petkov wrote:
>> On Fri, May 10, 2019 at 12:52:05AM +1000, Michael Ellerman wrote:
>> > Thanks. It would be nice if you could send it as a fix for 5.2, it's the
>> > last thing blocking one of my allmodconfig builds. But if you don't
>> > think it qualifies as a fix that's fine too, it can wait.
>> 
>> Sure, no problem. Will do a pull request later.
>
> Hmm, so looking at this more, I was able to produce this config with my
> ancient cross-compiler:
>
> CONFIG_EDAC_SUPPORT=y
> CONFIG_EDAC=m
> CONFIG_EDAC_LEGACY_SYSFS=y
> CONFIG_EDAC_MPC85XX=y

Oh yeah good point.

> Now, mpc85xx_edac is built-in and edac_core.ko is a module
> (CONFIG_EDAC=m) and that should not work - i.e., builtin code calling
> module functions. But my cross-compiler is happily building this without
> complaint. Or maybe I'm missing something.

That's weird.

> In any case, I *think* the proper fix should be to do:
>
> config EDAC_MPC85XX
> bool "Freescale MPC83xx / MPC85xx"
> depends on FSL_SOC && EDAC=y
>
> so that you can't even produce the above invalid .config snippet.
>
> Hmmm?

Yeah that looks better to me. I didn't think about the case where EDAC
core is modular.

Do you want me to send a new patch?

cheers

RE: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-10 Thread Michael Ellerman

David Laight  writes:
> From: Michal Suchánek
>> Sent: 09 May 2019 14:38
> ...
>> > The problem is the combination of some new code called via printk(),
>> > check_pointer() which calls probe_kernel_read(). That then calls
>> > allow_user_access() (PPC_KUAP) and that uses mmu_has_feature() too early
>> > (before we've patched features).
>> 
>> There is early_mmu_has_feature for this case. mmu_has_feature does not
>> work before patching so parts of kernel that can run before patching
>> must use the early_ variant which actually runs code reading the
>> feature bitmap to determine the answer.
>
> Does the early_ variant get patched so the it is reasonably
> efficient after the 'patching' is done?

No they don't get patched ever. The name is a bit misleading I guess.

> Or should there be a third version which gets patched across?

For a case like this it's entirely safe to just skip the code early in
boot, so if it was a static_key_false everything would just work.

Unfortunately the way the code is currently written we would have to
change all MMU features to static_key_false and that risks breaking
something else.

We have a long standing TODO to rework all our feature logic and unify
CPU/MMU/firmware/etc. features. Possibly as part of that we can come up
with a scheme where the default value is per-feature bit.

Having said all that, in this case the overhead of the test and branch
is small compared to the cost of writing to the SPR which controls user
access and then doing an isync, so it's all somewhat premature
optimisation.

cheers

Re: [PATCH v3] fs/proc: add VmTaskSize field to /proc/$$/status

2019-05-09 Thread Michael Ellerman

Yury Norov  writes:
> On Tue, May 07, 2019 at 08:54:31AM -0400, Rafael Aquini wrote:
>> On Mon, May 06, 2019 at 11:53:43AM -0400, Joel Savitz wrote:
>> > There is currently no easy and architecture-independent way to find the
>> > lowest unusable virtual address available to a process without
>> > brute-force calculation. This patch allows a user to easily retrieve
>> > this value via /proc//status.
>> > 
>> > Using this patch, any program that previously needed to waste cpu cycles
>> > recalculating a non-sensitive process-dependent value already known to
>> > the kernel can now be optimized to use this mechanism.
>> > 
>> > Signed-off-by: Joel Savitz 
>> > ---
>> >  Documentation/filesystems/proc.txt | 2 ++
>> >  fs/proc/task_mmu.c | 2 ++
>> >  2 files changed, 4 insertions(+)
>> > 
>> > diff --git a/Documentation/filesystems/proc.txt 
>> > b/Documentation/filesystems/proc.txt
>> > index 66cad5c86171..1c6a912e3975 100644
>> > --- a/Documentation/filesystems/proc.txt
>> > +++ b/Documentation/filesystems/proc.txt
>> > @@ -187,6 +187,7 @@ read the file /proc/PID/status:
>> >VmLib:  1412 kB
>> >VmPTE:20 kb
>> >VmSwap:0 kB
>> > +  VmTaskSize: 137438953468 kB
>> >HugetlbPages:  0 kB
>> >CoreDumping:0
>> >THP_enabled:  1
>> > @@ -263,6 +264,7 @@ Table 1-2: Contents of the status files (as of 4.19)
>> >   VmPTE   size of page table entries
>> >   VmSwap  amount of swap used by anonymous private data
>> >   (shmem swap usage is not included)
>> > + VmTaskSize  lowest unusable address in process virtual 
>> > memory
>> 
>> Can we change this help text to "size of process' virtual address space 
>> memory" ?
>
> Agree. Or go in other direction and make it VmEnd

Yeah I think VmEnd would be clearer to folks who aren't familiar with
the kernel's usage of the TASK_SIZE terminology.

>> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>> > index 95ca1fe7283c..0af7081f7b19 100644
>> > --- a/fs/proc/task_mmu.c
>> > +++ b/fs/proc/task_mmu.c
>> > @@ -74,6 +74,8 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
>> >seq_put_decimal_ull_width(m,
>> >" kB\nVmPTE:\t", mm_pgtables_bytes(mm) >> 10, 8);
>> >SEQ_PUT_DEC(" kB\nVmSwap:\t", swap);
>> > +  seq_put_decimal_ull_width(m,
>> > +  " kB\nVmTaskSize:\t", mm->task_size >> 10, 8);
>> >seq_puts(m, " kB\n");
>> >hugetlb_report_usage(m, mm);
>> >  }
>
> I'm OK with technical part, but I still have questions not answered
> (or wrongly answered) in v1 and v2. Below is the very detailed
> description of the concerns I have.
>
> 1. What is the exact reason for it? Original version tells about some
> test that takes so much time that you were able to drink a cup of
> coffee before it was done. The test as you said implements linear
> search to find the last page and so is of O(n). If it's only for some
> random test, I think the kernel can survive without it. Do you have a
> real example of useful programs that suffer without this information?
>
>
> 2. I have nothing against taking breaks and see nothing weird if
> ineffective algorithms take time. On my system (x86, Ubuntu) the last
> mapped region according to /proc//maps is:
> ff60-ff601000 r-xp  00:00 0 [vsyscall]
> So to find the required address, we have to inspect 2559 pages. With a
> binary search it would take 12 iterations at max. If my calculation is
> wrong or your environment is completely different - please elaborate.

I agree it should not be hard to calculate, but at the same time it's
trivial for the kernel to export the information so I don't see why the
kernel shouldn't.

> 3. As far as I can see, Linux currently does not support dynamic
> TASK_SIZE. It means that for any platform+ABI combination VmTaskSize
> will be always the same. So VmTaskSize would be effectively useless waste
> of lines. In fact, TASK SIZE is compiler time information and should
> be exposed to user in headers. In discussion to v2 Rafael Aquini answered
> for this concern that TASK_SIZE is a runtime resolved macro. It's
> true, but my main point is: GCC knows what type of binary it compiles
> and is able to select proper value. We are already doing similar things
> where appropriate. Refer for example to my arm64/ilp32 series:
>
> arch/arm64/include/uapi/asm/bitsperlong.h:
> -#define __BITS_PER_LONG 64
> +#if defined(__LP64__)
> +/* Assuming __LP64__ will be defined for native ELF64's and not for ILP32. */
> +#  define __BITS_PER_LONG 64
> +#elif defined(__ILP32__)
> +#  define __BITS_PER_LONG 32
> +#else
> +#  error "Neither LP64 nor ILP32: unsupported ABI in asm/bitsperlong.h"
> +#endif
>
> __LP64__ and __ILP32__ are symbols provided by GCC to distinguish
> between ABIs. So userspace is able to take proper __BITS_PER_LONG value
> at compile time, not at runtime. I think, you can do the same t

Re: [PATCH] EDAC, mpc85xx: Prevent building as a module

2019-05-09 Thread Michael Ellerman

Borislav Petkov  writes:
> On Mon, May 06, 2019 at 08:50:45AM +0200, Johannes Thumshirn wrote:
>> Acked-by: Johannes Thumshirn 
>
> Queued, thanks.

Thanks. It would be nice if you could send it as a fix for 5.2, it's the
last thing blocking one of my allmodconfig builds. But if you don't
think it qualifies as a fix that's fine too, it can wait.

cheers

Re: [PATCH] powerpc: restore current_thread_info()

2019-05-07 Thread Michael Ellerman

Yury Norov  writes:
> On Tue, May 07, 2019 at 11:58:56PM +0100, Al Viro wrote:
>> On Tue, May 07, 2019 at 03:51:21PM -0700, Yury Norov wrote:
>> > Commit ed1cd6deb013 ("powerpc: Activate CONFIG_THREAD_INFO_IN_TASK")
>> > removes the function current_thread_info(). It's wrong because the
>> > function is used in non-arch code and is part of API.
>> 
>> In include/linux/thread_info.h:
>> 
>> #ifdef CONFIG_THREAD_INFO_IN_TASK
>> /*
>>  * For CONFIG_THREAD_INFO_IN_TASK kernels we need  for the
>>  * definition of current, but for !CONFIG_THREAD_INFO_IN_TASK kernels,
>>  * including  can cause a circular dependency on some 
>> platforms.
>>  */
>> #include 
>> #define current_thread_info() ((struct thread_info *)current)
>> #endif
>
> Ah, sorry. Then it might be my rebase issue. I was confused because Christophe
> didn't remove the comment to current_thread_info(), so I decided he
> removed it erroneously.

Yeah you're right, that comment should have been removed too.

cheers

Re: [PATCH linux-next v10 5/7] powerpc: define syscall_get_error()

2019-05-06 Thread Michael Ellerman

"Dmitry V. Levin"  writes:

> syscall_get_error() is required to be implemented on this
> architecture in addition to already implemented syscall_get_nr(),
> syscall_get_arguments(), syscall_get_return_value(), and
> syscall_get_arch() functions in order to extend the generic
> ptrace API with PTRACE_GET_SYSCALL_INFO request.
>
> Cc: Michael Ellerman 
> Cc: Elvira Khabirova 
> Cc: Eugene Syromyatnikov 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Oleg Nesterov 
> Cc: Andy Lutomirski 
> Cc: linuxppc-...@lists.ozlabs.org
> Signed-off-by: Dmitry V. Levin 
> ---
>
> Michael, this patch is waiting for ACK since early December.

Sorry, the more I look at our seccomp/ptrace code the more problems I
find :/

This change looks OK to me, given it will only be called by your new
ptrace API.

Acked-by: Michael Ellerman 


> Notes:
> v10: unchanged
> v9: unchanged
> v8: unchanged
> v7: unchanged
> v6: unchanged
> v5: initial revision
> 
> This change has been tested with
> tools/testing/selftests/ptrace/get_syscall_info.c and strace,
> so it's correct from PTRACE_GET_SYSCALL_INFO point of view.
> 
> This cast doubts on commit v4.3-rc1~86^2~81 that changed
> syscall_set_return_value() in a way that doesn't quite match
> syscall_get_error(), but syscall_set_return_value() is out
> of scope of this series, so I'll just let you know my concerns.
 
Yeah I think you're right. My commit made it work for seccomp but only
on the basis that seccomp calls syscall_set_return_value() and then
immediately goes out via the syscall exit path. And only the combination
of those gets things into the same state that syscall_get_error()
expects.

But with the way the code is currently structured if
syscall_set_return_value() negated the error value, then the syscall
exit path would then store the wrong thing in pt_regs->result. So I
think it needs some more work rather than just reverting 1b1a3702a65c.

But I think fixing that can be orthogonal to this commit going in as the
code does work as it's currently written, the in-between state that
syscall_set_return_value() creates via seccomp should not be visible to
ptrace.

cheers

> See also 
> https://lore.kernel.org/lkml/874lbbt3k6@concordia.ellerman.id.au/
> for more details on powerpc syscall_set_return_value() confusion.
>
>  arch/powerpc/include/asm/syscall.h | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/syscall.h 
> b/arch/powerpc/include/asm/syscall.h
> index a048fed0722f..bd9663137d57 100644
> --- a/arch/powerpc/include/asm/syscall.h
> +++ b/arch/powerpc/include/asm/syscall.h
> @@ -38,6 +38,16 @@ static inline void syscall_rollback(struct task_struct 
> *task,
>   regs->gpr[3] = regs->orig_gpr3;
>  }
>  
> +static inline long syscall_get_error(struct task_struct *task,
> +  struct pt_regs *regs)
> +{
> + /*
> +  * If the system call failed,
> +  * regs->gpr[3] contains a positive ERRORCODE.
> +  */
> + return (regs->ccr & 0x1000UL) ? -regs->gpr[3] : 0;
> +}
> +
>  static inline long syscall_get_return_value(struct task_struct *task,
>   struct pt_regs *regs)
>  {
> -- 
> ldv

Re: [PATCH v2 03/15] powerpc/mm: convert Book3E 64 to pte_fragment

2019-05-06 Thread Michael Ellerman

Christophe Leroy  writes:

> Le 26/04/2019 à 17:58, Christophe Leroy a écrit :
>> Book3E 64 is the only subarch not using pte_fragment. In order
>> to allow refactorisation, this patch converts it to pte_fragment.
>> 
>> Reviewed-by: Aneesh Kumar K.V 
>> Signed-off-by: Christophe Leroy 
>> ---
>>   arch/powerpc/include/asm/mmu_context.h   |  6 -
>>   arch/powerpc/include/asm/nohash/64/mmu.h |  4 +++-
>>   arch/powerpc/include/asm/nohash/64/pgalloc.h | 33 
>> ++--
>>   arch/powerpc/mm/Makefile |  4 ++--
>>   arch/powerpc/mm/mmu_context.c|  2 +-
>>   5 files changed, 18 insertions(+), 31 deletions(-)
>> 
> [...]
>
>> diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
>> index 3c1bd9fa23cd..138c772d58d1 100644
>> --- a/arch/powerpc/mm/Makefile
>> +++ b/arch/powerpc/mm/Makefile
>> @@ -9,6 +9,7 @@ CFLAGS_REMOVE_slb.o = $(CC_FLAGS_FTRACE)
>>   
>>   obj-y  := fault.o mem.o pgtable.o mmap.o \
>> init_$(BITS).o pgtable_$(BITS).o \
>> +   pgtable-frag.o \
>> init-common.o mmu_context.o drmem.o
>>   obj-$(CONFIG_PPC_MMU_NOHASH)   += mmu_context_nohash.o tlb_nohash.o \
>> tlb_nohash_low.o
>> @@ -17,8 +18,7 @@ hash64-$(CONFIG_PPC_NATIVE):= hash_native_64.o
>>   obj-$(CONFIG_PPC_BOOK3E_64)   += pgtable-book3e.o
>>   obj-$(CONFIG_PPC_BOOK3S_64)+= pgtable-hash64.o hash_utils_64.o 
>> slb.o \
>> $(hash64-y) mmu_context_book3s64.o \
>> -   pgtable-book3s64.o pgtable-frag.o
>> -obj-$(CONFIG_PPC32) += pgtable-frag.o
>> +   pgtable-book3s64.o
>
> Looks like the removal of pgtable-frag.o for CONFIG_PPC_BOOK3S_64 didn't 
> survive the merge.

Hmm, I don't remember having problems there but clearly something went
wrong.

> Will send a patch to fix that.

Thanks.

cheers

[GIT PULL] Please pull powerpc/linux.git powerpc-5.1-7 tag

2019-05-04 Thread Michael Ellerman

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Linus,

Please pull one more powerpc fix for 5.1:

The following changes since commit 7a3a4d763837d3aa654cd1059030950410c04d77:

  powerpc/mm_iommu: Allow pinning large regions (2019-04-17 21:36:51 +1000)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-5.1-7

for you to fetch changes up to 12f363511d47f86c49b7766c349989cb33fd61a8:

  powerpc/32s: Fix BATs setting with CONFIG_STRICT_KERNEL_RWX (2019-05-02 
15:33:46 +1000)

- --
powerpc fixes for 5.1 #7

One regression fix.

Changes we merged to STRICT_KERNEL_RWX on 32-bit were causing crashes under
load on some machines depending on memory layout.

Thanks to:
  Christophe Leroy.

- --
Christophe Leroy (1):
  powerpc/32s: Fix BATs setting with CONFIG_STRICT_KERNEL_RWX


 arch/powerpc/mm/ppc_mmu_32.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)
-BEGIN PGP SIGNATURE-

iQIcBAEBAgAGBQJczaPFAAoJEFHr6jzI4aWAEsoP+wbm3WGTTSULdF3PbIGEtVbQ
CNrw/LpvNsmAn6210U2ag7Fwd6/hIprIy9wgwbgNiB2kDtMNy6srl1eMlC9npsV4
y43xeJQ0E2+u10eaBNsiwJEYtmNkJMuxCu31zGH/PZ4nTi4hdvaVwUETR725vYli
LICixZ2yr1eL948D3DzWpGigBmGhq1ajBsdXxn2sHxbeqefnFesdrjPR2e2GIj7E
cyHb+7vUATLUVc405yYCyHEU3/cly12LPcsreGe/tPWSJxw8I2BU36lCCXgby62w
E1KlSb4EzFx+lFujK6ICxaflFOtkP+0Xzajq8YU0qrItkGM8DA6yTy4vU99gN1KP
pgNwbaoMQCNJvzk0cIuMZ0RMGKrkRT4y2jW+MhHALMSkyv4HGKqT/N227PMq4U94
nVv41w868b8NnTrN9pLfSR+Gyr/Q7sF8zEVv0eIpbSciB/OTcg0yqAp7V2p0cTBG
pKM/c6glvkrbfEkoRItMpVU0PbckPFjXgTVqI/rvdiAVoEQvi1U4r8Fpu5I28j1d
wuryRhjnGXgkSjBlXkSK+tASWZHcKwhnD1y9KTHtKuLdxJqjDfyX0Dii+FqU5w42
aKeU4VrlrCGX2VnLj7ViH99wzkMogP9oZWZ5bhmOva/boxo1kJ9/vQkxaiXrhVjd
NLBrlVeLtaCjY52+eEJS
=k/X8
-END PGP SIGNATURE-

Re: [PATCH] powerpc/powernv/ioda2: Add __printf format/argument verification

2019-05-03 Thread Michael Ellerman

Joe Perches  writes:
> On Fri, 2019-05-03 at 16:59 +1000, Michael Ellerman wrote:
>> On Thu, 2017-03-30 at 10:19:25 UTC, Joe Perches wrote:
>> > Fix fallout too.
>> > 
>> > Signed-off-by: Joe Perches 
>> 
>> Applied to powerpc next, thanks.
>> 
>> https://git.kernel.org/powerpc/c/1e496391a8452101308a23b7395cdd49
>
> 2+ years later.

I was hoping for a new record.

cheers

Re: [PATCH] powerpc/book3e: drop BUG_ON() in map_kernel_page()

2019-05-03 Thread Michael Ellerman

On Thu, 2019-03-28 at 13:03:45 UTC, Christophe Leroy wrote:
> early_alloc_pgtable() never returns NULL as it panics on failure.
> 
> This patch drops the three BUG_ON() which check the non nullity
> of early_alloc_pgtable() returned value.
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a1ac2a9c4f98482e49305ab5551b7b32

cheers

Re: [PATCH 08/14] powerpc: entry: Remove unneeded need_resched() loop

2019-05-03 Thread Michael Ellerman

On Mon, 2019-03-11 at 22:47:46 UTC, Valentin Schneider wrote:
> Since the enabling and disabling of IRQs within preempt_schedule_irq()
> is contained in a need_resched() loop, we don't need the outer arch
> code loop.
> 
> Signed-off-by: Valentin Schneider 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: linuxppc-...@lists.ozlabs.org

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/90437bffa5f9b1440ba03e023f4875d1

cheers

Re: [PATCH] powerpc/powernv/ioda2: Add __printf format/argument verification

2019-05-03 Thread Michael Ellerman

On Thu, 2017-03-30 at 10:19:25 UTC, Joe Perches wrote:
> Fix fallout too.
> 
> Signed-off-by: Joe Perches 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/1e496391a8452101308a23b7395cdd49

cheers

[PATCH] EDAC, mpc85xx: Prevent building as a module

2019-05-02 Thread Michael Ellerman

The mpc85xx EDAC code can be configured as a module but then fails to
build because it uses two unexported symbols:

  ERROR: ".pci_find_hose_for_OF_device" [drivers/edac/mpc85xx_edac_mod.ko] 
undefined!
  ERROR: ".early_find_capability" [drivers/edac/mpc85xx_edac_mod.ko] undefined!

We don't want to export those symbols just for this driver, so make
the driver only configurable as a built-in.

This seems to have been broken since at least commit c92132f59806
("edac/85xx: Add PCIe error interrupt edac support") (Nov 2013).

Signed-off-by: Michael Ellerman 
---
 drivers/edac/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 47eb4d13ed5f..6317519f9d88 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -263,7 +263,7 @@ config EDAC_PND2
  micro-server but may appear on others in the future.
 
 config EDAC_MPC85XX
-   tristate "Freescale MPC83xx / MPC85xx"
+   bool "Freescale MPC83xx / MPC85xx"
depends on FSL_SOC
help
  Support for error detection and correction on the Freescale
-- 
2.20.1

Re: [PATCH 037/114] arch: powerpc: Kconfig: pedantic formatting

2019-04-28 Thread Michael Ellerman

"Enrico Weigelt, metux IT consult"  writes:
> Formatting of Kconfig files doesn't look so pretty, so let the
> Great White Handkerchief come around and clean it up.
>
> Signed-off-by: Enrico Weigelt, metux IT consult 
> ---
>  arch/powerpc/Kconfig   | 28 ++--
>  arch/powerpc/kvm/Kconfig   |  6 +++---
>  arch/powerpc/platforms/40x/Kconfig |  7 +++
>  arch/powerpc/platforms/44x/Kconfig | 10 +-
>  arch/powerpc/platforms/85xx/Kconfig|  8 
>  arch/powerpc/platforms/86xx/Kconfig|  6 +++---
>  arch/powerpc/platforms/maple/Kconfig   |  2 +-
>  arch/powerpc/platforms/pseries/Kconfig | 18 +-
>  arch/powerpc/sysdev/xics/Kconfig   | 13 ++---
>  9 files changed, 48 insertions(+), 50 deletions(-)
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 2d0be82..ea29d94 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -430,13 +430,13 @@ config MATH_EMULATION_HW_UNIMPLEMENTED
>  endchoice
>  
>  config PPC_TRANSACTIONAL_MEM
> -   bool "Transactional Memory support for POWERPC"
> -   depends on PPC_BOOK3S_64
> -   depends on SMP
> -   select ALTIVEC
> -   select VSX
> -   ---help---
> - Support user-mode Transactional Memory on POWERPC.
> + bool "Transactional Memory support for POWERPC"
> + depends on PPC_BOOK3S_64
> + depends on SMP
> + select ALTIVEC
> + select VSX
> + ---help---
> +   Support user-mode Transactional Memory on POWERPC.

If you're going to do this can you also convert all the "---help---"
instances to just "help"? Thanks.

cheers

Re: [PATCH] x86/mpx: fix recursive munmap() corruption

2019-04-20 Thread Michael Ellerman

Thomas Gleixner  writes:
> On Mon, 1 Apr 2019, Dave Hansen wrote:
>> diff -puN mm/mmap.c~mpx-rss-pass-no-vma mm/mmap.c
>> --- a/mm/mmap.c~mpx-rss-pass-no-vma  2019-04-01 06:56:53.409411123 -0700
>> +++ b/mm/mmap.c  2019-04-01 06:56:53.423411123 -0700
>> @@ -2731,9 +2731,17 @@ int __do_munmap(struct mm_struct *mm, un
>>  return -EINVAL;
>>  
>>  len = PAGE_ALIGN(len);
>> +end = start + len;
>>  if (len == 0)
>>  return -EINVAL;
>>  
>> +/*
>> + * arch_unmap() might do unmaps itself.  It must be called
>> + * and finish any rbtree manipulation before this code
>> + * runs and also starts to manipulate the rbtree.
>> + */
>> +arch_unmap(mm, start, end);
>
> ...
>   
>> -static inline void arch_unmap(struct mm_struct *mm, struct vm_area_struct 
>> *vma,
>> -  unsigned long start, unsigned long end)
>> +static inline void arch_unmap(struct mm_struct *mm, unsigned long start,
>> +  unsigned long end)
>
> While you fixed up the asm-generic thing, this breaks arch/um and
> arch/unicorn32. For those the fixup is trivial by removing the vma
> argument.
>
> But itt also breaks powerpc and there I'm not sure whether moving
> arch_unmap() to the beginning of __do_munmap() is safe. Micheal???

I don't know for sure but I think it should be fine. That code is just
there to handle CRIU unmapping/remapping the VDSO. So that either needs
to happen while the process is stopped or it needs to handle races
anyway, so I don't see how the placement within the unmap path should
matter.

> Aside of that the powerpc variant looks suspicious:
>
> static inline void arch_unmap(struct mm_struct *mm,
>   unsigned long start, unsigned long end)
> {
>   if (start <= mm->context.vdso_base && mm->context.vdso_base < end)
> mm->context.vdso_base = 0;
> }
>
> Shouldn't that be: 
>
>   if (start >= mm->context.vdso_base && mm->context.vdso_base < end)
>
> Hmm?

Yeah looks pretty suspicious. I'll follow-up with Laurent who wrote it.
Thanks for spotting it!

cheers

Re: [PATCH 4.9 20/76] powerpc/64: Disable the speculation barrier from the command line

2019-04-17 Thread Michael Ellerman

Diana Madalina Craciun  writes:
> Hi,
>
> I have tested the patches on NXP platforms and they worked as expected.

Thanks Diana.

cheers

> On 4/15/2019 9:45 PM, Greg Kroah-Hartman wrote:
>> commit cf175dc315f90185128fb061dc05b6fbb211aa2f upstream.
>>
>> The speculation barrier can be disabled from the command line
>> with the parameter: "nospectre_v1".
>>
>> Signed-off-by: Diana Craciun 
>> Signed-off-by: Michael Ellerman 
>> Signed-off-by: Sasha Levin 
>> ---
>>  arch/powerpc/kernel/security.c | 12 +++-
>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
>> index bf298d0c475f..813e38ff81ce 100644
>> --- a/arch/powerpc/kernel/security.c
>> +++ b/arch/powerpc/kernel/security.c
>> @@ -17,6 +17,7 @@
>>  unsigned long powerpc_security_features __read_mostly = SEC_FTR_DEFAULT;
>>
>>  bool barrier_nospec_enabled;
>> +static bool no_nospec;
>>
>>  static void enable_barrier_nospec(bool enable)
>>  {
>> @@ -43,9 +44,18 @@ void setup_barrier_nospec(void)
>> enable = security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) &&
>>  security_ftr_enabled(SEC_FTR_BNDS_CHK_SPEC_BAR);
>>
>> -   enable_barrier_nospec(enable);
>> +   if (!no_nospec)
>> +   enable_barrier_nospec(enable);
>>  }
>>
>> +static int __init handle_nospectre_v1(char *p)
>> +{
>> +   no_nospec = true;
>> +
>> +   return 0;
>> +}
>> +early_param("nospectre_v1", handle_nospectre_v1);
>> +
>>  #ifdef CONFIG_DEBUG_FS
>>  static int barrier_nospec_set(void *data, u64 val)
>>  {
>> --
>> 2.19.1
>>
>>
>>
>>

Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)

2019-04-08 Thread Michael Ellerman

Carlos O'Donell  writes:
> On 4/8/19 3:20 PM, Tulio Magno Quites Machado Filho wrote:
>> Carlos O'Donell  writes:
>> 
>>> On 4/5/19 5:16 AM, Florian Weimer wrote:
 * Carlos O'Donell:
> It is valuable that it be a trap, particularly for constant pools because
> it means that a jump into the constant pool will trap.

 Sorry, I don't understand why this matters in this context.  Would you
 please elaborate?
>>>
>>> Sorry, I wasn't very clear.
>>>
>>> My point is only that any accidental jumps, either with off-by-one (like you
>>> fixed in gcc/glibc's signal unwinding most recently), result in a process 
>>> fault
>>> rather than executing RSEQ_SIG as a valid instruction *and then* continuing
>>> onwards to the handler.
>>>
>>> A process fault is achieved either by a trap, or an invalid instruction, or
>>> a privileged insn (like suggested for MIPS in this thread).
>> 
>> In that case, mtmsr (Move to Machine State Register) seems a good candidate.
>> 
>> mtmsr is available both on 32 and 64 bits since their first implementations.
>> 
>> It's a privileged instruction and should never appear in userspace
>> code (causes SIGILL).

I'd much rather we use a trap with a specific immediate value. Otherwise
someone's going to waste time one day puzzling over why userspace is
doing mtmsr.

It would also complicate things if we ever wanted to emulate mtmsr.

If we want something that is a trap rather than a nop then use 0x0fe50553.

That's "compare the value in r5 with 0x553 and then trap unconditionally".

It shows up in objdump as:

1000:   53 05 e5 0f twuir5,1363

The immediate can be anything, I chose that value to mimic the x86 value
Mathieu mentioned.

There's no reason that instruction would ever be generated because the
immediate value serves no purpose. So it satisfies the "very unlikely
to appear" criteria AFAICS.

cheers

Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)

2019-04-01 Thread Michael Ellerman

Mathieu Desnoyers  writes:
> Hi Carlos,
>
> - On Mar 22, 2019, at 4:09 PM, Carlos O'Donell codon...@redhat.com wrote:
...
>
> [...]
>>> +++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
> [...]
>>> +/* Signature required before each abort handler code.  */
>>> +#define RSEQ_SIG 0x53053053
>> 
>> Why isn't this an opcode specific to power?
>
> On powerpc 32/64, the abort is placed in a __rseq_failure executable section:
>
> #define RSEQ_ASM_DEFINE_ABORT(label, abort_label) 
>   \
> ".pushsection __rseq_failure, \"ax\"\n\t" 
>   \
> ".long " __rseq_str(RSEQ_SIG) "\n\t"  
>   \
> __rseq_str(label) ":\n\t" 
>   \
> "b %l[" __rseq_str(abort_label) "]\n\t"   
>   \
> ".popsection\n\t"
>
> That section only contains snippets of those trampolines. Arguably, it would 
> be
> good if disassemblers could find valid instructions there. Boqun Feng could 
> perhaps
> shed some light on this signature choice ? Now would be a good time to decide
> once and for all whether a valid instruction would be a better choice.

I'm a bit vague on what we're trying to do here.

But it seems like you want some sort of "eye catcher" prior to the branch?

That value is a valid instruction on current CPUs (rlwimi.
r5,r24,6,1,9), and even if it wasn't it could become one in future.

If you change it to 0x8053530 that is both a valid instruction and is a
nop (conditional trap immediate but with no conditions set).

cheers

Re: linux-next: build failure after merge of the sound-asoc tree

2019-04-01 Thread Michael Ellerman

Mark Brown  writes:
> On Wed, Mar 27, 2019 at 03:29:55PM +1100, Michael Ellerman wrote:
>> Mark Brown  writes:
>
>> > Hrm, seems PowerPC is still not using the common clock API - is there
>> > any plan for that?  There are some ASoC PowerPC uses so it's going to be
>> > a bit of an issue as we expand our use of the clock API.
>
>> I don't know anything about the common clock API. What would it involve
>> for powerpc to use it?
>
> It's what's in drivers/clk - you'd have to provide clock drivers for all
> the clocks that are current supported by arch-specific code, make sure
> that those drivers can be instantiated and then remove the custom
> implementation of the clock API in arch/powerpc in favour of those.

OK. I realise we do have some support for the common clock API, but only
on certain sub-platforms (PPC_MPC512x, PPC_MPC52xx, PPC_E500MC).

On other platforms we have nothing at all AFAICS.

Seems Ben posted an RFC to support it in 2009, but nothing since:
  http://patchwork.ozlabs.org/patch/31551/

Anyway I think what you've done in next, make the code depend on
COMMON_CLOCK, is the best option. If anyone cares about that driver on
powerpc platforms that don't support COMMON_CLOCK they should speak up.

cheers

Re: linux-next: build failure after merge of the sound-asoc tree

2019-03-26 Thread Michael Ellerman

Mark Brown  writes:
> On Tue, Mar 26, 2019 at 01:33:49PM +1100, Stephen Rothwell wrote:
>
>> After merging the sound-asoc tree, today's linux-next build (powerpc
>> allyesconfig) failed like this:
>
>> sound/soc/codecs/tlv320aic32x4-clk.c: In function 'clk_aic32x4_pll_prepare':
>> include/linux/kernel.h:979:32: error: dereferencing pointer to incomplete 
>> type 'struct clk_hw'
>>   BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
>> ^~
>
> Hrm, seems PowerPC is still not using the common clock API - is there
> any plan for that?  There are some ASoC PowerPC uses so it's going to be
> a bit of an issue as we expand our use of the clock API.

I don't know anything about the common clock API. What would it involve
for powerpc to use it?

cheers

Re: [PATCH v2] powerpc/8xx: fix possible object reference leak

2019-03-25 Thread Michael Ellerman

Peng Hao  writes:

> From: Wen Yang 
>
> The call to of_find_compatible_node returns a node pointer with refcount
> incremented thus it must be explicitly decremented after the last
> usage.
> irq_domain_add_linear also calls of_node_get to increase refcount,
> so irq_domain will not be affected when it is released.
>
> Detected by coccinelle with the following warnings:
> ./arch/powerpc/platforms/8xx/pic.c:158:1-7: ERROR: missing of_node_put; 
> acquired a node pointer with refcount incremented on line 136, but without a 
> corresponding object release within this function.
>
> Fixes: a8db8cf0d894 ("irq_domain: Replace irq_alloc_host() with
> revmap-specific initializers")
> Signed-off-by: Wen Yang 
> Suggested-by: Christophe Leroy 
> Reviewed-by: Peng Hao 
> Cc: Vitaly Bordug 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  arch/powerpc/platforms/8xx/pic.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/arch/powerpc/platforms/8xx/pic.c 
> b/arch/powerpc/platforms/8xx/pic.c
> index 8d5a25d..4453df6 100644
> --- a/arch/powerpc/platforms/8xx/pic.c
> +++ b/arch/powerpc/platforms/8xx/pic.c
> @@ -153,9 +153,7 @@ int mpc8xx_pic_init(void)
>   if (mpc8xx_pic_host == NULL) {
>   printk(KERN_ERR "MPC8xx PIC: failed to allocate irq host!\n");
>   ret = -ENOMEM;
> - goto out;
>   }
> - return 0;

It's fragile to rely on ret being equal to zero here. If the code
further up is changed it's easy enough to miss that ret is expected to
be zero.

Better to set it to zero here explicitly, this is the success path after
all, eg:

ret = 0;

>  out:
>   of_node_put(np);


cheers

[GIT PULL] Please pull powerpc/linux.git powerpc-5.1-3 tag

2019-03-22 Thread Michael Ellerman

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Linus,

Please pull some powerpc fixes for 5.1:

The following changes since commit 9e98c678c2d6ae3a17cb2de55d17f69dddaa231b:

  Linux 5.1-rc1 (2019-03-17 14:22:26 -0700)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-5.1-3

for you to fetch changes up to 92edf8df0ff2ae86cc632eeca0e651fd8431d40d:

  powerpc/security: Fix spectre_v2 reporting (2019-03-21 21:09:03 +1100)

- --
powerpc fixes for 5.1 #3

One fix for a boot failure on 32-bit, introduced during the merge window.

A fix for our handling of CLOCK_MONOTONIC in the 64-bit VDSO. Changing the wall
clock across the Y2038 boundary could cause CLOCK_MONOTONIC to jump forward and
backward.

Our spectre_v2 reporting was a bit confusing due to a bug I introduced. On some
systems it was reporting that the count cache was disabled and also that we were
flushing the count cache on context switch. Only the former is true, and given
that the count cache is disabled it doesn't make any sense to flush it. No one
reported it, so presumably the presence of any mitigation is all people check
for.

Finally a small build fix for zsmalloc on 32-bit.

Thanks to:
  Ben Hutchings, Christophe Leroy, Diana Craciun, Guenter Roeck, Michael 
Neuling.

- --
Ben Hutchings (1):
  powerpc/mm: Only define MAX_PHYSMEM_BITS in SPARSEMEM configurations

Christophe Leroy (1):
  powerpc/6xx: fix setup and use of SPRN_SPRG_PGDIR for hash32

Michael Ellerman (2):
  powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038
  powerpc/security: Fix spectre_v2 reporting


 arch/powerpc/include/asm/mmu.h|  2 +-
 arch/powerpc/include/asm/vdso_datapage.h  |  8 
 arch/powerpc/kernel/cpu_setup_6xx.S   |  3 ---
 arch/powerpc/kernel/head_32.S |  6 ++
 arch/powerpc/kernel/security.c| 23 ---
 arch/powerpc/kernel/vdso64/gettimeofday.S |  4 ++--
 arch/powerpc/mm/hash_low_32.S |  8 
 7 files changed, 25 insertions(+), 29 deletions(-)
-BEGIN PGP SIGNATURE-

iQIcBAEBAgAGBQJclNsLAAoJEFHr6jzI4aWAQlAQAKgQgvPoMIBKObYsjQ6KK6iN
lgXCrU+WuI+7f6kMMeFWP8sOeT4nA+f7ejoXJDog96Z3oBG63sZuILTlifFetFbp
0ptJA+AC0DS0k//yv2pMweZVWq2jR7Jfr213inZxwJW6NzvI9m54QK5eUXv++dkk
Q0H/PhMxNTnP0HKBYRWKSkBSvhCZd6zex5hRZFkXVfDwe6fhpYSkObInGlt2rN4s
u3NJIZLS1zqYOyx/VPwkUCsePmdqdR0/qFBYT191iFce3lmdrKociFt9/mJKkqj6
DYVbJljxJtoZ0iIztHdStvHBpbC0kaaUHTNnKEjX2Q2xL7oitiOyOa5gT98Cs8Q1
ZHfNPidZhyhdRRwIgpDKECIE7xldhG/4icTg0a7LnufjpVrbc8idUU8Hm2oaGuvu
SlytOO0AAepPFqsTy/IeKo5cT1TNhjqcPe0twxx5nOHtaY0vhtZ9azn6B/2AFfIb
y/hX6rFoIOzBDlmdRS63EdtQr9negUilUvonCWkY5luo2ypvGcNVnxdrTQww5yPA
geahb+HRm+dJb22lVp6sONHXJZRfZijBi6jJPPhSRjAVmaibCUvxYiN9MvSFQT/U
iwsMtS4dpvX8WISijEfCuiZNBEjGSoUQEIwPWqRtgaqZbHYfBFZkwrb6yGF/YjmP
wVjdhrCcut+qz2NbbDlv
=PXK6
-END PGP SIGNATURE-

Re: [PATCH] printk: Add caller information to printk() output.

2019-03-21 Thread Michael Ellerman

Petr Mladek  writes:
> On Thu 2019-03-21 13:59:53, Michael Ellerman wrote:
>> Tetsuo Handa  writes:
>> ...
>> > From 91f85d2bd494df2f73c605d8b4747e8cc0a61ae2 Mon Sep 17 00:00:00 2001
>> > From: Tetsuo Handa 
>> > Date: Tue, 18 Dec 2018 05:53:04 +0900
>> > Subject: [PATCH] printk: Add caller information to printk() output.
>> >
>> > Sometimes we want to print a series of printk() messages to consoles
>> > without being disturbed by concurrent printk() from interrupts and/or
>> > other threads. But we can't enforce printk() callers to use their local
>> > buffers because we need to ask them to make too much changes. Also, even
>> > buffering up to one line inside printk() might cause failing to emit
>> > an important clue under critical situation.
>> >
>> > Therefore, instead of trying to help buffering, let's try to help
>> > reconstructing messages by saving caller information as of calling
>> > log_store() and adding it as "[T$thread_id]" or "[C$processor_id]"
>> > upon printing to consoles.
>> >
>> > Some examples for console output:
>> >
>> >   [1.222773][T1] x86: Booting SMP configuration:
>> >   [2.779635][T1] pci :00:01.0: PCI bridge to [bus 01]
>> >   [5.069193][  T268] Fusion MPT base driver 3.04.20
>> >   [9.316504][C2] random: fast init done
>> >   [   13.413336][ T3355] Initialized host personality
>> >
>> > Some examples for /dev/kmsg output:
>> >
>> >   6,496,1222773,-,caller=T1;x86: Booting SMP configuration:
>> >   6,968,2779635,-,caller=T1;pci :00:01.0: PCI bridge to [bus 01]
>> >SUBSYSTEM=pci
>> >DEVICE=+pci::00:01.0
>> >   6,1353,5069193,-,caller=T268;Fusion MPT base driver 3.04.20
>> >   5,1526,9316504,-,caller=C2;random: fast init done
>> >   6,1575,13413336,-,caller=T3355;Initialized host personality
>> >
>> > Note that this patch changes max length of messages which can be printed
>> > by printk() or written to /dev/kmsg interface from 992 bytes to 976 bytes,
>> > based on an assumption that userspace won't try to write messages hitting
>> > that border line to /dev/kmsg interface.
>> 
>> Do you have any plans to update dmesg or other userspace tools to show
>> the caller information?
>
> dmesg already works via the syslog interface, try dmesg -S.

Oh nice, thanks.

> The current implementation does not pass the information into
> the /dev/kmsg interface. It has the following format:
>
> ",,,[,additional_values, ... 
> ];\n"
>
> It would be possible to add it as an additional value. To be hones
> I am not sure how they are handled by userspace tools.

OK, yeah I guess there's the potential for a backward compatibility can
of worms there.

I'm happy with dmesg -S :)

cheers

Re: [PATCH] printk: Add caller information to printk() output.

2019-03-20 Thread Michael Ellerman

Hi Tetsuo,

Thanks for implementing this, it's really helpful.

Tetsuo Handa  writes:
...
> From 91f85d2bd494df2f73c605d8b4747e8cc0a61ae2 Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa 
> Date: Tue, 18 Dec 2018 05:53:04 +0900
> Subject: [PATCH] printk: Add caller information to printk() output.
>
> Sometimes we want to print a series of printk() messages to consoles
> without being disturbed by concurrent printk() from interrupts and/or
> other threads. But we can't enforce printk() callers to use their local
> buffers because we need to ask them to make too much changes. Also, even
> buffering up to one line inside printk() might cause failing to emit
> an important clue under critical situation.
>
> Therefore, instead of trying to help buffering, let's try to help
> reconstructing messages by saving caller information as of calling
> log_store() and adding it as "[T$thread_id]" or "[C$processor_id]"
> upon printing to consoles.
>
> Some examples for console output:
>
>   [1.222773][T1] x86: Booting SMP configuration:
>   [2.779635][T1] pci :00:01.0: PCI bridge to [bus 01]
>   [5.069193][  T268] Fusion MPT base driver 3.04.20
>   [9.316504][C2] random: fast init done
>   [   13.413336][ T3355] Initialized host personality
>
> Some examples for /dev/kmsg output:
>
>   6,496,1222773,-,caller=T1;x86: Booting SMP configuration:
>   6,968,2779635,-,caller=T1;pci :00:01.0: PCI bridge to [bus 01]
>SUBSYSTEM=pci
>DEVICE=+pci::00:01.0
>   6,1353,5069193,-,caller=T268;Fusion MPT base driver 3.04.20
>   5,1526,9316504,-,caller=C2;random: fast init done
>   6,1575,13413336,-,caller=T3355;Initialized host personality
>
> Note that this patch changes max length of messages which can be printed
> by printk() or written to /dev/kmsg interface from 992 bytes to 976 bytes,
> based on an assumption that userspace won't try to write messages hitting
> that border line to /dev/kmsg interface.

Do you have any plans to update dmesg or other userspace tools to show
the caller information?

cheers

Re: powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038

2019-03-20 Thread Michael Ellerman

On Wed, 2019-03-13 at 13:14:38 UTC, Michael Ellerman wrote:
> Jakub Drnec reported:
>   Setting the realtime clock can sometimes make the monotonic clock go
>   back by over a hundred years. Decreasing the realtime clock across
>   the y2k38 threshold is one reliable way to reproduce. Allegedly this
>   can also happen just by running ntpd, I have not managed to
>   reproduce that other than booting with rtc at >2038 and then running
>   ntp. When this happens, anything with timers (e.g. openjdk) breaks
>   rather badly.
> 
> And included a test case (slightly edited for brevity):
>   #define _POSIX_C_SOURCE 199309L
>   #include 
>   #include 
>   #include 
>   #include 
> 
>   long get_time(void) {
> struct timespec tp;
> clock_gettime(CLOCK_MONOTONIC, &tp);
> return tp.tv_sec + tp.tv_nsec / 10;
>   }
> 
>   int main(void) {
> long last = get_time();
> while(1) {
>   long now = get_time();
>   if (now < last) {
> printf("clock went backwards by %ld seconds!\n", last - now);
>   }
>   last = now;
>   sleep(1);
> }
> return 0;
>   }
> 
> Which when run concurrently with:
>  # date -s 2040-1-1
>  # date -s 2037-1-1
> 
> Will detect the clock going backward.
> 
> The root cause is that wtom_clock_sec in struct vdso_data is only a
> 32-bit signed value, even though we set its value to be equal to
> tk->wall_to_monotonic.tv_sec which is 64-bits.
> 
> Because the monotonic clock starts at zero when the system boots the
> wall_to_montonic.tv_sec offset is negative for current and future
> dates. Currently on a freshly booted system the offset will be in the
> vicinity of negative 1.5 billion seconds.
> 
> However if the wall clock is set past the Y2038 boundary, the offset
> from wall to monotonic becomes less than negative 2^31, and no longer
> fits in 32-bits. When that value is assigned to wtom_clock_sec it is
> truncated and becomes positive, causing the VDSO assembly code to
> calculate CLOCK_MONOTONIC incorrectly.
> 
> That causes CLOCK_MONOTONIC to jump ahead by ~4 billion seconds which
> it is not meant to do. Worse, if the time is then set back before the
> Y2038 boundary CLOCK_MONOTONIC will jump backward.
> 
> We can fix it simply by storing the full 64-bit offset in the
> vdso_data, and using that in the VDSO assembly code. We also shuffle
> some of the fields in vdso_data to avoid creating a hole.
> 
> The original commit that added the CLOCK_MONOTONIC support to the VDSO
> did actually use a 64-bit value for wtom_clock_sec, see commit
> a7f290dad32e ("[PATCH] powerpc: Merge vdso's and add vdso support to
> 32 bits kernel") (Nov 2005). However just 3 days later it was
> converted to 32-bits in commit 0c37ec2aa88b ("[PATCH] powerpc: vdso
> fixes (take #2)"), and the bug has existed since then AFAICS.
> 
> Fixes: 0c37ec2aa88b ("[PATCH] powerpc: vdso fixes (take #2)")
> Cc: sta...@vger.kernel.org # v2.6.15+
> Link: http://lkml.kernel.org/r/hac.zfes.62bwlnvavmp.1st...@seznam.cz
> Reported-by: Jakub Drnec 
> Signed-off-by: Michael Ellerman 

Applied to powerpc fixes.

https://git.kernel.org/powerpc/c/b5b4453e7912f056da1ca7572574cada

cheers

Re: [PATCH] powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038

2019-03-14 Thread Michael Ellerman

Arnd Bergmann  writes:
> On Wed, Mar 13, 2019 at 2:14 PM Michael Ellerman  wrote:
>
>> That causes CLOCK_MONOTONIC to jump ahead by ~4 billion seconds which
>> it is not meant to do. Worse, if the time is then set back before the
>> Y2038 boundary CLOCK_MONOTONIC will jump backward.
>>
>> We can fix it simply by storing the full 64-bit offset in the
>> vdso_data, and using that in the VDSO assembly code. We also shuffle
>> some of the fields in vdso_data to avoid creating a hole.
>
> I see nothing wrong with your patch,

Thanks.

> but I would point out that there is a patch series [1] from Vincenzo
> Frascino to unify the vdso implementation across architectures that I
> hope can make it into linux-5.2, and that will resolve this issue, as
> well as allow 32-bit architectures to provide a working interface with
> 64-bit time_t.

Yeah I did see that series. I will try and keep an eye on it, though I'm
not sure I'll have time to convert powerpc to use it for 5.2.

I'm also not sure how easy it's going to be to convert to the C
versions, because our syscall ABI is not a simple function call (result
code is returned in CR0.SO).

cheers

[PATCH] powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038

2019-03-13 Thread Michael Ellerman

Jakub Drnec reported:
  Setting the realtime clock can sometimes make the monotonic clock go
  back by over a hundred years. Decreasing the realtime clock across
  the y2k38 threshold is one reliable way to reproduce. Allegedly this
  can also happen just by running ntpd, I have not managed to
  reproduce that other than booting with rtc at >2038 and then running
  ntp. When this happens, anything with timers (e.g. openjdk) breaks
  rather badly.

And included a test case (slightly edited for brevity):
  #define _POSIX_C_SOURCE 199309L
  #include 
  #include 
  #include 
  #include 

  long get_time(void) {
struct timespec tp;
clock_gettime(CLOCK_MONOTONIC, &tp);
return tp.tv_sec + tp.tv_nsec / 10;
  }

  int main(void) {
long last = get_time();
while(1) {
  long now = get_time();
  if (now < last) {
printf("clock went backwards by %ld seconds!\n", last - now);
  }
  last = now;
  sleep(1);
}
return 0;
  }

Which when run concurrently with:
 # date -s 2040-1-1
 # date -s 2037-1-1

Will detect the clock going backward.

The root cause is that wtom_clock_sec in struct vdso_data is only a
32-bit signed value, even though we set its value to be equal to
tk->wall_to_monotonic.tv_sec which is 64-bits.

Because the monotonic clock starts at zero when the system boots the
wall_to_montonic.tv_sec offset is negative for current and future
dates. Currently on a freshly booted system the offset will be in the
vicinity of negative 1.5 billion seconds.

However if the wall clock is set past the Y2038 boundary, the offset
from wall to monotonic becomes less than negative 2^31, and no longer
fits in 32-bits. When that value is assigned to wtom_clock_sec it is
truncated and becomes positive, causing the VDSO assembly code to
calculate CLOCK_MONOTONIC incorrectly.

That causes CLOCK_MONOTONIC to jump ahead by ~4 billion seconds which
it is not meant to do. Worse, if the time is then set back before the
Y2038 boundary CLOCK_MONOTONIC will jump backward.

We can fix it simply by storing the full 64-bit offset in the
vdso_data, and using that in the VDSO assembly code. We also shuffle
some of the fields in vdso_data to avoid creating a hole.

The original commit that added the CLOCK_MONOTONIC support to the VDSO
did actually use a 64-bit value for wtom_clock_sec, see commit
a7f290dad32e ("[PATCH] powerpc: Merge vdso's and add vdso support to
32 bits kernel") (Nov 2005). However just 3 days later it was
converted to 32-bits in commit 0c37ec2aa88b ("[PATCH] powerpc: vdso
fixes (take #2)"), and the bug has existed since then AFAICS.

Fixes: 0c37ec2aa88b ("[PATCH] powerpc: vdso fixes (take #2)")
Cc: sta...@vger.kernel.org # v2.6.15+
Link: http://lkml.kernel.org/r/hac.zfes.62bwlnvavmp.1st...@seznam.cz
Reported-by: Jakub Drnec 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/vdso_datapage.h  | 8 
 arch/powerpc/kernel/vdso64/gettimeofday.S | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/vdso_datapage.h 
b/arch/powerpc/include/asm/vdso_datapage.h
index 1afe90ade595..bbc06bd72b1f 100644
--- a/arch/powerpc/include/asm/vdso_datapage.h
+++ b/arch/powerpc/include/asm/vdso_datapage.h
@@ -82,10 +82,10 @@ struct vdso_data {
__u32 icache_block_size;/* L1 i-cache block size */
__u32 dcache_log_block_size;/* L1 d-cache log block size */
__u32 icache_log_block_size;/* L1 i-cache log block size */
-   __s32 wtom_clock_sec;   /* Wall to monotonic clock */
-   __s32 wtom_clock_nsec;
-   struct timespec stamp_xtime;/* xtime as at tb_orig_stamp */
-   __u32 stamp_sec_fraction;   /* fractional seconds of stamp_xtime */
+   __u32 stamp_sec_fraction;   /* fractional seconds of 
stamp_xtime */
+   __s32 wtom_clock_nsec;  /* Wall to monotonic clock nsec 
*/
+   __s64 wtom_clock_sec;   /* Wall to monotonic clock sec 
*/
+   struct timespec stamp_xtime;/* xtime as at tb_orig_stamp */
__u32 syscall_map_64[SYSCALL_MAP_SIZE]; /* map of syscalls  */
__u32 syscall_map_32[SYSCALL_MAP_SIZE]; /* map of syscalls */
 };
diff --git a/arch/powerpc/kernel/vdso64/gettimeofday.S 
b/arch/powerpc/kernel/vdso64/gettimeofday.S
index a4ed9edfd5f0..1f324c28705b 100644
--- a/arch/powerpc/kernel/vdso64/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso64/gettimeofday.S
@@ -92,7 +92,7 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
 * At this point, r4,r5 contain our sec/nsec values.
 */
 
-   lwa r6,WTOM_CLOCK_SEC(r3)
+   ld  r6,WTOM_CLOCK_SEC(r3)
lwa r9,WTOM_CLOCK_NSEC(r3)
 
/* We now have our result in r6,r9. We create a fake dependency
@@ -125,7 +125,7 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
bne cr6,75f
 
/* CLOCK_MONOTONIC_

[GIT PULL] Please pull powerpc/linux.git powerpc-5.1-1 tag

2019-03-07 Thread Michael Ellerman

spufs: use struct_size() in kmalloc()

Igor Stoppa (1):
  powerpc: remove unnecessary unlikely()

Joe Lawrence (4):
  powerpc/livepatch: relax reliable stack tracer checks for first-frame
  powerpc/livepatch: small cleanups in save_stack_trace_tsk_reliable()
  powerpc/livepatch: return -ERRNO values in save_stack_trace_tsk_reliable()
  powerpc: Remove export of save_stack_trace_tsk_reliable()

Joel Stanley (2):
  powerpc: Use ALIGN instead of BLOCK
  powerpc/32: Include .branch_lt in data section

Jonathan Neuschäfer (2):
  powerpc: wii.dts: Add interrupt-related properties to GPIO node
  powerpc: wii.dts: Add GPIO keys

Jordan Niethe (1):
  powerpc/powernv: Make opal log only readable by root

Madhavan Srinivasan (1):
  powerpc/perf: Add mem access events to sysfs

Mark Cave-Ayland (1):
  powerpc: Fix 32-bit KVM-PR lockup and host crash with MacOS guest

Masahiro Yamada (3):
  KVM: powerpc: remove -I. header search paths
  powerpc: remove redundant header search path additions
  powerpc: math-emu: remove unneeded header search paths

Mathieu Malaterre (3):
  powerpc: Allow CPU selection of G4/74xx variant
  powerpc: Remove trailing semicolon after curly brace
  Move static keyword at beginning of declaration

Matteo Croce (1):
  powerpc/hvsi: Fix spelling mistake: "lenght" should be "length"

Michael Ellerman (19):
  powerpc: Stop using pr_cont() in __die()
  powerpc: Show PAGE_SIZE in __die() output
  powerpc/64s: Add MMU type to __die() output
  Merge branch 'fixes' into next
  KVM: PPC: Book3S HV: Context switch AMR on Power9
  Merge branch 'topic/dma' into next
  Merge branch 'topic/ppc-kvm' into next
  powerpc/ptrace: Simplify vr_get/set() to avoid GCC warning
  powerpc/mm/hash: Increase vmalloc space to 512T with hash MMU
  powerpc/44x: Force PCI on for CURRITUCK
  powerpc/64: Make sys_switch_endian() traceable
  powerpc: Make PPC_64K_PAGES depend on only 44x or PPC_BOOK3S_64
  powerpc/64s: Fix logic when handling unknown CPU features
  powerpc/kvm: Save and restore host AMR/IAMR/UAMOR
  Revert "powerpc/book3s32: Reorder _PAGE_XXX flags to simplify TLB 
handling"
  powerpc/64s: Remove MSR_RI optimisation in system_call_exit()
  powerpc/64: Simplify __secondary_start paca->kstack handling
  selftests/powerpc: Remove duplicate header
  Merge branch 'topic/ppc-kvm' into next

Nathan Chancellor (1):
  powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpc

Nathan Fontenot (1):
  powerpc/pseries: Perform full re-add of CPU for topology update 
post-migration

Nicholas Piggin (10):
  powerpc/64s/hash: Fix assert_slb_presence() use of the slbfee. instruction
  powerpc/smp: Fix NMI IPI timeout
  powerpc/smp: Fix NMI IPI xmon timeout
  powerpc/smp: Make __smp_send_nmi_ipi() static
  powerpc/64s: Fix HV NMI vs HV interrupt recoverability test
  powerpc/64s: system reset interrupt preserve HSRRs
  powerpc/64s: Prepare to handle data interrupts vs d-side MCE reentrancy
  powerpc/64s: Fix data interrupts vs d-side MCE reentrancy
  powerpc/powernv: move OPAL call wrapper tracing and interrupt handling to 
C
  powerpc/64s: Fix unrelocated interrupt trampoline address test

Nicolai Stange (2):
  powerpc/64s: Clear on-stack exception marker upon exception return
  powerpc/64s: Make reliable stacktrace dependency clearer

Oliver O'Halloran (8):
  powerpc/powernv: Escalate reset when IODA reset fails
  powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes
  powerpc/eeh_cache: Add pr_debug() prints for insert/remove
  powerpc/eeh_cache: Add a way to dump the EEH address cache
  powerpc/eeh_cache: Bump log level of eeh_addr_cache_print()
  powerpc/pci: Add pci_find_controller_for_domain()
  powerpc/eeh: Allow disabling recovery
  powerpc/eeh: Add eeh_force_recover to debugfs

Paul Mackerras (3):
  KVM: PPC: Book3S HV: Simplify machine check handling
  powerpc/64s: Better printing of machine check info for guest MCEs
  powerpc/powernv: Don't reprogram SLW image on every KVM guest entry/exit

Peter Xu (1):
  powerpc/powernv/npu: Remove redundant change_pte() hook

PrasannaKumar Muralidharan (1):
  powerpc sstep: Add support for modsw, moduw instructions

Qian Cai (2):
  powerpc/mm: Fix "sz" set but not used warning
  powerpc/mm: fix "section_base" set but not used

Rashmica Gupta (1):
  powerpc/mm: Check secondary hash page table

Reza Arbab (1):
  powerpc/mm: Add _PAGE_SAO to _PAGE_CACHE_CTL mask

Robert P. J. Day (1):
  powerpc/dts: Standardize DTS status assignments from "ok" to "okay"

Sabyasachi Gupta (2):
  powerpc/powernv: Remove duplicate header
  powerpc/cell: Remove duplicate header

Sam

Re: [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking

2019-03-06 Thread Michael Ellerman

Linus Torvalds  writes:
> On Mon, Mar 4, 2019 at 2:24 AM Michael Ellerman  wrote:
>>
>> Without wading into the rest of the discussion, this does raise an
>> interesting point, ie. what about eg. rwlock's?
>>
>> They're basically equivalent to spinlocks, and so could reasonably be
>> expected to have the same behaviour.
>>
>> But we don't check the io_sync flag in arch_read/write_unlock() etc. and
>> both of those use lwsync.
>
> I think technically rwlocks should do the same thing, at least when
> they are used for exclusion.

OK.

> Because of the exclusion argument, we can presubably limit it to just
> write_unlock(), although at least in theory I guess you could have
> some "one reader does IO, then a writer comes in" situation..

It's a bit hard to grep for, but I did find one case:

static void netxen_nic_io_write_128M(struct netxen_adapter *adapter,
void __iomem *addr, u32 data)
{
read_lock(&adapter->ahw.crb_lock);
writel(data, addr);
read_unlock(&adapter->ahw.crb_lock);
}

It looks like that driver is using the rwlock to exclude cases that can
just do a readl()/writel() (readers) vs another case that has to reconfigure a
window or something, before doing readl()/writel() and then configuring
the window back. So that seems like a valid use for a rwlock.

Whether we want to penalise all read_unlock() usages with a mmiowb()
check just to support that one driver is another question.

> Perhaps more importantly, what about sleeping locks? When they
> actually *block*, they get the barrier thanks to the scheduler, but
> you can have a nice non-contended sequence that never does that.

Yeah.

The mutex unlock fast path is just:

if (atomic_long_cmpxchg_release(&lock->owner, curr, 0UL) == curr)
return true;

And because it's the "release" variant we just use lwsync, which doesn't
order MMIO. If it was just atomic_long_cmpxchg() that would work because
we use sync for those.

__up_write() uses atomic_long_sub_return_release(), so same story.

> I guess the fact that these cases have never even shown up as an issue
> means that we could just continue to ignore it.
>
> We could even give that approach some fancy name, and claim it as a
> revolutionary new programming paradigm ("ostrich programming" to go
> with "agile" and "pair programming").

Maybe. On power we have the double whammy of weaker ordering than
other arches and infinitesimal market share, which makes me worry that
there are bugs lurking that we just haven't found, it's happened before.

cheers

Re: [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking

2019-03-04 Thread Michael Ellerman

Nicholas Piggin  writes:
> Will Deacon's on March 2, 2019 12:03 am:
>> In preparation for removing all explicit mmiowb() calls from driver
>> code, implement a tracking system in asm-generic based loosely on the
>> PowerPC implementation. This allows architectures with a non-empty
>> mmiowb() definition to have the barrier automatically inserted in
>> spin_unlock() following a critical section containing an I/O write.
>
> Is there a reason to call this "mmiowb"? We already have wmb that
> orders cacheable stores vs mmio stores don't we?
>
> Yes ia64 "sn2" is broken in that case, but that can be fixed (if
> anyone really cares about the platform any more). Maybe that's
> orthogonal to what you're doing here, I just don't like seeing
> "mmiowb" spread.
>
> This series works for spin locks, but you would want a driver to
> be able to use wmb() to order locks vs mmio when using a bit lock
> or a mutex or whatever else.

Without wading into the rest of the discussion, this does raise an
interesting point, ie. what about eg. rwlock's?

They're basically equivalent to spinlocks, and so could reasonably be
expected to have the same behaviour.

But we don't check the io_sync flag in arch_read/write_unlock() etc. and
both of those use lwsync.

Seems like we just forgot they existed? Commit f007cacffc88 ("[POWERPC]
Fix MMIO ops to provide expected barrier behaviour") that added the
io_sync stuff doesn't mention them at all.

Am I missing anything? AFAICS read/write locks were never built on top
of spin locks, so seems like we're just hoping drivers using rwlock do
the right barriers?

cheers

Re: [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking

2019-03-03 Thread Michael Ellerman

Nicholas Piggin  writes:
> Michael Ellerman's on March 3, 2019 7:26 pm:
>> Nicholas Piggin  writes:
...
>>> what was broken about the powerpc one, which is basically:
>>>
>>> static inline void mmiowb_set_pending(void)
>>> {
>>> struct mmiowb_state *ms = __mmiowb_state();
>>> ms->mmiowb_pending = 1;
>>> }
>>>
>>> static inline void mmiowb_spin_lock(void)
>>> {
>>> }
>> 
>> The current powerpc code clears io_sync in spin_lock().
>> 
>> ie, it would be equivalent to:
>> 
>> static inline void mmiowb_spin_lock(void)
>> {
>>  ms->mmiowb_pending = 0;
>> }
>
> Ah okay that's what I missed. How about we just not do that?

Yeah I thought of that too but it's not great. We'd start semi-randomly
executing the sync in unlock depending on whether someone had done IO on
that CPU prior to the spinlock.

eg.

writel(x, y);   // sets paca->io_sync
... 

spin_lock(a);
...
// No IO in here
...
spin_unlock(a); // sync() here because other task did writel().

Which wouldn't be *incorrect*, but would be kind of weird.

cheers

Re: [v2] powerpc/mm: fix "section_base" set but not used

2019-03-03 Thread Michael Ellerman

On Fri, 2019-03-01 at 14:20:40 UTC, Qian Cai wrote:
> The commit 24b6d4164348 ("mm: pass the vmem_altmap to vmemmap_free")
> removed a line in vmemmap_free(),
> 
> altmap = to_vmem_altmap((unsigned long) section_base);
> 
> but left a variable no longer used.
> 
> arch/powerpc/mm/init_64.c: In function 'vmemmap_free':
> arch/powerpc/mm/init_64.c:277:16: error: variable 'section_base' set but
> not used [-Werror=unused-but-set-variable]
> 
> Signed-off-by: Qian Cai 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/c38ca265525a00d635219450e8fcc858

cheers

Re: powerpc/32: Clear on-stack exception marker upon exception return

2019-03-03 Thread Michael Ellerman

On Wed, 2019-02-27 at 11:45:30 UTC, Christophe Leroy wrote:
> Clear the on-stack STACK_FRAME_REGS_MARKER on exception exit in order
> to avoid confusing stacktrace like the one below.
> 
> Call Trace:
> [c0e9dca0] [c01c42a0] print_address_description+0x64/0x2bc (unreliable)
> [c0e9dcd0] [c01c4684] kasan_report+0xfc/0x180
> [c0e9dd10] [c0895130] memchr+0x24/0x74
> [c0e9dd30] [c00a9e38] msg_print_text+0x124/0x574
> [c0e9dde0] [c00ab710] console_unlock+0x114/0x4f8
> [c0e9de40] [c00adc60] vprintk_emit+0x188/0x1c4
> --- interrupt: c0e9df00 at 0x400f330
> LR = init_stack+0x1f00/0x2000
> [c0e9de80] [c00ae3c4] printk+0xa8/0xcc (unreliable)
> [c0e9df20] [c0c27e44] early_irq_init+0x38/0x108
> [c0e9df50] [c0c15434] start_kernel+0x310/0x488
> [c0e9dff0] [3484] 0x3484
> 
> With this patch the trace becomes:
> 
> Call Trace:
> [c0e9dca0] [c01c42c0] print_address_description+0x64/0x2bc (unreliable)
> [c0e9dcd0] [c01c46a4] kasan_report+0xfc/0x180
> [c0e9dd10] [c0895150] memchr+0x24/0x74
> [c0e9dd30] [c00a9e58] msg_print_text+0x124/0x574
> [c0e9dde0] [c00ab730] console_unlock+0x114/0x4f8
> [c0e9de40] [c00adc80] vprintk_emit+0x188/0x1c4
> [c0e9de80] [c00ae3e4] printk+0xa8/0xcc
> [c0e9df20] [c0c27e44] early_irq_init+0x38/0x108
> [c0e9df50] [c0c15434] start_kernel+0x310/0x488
> [c0e9dff0] [3484] 0x3484
> 
> Cc: sta...@vger.kernel.org
> Cc: Nicolai Stange 
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/9580b71b5a7863c24a9bd18bcd2ad759

cheers

Re: powerpc: fix "sz" set but not used

2019-03-03 Thread Michael Ellerman

On Thu, 2019-02-28 at 02:35:05 UTC, Qian Cai wrote:
> arch/powerpc/mm/hugetlbpage-hash64.c: In function '__hash_page_huge':
> arch/powerpc/mm/hugetlbpage-hash64.c:29:28: warning: variable 'sz' set
> but not used [-Wunused-but-set-variable]
> 
> Signed-off-by: Qian Cai 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/8132cf115efc3b3684bb5fd3bfdf6860

cheers

Re: [1/2] powerpc: remove nargs from __SYSCALL

2019-03-03 Thread Michael Ellerman

On Wed, 2019-01-02 at 15:02:03 UTC, Firoz Khan wrote:
> The __SYSCALL macro's arguments are system call number,
> system call entry name and number of arguments for the
> system call.
> 
> Argument- nargs in __SYSCALL(nr, entry, nargs) is neither
> calculated nor used anywhere. So it would be better to
> keep the implementaion as  __SYSCALL(nr, entry). This will
> unifies the implementation with some other architetures
> too.
> 
> Signed-off-by: Firoz Khan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/6b1200facc051a3e487a52cbabd745f7

cheers

Re: [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking

2019-03-03 Thread Michael Ellerman

Nicholas Piggin  writes:
> Will Deacon's on March 2, 2019 12:03 am:
>> In preparation for removing all explicit mmiowb() calls from driver
>> code, implement a tracking system in asm-generic based loosely on the
>> PowerPC implementation. This allows architectures with a non-empty
>> mmiowb() definition to have the barrier automatically inserted in
>> spin_unlock() following a critical section containing an I/O write.
>
> Is there a reason to call this "mmiowb"? We already have wmb that
> orders cacheable stores vs mmio stores don't we?
>
> Yes ia64 "sn2" is broken in that case, but that can be fixed (if
> anyone really cares about the platform any more). Maybe that's
> orthogonal to what you're doing here, I just don't like seeing
> "mmiowb" spread.
>
> This series works for spin locks, but you would want a driver to
> be able to use wmb() to order locks vs mmio when using a bit lock
> or a mutex or whatever else. Calling your wmb-if-io-is-pending
> version io_mb_before_unlock() would kind of match with existing
> patterns.
>
>> +static inline void mmiowb_set_pending(void)
>> +{
>> +struct mmiowb_state *ms = __mmiowb_state();
>> +ms->mmiowb_pending = ms->nesting_count;
>> +}
>> +
>> +static inline void mmiowb_spin_lock(void)
>> +{
>> +struct mmiowb_state *ms = __mmiowb_state();
>> +ms->nesting_count++;
>> +}
>> +
>> +static inline void mmiowb_spin_unlock(void)
>> +{
>> +struct mmiowb_state *ms = __mmiowb_state();
>> +
>> +if (unlikely(ms->mmiowb_pending)) {
>> +ms->mmiowb_pending = 0;
>> +mmiowb();
>> +}
>> +
>> +ms->nesting_count--;
>> +}
>
> Humour me for a minute and tell me what this algorithm is doing, or
> what was broken about the powerpc one, which is basically:
>
> static inline void mmiowb_set_pending(void)
> {
>   struct mmiowb_state *ms = __mmiowb_state();
>   ms->mmiowb_pending = 1;
> }
>
> static inline void mmiowb_spin_lock(void)
> {
> }

The current powerpc code clears io_sync in spin_lock().

ie, it would be equivalent to:

static inline void mmiowb_spin_lock(void)
{
ms->mmiowb_pending = 0;
}

Which means that:

spin_lock(a);
writel(x, y);
spin_lock(b);
...
spin_unlock(b);
spin_unlock(a);

Does no barrier.

cheers

< 5 6 7 8 9 10 11 12 13 14 >

901 - 1000 of 2917 matches

Mail list logo