Re: [PATCH] powerpc/Makefiles: Fix clang/llvm build

2018-08-20 Thread Anton Blanchard
Hi Michael,

> This breaks GCC 4.6.3 at least, which we still support:
> 
>   Assembler messages:
>   Error: invalid switch -mpower8
>   Error: unrecognized option -mpower8
>   ../scripts/mod/empty.c:1:0: fatal error: error closing -: Broken
> pipe

Yuck. We have POWER8 instructions in our assembly code, and a toolchain
that doesn't understand the -mpower8 flag, but has support for
power8 instructions.

This is what I see on clang with -mpower7:

/tmp/sstep-2afa55.s:7584: Error: unrecognized opcode: `lbarx'
/tmp/sstep-2afa55.s:7626: Error: unrecognized opcode: `stbcx.'
/tmp/sstep-2afa55.s:8077: Error: unrecognized opcode: `lharx'
/tmp/sstep-2afa55.s:8140: Error: unrecognized opcode: `stbcx.'

Nick: any suggestions?

Thanks,
Anton


Re: [PATCH] powerpc/Makefiles: Fix clang/llvm build

2018-08-20 Thread Michael Ellerman
Anton Blanchard  writes:

> From: Anton Blanchard 
>
> Commit 15a3204d24a3 ("powerpc/64s: Set assembler machine type to POWER4")
> passes -mpower4 to the assembler. We have more recent instructions in our
> assembly files, but gas permits them. The clang/llvm integrated assembler
> is more strict, and we get a build failure.
>
> Fix this by calling the assembler with -mcpu=power8

This breaks GCC 4.6.3 at least, which we still support:

  Assembler messages:
  Error: invalid switch -mpower8
  Error: unrecognized option -mpower8
  ../scripts/mod/empty.c:1:0: fatal error: error closing -: Broken pipe


cheers


[PATCH] powerpc/64s/hash: convert SLB miss handlers to C

2018-08-20 Thread Nicholas Piggin
This patch moves SLB miss handlers completely to C, using the standard
exception handler macros to set up the stack and branch to C.

This can be done because the segment containing the kernel stack is
always bolted, so accessing it with relocation on will not cause an
SLB exception.

Arbitrary kernel memory may not be accessed when handling kernel space
SLB misses, so care should be taken there. However user SLB misses can
access any kernel memory, which can be used to move some fields out of
the paca (in later patches).

User SLB misses could quite easily reconcile IRQs and set up a first
class kernel environment and exit via ret_from_except, however that
doesn't seem to be necessary at the moment, so we only do that if a
bad fault is encountered.

[ Credit to Aneesh for bug fixes, error checks, and improvements to bad
  address handling, etc ]

Signed-off-by: Nicholas Piggin 

Since RFC:
- Send patch 1 by itself to focus on the big change.
- Added MSR[RI] handling
- Fixed up a register loss bug exposed by irq tracing (Aneesh)
- Reject misses outside the defined kernel regions (Aneesh)
- Added several more sanity checks and error handlig (Aneesh), we may
  look at consolidating these tests and tightenig up the code but for
  a first pass we decided it's better to check carefully.
---
 arch/powerpc/include/asm/asm-prototypes.h |   2 +
 arch/powerpc/kernel/exceptions-64s.S  | 202 +++--
 arch/powerpc/mm/Makefile  |   2 +-
 arch/powerpc/mm/slb.c | 257 +
 arch/powerpc/mm/slb_low.S | 335 --
 5 files changed, 185 insertions(+), 613 deletions(-)
 delete mode 100644 arch/powerpc/mm/slb_low.S

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 1f4691ce4126..78ed3c3f879a 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -78,6 +78,8 @@ void kernel_bad_stack(struct pt_regs *regs);
 void system_reset_exception(struct pt_regs *regs);
 void machine_check_exception(struct pt_regs *regs);
 void emulation_assist_interrupt(struct pt_regs *regs);
+long do_slb_fault(struct pt_regs *regs, unsigned long ea);
+void do_bad_slb_fault(struct pt_regs *regs, unsigned long ea, long err);
 
 /* signals, syscalls and interrupts */
 long sys_swapcontext(struct ucontext __user *old_ctx,
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index ea04dfb8c092..c3832243819b 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -566,28 +566,36 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 
 
 EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80)
-   SET_SCRATCH0(r13)
-   EXCEPTION_PROLOG_0(PACA_EXSLB)
-   EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST_PR, 0x380)
-   mr  r12,r3  /* save r3 */
-   mfspr   r3,SPRN_DAR
-   mfspr   r11,SPRN_SRR1
-   crset   4*cr6+eq
-   BRANCH_TO_COMMON(r10, slb_miss_common)
+EXCEPTION_PROLOG(PACA_EXSLB, data_access_slb_common, EXC_STD, KVMTEST_PR, 
0x380);
 EXC_REAL_END(data_access_slb, 0x380, 0x80)
 
 EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80)
-   SET_SCRATCH0(r13)
-   EXCEPTION_PROLOG_0(PACA_EXSLB)
-   EXCEPTION_PROLOG_1(PACA_EXSLB, NOTEST, 0x380)
-   mr  r12,r3  /* save r3 */
-   mfspr   r3,SPRN_DAR
-   mfspr   r11,SPRN_SRR1
-   crset   4*cr6+eq
-   BRANCH_TO_COMMON(r10, slb_miss_common)
+EXCEPTION_RELON_PROLOG(PACA_EXSLB, data_access_slb_common, EXC_STD, NOTEST, 
0x380);
 EXC_VIRT_END(data_access_slb, 0x4380, 0x80)
+
 TRAMP_KVM_SKIP(PACA_EXSLB, 0x380)
 
+EXC_COMMON_BEGIN(data_access_slb_common)
+   mfspr   r10,SPRN_DAR
+   std r10,PACA_EXSLB+EX_DAR(r13)
+   EXCEPTION_PROLOG_COMMON(0x380, PACA_EXSLB)
+   ld  r4,PACA_EXSLB+EX_DAR(r13)
+   std r4,_DAR(r1)
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  do_slb_fault
+   cmpdi   r3,0
+   bne-1f
+   b   fast_exception_return
+1: /* Error case */
+   std r3,RESULT(r1)
+   bl  save_nvgprs
+   RECONCILE_IRQ_STATE(r10, r11)
+   ld  r4,_DAR(r1)
+   ld  r5,RESULT(r1)
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  do_bad_slb_fault
+   b   ret_from_except
+
 
 EXC_REAL(instruction_access, 0x400, 0x80)
 EXC_VIRT(instruction_access, 0x4400, 0x80, 0x400)
@@ -610,160 +618,34 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 
 
 EXC_REAL_BEGIN(instruction_access_slb, 0x480, 0x80)
-   SET_SCRATCH0(r13)
-   EXCEPTION_PROLOG_0(PACA_EXSLB)
-   EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST_PR, 0x480)
-   mr  r12,r3  /* save r3 */
-   mfspr   r3,SPRN_SRR0/* SRR0 is faulting address */
-   mfspr   r11,SPRN_SRR1
-   crclr   4*cr6+eq
-   BRANCH_TO_COMMON(r10, slb_miss_common)
+EXCEPTION_PROLOG(PACA_EXSLB, instruction_access_slb_common, EXC_STD, 
KVMTEST_PR, 0x480);
 

Re: [PATCH] powerpc/powernv: Don't select the cpufreq governors

2018-08-20 Thread Benjamin Herrenschmidt
On Tue, 2018-08-21 at 11:44 +0930, Joel Stanley wrote:
> Deciding wich govenors should be built into the kernel can be left to
> users to configure.
> 
> Fixes: 81f359027a3a ("cpufreq: powernv: Select CPUFreq related Kconfig 
> options for powernv")
> Signed-off-by: Joel Stanley 

Can you add them to the defconfigs then ?

> ---
>  arch/powerpc/platforms/powernv/Kconfig | 5 -
>  1 file changed, 5 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/Kconfig 
> b/arch/powerpc/platforms/powernv/Kconfig
> index f8dc98d3dc01..028ac941c05c 100644
> --- a/arch/powerpc/platforms/powernv/Kconfig
> +++ b/arch/powerpc/platforms/powernv/Kconfig
> @@ -15,11 +15,6 @@ config PPC_POWERNV
>   select PPC_SCOM
>   select ARCH_RANDOM
>   select CPU_FREQ
> - select CPU_FREQ_GOV_PERFORMANCE
> - select CPU_FREQ_GOV_POWERSAVE
> - select CPU_FREQ_GOV_USERSPACE
> - select CPU_FREQ_GOV_ONDEMAND
> - select CPU_FREQ_GOV_CONSERVATIVE
>   select PPC_DOORBELL
>   select MMU_NOTIFIER
>   select FORCE_SMP



[PATCH] powerpc/powernv: Don't select the cpufreq governors

2018-08-20 Thread Joel Stanley
Deciding wich govenors should be built into the kernel can be left to
users to configure.

Fixes: 81f359027a3a ("cpufreq: powernv: Select CPUFreq related Kconfig options 
for powernv")
Signed-off-by: Joel Stanley 
---
 arch/powerpc/platforms/powernv/Kconfig | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/Kconfig 
b/arch/powerpc/platforms/powernv/Kconfig
index f8dc98d3dc01..028ac941c05c 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -15,11 +15,6 @@ config PPC_POWERNV
select PPC_SCOM
select ARCH_RANDOM
select CPU_FREQ
-   select CPU_FREQ_GOV_PERFORMANCE
-   select CPU_FREQ_GOV_POWERSAVE
-   select CPU_FREQ_GOV_USERSPACE
-   select CPU_FREQ_GOV_ONDEMAND
-   select CPU_FREQ_GOV_CONSERVATIVE
select PPC_DOORBELL
select MMU_NOTIFIER
select FORCE_SMP
-- 
2.17.1



[PATCH] MAINTAINERS: add maintainer entries for RPA pci hotplug drivers

2018-08-20 Thread Tyrel Datwyler
Adding myself as maintiner of the IBM RPA hotplug modules located in
drivers/pci/hotplug directory. These modules provide kernel interfaces
for support of Dynamic Logical Partitioning (DLPAR) of Logical and
Physical IO slots, and hotplug of physical PCI slots of a PHB on
RPA-compliant ppc64 platforms (pseries).

Signed-off-by: Tyrel Datwyler 
---
 MAINTAINERS | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5df1b36..7b5dc3f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6984,6 +6984,20 @@ F:   drivers/crypto/vmx/aes*
 F: drivers/crypto/vmx/ghash*
 F: drivers/crypto/vmx/ppc-xlate.pl
 
+IBM Power PCI Hotplug Driver for RPA-compliant PPC64 platform
+M: Tyrel Datwyler 
+L: linux-...@vger.kernel.org
+L: linuxppc-dev@lists.ozlabs.org
+S: Supported
+F: drivers/pci/hotplug/rpaphp*
+
+IBM Power IO DLPAR Driver for RPA-compliant PPC64 platform
+M: Tyrel Datwyler 
+L: linux-...@vger.kernel.org
+L: linuxppc-dev@lists.ozlabs.org
+S: Supported
+F: drivers/pci/hotplug/rpadlpar*
+
 IBM ServeRAID RAID DRIVER
 S: Orphan
 F: drivers/scsi/ips.*
-- 
1.8.3.1



[PATCH] powerpc/64: Remove static branch hints from memset()

2018-08-20 Thread Anton Blanchard
From: Anton Blanchard 

Static branch hints override dynamic branch prediction on recent
POWER CPUs. We should only use them when we are overwhelmingly
sure of the direction.

Signed-off-by: Anton Blanchard 
---
 arch/powerpc/lib/mem_64.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/lib/mem_64.S b/arch/powerpc/lib/mem_64.S
index ec531de6..3c3be02f33b7 100644
--- a/arch/powerpc/lib/mem_64.S
+++ b/arch/powerpc/lib/mem_64.S
@@ -40,7 +40,7 @@ _GLOBAL(memset)
 .Lms:  PPC_MTOCRF(1,r0)
mr  r6,r3
blt cr1,8f
-   beq+3f  /* if already 8-byte aligned */
+   beq 3f  /* if already 8-byte aligned */
subfr5,r0,r5
bf  31,1f
stb r4,0(r6)
@@ -85,7 +85,7 @@ _GLOBAL(memset)
addir6,r6,8
 8: cmpwi   r5,0
PPC_MTOCRF(1,r5)
-   beqlr+
+   beqlr
bf  29,9f
stw r4,0(r6)
addir6,r6,4
-- 
2.17.1



[PATCH] powerpc/Makefiles: Fix clang/llvm build

2018-08-20 Thread Anton Blanchard
From: Anton Blanchard 

Commit 15a3204d24a3 ("powerpc/64s: Set assembler machine type to POWER4")
passes -mpower4 to the assembler. We have more recent instructions in our
assembly files, but gas permits them. The clang/llvm integrated assembler
is more strict, and we get a build failure.

Fix this by calling the assembler with -mcpu=power8

Signed-off-by: Anton Blanchard 
---
 arch/powerpc/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 8397c7bd5880..4d9c01df0dec 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -238,7 +238,7 @@ cpu-as-$(CONFIG_4xx)+= -Wa,-m405
 cpu-as-$(CONFIG_ALTIVEC)   += $(call as-option,-Wa$(comma)-maltivec)
 cpu-as-$(CONFIG_E200)  += -Wa,-me200
 cpu-as-$(CONFIG_E500)  += -Wa,-me500
-cpu-as-$(CONFIG_PPC_BOOK3S_64) += -Wa,-mpower4
+cpu-as-$(CONFIG_PPC_BOOK3S_64) += -Wa,-mpower8
 cpu-as-$(CONFIG_PPC_E500MC)+= $(call as-option,-Wa$(comma)-me500mc)
 
 KBUILD_AFLAGS += $(cpu-as-y)
-- 
2.17.1



Re: [PATCH 8/9] PCI: hotplug: Embed hotplug_slot

2018-08-20 Thread Tyrel Datwyler
On 08/19/2018 07:29 AM, Lukas Wunner wrote:
> When the PCI hotplug core and its first user, cpqphp, were introduced in
> February 2002 with historic commit a8a2069f432c, cpqphp allocated a slot
> struct for its internal use plus a hotplug_slot struct to be registered
> with the hotplug core and linked the two with pointers:
> https://git.kernel.org/tglx/history/c/a8a2069f432c
> 
> Nowadays, the predominant pattern in the tree is to embed ("subclass")
> such structures in one another and cast to the containing struct with
> container_of().  But it wasn't until July 2002 that container_of() was
> introduced with historic commit ec4f214232cf:
> https://git.kernel.org/tglx/history/c/ec4f214232cf
> 
> pnv_php, introduced in 2016, did the right thing and embedded struct
> hotplug_slot in its internal struct pnv_php_slot, but all other drivers
> cargo-culted cpqphp's design and linked separate structs with pointers.
> 
> Embedding structs is preferrable to linking them with pointers because
> it requires fewer allocations, thereby reducing overhead and simplifying
> error paths.  Casting an embedded struct to the containing struct
> becomes a cheap subtraction rather than a dereference.  And having fewer
> pointers reduces the risk of them pointing nowhere either accidentally
> or due to an attack.
> 
> Convert all drivers to embed struct hotplug_slot in their internal slot
> struct.  The "private" pointer in struct hotplug_slot thereby becomes
> unused, so drop it.
> 
> Signed-off-by: Lukas Wunner 
> Cc: Rafael J. Wysocki 
> Cc: Len Brown 
> Cc: Scott Murray 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Gavin Shan 
> Cc: Sebastian Ott 
> Cc: Gerald Schaefer 
> Cc: Corentin Chary 
> Cc: Darren Hart 



Re: [PATCH 7/9] PCI: hotplug: Drop hotplug_slot_info

2018-08-20 Thread Tyrel Datwyler
On 08/19/2018 07:29 AM, Lukas Wunner wrote:
> Ever since the PCI hotplug core was introduced in 2002, drivers had to
> allocate and register a struct hotplug_slot_info for every slot:
> https://git.kernel.org/tglx/history/c/a8a2069f432c
> 
> Apparently the idea was that drivers furnish the hotplug core with an
> up-to-date card presence status, power status, latch status and
> attention indicator status as well as notify the hotplug core of changes
> thereof.  However only 4 out of 12 hotplug drivers bother to notify the
> hotplug core with pci_hp_change_slot_info() and the hotplug core never
> made any use of the information:  There is just a single macro in
> pci_hotplug_core.c, GET_STATUS(), which uses the hotplug_slot_info if
> the driver lacks the corresponding callback in hotplug_slot_ops.  The
> macro is called when the user reads the attribute via sysfs.
> 
> Now, if the callback isn't defined, the attribute isn't exposed in sysfs
> in the first place (see e.g. has_power_file()).  There are only two
> situations when the hotplug_slot_info would actually be accessed:
> 
> * If the driver defines ->enable_slot or ->disable_slot but not
>   ->get_power_status.
> 
> * If the driver defines ->set_attention_status but not
>   ->get_attention_status.
> 
> There is no driver doing the former and just a single driver doing the
> latter, namely pnv_php.c.  Amend it with a ->get_attention_status
> callback.  With that, the hotplug_slot_info becomes completely unused by
> the PCI hotplug core.  But a few drivers use it internally as a cache:
> 
> cpcihp uses it to cache the latch_status and adapter_status.
> cpqhp uses it to cache the adapter_status.
> pnv_php and rpaphp use it to cache the attention_status.
> shpchp uses it to cache all four values.
> 
> Amend these drivers to cache the information in their private slot
> struct.  shpchp's slot struct already contains members to cache the
> power_status and adapter_status, so additional members are only needed
> for the other two values.  In the case of cpqphp, the cached value is
> only accessed in a single place, so instead of caching it, read the
> current value from the hardware.
> 
> Caution:  acpiphp, cpci, cpqhp, shpchp, asus-wmi and eeepc-laptop
> populate the hotplug_slot_info with initial values on probe.  That code
> is herewith removed.  There is a theoretical chance that the code has
> side effects without which the driver fails to function, e.g. if the
> ACPI method to read the adapter status needs to be executed at least
> once on probe.  That seems unlikely to me, still maintainers should
> review the changes carefully for this possibility.
> 
> Signed-off-by: Lukas Wunner 
> Cc: Rafael J. Wysocki 
> Cc: Len Brown 
> Cc: Scott Murray 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Gavin Shan 
> Cc: Sebastian Ott 
> Cc: Gerald Schaefer 
> Cc: Corentin Chary 
> Cc: Darren Hart 
> Cc: Andy Shevchenko 
> ---

With regards to driver/pci/hotplug/rpa*

Acked-by: Tyrel Datwyler 



Re: [PATCH 6/9] PCI: hotplug: Constify hotplug_slot_ops

2018-08-20 Thread Tyrel Datwyler
On 08/19/2018 07:29 AM, Lukas Wunner wrote:
> Hotplug drivers cannot declare their hotplug_slot_ops const, making them
> attractive targets for attackers, because upon registration of a hotplug
> slot, __pci_hp_initialize() writes to the "owner" and "mod_name" members
> in that struct.
> 
> Fix by moving these members to struct hotplug_slot and constify every
> driver's hotplug_slot_ops except for pciehp.
> 
> pciehp constructs its hotplug_slot_ops at runtime based on the PCIe
> port's capabilities, hence cannot declare them const.  It can be
> converted to __write_rarely once that's mainlined:
> http://www.openwall.com/lists/kernel-hardening/2016/11/16/3
> 
> Signed-off-by: Lukas Wunner 
> Cc: Rafael J. Wysocki 
> Cc: Len Brown 
> Cc: Scott Murray 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Gavin Shan 
> Cc: Sebastian Ott 
> Cc: Gerald Schaefer 
> Cc: Corentin Chary 
> Cc: Darren Hart 
> Cc: Andy Shevchenko 
> ---

With regards to drivers/pci/hotplug/rpa*

Acked-by: Tyrel Datwyler 



[PATCH 5/5] arm64: dts: add LX2160ARDB board support

2018-08-20 Thread Vabhav Sharma
LX2160A reference design board (RDB) is a high-performance
computing, evaluation, and development platform with LX2160A
SoC.

Signed-off-by: Priyanka Jain 
Signed-off-by: Sriram Dash 
Signed-off-by: Vabhav Sharma 
---
 arch/arm64/boot/dts/freescale/Makefile|  1 +
 arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts | 95 +++
 2 files changed, 96 insertions(+)
 create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts

diff --git a/arch/arm64/boot/dts/freescale/Makefile 
b/arch/arm64/boot/dts/freescale/Makefile
index 86e18ad..445b72b 100644
--- a/arch/arm64/boot/dts/freescale/Makefile
+++ b/arch/arm64/boot/dts/freescale/Makefile
@@ -13,3 +13,4 @@ dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-rdb.dtb
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-simu.dtb
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-qds.dtb
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-rdb.dtb
+dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-rdb.dtb
diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts 
b/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts
new file mode 100644
index 000..70fad20
--- /dev/null
+++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts
@@ -0,0 +1,95 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+//
+// Device Tree file for LX2160ARDB
+//
+// Copyright 2018 NXP
+
+/dts-v1/;
+
+#include "fsl-lx2160a.dtsi"
+
+/ {
+   model = "NXP Layerscape LX2160ARDB";
+   compatible = "fsl,lx2160a-rdb", "fsl,lx2160a";
+
+   aliases {
+   crypto = 
+   serial0 = 
+   serial1 = 
+   serial2 = 
+   serial3 = 
+   };
+   chosen {
+   stdout-path = "serial0:115200n8";
+   };
+};
+
+ {
+   status = "okay";
+};
+
+ {
+   status = "okay";
+};
+
+ {
+   status = "okay";
+   pca9547@77 {
+   compatible = "nxp,pca9547";
+   reg = <0x77>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   i2c@2 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <0x2>;
+
+   ina220@40 {
+   compatible = "ti,ina220";
+   reg = <0x40>;
+   shunt-resistor = <1000>;
+   };
+   };
+
+   i2c@3 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <0x3>;
+
+   sa56004@4c {
+   compatible = "nxp,sa56004";
+   reg = <0x4c>;
+   };
+
+   sa56004@4d {
+   compatible = "nxp,sa56004";
+   reg = <0x4d>;
+   };
+   };
+   };
+};
+
+ {
+   status = "okay";
+
+   rtc@51 {
+   compatible = "nxp,pcf2129";
+   reg = <0x51>;
+   // IRQ10_B
+   interrupts = <0 150 0x4>;
+   };
+
+};
+
+ {
+   status = "okay";
+};
+
+ {
+   status = "okay";
+};
+
+ {
+   status = "okay";
+};
-- 
2.7.4



[PATCH 4/5] arm64: dts: add QorIQ LX2160A SoC support

2018-08-20 Thread Vabhav Sharma
LX2160A SoC is based on Layerscape Chassis Generation 3.2 Architecture.

LX2160A features an advanced 16 64-bit ARM v8 CortexA72 processor cores
in 8 cluster, CCN508, GICv3,two 64-bit DDR4 memory controller, 8 I2C
controllers, 3 dspi, 2 esdhc,2 USB 3.0, mmu 500, 3 SATA, 4 PL011 SBSA
UARTs etc.

Signed-off-by: Ramneek Mehresh 
Signed-off-by: Zhang Ying-22455 
Signed-off-by: Nipun Gupta 
Signed-off-by: Priyanka Jain 
Signed-off-by: Yogesh Gaur 
Signed-off-by: Sriram Dash 
Signed-off-by: Vabhav Sharma 
---
 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi | 572 +
 1 file changed, 572 insertions(+)
 create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi

diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
new file mode 100644
index 000..e35e494
--- /dev/null
+++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
@@ -0,0 +1,572 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+//
+// Device Tree Include file for Layerscape-LX2160A family SoC.
+//
+// Copyright 2018 NXP
+
+#include 
+
+/memreserve/ 0x8000 0x0001;
+
+/ {
+   compatible = "fsl,lx2160a";
+   interrupt-parent = <>;
+   #address-cells = <2>;
+   #size-cells = <2>;
+
+   cpus {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   // 8 clusters having 2 Cortex-A72 cores each
+   cpu@0 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   reg = <0x0>;
+   clocks = < 1 0>;
+   next-level-cache = <_l2>;
+   };
+
+   cpu@1 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   reg = <0x1>;
+   clocks = < 1 0>;
+   next-level-cache = <_l2>;
+   };
+
+   cpu@100 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   reg = <0x100>;
+   clocks = < 1 1>;
+   next-level-cache = <_l2>;
+   };
+
+   cpu@101 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   reg = <0x101>;
+   clocks = < 1 1>;
+   next-level-cache = <_l2>;
+   };
+
+   cpu@200 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   reg = <0x200>;
+   clocks = < 1 2>;
+   next-level-cache = <_l2>;
+   };
+
+   cpu@201 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   reg = <0x201>;
+   clocks = < 1 2>;
+   next-level-cache = <_l2>;
+   };
+
+   cpu@300 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   reg = <0x300>;
+   clocks = < 1 3>;
+   next-level-cache = <_l2>;
+   };
+
+   cpu@301 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   reg = <0x301>;
+   clocks = < 1 3>;
+   next-level-cache = <_l2>;
+   };
+
+   cpu@400 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   reg = <0x400>;
+   clocks = < 1 4>;
+   next-level-cache = <_l2>;
+   };
+
+   cpu@401 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   reg = <0x401>;
+   clocks = < 1 4>;
+   next-level-cache = <_l2>;
+   };
+
+   cpu@500 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   reg = <0x500>;
+   clocks = < 1 5>;
+   next-level-cache = <_l2>;
+   };
+
+   cpu@501 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   reg = <0x501>;
+   clocks = < 1 5>;
+   next-level-cache = <_l2>;
+   };
+
+   cpu@600 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   reg = <0x600>;
+   clocks = < 1 6>;
+   next-level-cache = <_l2>;
+   };

[PATCH 3/5] drivers: clk-qoriq: Add clockgen support for lx2160a

2018-08-20 Thread Vabhav Sharma
From: Yogesh Gaur 

Add clockgen support for lx2160a.
Added entry for compat 'fsl,lx2160a-clockgen'.
As LX2160A is 16 core, so modified value for NUM_CMUX

Signed-off-by: Tang Yuantian 
Signed-off-by: Yogesh Gaur 
Signed-off-by: Vabhav Sharma 
---
 drivers/clk/clk-qoriq.c | 14 +-
 drivers/cpufreq/qoriq-cpufreq.c |  1 +
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/clk/clk-qoriq.c b/drivers/clk/clk-qoriq.c
index 3a1812f..fc6e308 100644
--- a/drivers/clk/clk-qoriq.c
+++ b/drivers/clk/clk-qoriq.c
@@ -60,7 +60,7 @@ struct clockgen_muxinfo {
 };
 
 #define NUM_HWACCEL5
-#define NUM_CMUX   8
+#define NUM_CMUX   16
 
 struct clockgen;
 
@@ -570,6 +570,17 @@ static const struct clockgen_chipinfo chipinfo[] = {
.flags = CG_VER3 | CG_LITTLE_ENDIAN,
},
{
+   .compat = "fsl,lx2160a-clockgen",
+   .cmux_groups = {
+   _cmux_cga12, _cmux_cgb
+   },
+   .cmux_to_group = {
+   0, 0, 0, 0, 1, 1, 1, 1, -1
+   },
+   .pll_mask = 0x37,
+   .flags = CG_VER3 | CG_LITTLE_ENDIAN,
+   },
+   {
.compat = "fsl,p2041-clockgen",
.guts_compat = "fsl,qoriq-device-config-1.0",
.init_periph = p2041_init_periph,
@@ -1424,6 +1435,7 @@ CLK_OF_DECLARE(qoriq_clockgen_ls1043a, 
"fsl,ls1043a-clockgen", clockgen_init);
 CLK_OF_DECLARE(qoriq_clockgen_ls1046a, "fsl,ls1046a-clockgen", clockgen_init);
 CLK_OF_DECLARE(qoriq_clockgen_ls1088a, "fsl,ls1088a-clockgen", clockgen_init);
 CLK_OF_DECLARE(qoriq_clockgen_ls2080a, "fsl,ls2080a-clockgen", clockgen_init);
+CLK_OF_DECLARE(qoriq_clockgen_lx2160a, "fsl,lx2160a-clockgen", clockgen_init);
 
 /* Legacy nodes */
 CLK_OF_DECLARE(qoriq_sysclk_1, "fsl,qoriq-sysclk-1.0", sysclk_init);
diff --git a/drivers/cpufreq/qoriq-cpufreq.c b/drivers/cpufreq/qoriq-cpufreq.c
index 3d773f6..83921b7 100644
--- a/drivers/cpufreq/qoriq-cpufreq.c
+++ b/drivers/cpufreq/qoriq-cpufreq.c
@@ -295,6 +295,7 @@ static const struct of_device_id node_matches[] __initconst 
= {
{ .compatible = "fsl,ls1046a-clockgen", },
{ .compatible = "fsl,ls1088a-clockgen", },
{ .compatible = "fsl,ls2080a-clockgen", },
+   { .compatible = "fsl,lx2160a-clockgen", },
{ .compatible = "fsl,p4080-clockgen", },
{ .compatible = "fsl,qoriq-clockgen-1.0", },
{ .compatible = "fsl,qoriq-clockgen-2.0", },
-- 
2.7.4



[PATCH 2/5] soc/fsl/guts: Add compatible string for LX2160A

2018-08-20 Thread Vabhav Sharma
Adding compatible string "lx2160a-dcfg" to
initialize guts driver for lx2160

Signed-off-by: Vabhav Sharma 
---
 drivers/soc/fsl/guts.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c
index 302e0c8..5e1e633 100644
--- a/drivers/soc/fsl/guts.c
+++ b/drivers/soc/fsl/guts.c
@@ -222,6 +222,7 @@ static const struct of_device_id fsl_guts_of_match[] = {
{ .compatible = "fsl,ls1088a-dcfg", },
{ .compatible = "fsl,ls1012a-dcfg", },
{ .compatible = "fsl,ls1046a-dcfg", },
+   { .compatible = "fsl,lx2160a-dcfg", },
{}
 };
 MODULE_DEVICE_TABLE(of, fsl_guts_of_match);
-- 
2.7.4



[PATCH 1/5] dt-bindings: arm64: add compatible for LX2160A

2018-08-20 Thread Vabhav Sharma
Add compatible for LX2160A SoC,QDS and RDB board

Signed-off-by: Vabhav Sharma 
---
 Documentation/devicetree/bindings/arm/fsl.txt | 12 
 1 file changed, 12 insertions(+)

diff --git a/Documentation/devicetree/bindings/arm/fsl.txt 
b/Documentation/devicetree/bindings/arm/fsl.txt
index cdb9dd7..76256bd 100644
--- a/Documentation/devicetree/bindings/arm/fsl.txt
+++ b/Documentation/devicetree/bindings/arm/fsl.txt
@@ -218,3 +218,15 @@ Required root node properties:
 LS2088A ARMv8 based RDB Board
 Required root node properties:
 - compatible = "fsl,ls2088a-rdb", "fsl,ls2088a";
+
+LX2160A SoC
+Required root node properties:
+- compatible = "fsl,lx2160a";
+
+LX2160A ARMv8 based QDS Board
+Required root node properties:
+- compatible = "fsl,lx2160a-qds", "fsl,lx2160a";
+
+LX2160A ARMv8 based RDB Board
+Required root node properties:
+- compatible = "fsl,lx2160a-rdb", "fsl,lx2160a";
-- 
2.7.4



[PATCH 0/5] arm64: dts: NXP: add basic dts file for LX2160A SoC

2018-08-20 Thread Vabhav Sharma
- Add compatible string for LX2160A clockgen support
- Add compatible string to initialize LX2160A guts driver
- Add compatible string for LX2160A support in dt-bindings
- Add dts file to enable support for LX2160A SoC and LX2160A RDB
  (Reference design board)

Vabhav Sharma (4):
  dt-bindings: arm64: add compatible for LX2160A
  soc/fsl/guts: Add compatible string for LX2160A
  arm64: dts: add QorIQ LX2160A SoC support
  arm64: dts: add LX2160ARDB board support

Yogesh Gaur (1):
  drivers: clk-qoriq: Add clockgen support for lx2160a

 Documentation/devicetree/bindings/arm/fsl.txt |  12 +
 arch/arm64/boot/dts/freescale/Makefile|   1 +
 arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts |  95 
 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi| 572 ++
 drivers/clk/clk-qoriq.c   |  14 +-
 drivers/cpufreq/qoriq-cpufreq.c   |   1 +
 drivers/soc/fsl/guts.c|   1 +
 7 files changed, 695 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts
 create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi

-- 
2.7.4



Re: [PATCH v7 2/2] powerpc: Use cpu_smallcore_sibling_mask at SMT level on bigcores

2018-08-20 Thread Srikar Dronamraju
* Gautham R. Shenoy  [2018-08-20 11:11:44]:

> From: "Gautham R. Shenoy" 
> 
> Each of the SMT4 cores forming a big-core are more or less independent
> units. Thus when multiple tasks are scheduled to run on the fused
> core, we get the best performance when the tasks are spread across the
> pair of SMT4 cores.
> 
> This patch achieves this by setting the SMT level mask to correspond
> to the smallcore sibling mask on big-core systems. This patch also
> ensures that while checked for shared-caches on big-core system, we
> use the smallcore_sibling_mask to compare with the l2_cache_mask.
> This ensure that the CACHE level sched-domain is created, whose groups
> correspond to the threads of the big-core.
> 
> With this patch, the SMT sched-domain with SMT=8,4,2 on big-core
> systems are as follows:


Reviewed-by: Srikar Dronamraju 



Re: [PATCH v7 1/2] powerpc: Detect the presence of big-cores via "ibm,thread-groups"

2018-08-20 Thread Srikar Dronamraju
* Gautham R. Shenoy  [2018-08-20 11:11:43]:

> From: "Gautham R. Shenoy" 
> 
> On IBM POWER9, the device tree exposes a property array identifed by
one small nit:
s/identifed/identified/g

> "ibm,thread-groups" which will indicate which groups of threads share
> a particular set of resources.
> 
> As of today we only have one form of grouping identifying the group of
> threads in the core that share the L1 cache, translation cache and
> instruction data flow.
> 
> This patch defines the helper function to parse the contents of
> "ibm,thread-groups" and a new structure to contain the parsed output.
> 
> The patch also creates the sysfs file named "small_core_siblings" that
> returns the physical ids of the threads in the core that share the L1
> cache, translation cache and instruction data flow.
> 
> Signed-off-by: Gautham R. Shenoy 

Otherwise looks good to me.

Reviewed-by: Srikar Dronamraju 



Re: Odd SIGSEGV issue introduced by commit 6b31d5955cb29 ("mm, oom: fix potential data corruption when oom_reaper races with writer")

2018-08-20 Thread Christophe LEROY




Le 20/08/2018 à 18:01, Michal Hocko a écrit :

On Mon 20-08-18 17:23:58, Christophe LEROY wrote:

Hello,

I have an odd issue on my powerpc 8xx board.

I am running latest 4.14 and get the following SIGSEGV which appears more or
less randomly.

[9.190354] touch[91]: unhandled signal 11 at 67807b58 nip 777cf114 lr
777cf100 code 30001
[   24.634810] ifconfig[160]: unhandled signal 11 at 67ae7b58 nip 77aaf114
lr 77aaf100 code 30001
[   30.383737] default.deconfi[231]: unhandled signal 11 at 67c8bb58 nip
77c53114 lr 77c53100 code 30001
[   37.655588] S15syslogd[251]: unhandled signal 11 at 6784fb58 nip 77817114
lr 77817100 code 30001
[   40.974649] snmpd[315]: unhandled signal 11 at 67e0bb58 nip 77dd3114 lr
77dd3100 code 30001
[   43.220964] exe[338]: unhandled signal 11 at 67cd3b58 nip 77c9b114 lr
77c9b100 code 30001
[   44.191494] exe[348]: unhandled signal 11 at 67c1fb58 nip 77be7114 lr
77be7100 code 30001
[   59.175022] sleep[655]: unhandled signal 11 at 67ca3b58 nip 77c6b114 lr
77c6b100 code 30001
[   61.853406] smcroute[705]: unhandled signal 11 at 6789bb58 nip 77863114
lr 77863100 code 30001
[   64.662431] smcroute[778]: unhandled signal 11 at 67e03b58 nip 77dcb114
lr 77dcb100 code 30001
[   65.623103] smcroute[795]: unhandled signal 11 at 67bdbb58 nip 77ba3114
lr 77ba3100 code 30001
[   66.579416] exe[825]: unhandled signal 11 at 67edbb58 nip 77ea3114 lr
77ea3100 code 30001
[   68.382941] exe[864]: unhandled signal 11 at 6789bb58 nip 77863114 lr
77863100 code 30001
[   95.187346] exe[1147]: unhandled signal 11 at 67e83b58 nip 77e4b114 lr
77e4b100 code 30001
[  105.238218] exe[1158]: unhandled signal 11 at 67ca3b58 nip 77c6b114 lr
77c6b100 code 30001
[  127.556731] exe[1181]: unhandled signal 11 at 67cc3b58 nip 77c8b114 lr
77c8b100 code 30001
[  135.558982] exe[1195]: unhandled signal 11 at 678d7b58 nip 7789f114 lr
7789f100 code 30001
[  147.579142] exe[1216]: unhandled signal 11 at 67c6bb58 nip 77c33114 lr
77c33100 code 30001
[  175.538747] exe[1262]: unhandled signal 11 at 67e2fb58 nip 77df7114 lr
77df7100 code 30001
[  186.552670] exe[1275]: unhandled signal 11 at 6781fb58 nip 777e7114 lr
777e7100 code 30001
[  230.629786] exe[1344]: unhandled signal 11 at 67cb3b58 nip 77c7b114 lr
77c7b100 code 30001
[  249.640396] repair-service.[1369]: unhandled signal 11 at 67e5fb58 nip
77e27114 lr 77e27100 code 30001
[  378.003410] exe[1593]: unhandled signal 11 at 678d7b58 nip 7789f114 lr
7789f100 code 30001
[  414.060661] exe[1656]: unhandled signal 11 at 67cc7b58 nip 77c8f114 lr
77c8f100 code 30001

The problem is present in 3.13, 3.14 and 3.15.

I bisected its appearance with commit 6b31d5955cb29 ("mm, oom: fix potential
data corruption when oom_reaper races with writer")


Do you see any oom killer invocations preceeding the SEGV? Some of those
killed tasks simply do not look like a sensible oom victims (e.g.
touch)...


No I don't see any.




And I bisected its disappearance with commit 99cd1302327a2 ("powerpc:
Deliver SEGV signal on pkey violation")


Those two seem completely unrelated.



That's my feeling too, hence my incredulity


Re: Odd SIGSEGV issue introduced by commit 6b31d5955cb29 ("mm, oom: fix potential data corruption when oom_reaper races with writer")

2018-08-20 Thread Michal Hocko
On Mon 20-08-18 17:23:58, Christophe LEROY wrote:
> Hello,
> 
> I have an odd issue on my powerpc 8xx board.
> 
> I am running latest 4.14 and get the following SIGSEGV which appears more or
> less randomly.
> 
> [9.190354] touch[91]: unhandled signal 11 at 67807b58 nip 777cf114 lr
> 777cf100 code 30001
> [   24.634810] ifconfig[160]: unhandled signal 11 at 67ae7b58 nip 77aaf114
> lr 77aaf100 code 30001
> [   30.383737] default.deconfi[231]: unhandled signal 11 at 67c8bb58 nip
> 77c53114 lr 77c53100 code 30001
> [   37.655588] S15syslogd[251]: unhandled signal 11 at 6784fb58 nip 77817114
> lr 77817100 code 30001
> [   40.974649] snmpd[315]: unhandled signal 11 at 67e0bb58 nip 77dd3114 lr
> 77dd3100 code 30001
> [   43.220964] exe[338]: unhandled signal 11 at 67cd3b58 nip 77c9b114 lr
> 77c9b100 code 30001
> [   44.191494] exe[348]: unhandled signal 11 at 67c1fb58 nip 77be7114 lr
> 77be7100 code 30001
> [   59.175022] sleep[655]: unhandled signal 11 at 67ca3b58 nip 77c6b114 lr
> 77c6b100 code 30001
> [   61.853406] smcroute[705]: unhandled signal 11 at 6789bb58 nip 77863114
> lr 77863100 code 30001
> [   64.662431] smcroute[778]: unhandled signal 11 at 67e03b58 nip 77dcb114
> lr 77dcb100 code 30001
> [   65.623103] smcroute[795]: unhandled signal 11 at 67bdbb58 nip 77ba3114
> lr 77ba3100 code 30001
> [   66.579416] exe[825]: unhandled signal 11 at 67edbb58 nip 77ea3114 lr
> 77ea3100 code 30001
> [   68.382941] exe[864]: unhandled signal 11 at 6789bb58 nip 77863114 lr
> 77863100 code 30001
> [   95.187346] exe[1147]: unhandled signal 11 at 67e83b58 nip 77e4b114 lr
> 77e4b100 code 30001
> [  105.238218] exe[1158]: unhandled signal 11 at 67ca3b58 nip 77c6b114 lr
> 77c6b100 code 30001
> [  127.556731] exe[1181]: unhandled signal 11 at 67cc3b58 nip 77c8b114 lr
> 77c8b100 code 30001
> [  135.558982] exe[1195]: unhandled signal 11 at 678d7b58 nip 7789f114 lr
> 7789f100 code 30001
> [  147.579142] exe[1216]: unhandled signal 11 at 67c6bb58 nip 77c33114 lr
> 77c33100 code 30001
> [  175.538747] exe[1262]: unhandled signal 11 at 67e2fb58 nip 77df7114 lr
> 77df7100 code 30001
> [  186.552670] exe[1275]: unhandled signal 11 at 6781fb58 nip 777e7114 lr
> 777e7100 code 30001
> [  230.629786] exe[1344]: unhandled signal 11 at 67cb3b58 nip 77c7b114 lr
> 77c7b100 code 30001
> [  249.640396] repair-service.[1369]: unhandled signal 11 at 67e5fb58 nip
> 77e27114 lr 77e27100 code 30001
> [  378.003410] exe[1593]: unhandled signal 11 at 678d7b58 nip 7789f114 lr
> 7789f100 code 30001
> [  414.060661] exe[1656]: unhandled signal 11 at 67cc7b58 nip 77c8f114 lr
> 77c8f100 code 30001
> 
> The problem is present in 3.13, 3.14 and 3.15.
> 
> I bisected its appearance with commit 6b31d5955cb29 ("mm, oom: fix potential
> data corruption when oom_reaper races with writer")

Do you see any oom killer invocations preceeding the SEGV? Some of those
killed tasks simply do not look like a sensible oom victims (e.g.
touch)...

> And I bisected its disappearance with commit 99cd1302327a2 ("powerpc:
> Deliver SEGV signal on pkey violation")

Those two seem completely unrelated.

-- 
Michal Hocko
SUSE Labs


Odd SIGSEGV issue introduced by commit 6b31d5955cb29 ("mm, oom: fix potential data corruption when oom_reaper races with writer")

2018-08-20 Thread Christophe LEROY

Hello,

I have an odd issue on my powerpc 8xx board.

I am running latest 4.14 and get the following SIGSEGV which appears 
more or less randomly.


[9.190354] touch[91]: unhandled signal 11 at 67807b58 nip 777cf114 
lr 777cf100 code 30001
[   24.634810] ifconfig[160]: unhandled signal 11 at 67ae7b58 nip 
77aaf114 lr 77aaf100 code 30001
[   30.383737] default.deconfi[231]: unhandled signal 11 at 67c8bb58 nip 
77c53114 lr 77c53100 code 30001
[   37.655588] S15syslogd[251]: unhandled signal 11 at 6784fb58 nip 
77817114 lr 77817100 code 30001
[   40.974649] snmpd[315]: unhandled signal 11 at 67e0bb58 nip 77dd3114 
lr 77dd3100 code 30001
[   43.220964] exe[338]: unhandled signal 11 at 67cd3b58 nip 77c9b114 lr 
77c9b100 code 30001
[   44.191494] exe[348]: unhandled signal 11 at 67c1fb58 nip 77be7114 lr 
77be7100 code 30001
[   59.175022] sleep[655]: unhandled signal 11 at 67ca3b58 nip 77c6b114 
lr 77c6b100 code 30001
[   61.853406] smcroute[705]: unhandled signal 11 at 6789bb58 nip 
77863114 lr 77863100 code 30001
[   64.662431] smcroute[778]: unhandled signal 11 at 67e03b58 nip 
77dcb114 lr 77dcb100 code 30001
[   65.623103] smcroute[795]: unhandled signal 11 at 67bdbb58 nip 
77ba3114 lr 77ba3100 code 30001
[   66.579416] exe[825]: unhandled signal 11 at 67edbb58 nip 77ea3114 lr 
77ea3100 code 30001
[   68.382941] exe[864]: unhandled signal 11 at 6789bb58 nip 77863114 lr 
77863100 code 30001
[   95.187346] exe[1147]: unhandled signal 11 at 67e83b58 nip 77e4b114 
lr 77e4b100 code 30001
[  105.238218] exe[1158]: unhandled signal 11 at 67ca3b58 nip 77c6b114 
lr 77c6b100 code 30001
[  127.556731] exe[1181]: unhandled signal 11 at 67cc3b58 nip 77c8b114 
lr 77c8b100 code 30001
[  135.558982] exe[1195]: unhandled signal 11 at 678d7b58 nip 7789f114 
lr 7789f100 code 30001
[  147.579142] exe[1216]: unhandled signal 11 at 67c6bb58 nip 77c33114 
lr 77c33100 code 30001
[  175.538747] exe[1262]: unhandled signal 11 at 67e2fb58 nip 77df7114 
lr 77df7100 code 30001
[  186.552670] exe[1275]: unhandled signal 11 at 6781fb58 nip 777e7114 
lr 777e7100 code 30001
[  230.629786] exe[1344]: unhandled signal 11 at 67cb3b58 nip 77c7b114 
lr 77c7b100 code 30001
[  249.640396] repair-service.[1369]: unhandled signal 11 at 67e5fb58 
nip 77e27114 lr 77e27100 code 30001
[  378.003410] exe[1593]: unhandled signal 11 at 678d7b58 nip 7789f114 
lr 7789f100 code 30001
[  414.060661] exe[1656]: unhandled signal 11 at 67cc7b58 nip 77c8f114 
lr 77c8f100 code 30001


The problem is present in 3.13, 3.14 and 3.15.

I bisected its appearance with commit 6b31d5955cb29 ("mm, oom: fix 
potential data corruption when oom_reaper races with writer")


And I bisected its disappearance with commit 99cd1302327a2 ("powerpc: 
Deliver SEGV signal on pkey violation")


Looking at those two commits, especially the one which makes it 
dissapear, I'm quite sceptic. Any idea on what could be the cause and/or 
how to investigate further ?


Thanks
Christophe


[PATCH v2 0/3] powerpc/pseries: use H_BLOCK_REMOVE

2018-08-20 Thread Laurent Dufour
On very large system we could see soft lockup fired when a process is
exiting

watchdog: BUG: soft lockup - CPU#851 stuck for 21s! [forkoff:215523]
Modules linked in: pseries_rng rng_core xfs raid10 vmx_crypto btrfs libcrc32c 
xor zstd_decompress zstd_compress xxhash lzo_compress raid6_pq crc32c_vpmsum 
lpfc crc_t10dif crct10dif_generic crct10dif_common dm_multipath scsi_dh_rdac 
scsi_dh_alua autofs4
CPU: 851 PID: 215523 Comm: forkoff Not tainted 4.17.0 #1
NIP:  c00b995c LR: c00b8f64 CTR: aa18
REGS: c6b0645b7610 TRAP: 0901   Not tainted  (4.17.0)
MSR:  80010280b033   CR: 22042082  
XER: 
CFAR: 006cf8f0 SOFTE: 0 
GPR00: 0010 c6b0645b7890 c0f99200  
GPR04: 8e01a5a4de58 400249cf1bfd5480 8e01a5a4de50 400249cf1bfd5480 
GPR08: 8e01a5a4de48 400249cf1bfd5480 8e01a5a4de40 400249cf1bfd5480 
GPR12:  c0001e690800 
NIP [c00b995c] plpar_hcall9+0x44/0x7c
LR [c00b8f64] pSeries_lpar_flush_hash_range+0x324/0x3d0
Call Trace:
[c6b0645b7890] [8e01a5a4dd20] 0x8e01a5a4dd20 (unreliable)
[c6b0645b7a00] [c006d5b0] flush_hash_range+0x60/0x110
[c6b0645b7a50] [c0072a2c] __flush_tlb_pending+0x4c/0xd0
[c6b0645b7a80] [c02eaf44] unmap_page_range+0x984/0xbd0
[c6b0645b7bc0] [c02eb594] unmap_vmas+0x84/0x100
[c6b0645b7c10] [c02f8afc] exit_mmap+0xac/0x1f0
[c6b0645b7cd0] [c00f2638] mmput+0x98/0x1b0
[c6b0645b7d00] [c00fc9d0] do_exit+0x330/0xc00
[c6b0645b7dc0] [c00fd384] do_group_exit+0x64/0x100
[c6b0645b7e00] [c00fd44c] sys_exit_group+0x2c/0x30
[c6b0645b7e30] [c000b960] system_call+0x58/0x6c
Instruction dump:
6000 f8810028 7ca42b78 7cc53378 7ce63b78 7d074378 7d284b78 7d495378 
e9410060 e9610068 e9810070 4422 <7d806378> e9810028 f88c f8ac0008

This happens when removing the PTE by calling the hypervisor using the
H_BULK_REMOVE call. This call is processing up to 4 PTEs but is doing a
tlbie for each PTE it is processing. This could lead to long time spent in
the hypervisor (sometimes up to 4s) and soft lockup being raised because
the scheduler is not called in zap_pte_range().

Since the Power7's time, the hypervisor is providing a new hcall
H_BLOCK_REMOVE allowing processing up to 8 PTEs with one call to
tlbie. By limiting the amount of tlbie generated, this reduces the time
spent invalidating the PTEs.

This hcall requires that the pages are "all within the same naturally
aligned 8 page virtual address block".

With this patch series applied, I couldn't see any soft lockup raised on
the victim LPAR I was running the test one.

Changes since V1:
- Remove a call to BUG_ON() in call_block_remove() since this one can be
  handled gently.
- Remove uneeded of current_vpgb to 0 when retrying entries in
  hugepage_block_invalidate() and do_block_remove().

Laurent Dufour (3):
  powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE
  powerpc/pseries/mm: factorize PTE slot computation
  powerpc/pseries/mm: call H_BLOCK_REMOVE

 arch/powerpc/include/asm/firmware.h   |   3 +-
 arch/powerpc/include/asm/hvcall.h |   1 +
 arch/powerpc/platforms/pseries/firmware.c |   1 +
 arch/powerpc/platforms/pseries/lpar.c | 241 --
 4 files changed, 230 insertions(+), 16 deletions(-)

-- 
2.7.4



[PATCH v2 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE

2018-08-20 Thread Laurent Dufour
This hypervisor's call allows to remove up to 8 ptes with only call to
tlbie.

The virtual pages must be all within the same naturally aligned 8 pages
virtual address block and have the same page and segment size encodings.

Cc: "Aneesh Kumar K.V" 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/include/asm/hvcall.h |   1 +
 arch/powerpc/platforms/pseries/lpar.c | 214 --
 2 files changed, 207 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index a0b17f9f1ea4..c349d3960d63 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -278,6 +278,7 @@
 #define H_COP  0x304
 #define H_GET_MPP_X0x314
 #define H_SET_MODE 0x31C
+#define H_BLOCK_REMOVE 0x328
 #define H_CLEAR_HPT0x358
 #define H_REQUEST_VMC  0x360
 #define H_RESIZE_HPT_PREPARE   0x36C
diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c
index ebc852e3607d..0b5081085a44 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -417,6 +417,79 @@ static void pSeries_lpar_hpte_invalidate(unsigned long 
slot, unsigned long vpn,
BUG_ON(lpar_rc != H_SUCCESS);
 }
 
+
+/*
+ * As defined in the PAPR's section 14.5.4.1.8
+ * The control mask doesn't include the returned reference and change bit from
+ * the processed PTE.
+ */
+#define HBLKR_AVPN 0x0100UL
+#define HBLKR_CTRL_MASK0xf800UL
+#define HBLKR_CTRL_SUCCESS 0x8000UL
+#define HBLKR_CTRL_ERRNOTFOUND 0x8800UL
+#define HBLKR_CTRL_ERRBUSY 0xa000UL
+
+/**
+ * H_BLOCK_REMOVE caller.
+ * @idx should point to the latest @param entry set with a PTEX.
+ * If PTE cannot be processed because another CPUs has already locked that
+ * group, those entries are put back in @param starting at index 1.
+ * If entries has to be retried and @retry_busy is set to true, these entries
+ * are retried until success. If @retry_busy is set to false, the returned
+ * is the number of entries yet to process.
+ */
+static unsigned long call_block_remove(unsigned long idx, unsigned long *param,
+  bool retry_busy)
+{
+   unsigned long i, rc, new_idx;
+   unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
+
+   if (idx < 2) {
+   pr_warn("Unexpected empty call to H_BLOCK_REMOVE");
+   return 0;
+   }
+again:
+   new_idx = 0;
+   if (idx > PLPAR_HCALL9_BUFSIZE) {
+   pr_err("Too many PTEs (%lu) for H_BLOCK_REMOVE", idx);
+   idx = PLPAR_HCALL9_BUFSIZE;
+   } else if (idx < PLPAR_HCALL9_BUFSIZE)
+   param[idx] = HBR_END;
+
+   rc = plpar_hcall9(H_BLOCK_REMOVE, retbuf,
+ param[0], /* AVA */
+ param[1],  param[2],  param[3],  param[4], /* TS0-7 */
+ param[5],  param[6],  param[7],  param[8]);
+   if (rc == H_SUCCESS)
+   return 0;
+
+   BUG_ON(rc != H_PARTIAL);
+
+   /* Check that the unprocessed entries were 'not found' or 'busy' */
+   for (i = 0; i < idx-1; i++) {
+   unsigned long ctrl = retbuf[i] & HBLKR_CTRL_MASK;
+
+   if (ctrl == HBLKR_CTRL_ERRBUSY) {
+   param[++new_idx] = param[i+1];
+   continue;
+   }
+
+   BUG_ON(ctrl != HBLKR_CTRL_SUCCESS
+  && ctrl != HBLKR_CTRL_ERRNOTFOUND);
+   }
+
+   /*
+* If there were entries found busy, retry these entries if requested,
+* of if all the entries have to be retried.
+*/
+   if (new_idx && (retry_busy || new_idx == (PLPAR_HCALL9_BUFSIZE-1))) {
+   idx = new_idx + 1;
+   goto again;
+   }
+
+   return new_idx;
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 /*
  * Limit iterations holding pSeries_lpar_tlbie_lock to 3. We also need
@@ -424,17 +497,57 @@ static void pSeries_lpar_hpte_invalidate(unsigned long 
slot, unsigned long vpn,
  */
 #define PPC64_HUGE_HPTE_BATCH 12
 
-static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
-unsigned long *vpn, int count,
-int psize, int ssize)
+static void hugepage_block_invalidate(unsigned long *slot, unsigned long *vpn,
+ int count, int psize, int ssize)
 {
unsigned long param[PLPAR_HCALL9_BUFSIZE];
-   int i = 0, pix = 0, rc;
-   unsigned long flags = 0;
-   int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
+   unsigned long shift, current_vpgb, vpgb;
+   int i, pix = 0;
 
-   if (lock_tlbie)
-   

[PATCH v2 2/3] powerpc/pseries/mm: factorize PTE slot computation

2018-08-20 Thread Laurent Dufour
This part of code will be called also when dealing with H_BLOCK_REMOVE.

Cc: "Aneesh Kumar K.V" 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/platforms/pseries/lpar.c | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c
index d3992ced0782..ebc852e3607d 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -546,6 +546,24 @@ static int pSeries_lpar_hpte_removebolted(unsigned long ea,
return 0;
 }
 
+
+static inline unsigned long compute_slot(real_pte_t pte,
+unsigned long vpn,
+unsigned long index,
+unsigned long shift,
+int ssize)
+{
+   unsigned long slot, hash, hidx;
+
+   hash = hpt_hash(vpn, shift, ssize);
+   hidx = __rpte_to_hidx(pte, index);
+   if (hidx & _PTEIDX_SECONDARY)
+   hash = ~hash;
+   slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
+   slot += hidx & _PTEIDX_GROUP_IX;
+   return slot;
+}
+
 /*
  * Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
  * lock.
@@ -558,7 +576,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long 
number, int local)
struct ppc64_tlb_batch *batch = this_cpu_ptr(_tlb_batch);
int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
unsigned long param[PLPAR_HCALL9_BUFSIZE];
-   unsigned long hash, index, shift, hidx, slot;
+   unsigned long index, shift, slot;
real_pte_t pte;
int psize, ssize;
 
@@ -572,12 +590,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long 
number, int local)
vpn = batch->vpn[i];
pte = batch->pte[i];
pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
-   hash = hpt_hash(vpn, shift, ssize);
-   hidx = __rpte_to_hidx(pte, index);
-   if (hidx & _PTEIDX_SECONDARY)
-   hash = ~hash;
-   slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
-   slot += hidx & _PTEIDX_GROUP_IX;
+   slot = compute_slot(pte, vpn, index, shift, ssize);
if (!firmware_has_feature(FW_FEATURE_BULK_REMOVE)) {
/*
 * lpar doesn't use the passed actual page size
-- 
2.7.4



[PATCH v2 1/3] powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE

2018-08-20 Thread Laurent Dufour
This feature tells if the hcall H_BLOCK_REMOVE is available.

Cc: "Aneesh Kumar K.V" 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/include/asm/firmware.h   | 3 ++-
 arch/powerpc/platforms/pseries/firmware.c | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/firmware.h 
b/arch/powerpc/include/asm/firmware.h
index 7a051bd21f87..2aca2655fe30 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -52,6 +52,7 @@
 #define FW_FEATURE_PRRNASM_CONST(0x0002)
 #define FW_FEATURE_DRMEM_V2ASM_CONST(0x0004)
 #define FW_FEATURE_DRC_INFOASM_CONST(0x0008)
+#define FW_FEATURE_BLOCK_REMOVE ASM_CONST(0x0010)
 
 #ifndef __ASSEMBLY__
 
@@ -69,7 +70,7 @@ enum {
FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 |
-   FW_FEATURE_DRC_INFO,
+   FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE,
FW_FEATURE_PSERIES_ALWAYS = 0,
FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL,
FW_FEATURE_POWERNV_ALWAYS = 0,
diff --git a/arch/powerpc/platforms/pseries/firmware.c 
b/arch/powerpc/platforms/pseries/firmware.c
index a3bbeb43689e..1624501386f4 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -65,6 +65,7 @@ hypertas_fw_features_table[] = {
{FW_FEATURE_SET_MODE,   "hcall-set-mode"},
{FW_FEATURE_BEST_ENERGY,"hcall-best-energy-1*"},
{FW_FEATURE_HPT_RESIZE, "hcall-hpt-resize"},
+   {FW_FEATURE_BLOCK_REMOVE,   "hcall-block-remove"},
 };
 
 /* Build up the firmware features bitmask using the contents of
-- 
2.7.4



Re: [RFC 07/15] PCI/ACPI: clean up acpi_pci_root_create()

2018-08-20 Thread Arnd Bergmann
On Mon, Aug 20, 2018 at 1:24 PM Rafael J. Wysocki  wrote:
>
> On Mon, Aug 20, 2018 at 1:20 PM Arnd Bergmann  wrote:
> >
> > On Mon, Aug 20, 2018 at 10:23 AM Rafael J. Wysocki  
> > wrote:
> > > On Fri, Aug 17, 2018 at 12:33 PM Arnd Bergmann  wrote:
> > > > @@ -909,8 +881,7 @@ struct pci_bus *acpi_pci_root_create(struct 
> > > > acpi_pci_root *root,
> > > > int ret, busnum = root->secondary.start;
> > > > struct acpi_device *device = root->device;
> > > > int node = acpi_get_node(device->handle);
> > > > -   struct pci_bus *bus;
> > > > -   struct pci_host_bridge *host_bridge;
> > > > +   struct pci_host_bridge *bridge;
> > >
> > > Why "bridge" and not "host" or even something to stand for "root complex"?
> > >
> > > Or maybe it can still be "host_bridge"?
> >
> > I did this for consistency with the naming in drivers/pci/probe.c,
> > which always declares the local variable as 'struct pci_host_bridge 
> > *bridge'.
> > It's easy to change here if you feel strongly about it (I don't).
>
> I would leave host_bridge here.  It would make the patch smaller too I think.

Ok, I've changed my local copy as you suggested now.

  Arnd


Re: [PATCH v8 5/5] powernv/pseries: consolidate code for mce early handling.

2018-08-20 Thread Nicholas Piggin
On Sun, 19 Aug 2018 22:38:39 +0530
Mahesh J Salgaonkar  wrote:

> From: Mahesh Salgaonkar 
> 
> Now that other platforms also implements real mode mce handler,
> lets consolidate the code by sharing existing powernv machine check
> early code. Rename machine_check_powernv_early to
> machine_check_common_early and reuse the code.
> 
> Signed-off-by: Mahesh Salgaonkar 
> ---
>  arch/powerpc/kernel/exceptions-64s.S |  155 
> ++
>  1 file changed, 28 insertions(+), 127 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index 12f056179112..2f85a7baf026 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -243,14 +243,13 @@ EXC_REAL_BEGIN(machine_check, 0x200, 0x100)
>   SET_SCRATCH0(r13)   /* save r13 */
>   EXCEPTION_PROLOG_0(PACA_EXMC)
>  BEGIN_FTR_SECTION
> - b   machine_check_powernv_early
> + b   machine_check_common_early
>  FTR_SECTION_ELSE
>   b   machine_check_pSeries_0
>  ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
>  EXC_REAL_END(machine_check, 0x200, 0x100)
>  EXC_VIRT_NONE(0x4200, 0x100)
> -TRAMP_REAL_BEGIN(machine_check_powernv_early)
> -BEGIN_FTR_SECTION
> +TRAMP_REAL_BEGIN(machine_check_common_early)
>   EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
>   /*
>* Register contents:
> @@ -306,7 +305,9 @@ BEGIN_FTR_SECTION
>   /* Save r9 through r13 from EXMC save area to stack frame. */
>   EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
>   mfmsr   r11 /* get MSR value */
> +BEGIN_FTR_SECTION
>   ori r11,r11,MSR_ME  /* turn on ME bit */
> +END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
>   ori r11,r11,MSR_RI  /* turn on RI bit */
>   LOAD_HANDLER(r12, machine_check_handle_early)
>  1:   mtspr   SPRN_SRR0,r12
> @@ -325,7 +326,6 @@ BEGIN_FTR_SECTION
>   andcr11,r11,r10 /* Turn off MSR_ME */
>   b   1b
>   b   .   /* prevent speculative execution */
> -END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
>  
>  TRAMP_REAL_BEGIN(machine_check_pSeries)
>   .globl machine_check_fwnmi
> @@ -333,7 +333,7 @@ machine_check_fwnmi:
>   SET_SCRATCH0(r13)   /* save r13 */
>   EXCEPTION_PROLOG_0(PACA_EXMC)
>  BEGIN_FTR_SECTION
> - b   machine_check_pSeries_early
> + b   machine_check_common_early
>  END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE)
>  machine_check_pSeries_0:
>   EXCEPTION_PROLOG_1(PACA_EXMC, KVMTEST_PR, 0x200)
> @@ -346,103 +346,6 @@ machine_check_pSeries_0:
>  
>  TRAMP_KVM_SKIP(PACA_EXMC, 0x200)
>  
> -TRAMP_REAL_BEGIN(machine_check_pSeries_early)
> -BEGIN_FTR_SECTION
> - EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
> - mr  r10,r1  /* Save r1 */
> - lhz r11,PACA_IN_MCE(r13)
> - cmpwi   r11,0   /* Are we in nested machine check */
> - bne 0f  /* Yes, we are. */
> - /* First machine check entry */
> - ld  r1,PACAMCEMERGSP(r13)   /* Use MC emergency stack */
> -0:   subir1,r1,INT_FRAME_SIZE/* alloc stack frame */
> - addir11,r11,1   /* increment paca->in_mce */
> - sth r11,PACA_IN_MCE(r13)
> - /* Limit nested MCE to level 4 to avoid stack overflow */
> - cmpwi   r11,MAX_MCE_DEPTH
> - bgt 1f  /* Check if we hit limit of 4 */
> - mfspr   r11,SPRN_SRR0   /* Save SRR0 */
> - mfspr   r12,SPRN_SRR1   /* Save SRR1 */
> - EXCEPTION_PROLOG_COMMON_1()
> - EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
> - EXCEPTION_PROLOG_COMMON_3(0x200)
> - addir3,r1,STACK_FRAME_OVERHEAD
> - BRANCH_LINK_TO_FAR(machine_check_early) /* Function call ABI */
> - ld  r12,_MSR(r1)
> - andi.   r11,r12,MSR_PR  /* See if coming from user. */
> - bne 2f  /* continue in V mode if we are. */
> -
> - /*
> -  * At this point we are not sure about what context we come from.
> -  * We may be in the middle of swithing stack. r1 may not be valid.
> -  * Hence stay on emergency stack, call machine_check_exception and
> -  * return from the interrupt.
> -  * But before that, check if this is an un-recoverable exception.
> -  * If yes, then stay on emergency stack and panic.
> -  */
> - andi.   r11,r12,MSR_RI
> - beq 1f
> -
> - /*
> -  * Check if we have successfully handled/recovered from error, if not
> -  * then stay on emergency stack and panic.
> -  */
> - cmpdi   r3,0/* see if we handled MCE successfully */
> - beq 1f  /* if !handled then panic */
> -
> - /* Stay on emergency stack and return from interrupt. */
> - LOAD_HANDLER(r10,mce_return)
> - mtspr   SPRN_SRR0,r10
> - ld  r10,PACAKMSR(r13)
> - mtspr   SPRN_SRR1,r10
> - RFI_TO_KERNEL
> - b   .
> -
> -1:   

Re: [RFC 08/15] x86: PCI: clean up pcibios_scan_root()

2018-08-20 Thread Rafael J. Wysocki
On Mon, Aug 20, 2018 at 1:17 PM Arnd Bergmann  wrote:
>
> On Mon, Aug 20, 2018 at 10:31 AM Rafael J. Wysocki  wrote:
> > On Fri, Aug 17, 2018 at 12:32 PM Arnd Bergmann  wrote:
>
> > > -static struct pci_bus *pci_scan_root_bus(struct device *parent, int bus,
> > > -   struct pci_ops *ops, void *sysdata, struct list_head 
> > > *resources)
> > > +void pcibios_scan_root(int busnum)
> > >  {
> > > +   struct pci_sysdata *sd;
> > > struct pci_host_bridge *bridge;
> > > int error;
> > >
> > > -   bridge = pci_alloc_host_bridge(0);
> > > -   if (!bridge)
> > > -   return NULL;
> > > +   bridge = pci_alloc_host_bridge(sizeof(sd));
> > > +   if (!bridge) {
> > > +   printk(KERN_ERR "PCI: OOM, skipping PCI bus %02x\n", 
> > > busnum);
> > > +   return;
> > > +   }
> > > +   sd = pci_host_bridge_priv(bridge);
> >
> > This looks fishy, as bridge->private is not set at this point AFAICS,
> > unless one of the previous patches changes that.
>
> bridge->private what comes after the bridge structure, and it's allocated
> by pci_alloc_host_bridge() passing the size of the structure we want
> for this private area.

I see, sorry for the noise.


Re: [RFC 07/15] PCI/ACPI: clean up acpi_pci_root_create()

2018-08-20 Thread Rafael J. Wysocki
On Mon, Aug 20, 2018 at 1:20 PM Arnd Bergmann  wrote:
>
> On Mon, Aug 20, 2018 at 10:23 AM Rafael J. Wysocki  wrote:
> > On Fri, Aug 17, 2018 at 12:33 PM Arnd Bergmann  wrote:
> > > @@ -909,8 +881,7 @@ struct pci_bus *acpi_pci_root_create(struct 
> > > acpi_pci_root *root,
> > > int ret, busnum = root->secondary.start;
> > > struct acpi_device *device = root->device;
> > > int node = acpi_get_node(device->handle);
> > > -   struct pci_bus *bus;
> > > -   struct pci_host_bridge *host_bridge;
> > > +   struct pci_host_bridge *bridge;
> >
> > Why "bridge" and not "host" or even something to stand for "root complex"?
> >
> > Or maybe it can still be "host_bridge"?
>
> I did this for consistency with the naming in drivers/pci/probe.c,
> which always declares the local variable as 'struct pci_host_bridge *bridge'.
> It's easy to change here if you feel strongly about it (I don't).

I would leave host_bridge here.  It would make the patch smaller too I think.


Re: [PATCH v8 4/5] powerpc/pseries: Dump the SLB contents on SLB MCE errors.

2018-08-20 Thread Nicholas Piggin
On Sun, 19 Aug 2018 22:38:32 +0530
Mahesh J Salgaonkar  wrote:

> From: Mahesh Salgaonkar 
> 
> If we get a machine check exceptions due to SLB errors then dump the
> current SLB contents which will be very much helpful in debugging the
> root cause of SLB errors. Introduce an exclusive buffer per cpu to hold
> faulty SLB entries. In real mode mce handler saves the old SLB contents
> into this buffer accessible through paca and print it out later in virtual
> mode.
> 
> With this patch the console will log SLB contents like below on SLB MCE
> errors:
> 
> [  507.297236] SLB contents of cpu 0x1
> [  507.297237] Last SLB entry inserted at slot 16
> [  507.297238] 00 c800 400ea1b217000500
> [  507.297239]   1T  ESID=   c0  VSID=  ea1b217 LLP:100
> [  507.297240] 01 d800 400d43642f000510
> [  507.297242]   1T  ESID=   d0  VSID=  d43642f LLP:110
> [  507.297243] 11 f800 400a86c85f000500
> [  507.297244]   1T  ESID=   f0  VSID=  a86c85f LLP:100
> [  507.297245] 12 7f000800 4008119624000d90
> [  507.297246]   1T  ESID=   7f  VSID=  8119624 LLP:110
> [  507.297247] 13 1800 00092885f5150d90
> [  507.297247]  256M ESID=1  VSID=   92885f5150 LLP:110
> [  507.297248] 14 01000800 4009e7cb5d90
> [  507.297249]   1T  ESID=1  VSID=  9e7cb50 LLP:110
> [  507.297250] 15 d800 400d43642f000510
> [  507.297251]   1T  ESID=   d0  VSID=  d43642f LLP:110
> [  507.297252] 16 d800 400d43642f000510
> [  507.297253]   1T  ESID=   d0  VSID=  d43642f LLP:110
> [  507.297253] --
> [  507.297254] SLB cache ptr value = 3
> [  507.297254] Valid SLB cache entries:
> [  507.297255] 00 EA[0-35]=7f000
> [  507.297256] 01 EA[0-35]=1
> [  507.297257] 02 EA[0-35]= 1000
> [  507.297257] Rest of SLB cache entries:
> [  507.297258] 03 EA[0-35]=7f000
> [  507.297258] 04 EA[0-35]=1
> [  507.297259] 05 EA[0-35]= 1000
> [  507.297260] 06 EA[0-35]=   12
> [  507.297260] 07 EA[0-35]=7f000
> 
> Suggested-by: Aneesh Kumar K.V 
> Suggested-by: Michael Ellerman 
> Signed-off-by: Mahesh Salgaonkar 
> ---
> 
> Changes in V8:
> - Limit the slb saving to single level of mce recursion.

Thanks, that looks good now.

Reviewed-by: Nicholas Piggin 


Re: [RFC 07/15] PCI/ACPI: clean up acpi_pci_root_create()

2018-08-20 Thread Arnd Bergmann
On Mon, Aug 20, 2018 at 10:23 AM Rafael J. Wysocki  wrote:
> On Fri, Aug 17, 2018 at 12:33 PM Arnd Bergmann  wrote:
> > @@ -909,8 +881,7 @@ struct pci_bus *acpi_pci_root_create(struct 
> > acpi_pci_root *root,
> > int ret, busnum = root->secondary.start;
> > struct acpi_device *device = root->device;
> > int node = acpi_get_node(device->handle);
> > -   struct pci_bus *bus;
> > -   struct pci_host_bridge *host_bridge;
> > +   struct pci_host_bridge *bridge;
>
> Why "bridge" and not "host" or even something to stand for "root complex"?
>
> Or maybe it can still be "host_bridge"?

I did this for consistency with the naming in drivers/pci/probe.c,
which always declares the local variable as 'struct pci_host_bridge *bridge'.
It's easy to change here if you feel strongly about it (I don't).

Arnd


Re: [RFC 08/15] x86: PCI: clean up pcibios_scan_root()

2018-08-20 Thread Arnd Bergmann
On Mon, Aug 20, 2018 at 10:31 AM Rafael J. Wysocki  wrote:
> On Fri, Aug 17, 2018 at 12:32 PM Arnd Bergmann  wrote:

> > -static struct pci_bus *pci_scan_root_bus(struct device *parent, int bus,
> > -   struct pci_ops *ops, void *sysdata, struct list_head 
> > *resources)
> > +void pcibios_scan_root(int busnum)
> >  {
> > +   struct pci_sysdata *sd;
> > struct pci_host_bridge *bridge;
> > int error;
> >
> > -   bridge = pci_alloc_host_bridge(0);
> > -   if (!bridge)
> > -   return NULL;
> > +   bridge = pci_alloc_host_bridge(sizeof(sd));
> > +   if (!bridge) {
> > +   printk(KERN_ERR "PCI: OOM, skipping PCI bus %02x\n", 
> > busnum);
> > +   return;
> > +   }
> > +   sd = pci_host_bridge_priv(bridge);
>
> This looks fishy, as bridge->private is not set at this point AFAICS,
> unless one of the previous patches changes that.

bridge->private what comes after the bridge structure, and it's allocated
by pci_alloc_host_bridge() passing the size of the structure we want
for this private area.

 Arnd


Re: [PATCH v8 2/5] powerpc/pseries: flush SLB contents on SLB MCE errors.

2018-08-20 Thread Nicholas Piggin
On Sun, 19 Aug 2018 22:38:17 +0530
Mahesh J Salgaonkar  wrote:

> From: Mahesh Salgaonkar 
> 
> On pseries, as of today system crashes if we get a machine check
> exceptions due to SLB errors. These are soft errors and can be fixed by
> flushing the SLBs so the kernel can continue to function instead of
> system crash. We do this in real mode before turning on MMU. Otherwise
> we would run into nested machine checks. This patch now fetches the
> rtas error log in real mode and flushes the SLBs on SLB errors.
> 
> Signed-off-by: Mahesh Salgaonkar 
> Signed-off-by: Michal Suchanek 
> ---
> 
> Changes in V8:
> - Use flush_and_reload_slb() from mce_power.c.
> ---
>  arch/powerpc/include/asm/machdep.h   |1 
>  arch/powerpc/include/asm/mce.h   |3 +
>  arch/powerpc/kernel/exceptions-64s.S |  129 
> ++
>  arch/powerpc/kernel/mce.c|   15 +++
>  arch/powerpc/kernel/mce_power.c  |2 
>  arch/powerpc/platforms/powernv/setup.c   |   11 +++
>  arch/powerpc/platforms/pseries/pseries.h |1 
>  arch/powerpc/platforms/pseries/ras.c |   54 -
>  arch/powerpc/platforms/pseries/setup.c   |1 
>  9 files changed, 212 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/machdep.h 
> b/arch/powerpc/include/asm/machdep.h
> index a47de82fb8e2..b4831f1338db 100644
> --- a/arch/powerpc/include/asm/machdep.h
> +++ b/arch/powerpc/include/asm/machdep.h
> @@ -108,6 +108,7 @@ struct machdep_calls {
>  
>   /* Early exception handlers called in realmode */
>   int (*hmi_exception_early)(struct pt_regs *regs);
> + long(*machine_check_early)(struct pt_regs *regs);
>  
>   /* Called during machine check exception to retrive fixup address. */
>   bool(*mce_check_early_recovery)(struct pt_regs *regs);
> diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
> index 3a1226e9b465..78a1da95a394 100644
> --- a/arch/powerpc/include/asm/mce.h
> +++ b/arch/powerpc/include/asm/mce.h
> @@ -210,4 +210,7 @@ extern void release_mce_event(void);
>  extern void machine_check_queue_event(void);
>  extern void machine_check_print_event_info(struct machine_check_event *evt,
>  bool user_mode);
> +#ifdef CONFIG_PPC_BOOK3S_64
> +extern void flush_and_reload_slb(void);
> +#endif /* CONFIG_PPC_BOOK3S_64 */
>  #endif /* __ASM_PPC64_MCE_H__ */
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index 285c6465324a..12f056179112 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -332,6 +332,9 @@ TRAMP_REAL_BEGIN(machine_check_pSeries)
>  machine_check_fwnmi:
>   SET_SCRATCH0(r13)   /* save r13 */
>   EXCEPTION_PROLOG_0(PACA_EXMC)
> +BEGIN_FTR_SECTION
> + b   machine_check_pSeries_early
> +END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE)
>  machine_check_pSeries_0:
>   EXCEPTION_PROLOG_1(PACA_EXMC, KVMTEST_PR, 0x200)
>   /*
> @@ -343,6 +346,103 @@ machine_check_pSeries_0:
>  
>  TRAMP_KVM_SKIP(PACA_EXMC, 0x200)
>  
> +TRAMP_REAL_BEGIN(machine_check_pSeries_early)
> +BEGIN_FTR_SECTION
> + EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
> + mr  r10,r1  /* Save r1 */
> + lhz r11,PACA_IN_MCE(r13)
> + cmpwi   r11,0   /* Are we in nested machine check */
> + bne 0f  /* Yes, we are. */
> + /* First machine check entry */
> + ld  r1,PACAMCEMERGSP(r13)   /* Use MC emergency stack */
> +0:   subir1,r1,INT_FRAME_SIZE/* alloc stack frame */
> + addir11,r11,1   /* increment paca->in_mce */
> + sth r11,PACA_IN_MCE(r13)
> + /* Limit nested MCE to level 4 to avoid stack overflow */
> + cmpwi   r11,MAX_MCE_DEPTH
> + bgt 1f  /* Check if we hit limit of 4 */
> + mfspr   r11,SPRN_SRR0   /* Save SRR0 */
> + mfspr   r12,SPRN_SRR1   /* Save SRR1 */
> + EXCEPTION_PROLOG_COMMON_1()
> + EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
> + EXCEPTION_PROLOG_COMMON_3(0x200)
> + addir3,r1,STACK_FRAME_OVERHEAD
> + BRANCH_LINK_TO_FAR(machine_check_early) /* Function call ABI */
> + ld  r12,_MSR(r1)
> + andi.   r11,r12,MSR_PR  /* See if coming from user. */
> + bne 2f  /* continue in V mode if we are. */
> +
> + /*
> +  * At this point we are not sure about what context we come from.
> +  * We may be in the middle of swithing stack. r1 may not be valid.
> +  * Hence stay on emergency stack, call machine_check_exception and
> +  * return from the interrupt.
> +  * But before that, check if this is an un-recoverable exception.
> +  * If yes, then stay on emergency stack and panic.
> +  */
> + andi.   r11,r12,MSR_RI
> + beq 1f
> +
> + /*
> +  * Check if we 

Re: [RFC PATCH 1/5] powerpc/64s/hash: convert SLB miss handlers to C

2018-08-20 Thread Nicholas Piggin
On Mon, 20 Aug 2018 19:41:56 +1000
Nicholas Piggin  wrote:


> +long do_slb_fault(struct pt_regs *regs, unsigned long ea)
> +{
> + unsigned long id = REGION_ID(ea);
> +
> + /* IRQs are not reconciled here, so can't check irqs_disabled */
> + VM_WARN_ON(mfmsr() & MSR_EE);
> +
> + /*
> +  * SLB kernel faults must be very careful not to touch anything
> +  * that is not bolted. E.g., PACA and global variables are okay,
> +  * mm->context stuff is not.
> +  *
> +  * SLB user faults can access all of kernel memory, but must be
> +  * careful not to touch things like IRQ state because it is not
> +  * "reconciled" here. The difficulty is that we must use
> +  * fast_exception_return to return from kernel SLB faults without
> +  * looking at possible non-bolted memory. We could test user vs
> +  * kernel faults in the interrupt handler asm and do a full fault,
> +  * reconcile, ret_from_except for user faults which would make them
> +  * first class kernel code. But for performance it's probably nicer
> +  * if they go via fast_exception_return too.
> +  */
> + if (id >= KERNEL_REGION_ID) {
> + return slb_allocate_kernel(ea, id);
> + } else {
> + struct mm_struct *mm = current->mm;
> +
> + if (unlikely(!mm))
> + return -EFAULT;
>  
> - handle_multi_context_slb_miss(context, ea);
> - exception_exit(prev_state);
> - return;
> + return slb_allocate_user(mm, ea);
> + }
> +}
>  
> -slb_bad_addr:
> +void do_bad_slb_fault(struct pt_regs *regs, unsigned long ea, long err)
> +{
>   if (user_mode(regs))
>   _exception(SIGSEGV, regs, SEGV_BNDERR, ea);
>   else
>   bad_page_fault(regs, ea, SIGSEGV);
> - exception_exit(prev_state);
>  }

I knew I forgot something -- forgot to test MSR[RI] here. That can be
done just by returning a different error from do_slb_fault if RI is
clear, and do_bad_slb_fault will call unrecoverable_exception() if it
sees that code.

Thanks,
Nick


[RFC PATCH 5/5] powerpc/64s/hash: Add a SLB preload cache

2018-08-20 Thread Nicholas Piggin
When switching processes, currently all user SLBEs are cleared, and
a few (exec_base, pc, and stack) are preloaded. In trivial testing
with small apps, this tends to miss the heap and low 256MB segments,
and it will also miss commonly accessed segments on large memory
workloads.

Add a simple round-robin preload cache that just inserts the last
SLB miss into the head of the cache and preloads those at context
switch time.

Much more could go into this, including into the SLB entry reclaim
side to track some LRU information etc, which would require a study
of large memory workloads. But this is a simple thing we can do now
that is an obvious win for common workloads.

This plus the previous patch reduces SLB misses of a bare bones boot
to busybox from 945 to 180 when using 256MB segments, and 900 to 100 when
using 1T segments. These could almost all be eliminated by preloading
a bit more carefully with ELF binary loading.
---
 arch/powerpc/include/asm/thread_info.h |   4 +
 arch/powerpc/kernel/process.c  |   6 ++
 arch/powerpc/mm/mmu_context_book3s64.c |  10 ++-
 arch/powerpc/mm/slb.c  | 107 -
 4 files changed, 102 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index 3c0002044bc9..ee5e49ec12c7 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -29,6 +29,7 @@
 #include 
 #include 
 
+#define SLB_PRELOAD_NR 8U
 /*
  * low level task data.
  */
@@ -44,6 +45,9 @@ struct thread_info {
 #if defined(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) && defined(CONFIG_PPC32)
struct cpu_accounting_data accounting;
 #endif
+   unsigned int slb_preload_nr;
+   unsigned long slb_preload_ea[SLB_PRELOAD_NR];
+
/* low level flags - has atomic operations done on it */
unsigned long   flags cacheline_aligned_in_smp;
 };
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 913c5725cdb2..678a2c668270 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1710,6 +1710,8 @@ int copy_thread(unsigned long clone_flags, unsigned long 
usp,
return 0;
 }
 
+void preload_new_slb_context(unsigned long start, unsigned long sp);
+
 /*
  * Set up a thread for executing a new program
  */
@@ -1717,6 +1719,10 @@ void start_thread(struct pt_regs *regs, unsigned long 
start, unsigned long sp)
 {
 #ifdef CONFIG_PPC64
unsigned long load_addr = regs->gpr[2]; /* saved by ELF_PLAT_INIT */
+
+#ifdef CONFIG_PPC_BOOK3S_64
+   preload_new_slb_context(start, sp);
+#endif
 #endif
 
/*
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c 
b/arch/powerpc/mm/mmu_context_book3s64.c
index 4a892d894a0f..3671a32141e2 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -85,7 +85,9 @@ int hash__alloc_context_id(void)
 }
 EXPORT_SYMBOL_GPL(hash__alloc_context_id);
 
-static int hash__init_new_context(struct mm_struct *mm)
+void init_new_slb_context(struct task_struct *tsk, struct mm_struct *mm);
+
+static int hash__init_new_context(struct task_struct *tsk, struct mm_struct 
*mm)
 {
int index;
 
@@ -107,8 +109,10 @@ static int hash__init_new_context(struct mm_struct *mm)
 * We should not be calling init_new_context() on init_mm. Hence a
 * check against 0 is OK.
 */
-   if (mm->context.id == 0)
+   if (mm->context.id == 0) {
slice_init_new_context_exec(mm);
+   init_new_slb_context(tsk, mm);
+   }
 
subpage_prot_init_new_context(mm);
 
@@ -152,7 +156,7 @@ int init_new_context(struct task_struct *tsk, struct 
mm_struct *mm)
if (radix_enabled())
index = radix__init_new_context(mm);
else
-   index = hash__init_new_context(mm);
+   index = hash__init_new_context(tsk, mm);
 
if (index < 0)
return index;
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 3de63598f7c4..e53846d4e474 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -216,14 +216,85 @@ static inline int esids_match(unsigned long addr1, 
unsigned long addr2)
return (GET_ESID_1T(addr1) == GET_ESID_1T(addr2));
 }
 
+static bool preload_hit(struct thread_info *ti, unsigned long ea)
+{
+   int i;
+
+   for (i = 0; i < min(SLB_PRELOAD_NR, ti->slb_preload_nr); i++)
+   if (esids_match(ti->slb_preload_ea[i], ea))
+   return true;
+   return false;
+}
+
+static bool preload_add(struct thread_info *ti, unsigned long ea)
+{
+   if (preload_hit(ti, ea))
+   return false;
+
+   ti->slb_preload_ea[ti->slb_preload_nr % SLB_PRELOAD_NR] = ea;
+   ti->slb_preload_nr++;
+
+   return true;
+}
+
+void preload_new_slb_context(unsigned long start, unsigned long sp)
+{
+   struct thread_info *ti = current_thread_info();
+   struct 

[RFC PATCH 4/5] powerpc/64s/hash: Add SLB allocation bitmaps

2018-08-20 Thread Nicholas Piggin
Add 32-entry bitmaps to track the allocation status of the first 32
SLB entries, and whether they are user or kernel entries. These are
used to prevent context switches rolling the SLB round robin allocator
and evicting important kernel SLBEs when there are obvious free
entries.
---
 arch/powerpc/include/asm/paca.h |  6 +++--
 arch/powerpc/mm/slb.c   | 42 +++--
 arch/powerpc/xmon/xmon.c|  2 +-
 3 files changed, 35 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 8c258a057207..bf7ab59be3b8 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -113,7 +113,10 @@ struct paca_struct {
 * on the linear mapping */
/* SLB related definitions */
u16 vmalloc_sllp;
-   u16 slb_cache_ptr;
+   u8 slb_cache_ptr;
+   u8 stab_rr; /* stab/slb round-robin counter */
+   u32 slb_used_bitmap;/* Bitmaps for first 32 SLB entries. */
+   u32 slb_kern_bitmap;
u32 slb_cache[SLB_CACHE_ENTRIES];
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
@@ -148,7 +151,6 @@ struct paca_struct {
 */
struct task_struct *__current;  /* Pointer to current */
u64 kstack; /* Saved Kernel stack addr */
-   u64 stab_rr;/* stab/slb round-robin counter */
u64 saved_r1;   /* r1 save for RTAS calls or PM or EE=0 
*/
u64 saved_msr;  /* MSR saved here by enter_rtas */
u16 trap_save;  /* Used when bad stack is encountered */
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 6e595d75d997..3de63598f7c4 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -267,6 +267,7 @@ void switch_slb(struct task_struct *tsk, struct mm_struct 
*mm)
 
get_paca()->slb_cache_ptr = 0;
}
+   get_paca()->slb_used_bitmap = get_paca()->slb_kern_bitmap;
 
/*
 * preload some userspace segments into the SLB.
@@ -339,6 +340,8 @@ void slb_initialize(void)
}
 
get_paca()->stab_rr = SLB_NUM_BOLTED - 1;
+   get_paca()->slb_kern_bitmap |= (1U << SLB_NUM_BOLTED) - 1;
+   get_paca()->slb_used_bitmap = get_paca()->slb_kern_bitmap;
 
lflags = SLB_VSID_KERNEL | linear_llp;
 
@@ -390,27 +393,42 @@ static void slb_cache_update(unsigned long esid_data)
}
 }
 
-static enum slb_index alloc_slb_index(void)
+static enum slb_index alloc_slb_index(bool kernel)
 {
enum slb_index index;
 
-   /* round-robin replacement of slb starting at SLB_NUM_BOLTED. */
-   index = get_paca()->stab_rr;
-   if (index < (mmu_slb_size - 1))
-   index++;
-   else
-   index = SLB_NUM_BOLTED;
-   get_paca()->stab_rr = index;
+   /*
+* SLBs beyond 32 entries are allocated with stab_rr only
+* POWER7/8/9 have 32 SLB entries, this could be expanded if a
+* future CPU has more.
+*/
+   if (get_paca()->slb_used_bitmap != U32_MAX) {
+   index = ffz(get_paca()->slb_used_bitmap);
+   get_paca()->slb_used_bitmap |= 1U << index;
+   if (kernel)
+   get_paca()->slb_kern_bitmap |= 1U << index;
+   } else {
+   /* round-robin replacement of slb starting at SLB_NUM_BOLTED. */
+   index = get_paca()->stab_rr;
+   if (index < (mmu_slb_size - 1))
+   index++;
+   else
+   index = SLB_NUM_BOLTED;
+   get_paca()->stab_rr = index;
+   if (kernel && index < 32)
+   get_paca()->slb_kern_bitmap |= 1U << index;
+   }
+   BUG_ON(index < SLB_NUM_BOLTED);
 
return index;
 }
 
 static void slb_insert_entry(unsigned long ea, unsigned long context,
-   unsigned long flags, int ssize)
+   unsigned long flags, int ssize, bool kernel)
 {
unsigned long vsid;
unsigned long vsid_data, esid_data;
-   enum slb_index index = alloc_slb_index();
+   enum slb_index index = alloc_slb_index(kernel);
 
vsid = get_vsid(context, ea, ssize);
vsid_data = (vsid << slb_vsid_shift(ssize)) | flags |
@@ -454,7 +472,7 @@ static long slb_allocate_kernel(unsigned long ea, unsigned 
long id)
 
context = id - KERNEL_REGION_CONTEXT_OFFSET;
 
-   slb_insert_entry(ea, context, flags, ssize);
+   slb_insert_entry(ea, context, flags, ssize, true);
 
return 0;
 }
@@ -487,7 +505,7 @@ static long slb_allocate_user(struct mm_struct *mm, 
unsigned long ea)
bpsize = get_slice_psize(mm, ea);
flags = SLB_VSID_USER | mmu_psize_defs[bpsize].sllp;
 
-   slb_insert_entry(ea, context, flags, ssize);
+   slb_insert_entry(ea, context, flags, ssize, false);
 
return 0;
 }
diff 

[RFC PATCH 3/5] powerpc/64s/hash: remove the first vmalloc segment from the bolted SLB

2018-08-20 Thread Nicholas Piggin
Remove the first vmalloc segment from bolted SLBEs. This is not
required to be bolted, and seems like it was added to help pre-load
the SLB on context switch. However there are now other segments like
the vmemmap segment that often take misses after a context switch, so
it is better to solve this a different way and save a bolted entry.
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  2 +-
 arch/powerpc/mm/slb.c | 16 
 2 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 39764214aef5..4c8d413ce99a 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -30,7 +30,7 @@
  * SLB
  */
 
-#define SLB_NUM_BOLTED 3
+#define SLB_NUM_BOLTED 2
 #define SLB_CACHE_ENTRIES  8
 #define SLB_MIN_SIZE   32
 
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 221d94b4f9cf..6e595d75d997 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -133,13 +133,11 @@ static void __slb_flush_and_rebolt(void)
 {
/* If you change this make sure you change SLB_NUM_BOLTED
 * and PR KVM appropriately too. */
-   unsigned long linear_llp, vmalloc_llp, lflags, vflags;
+   unsigned long linear_llp, lflags;
unsigned long ksp_esid_data, ksp_vsid_data;
 
linear_llp = mmu_psize_defs[mmu_linear_psize].sllp;
-   vmalloc_llp = mmu_psize_defs[mmu_vmalloc_psize].sllp;
lflags = SLB_VSID_KERNEL | linear_llp;
-   vflags = SLB_VSID_KERNEL | vmalloc_llp;
 
ksp_esid_data = mk_esid_data(get_paca()->kstack, mmu_kernel_ssize, 
KSTACK_INDEX);
if ((ksp_esid_data & ~0xfffUL) <= PAGE_OFFSET) {
@@ -157,14 +155,10 @@ static void __slb_flush_and_rebolt(void)
 * the stack between the slbia and rebolting it. */
asm volatile("isync\n"
 "slbia\n"
-/* Slot 1 - first VMALLOC segment */
+/* Slot 1 - kernel stack */
 "slbmte%0,%1\n"
-/* Slot 2 - kernel stack */
-"slbmte%2,%3\n"
 "isync"
-:: "r"(mk_vsid_data(VMALLOC_START, mmu_kernel_ssize, 
vflags)),
-   "r"(mk_esid_data(VMALLOC_START, mmu_kernel_ssize, 
VMALLOC_INDEX)),
-   "r"(ksp_vsid_data),
+:: "r"(ksp_vsid_data),
"r"(ksp_esid_data)
 : "memory");
 }
@@ -321,7 +315,7 @@ void core_flush_all_slbs(struct mm_struct *mm)
 void slb_initialize(void)
 {
unsigned long linear_llp, vmalloc_llp, io_llp;
-   unsigned long lflags, vflags;
+   unsigned long lflags;
static int slb_encoding_inited;
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
unsigned long vmemmap_llp;
@@ -347,14 +341,12 @@ void slb_initialize(void)
get_paca()->stab_rr = SLB_NUM_BOLTED - 1;
 
lflags = SLB_VSID_KERNEL | linear_llp;
-   vflags = SLB_VSID_KERNEL | vmalloc_llp;
 
/* Invalidate the entire SLB (even entry 0) & all the ERATS */
asm volatile("isync":::"memory");
asm volatile("slbmte  %0,%0"::"r" (0) : "memory");
asm volatile("isync; slbia; isync":::"memory");
create_shadowed_slbe(PAGE_OFFSET, mmu_kernel_ssize, lflags, 
LINEAR_INDEX);
-   create_shadowed_slbe(VMALLOC_START, mmu_kernel_ssize, vflags, 
VMALLOC_INDEX);
 
/* For the boot cpu, we're running on the stack in init_thread_union,
 * which is in the first segment of the linear mapping, and also
-- 
2.17.0



[RFC PATCH 2/5] powerpc/64s/hash: remove user SLB data from the paca

2018-08-20 Thread Nicholas Piggin
User SLB mappig data is copied into the PACA from the mm->context
so it can be accessed by the SLB miss handlers.

After the previous patch, SLB miss handlers now run with relocation
on, and user SLB misses are able to take recursive kernel SLB misses,
so the user SLB mapping data can be removed from the paca and
accessed directly.
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  1 +
 arch/powerpc/include/asm/paca.h   | 13 --
 arch/powerpc/kernel/asm-offsets.c |  9 
 arch/powerpc/kernel/paca.c| 21 -
 arch/powerpc/mm/hash_utils_64.c   | 46 +--
 arch/powerpc/mm/mmu_context.c |  3 +-
 arch/powerpc/mm/slb.c | 20 +++-
 arch/powerpc/mm/slice.c   | 29 
 8 files changed, 40 insertions(+), 102 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index b3520b549cba..39764214aef5 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -496,6 +496,7 @@ static inline void hpte_init_pseries(void) { }
 extern void hpte_init_native(void);
 
 extern void slb_initialize(void);
+extern void core_flush_all_slbs(struct mm_struct *mm);
 extern void slb_flush_and_rebolt(void);
 void slb_flush_all_realmode(void);
 void __slb_restore_bolted_realmode(void);
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 4331295db0f7..8c258a057207 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -143,18 +143,6 @@ struct paca_struct {
struct tlb_core_data tcd;
 #endif /* CONFIG_PPC_BOOK3E */
 
-#ifdef CONFIG_PPC_BOOK3S
-   mm_context_id_t mm_ctx_id;
-#ifdef CONFIG_PPC_MM_SLICES
-   unsigned char mm_ctx_low_slices_psize[BITS_PER_LONG / BITS_PER_BYTE];
-   unsigned char mm_ctx_high_slices_psize[SLICE_ARRAY_SIZE];
-   unsigned long mm_ctx_slb_addr_limit;
-#else
-   u16 mm_ctx_user_psize;
-   u16 mm_ctx_sllp;
-#endif
-#endif
-
/*
 * then miscellaneous read-write fields
 */
@@ -256,7 +244,6 @@ struct paca_struct {
 #endif /* CONFIG_PPC_PSERIES */
 } cacheline_aligned;
 
-extern void copy_mm_to_paca(struct mm_struct *mm);
 extern struct paca_struct **paca_ptrs;
 extern void initialise_paca(struct paca_struct *new_paca, int cpu);
 extern void setup_paca(struct paca_struct *new_paca);
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 7834256585f1..43b67ead5b97 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -181,15 +181,6 @@ int main(void)
OFFSET(PACAIRQSOFTMASK, paca_struct, irq_soft_mask);
OFFSET(PACAIRQHAPPENED, paca_struct, irq_happened);
OFFSET(PACA_FTRACE_ENABLED, paca_struct, ftrace_enabled);
-#ifdef CONFIG_PPC_BOOK3S
-   OFFSET(PACACONTEXTID, paca_struct, mm_ctx_id);
-#ifdef CONFIG_PPC_MM_SLICES
-   OFFSET(PACALOWSLICESPSIZE, paca_struct, mm_ctx_low_slices_psize);
-   OFFSET(PACAHIGHSLICEPSIZE, paca_struct, mm_ctx_high_slices_psize);
-   OFFSET(PACA_SLB_ADDR_LIMIT, paca_struct, mm_ctx_slb_addr_limit);
-   DEFINE(MMUPSIZEDEFSIZE, sizeof(struct mmu_psize_def));
-#endif /* CONFIG_PPC_MM_SLICES */
-#endif
 
 #ifdef CONFIG_PPC_BOOK3E
OFFSET(PACAPGD, paca_struct, pgd);
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 0ee3e6d50f28..6752e17f0281 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -259,24 +259,3 @@ void __init free_unused_pacas(void)
paca_ptrs_size + paca_struct_size, nr_cpu_ids);
 }
 
-void copy_mm_to_paca(struct mm_struct *mm)
-{
-#ifdef CONFIG_PPC_BOOK3S
-   mm_context_t *context = >context;
-
-   get_paca()->mm_ctx_id = context->id;
-#ifdef CONFIG_PPC_MM_SLICES
-   VM_BUG_ON(!mm->context.slb_addr_limit);
-   get_paca()->mm_ctx_slb_addr_limit = mm->context.slb_addr_limit;
-   memcpy(_paca()->mm_ctx_low_slices_psize,
-  >low_slices_psize, sizeof(context->low_slices_psize));
-   memcpy(_paca()->mm_ctx_high_slices_psize,
-  >high_slices_psize, TASK_SLICE_ARRAY_SZ(mm));
-#else /* CONFIG_PPC_MM_SLICES */
-   get_paca()->mm_ctx_user_psize = context->user_psize;
-   get_paca()->mm_ctx_sllp = context->sllp;
-#endif
-#else /* !CONFIG_PPC_BOOK3S */
-   return;
-#endif
-}
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index f23a89d8e4ce..88c95dc8b141 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1088,16 +1088,16 @@ unsigned int hash_page_do_lazy_icache(unsigned int pp, 
pte_t pte, int trap)
 }
 
 #ifdef CONFIG_PPC_MM_SLICES
-static unsigned int get_paca_psize(unsigned long addr)
+static unsigned int get_psize(struct mm_struct *mm, unsigned long addr)
 {
unsigned char *psizes;

[RFC PATCH 1/5] powerpc/64s/hash: convert SLB miss handlers to C

2018-08-20 Thread Nicholas Piggin
This patch moves SLB miss handlers completely to C, using the standard
exception handler macros to set up the stack and branch to C.

This can be done because the segment containing the kernel stack is
always bolted, so accessing it with relocation on will not cause an
SLB exception.

Arbitrary kernel memory may not be accessed when handling kernel space
SLB misses, so care should be taken there. However user SLB misses can
access any kernel memory, which can be used to move some fields out of
the paca (in later patches).

User SLB misses could quite easily reconcile IRQs and set up a first
class kernel environment and exit via ret_from_except, however that
doesn't seem to be necessary at the moment, so we only do that if a
bad fault is encountered.

[ Credit to Aneesh for bug fixes and improvements to bad address
  handling ]
---
 arch/powerpc/include/asm/asm-prototypes.h |   2 +
 arch/powerpc/kernel/exceptions-64s.S  | 200 +++--
 arch/powerpc/mm/Makefile  |   2 +-
 arch/powerpc/mm/slb.c | 237 +++
 arch/powerpc/mm/slb_low.S | 338 --
 5 files changed, 166 insertions(+), 613 deletions(-)
 delete mode 100644 arch/powerpc/mm/slb_low.S

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 1f4691ce4126..c330ed10074a 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -78,6 +78,8 @@ void kernel_bad_stack(struct pt_regs *regs);
 void system_reset_exception(struct pt_regs *regs);
 void machine_check_exception(struct pt_regs *regs);
 void emulation_assist_interrupt(struct pt_regs *regs);
+long do_slb_fault(struct pt_regs *regs, unsigned long ea);
+void do_bad_slb_fault(struct pt_regs *regs, unsigned long ea, unsigned long 
err);
 
 /* signals, syscalls and interrupts */
 long sys_swapcontext(struct ucontext __user *old_ctx,
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 9dad73722d1a..f22ddb301661 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -567,28 +567,35 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 
 
 EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80)
-   SET_SCRATCH0(r13)
-   EXCEPTION_PROLOG_0(PACA_EXSLB)
-   EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST_PR, 0x380)
-   mr  r12,r3  /* save r3 */
-   mfspr   r3,SPRN_DAR
-   mfspr   r11,SPRN_SRR1
-   crset   4*cr6+eq
-   BRANCH_TO_COMMON(r10, slb_miss_common)
+EXCEPTION_PROLOG(PACA_EXSLB, data_access_slb_common, EXC_STD, KVMTEST_PR, 
0x380);
 EXC_REAL_END(data_access_slb, 0x380, 0x80)
 
 EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80)
-   SET_SCRATCH0(r13)
-   EXCEPTION_PROLOG_0(PACA_EXSLB)
-   EXCEPTION_PROLOG_1(PACA_EXSLB, NOTEST, 0x380)
-   mr  r12,r3  /* save r3 */
-   mfspr   r3,SPRN_DAR
-   mfspr   r11,SPRN_SRR1
-   crset   4*cr6+eq
-   BRANCH_TO_COMMON(r10, slb_miss_common)
+EXCEPTION_RELON_PROLOG(PACA_EXSLB, data_access_slb_common, EXC_STD, NOTEST, 
0x380);
 EXC_VIRT_END(data_access_slb, 0x4380, 0x80)
+
 TRAMP_KVM_SKIP(PACA_EXSLB, 0x380)
 
+EXC_COMMON_BEGIN(data_access_slb_common)
+   mfspr   r10,SPRN_DAR
+   std r10,PACA_EXSLB+EX_DAR(r13)
+   EXCEPTION_PROLOG_COMMON(0x380, PACA_EXSLB)
+   ld  r4,PACA_EXSLB+EX_DAR(r13)
+   std r4,_DAR(r1)
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  do_slb_fault
+   cmpdi   r3,0
+   bne-1f
+   b   fast_exception_return
+1: /* Error case */
+   bl  save_nvgprs
+   RECONCILE_IRQ_STATE(r10, r11)
+   ld  r4,_DAR(r1)
+   mr  r5,r3
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  do_bad_slb_fault
+   b   ret_from_except
+
 
 EXC_REAL(instruction_access, 0x400, 0x80)
 EXC_VIRT(instruction_access, 0x4400, 0x80, 0x400)
@@ -611,160 +618,33 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 
 
 EXC_REAL_BEGIN(instruction_access_slb, 0x480, 0x80)
-   SET_SCRATCH0(r13)
-   EXCEPTION_PROLOG_0(PACA_EXSLB)
-   EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST_PR, 0x480)
-   mr  r12,r3  /* save r3 */
-   mfspr   r3,SPRN_SRR0/* SRR0 is faulting address */
-   mfspr   r11,SPRN_SRR1
-   crclr   4*cr6+eq
-   BRANCH_TO_COMMON(r10, slb_miss_common)
+EXCEPTION_PROLOG(PACA_EXSLB, instruction_access_slb_common, EXC_STD, 
KVMTEST_PR, 0x480);
 EXC_REAL_END(instruction_access_slb, 0x480, 0x80)
 
 EXC_VIRT_BEGIN(instruction_access_slb, 0x4480, 0x80)
-   SET_SCRATCH0(r13)
-   EXCEPTION_PROLOG_0(PACA_EXSLB)
-   EXCEPTION_PROLOG_1(PACA_EXSLB, NOTEST, 0x480)
-   mr  r12,r3  /* save r3 */
-   mfspr   r3,SPRN_SRR0/* SRR0 is faulting address */
-   mfspr   r11,SPRN_SRR1
-   crclr   4*cr6+eq
-   BRANCH_TO_COMMON(r10, slb_miss_common)
+EXCEPTION_RELON_PROLOG(PACA_EXSLB, 

[RFC PATCH 0/5] rewriting SLB miss handler in C

2018-08-20 Thread Nicholas Piggin
I'd like to rewrite the SLB miss handlers in C for maintainability
and ability to more easily extend the code.

I have not benchmarked it yet but obviously setting up the stack
and going to C code rather than carefully hand optimised assembly
is likely to slow down SLB misses by a reasonable amount. So I've
started looked at a few basic optimisations we can make to justify
this change. There is still more that can be done, but SLB misses
have been reduced significantly, and with more tuning and optimization
I think we could bring it down quite a bit more.

I'm trying to get the first patch solid, and it is the big change so
would really appreciate review and commets on that. Other patches are
not quite polished but comments would still be welcome on those (keep
in mind they are obviously not all polished).

Thanks,
Nick

Nicholas Piggin (5):
  powerpc/64s/hash: convert SLB miss handlers to C
  powerpc/64s/hash: remove user SLB data from the paca
  powerpc/64s/hash: remove the first vmalloc segment from the bolted SLB
  powerpc/64s/hash: Add SLB allocation bitmaps
  powerpc/64s/hash: Add a SLB preload cache

 arch/powerpc/include/asm/asm-prototypes.h |   2 +
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |   3 +-
 arch/powerpc/include/asm/paca.h   |  19 +-
 arch/powerpc/include/asm/thread_info.h|   4 +
 arch/powerpc/kernel/asm-offsets.c |   9 -
 arch/powerpc/kernel/exceptions-64s.S  | 200 ++---
 arch/powerpc/kernel/paca.c|  21 -
 arch/powerpc/kernel/process.c |   6 +
 arch/powerpc/mm/Makefile  |   2 +-
 arch/powerpc/mm/hash_utils_64.c   |  46 +--
 arch/powerpc/mm/mmu_context.c |   3 +-
 arch/powerpc/mm/mmu_context_book3s64.c|  10 +-
 arch/powerpc/mm/slb.c | 382 +++---
 arch/powerpc/mm/slb_low.S | 338 
 arch/powerpc/mm/slice.c   |  29 +-
 arch/powerpc/xmon/xmon.c  |   2 +-
 16 files changed, 328 insertions(+), 748 deletions(-)
 delete mode 100644 arch/powerpc/mm/slb_low.S

-- 
2.17.0



Re: [RESEND PATCH] i2c/busses/pasemi: Remove hardcoded bus numbers on smbus

2018-08-20 Thread Wolfram Sang
On Sun, Dec 31, 2017 at 08:53:55PM +, Darren Stevens wrote:
> The pasemi smbus controller uses PCI_FUNC(dev->devfn) to define which
> number bus to attach to, however this fails when something else is 
> probed first, for example an ATI Radeon graphics card will claim 9 or
> 10 busses, including the ones the pasemi wants.
> Patch the driver to call i2c_add_adapter rather than
> i2c_add_numbered_adapter.
> 
> Signed-off-by: Darren Stevens 
> 

Applied to for-next, thanks!

Disclaimer: I usually do not like to change the bus numbering because
some people may rely on that. But numbering based on PCI functions seems
really weak and all known users of pasemi seem to have issues here, so I
make an exception.

Thanks to Michael Ellerman for the additional info.



signature.asc
Description: PGP signature


Re: [RFC 08/15] x86: PCI: clean up pcibios_scan_root()

2018-08-20 Thread Rafael J. Wysocki
On Fri, Aug 17, 2018 at 12:32 PM Arnd Bergmann  wrote:
>
> pcibios_scan_root() is now just a wrapper around pci_scan_root_bus(),
> and merging the two into one makes it shorter and more readable.
>
> We can also take advantage of pci_alloc_host_bridge() doing the
> allocation of the sysdata for us, which helps if we ever want to
> allow hot-unplugging the host bridge itself.
>
> We might be able to simplify it further using pci_host_probe(),
> but I wasn't sure about the resource registration there.
>
> Signed-off-by: Arnd Bergmann 
> ---
>  arch/x86/pci/common.c | 53 ++-
>  1 file changed, 17 insertions(+), 36 deletions(-)
>
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> index e740d9aa4024..920d0885434c 100644
> --- a/arch/x86/pci/common.c
> +++ b/arch/x86/pci/common.c
> @@ -453,54 +453,35 @@ void __init dmi_check_pciprobe(void)
> dmi_check_system(pciprobe_dmi_table);
>  }
>
> -static struct pci_bus *pci_scan_root_bus(struct device *parent, int bus,
> -   struct pci_ops *ops, void *sysdata, struct list_head 
> *resources)
> +void pcibios_scan_root(int busnum)
>  {
> +   struct pci_sysdata *sd;
> struct pci_host_bridge *bridge;
> int error;
>
> -   bridge = pci_alloc_host_bridge(0);
> -   if (!bridge)
> -   return NULL;
> +   bridge = pci_alloc_host_bridge(sizeof(sd));
> +   if (!bridge) {
> +   printk(KERN_ERR "PCI: OOM, skipping PCI bus %02x\n", busnum);
> +   return;
> +   }
> +   sd = pci_host_bridge_priv(bridge);

This looks fishy, as bridge->private is not set at this point AFAICS,
unless one of the previous patches changes that.

>
> -   list_splice_init(resources, >windows);
> -   bridge->dev.parent = parent;
> -   bridge->sysdata = sysdata;
> -   bridge->busnr = bus;
> -   bridge->ops = ops;
> +   sd->node = x86_pci_root_bus_node(busnum);
> +   x86_pci_root_bus_resources(busnum, >windows);
> +   bridge->sysdata = sd;
> +   bridge->busnr = busnum;
> +   bridge->ops = _root_ops;
>
> +   printk(KERN_DEBUG "PCI: Probing PCI hardware (bus %02x)\n", busnum);
> error = pci_scan_root_bus_bridge(bridge);
> if (error < 0)
> goto err_out;
>
> -   return bridge->bus;
> +   pci_bus_add_devices(bridge->bus);
> +   return;
>
>  err_out:
> -   kfree(bridge);
> -   return NULL;
> -}
> -
> -void pcibios_scan_root(int busnum)
> -{
> -   struct pci_bus *bus;
> -   struct pci_sysdata *sd;
> -   LIST_HEAD(resources);
> -
> -   sd = kzalloc(sizeof(*sd), GFP_KERNEL);
> -   if (!sd) {
> -   printk(KERN_ERR "PCI: OOM, skipping PCI bus %02x\n", busnum);
> -   return;
> -   }
> -   sd->node = x86_pci_root_bus_node(busnum);
> -   x86_pci_root_bus_resources(busnum, );
> -   printk(KERN_DEBUG "PCI: Probing PCI hardware (bus %02x)\n", busnum);
> -   bus = pci_scan_root_bus(NULL, busnum, _root_ops, sd, );
> -   if (!bus) {
> -   pci_free_resource_list();
> -   kfree(sd);
> -   return;
> -   }
> -   pci_bus_add_devices(bus);
> +   pci_free_host_bridge(bridge);
>  }
>
>  void __init pcibios_set_cache_line_size(void)
> --
> 2.18.0
>


Re: [RFC 07/15] PCI/ACPI: clean up acpi_pci_root_create()

2018-08-20 Thread Rafael J. Wysocki
On Fri, Aug 17, 2018 at 12:33 PM Arnd Bergmann  wrote:
>
> The acpi_pci_create_root_bus() can be fully integrated into
> acpi_pci_root_create(), improving a few things:
>
> * We can call pci_scan_root_bus_bridge(), which registers and
>   scans the bridge in one step.
> * After a failure in pci_register_host_bridge(), we correctly
>   clean up the resources.
> * The bridge settings (release function, flags, operations etc)
>   can get set up before registering the bridge.
> * Further cleanup would be possible, removing duplication between
>   pci_host_bridge and some ACPI structures.
>
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/acpi/pci_root.c | 68 +++--
>  1 file changed, 24 insertions(+), 44 deletions(-)
>
> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> index 85dbcf47015b..5f73de3b67c8 100644
> --- a/drivers/acpi/pci_root.c
> +++ b/drivers/acpi/pci_root.c
> @@ -873,34 +873,6 @@ static void acpi_pci_root_release_info(struct 
> pci_host_bridge *bridge)
> __acpi_pci_root_release_info(bridge->release_data);
>  }
>
> -static struct pci_bus *acpi_pci_create_root_bus(struct device *parent, int 
> bus,
> -   struct pci_ops *ops, void *sysdata, struct list_head 
> *resources)
> -{
> -   int error;
> -   struct pci_host_bridge *bridge;
> -
> -   bridge = pci_alloc_host_bridge(0);
> -   if (!bridge)
> -   return NULL;
> -
> -   bridge->dev.parent = parent;
> -
> -   list_splice_init(resources, >windows);
> -   bridge->sysdata = sysdata;
> -   bridge->busnr = bus;
> -   bridge->ops = ops;
> -
> -   error = pci_register_host_bridge(bridge);
> -   if (error < 0)
> -   goto err_out;
> -
> -   return bridge->bus;
> -
> -err_out:
> -   kfree(bridge);
> -   return NULL;
> -}
> -
>  struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,
>  struct acpi_pci_root_ops *ops,
>  struct acpi_pci_root_info *info,
> @@ -909,8 +881,7 @@ struct pci_bus *acpi_pci_root_create(struct acpi_pci_root 
> *root,
> int ret, busnum = root->secondary.start;
> struct acpi_device *device = root->device;
> int node = acpi_get_node(device->handle);
> -   struct pci_bus *bus;
> -   struct pci_host_bridge *host_bridge;
> +   struct pci_host_bridge *bridge;

Why "bridge" and not "host" or even something to stand for "root complex"?

Or maybe it can still be "host_bridge"?

>
> info->root = root;
> info->bridge = device;
> @@ -930,30 +901,39 @@ struct pci_bus *acpi_pci_root_create(struct 
> acpi_pci_root *root,
>
> pci_acpi_root_add_resources(info);
> pci_add_resource(>resources, >secondary);
> -   bus = acpi_pci_create_root_bus(NULL, busnum, ops->pci_ops,
> - sysdata, >resources);
> -   if (!bus)
> +
> +   bridge = pci_alloc_host_bridge(0);
> +   if (!bridge)
> goto out_release_info;
>
> -   host_bridge = to_pci_host_bridge(bus->bridge);
> +   list_splice_init(>resources, >windows);
> +   bridge->sysdata = sysdata;
> +   bridge->busnr = busnum;
> +   bridge->ops = ops->pci_ops;
> +   pci_set_host_bridge_release(bridge, acpi_pci_root_release_info,
> +   info);
> +
> if (!(root->osc_control_set & OSC_PCI_EXPRESS_NATIVE_HP_CONTROL))
> -   host_bridge->native_pcie_hotplug = 0;
> +   bridge->native_pcie_hotplug = 0;
> if (!(root->osc_control_set & OSC_PCI_SHPC_NATIVE_HP_CONTROL))
> -   host_bridge->native_shpc_hotplug = 0;
> +   bridge->native_shpc_hotplug = 0;
> if (!(root->osc_control_set & OSC_PCI_EXPRESS_AER_CONTROL))
> -   host_bridge->native_aer = 0;
> +   bridge->native_aer = 0;
> if (!(root->osc_control_set & OSC_PCI_EXPRESS_PME_CONTROL))
> -   host_bridge->native_pme = 0;
> +   bridge->native_pme = 0;
> if (!(root->osc_control_set & OSC_PCI_EXPRESS_LTR_CONTROL))
> -   host_bridge->native_ltr = 0;
> +   bridge->native_ltr = 0;
> +
> +   ret = pci_scan_root_bus_bridge(bridge);
> +   if (ret < 0)
> +   goto out_release_bridge;
>
> -   pci_scan_child_bus(bus);
> -   pci_set_host_bridge_release(host_bridge, acpi_pci_root_release_info,
> -   info);
> if (node != NUMA_NO_NODE)
> -   dev_printk(KERN_DEBUG, >dev, "on NUMA node %d\n", node);
> -   return bus;
> +   dev_printk(KERN_DEBUG, >bus->dev, "on NUMA node 
> %d\n", node);
> +   return bridge->bus;
>
> +out_release_bridge:
> +   pci_free_host_bridge(bridge);
>  out_release_info:
> __acpi_pci_root_release_info(info);
> return NULL;
> --
> 2.18.0
>


[PATCH v7 3/3] powerpc/fadump: Do not allow hot-remove memory from fadump reserved area.

2018-08-20 Thread Mahesh J Salgaonkar
From: Mahesh Salgaonkar 

For fadump to work successfully there should not be any holes in reserved
memory ranges where kernel has asked firmware to move the content of old
kernel memory in event of crash. Now that fadump uses CMA for reserved
area, this memory area is now not protected from hot-remove operations
unless it is cma allocated. Hence, fadump service can fail to re-register
after the hot-remove operation, if hot-removed memory belongs to fadump
reserved region. To avoid this make sure that memory from fadump reserved
area is not hot-removable if fadump is registered.

However, if user still wants to remove that memory, he can do so by
manually stopping fadump service before hot-remove operation.

Signed-off-by: Mahesh Salgaonkar 
---
 arch/powerpc/include/asm/fadump.h   |2 +-
 arch/powerpc/kernel/fadump.c|   10 --
 arch/powerpc/platforms/pseries/hotplug-memory.c |7 +--
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h 
b/arch/powerpc/include/asm/fadump.h
index e9764b541927..43825111c479 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -208,7 +208,7 @@ struct fad_crash_memory_ranges {
unsigned long long  size;
 };
 
-extern int is_fadump_boot_memory_area(u64 addr, ulong size);
+extern int is_fadump_memory_area(u64 addr, ulong size);
 extern int early_init_dt_scan_fw_dump(unsigned long node,
const char *uname, int depth, void *data);
 extern int fadump_reserve_mem(void);
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 44a29e4d419c..89bee4f4fe5c 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -180,13 +180,19 @@ int __init early_init_dt_scan_fw_dump(unsigned long node,
 
 /*
  * If fadump is registered, check if the memory provided
- * falls within boot memory area.
+ * falls within boot memory area and reserved memory area.
  */
-int is_fadump_boot_memory_area(u64 addr, ulong size)
+int is_fadump_memory_area(u64 addr, ulong size)
 {
+   u64 d_start = fw_dump.reserve_dump_area_start;
+   u64 d_end = d_start + fw_dump.reserve_dump_area_size;
+
if (!fw_dump.dump_registered)
return 0;
 
+   if (((addr + size) > d_start) && (addr <= d_end))
+   return 1;
+
return (addr + size) > RMA_START && addr <= fw_dump.boot_memory_size;
 }
 
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index c1578f54c626..e4c658cda3a7 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -389,8 +389,11 @@ static bool lmb_is_removable(struct drmem_lmb *lmb)
phys_addr = lmb->base_addr;
 
 #ifdef CONFIG_FA_DUMP
-   /* Don't hot-remove memory that falls in fadump boot memory area */
-   if (is_fadump_boot_memory_area(phys_addr, block_sz))
+   /*
+* Don't hot-remove memory that falls in fadump boot memory area
+* and memory that is reserved for capturing old kernel memory.
+*/
+   if (is_fadump_memory_area(phys_addr, block_sz))
return false;
 #endif
 



[PATCH v7 2/3] powerpc/fadump: throw proper error message on fadump registration failure.

2018-08-20 Thread Mahesh J Salgaonkar
From: Mahesh Salgaonkar 

fadump fails to register when there are holes in reserved memory area.
This can happen if user has hot-removed a memory that falls in the fadump
reserved memory area. Throw a meaningful error message to the user in
such case.

Signed-off-by: Mahesh Salgaonkar 
---
 arch/powerpc/kernel/fadump.c |   33 +
 1 file changed, 33 insertions(+)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 166e71635921..44a29e4d419c 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -234,6 +234,36 @@ static int is_boot_memory_area_contiguous(void)
return ret;
 }
 
+/*
+ * Returns 1, if there are no holes in reserved memory area,
+ * 0 otherwise.
+ */
+static int is_reserved_memory_area_contiguous(void)
+{
+   struct memblock_region *reg;
+   unsigned long start, end;
+   unsigned long d_start = fw_dump.reserve_dump_area_start;
+   unsigned long d_end = d_start + fw_dump.reserve_dump_area_size;
+   int ret = 0;
+
+   for_each_memblock(memory, reg) {
+   start = max(d_start, (unsigned long)reg->base);
+   end = min(d_end, (unsigned long)(reg->base + reg->size));
+   if (d_start < end) {
+   /* Memory hole from d_start to start */
+   if (start > d_start)
+   break;
+
+   if (end == d_end) {
+   ret = 1;
+   break;
+   }
+   d_start = end + 1;
+   }
+   }
+   return ret;
+}
+
 /* Print firmware assisted dump configurations for debugging purpose. */
 static void fadump_show_config(void)
 {
@@ -602,6 +632,9 @@ static int register_fw_dump(struct fadump_mem_struct *fdm)
if (!is_boot_memory_area_contiguous())
pr_err("Can't have holes in boot memory area while "
   "registering fadump\n");
+   else if (!is_reserved_memory_area_contiguous())
+   pr_err("Can't have holes in reserved memory area while"
+  " registering fadump\n");
 
printk(KERN_ERR "Failed to register firmware-assisted kernel"
" dump. Parameter Error(%d).\n", rc);



Re: [PATCH 8/9] PCI: hotplug: Embed hotplug_slot

2018-08-20 Thread Rafael J. Wysocki
On Sun, Aug 19, 2018 at 4:46 PM Lukas Wunner  wrote:
>
> When the PCI hotplug core and its first user, cpqphp, were introduced in
> February 2002 with historic commit a8a2069f432c, cpqphp allocated a slot
> struct for its internal use plus a hotplug_slot struct to be registered
> with the hotplug core and linked the two with pointers:
> https://git.kernel.org/tglx/history/c/a8a2069f432c
>
> Nowadays, the predominant pattern in the tree is to embed ("subclass")
> such structures in one another and cast to the containing struct with
> container_of().  But it wasn't until July 2002 that container_of() was
> introduced with historic commit ec4f214232cf:
> https://git.kernel.org/tglx/history/c/ec4f214232cf
>
> pnv_php, introduced in 2016, did the right thing and embedded struct
> hotplug_slot in its internal struct pnv_php_slot, but all other drivers
> cargo-culted cpqphp's design and linked separate structs with pointers.
>
> Embedding structs is preferrable to linking them with pointers because
> it requires fewer allocations, thereby reducing overhead and simplifying
> error paths.  Casting an embedded struct to the containing struct
> becomes a cheap subtraction rather than a dereference.  And having fewer
> pointers reduces the risk of them pointing nowhere either accidentally
> or due to an attack.
>
> Convert all drivers to embed struct hotplug_slot in their internal slot
> struct.  The "private" pointer in struct hotplug_slot thereby becomes
> unused, so drop it.
>
> Signed-off-by: Lukas Wunner 
> Cc: Rafael J. Wysocki 
> Cc: Len Brown 
> Cc: Scott Murray 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Gavin Shan 
> Cc: Sebastian Ott 
> Cc: Gerald Schaefer 
> Cc: Corentin Chary 
> Cc: Darren Hart 
> Cc: Andy Shevchenko 
> ---
>  drivers/pci/hotplug/acpiphp.h   |  9 ++-
>  drivers/pci/hotplug/acpiphp_core.c  | 28 +++-
>  drivers/pci/hotplug/acpiphp_ibm.c   |  2 +-
>  drivers/pci/hotplug/cpci_hotplug.h  |  9 ++-
>  drivers/pci/hotplug/cpci_hotplug_core.c | 37 --
>  drivers/pci/hotplug/cpci_hotplug_pci.c  |  6 +-
>  drivers/pci/hotplug/cpqphp.h|  9 ++-
>  drivers/pci/hotplug/cpqphp_core.c   | 37 --
>  drivers/pci/hotplug/cpqphp_ctrl.c   |  2 -
>  drivers/pci/hotplug/ibmphp.h|  7 +-
>  drivers/pci/hotplug/ibmphp_core.c   | 92 +++--
>  drivers/pci/hotplug/ibmphp_ebda.c   | 37 +++---
>  drivers/pci/hotplug/pciehp.h| 11 ++-
>  drivers/pci/hotplug/pciehp_core.c   | 37 --
>  drivers/pci/hotplug/pciehp_ctrl.c   |  4 +-
>  drivers/pci/hotplug/pciehp_hpc.c|  8 +--
>  drivers/pci/hotplug/pnv_php.c   |  9 ++-
>  drivers/pci/hotplug/rpaphp.h|  7 +-
>  drivers/pci/hotplug/rpaphp_core.c   | 14 ++--
>  drivers/pci/hotplug/rpaphp_slot.c   | 15 ++--
>  drivers/pci/hotplug/s390_pci_hpc.c  | 30 
>  drivers/pci/hotplug/sgi_hotplug.c   | 52 ++
>  drivers/pci/hotplug/shpchp.h|  6 +-
>  drivers/pci/hotplug/shpchp_core.c   | 17 ++---
>  drivers/platform/x86/asus-wmi.c | 26 +++
>  drivers/platform/x86/eeepc-laptop.c | 30 
>  include/linux/pci_hotplug.h |  3 -
>  27 files changed, 223 insertions(+), 321 deletions(-)

Good cleanup.

Reviewed-by: Rafael J. Wysocki 


[PATCH v7 1/3] powerpc/fadump: Reservationless firmware assisted dump

2018-08-20 Thread Mahesh J Salgaonkar
From: Mahesh Salgaonkar 

One of the primary issues with Firmware Assisted Dump (fadump) on Power
is that it needs a large amount of memory to be reserved. On large
systems with TeraBytes of memory, this reservation can be quite
significant.

In some cases, fadump fails if the memory reserved is insufficient, or
if the reserved memory was DLPAR hot-removed.

In the normal case, post reboot, the preserved memory is filtered to
extract only relevant areas of interest using the makedumpfile tool.
While the tool provides flexibility to determine what needs to be part
of the dump and what memory to filter out, all supported distributions
default this to "Capture only kernel data and nothing else".

We take advantage of this default and the Linux kernel's Contiguous
Memory Allocator (CMA) to fundamentally change the memory reservation
model for fadump.

Instead of setting aside a significant chunk of memory nobody can use,
this patch uses CMA instead, to reserve a significant chunk of memory
that the kernel is prevented from using (due to MIGRATE_CMA), but
applications are free to use it. With this fadump will still be able
to capture all of the kernel memory and most of the user space memory
except the user pages that were present in CMA region.

Essentially, on a P9 LPAR with 2 cores, 8GB RAM and current upstream:
[root@zzxx-yy10 ~]# free -m
  totalusedfree  shared  buff/cache   available
Mem:   7557 1936822  12 5416725
Swap:  4095   04095

With this patch:
[root@zzxx-yy10 ~]# free -m
  totalusedfree  shared  buff/cache   available
Mem:   8133 1947464  12 4757338
Swap:  4095   04095

Changes made here are completely transparent to how fadump has
traditionally worked.

Thanks to Aneesh Kumar and Anshuman Khandual for helping us understand
CMA and its usage.

TODO:
- Handle case where CMA reservation spans nodes.

Signed-off-by: Ananth N Mavinakayanahalli 
Signed-off-by: Mahesh Salgaonkar 
Signed-off-by: Hari Bathini 
---
 Documentation/powerpc/firmware-assisted-dump.txt |   17 
 arch/powerpc/include/asm/fadump.h|5 +
 arch/powerpc/kernel/fadump.c |   97 --
 3 files changed, 108 insertions(+), 11 deletions(-)

diff --git a/Documentation/powerpc/firmware-assisted-dump.txt 
b/Documentation/powerpc/firmware-assisted-dump.txt
index bdd344aa18d9..18c5feef2577 100644
--- a/Documentation/powerpc/firmware-assisted-dump.txt
+++ b/Documentation/powerpc/firmware-assisted-dump.txt
@@ -113,7 +113,15 @@ header, is usually reserved at an offset greater than boot 
memory
 size (see Fig. 1). This area is *not* released: this region will
 be kept permanently reserved, so that it can act as a receptacle
 for a copy of the boot memory content in addition to CPU state
-and HPTE region, in the case a crash does occur.
+and HPTE region, in the case a crash does occur. Since this reserved
+memory area is used only after the system crash, there is no point in
+blocking this significant chunk of memory from production kernel.
+Hence, the implementation uses the Linux kernel's Contiguous Memory
+Allocator (CMA) for memory reservation if CMA is configured for kernel.
+With CMA reservation this memory will be available for applications to
+use it, while kernel is prevented from using it. With this fadump will
+still be able to capture all of the kernel memory and most of the user
+space memory except the user pages that were present in CMA region.
 
   o Memory Reservation during first kernel
 
@@ -162,6 +170,9 @@ How to enable firmware-assisted dump (fadump):
 
 1. Set config option CONFIG_FA_DUMP=y and build kernel.
 2. Boot into linux kernel with 'fadump=on' kernel cmdline option.
+   By default, fadump reserved memory will be initialized as CMA area.
+   Alternatively, user can boot linux kernel with 'fadump=nocma' to
+   prevent fadump to use CMA.
 3. Optionally, user can also set 'crashkernel=' kernel cmdline
to specify size of the memory to reserve for boot memory dump
preservation.
@@ -172,6 +183,10 @@ NOTE: 1. 'fadump_reserve_mem=' parameter has been 
deprecated. Instead
   2. If firmware-assisted dump fails to reserve memory then it
  will fallback to existing kdump mechanism if 'crashkernel='
  option is set at kernel cmdline.
+  3. if user wants to capture all of user space memory and ok with
+ reserved memory not available to production system, then
+ 'fadump=nocma' kernel parameter can be used to fallback to
+ old behaviour.
 
 Sysfs/debugfs files:
 
diff --git a/arch/powerpc/include/asm/fadump.h 
b/arch/powerpc/include/asm/fadump.h
index 5a23010af600..e9764b541927 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -48,6 +48,10 @@
 
 #define 

[PATCH v7 0/3] powerpc/fadump: Improvements for firmware-assisted dump.

2018-08-20 Thread Mahesh J Salgaonkar
One of the primary issues with Firmware Assisted Dump (fadump) on Power
is that it needs a large amount of memory to be reserved. This reserved
memory is used for saving the contents of old crashed kernel's memory before
fadump capture kernel uses old kernel's memory area to boot. However, This
reserved memory area stays unused until system crash and isn't available
for production kernel to use.

Instead of setting aside a significant chunk of memory that nobody can use,
take advantage Linux kernel's Contiguous Memory Allocator (CMA) feature,
to reserve a significant chunk of memory that the kernel is prevented from
using, but applications are free to use it.

Patch 1 implements the usage of CMA region to allow production kernel to
use that memory for applications usage, making fadump reservationless.
We now initialize siginificant chunk of faump reserved memory for CMA.

Cahnges in V7:
- Revert back to use CMA for fadump reservation.
- Add fadump=nocma option to fall back default behaviour.

Changes in V6:
- Introduce an interface to mark reserved memory as ZONE_MOVABLE. Hence
  sending this series as RFC again.
- Mark reserved area as ZONE_MOVABLE instead of CMA.
- Add fadump=nonmovable parameter for user who don't want to use ZONE_MOVABLE.

Changes in V5:
- Drop the patch that does metadata movement.
- Move the kexec fix patch to top (patch 1)
- Fold CMA documenation patch into patch 2
- Fix the compilation issues when CONFIG_CMA is not set reported by Hari.
- Use the approach of using boot memory size for CMA as suggested by Hari
  except the movement of sections. Thanks to Hari.

Changes in V4:
- patch 1: Make fadump compatible irrespective of kernel versions.
- patch 4: moved out of the series and been posted seperatly at
  http://patchwork.ozlabs.org/patch/896716/
- Documentation update about CMA reservation.

Changes in V3:
- patch 1 & 2: move metadata region and documentation update.
- patch 7: Un-register the faudmp on kexec path


---

Mahesh Salgaonkar (3):
  powerpc/fadump: Reservationless firmware assisted dump
  powerpc/fadump: throw proper error message on fadump registration failure.
  powerpc/fadump: Do not allow hot-remove memory from fadump reserved area.


 Documentation/powerpc/firmware-assisted-dump.txt |   17 +++
 arch/powerpc/include/asm/fadump.h|7 +
 arch/powerpc/kernel/fadump.c |  140 --
 arch/powerpc/platforms/pseries/hotplug-memory.c  |7 +
 4 files changed, 155 insertions(+), 16 deletions(-)

--
Signature



Re: [PATCH 7/9] PCI: hotplug: Drop hotplug_slot_info

2018-08-20 Thread Rafael J. Wysocki
On Sun, Aug 19, 2018 at 4:43 PM Lukas Wunner  wrote:
>
> Ever since the PCI hotplug core was introduced in 2002, drivers had to
> allocate and register a struct hotplug_slot_info for every slot:
> https://git.kernel.org/tglx/history/c/a8a2069f432c
>
> Apparently the idea was that drivers furnish the hotplug core with an
> up-to-date card presence status, power status, latch status and
> attention indicator status as well as notify the hotplug core of changes
> thereof.  However only 4 out of 12 hotplug drivers bother to notify the
> hotplug core with pci_hp_change_slot_info() and the hotplug core never
> made any use of the information:  There is just a single macro in
> pci_hotplug_core.c, GET_STATUS(), which uses the hotplug_slot_info if
> the driver lacks the corresponding callback in hotplug_slot_ops.  The
> macro is called when the user reads the attribute via sysfs.
>
> Now, if the callback isn't defined, the attribute isn't exposed in sysfs
> in the first place (see e.g. has_power_file()).  There are only two
> situations when the hotplug_slot_info would actually be accessed:
>
> * If the driver defines ->enable_slot or ->disable_slot but not
>   ->get_power_status.
>
> * If the driver defines ->set_attention_status but not
>   ->get_attention_status.
>
> There is no driver doing the former and just a single driver doing the
> latter, namely pnv_php.c.  Amend it with a ->get_attention_status
> callback.  With that, the hotplug_slot_info becomes completely unused by
> the PCI hotplug core.  But a few drivers use it internally as a cache:
>
> cpcihp uses it to cache the latch_status and adapter_status.
> cpqhp uses it to cache the adapter_status.
> pnv_php and rpaphp use it to cache the attention_status.
> shpchp uses it to cache all four values.
>
> Amend these drivers to cache the information in their private slot
> struct.  shpchp's slot struct already contains members to cache the
> power_status and adapter_status, so additional members are only needed
> for the other two values.  In the case of cpqphp, the cached value is
> only accessed in a single place, so instead of caching it, read the
> current value from the hardware.
>
> Caution:  acpiphp, cpci, cpqhp, shpchp, asus-wmi and eeepc-laptop
> populate the hotplug_slot_info with initial values on probe.  That code
> is herewith removed.  There is a theoretical chance that the code has
> side effects without which the driver fails to function, e.g. if the
> ACPI method to read the adapter status needs to be executed at least
> once on probe.  That seems unlikely to me, still maintainers should
> review the changes carefully for this possibility.

I'm not aware of any case in which it will break anything, so

Reviewed-by: Rafael J. Wysocki 

but if that happens, it may be necessary to add the execution of the
control methods in question directly to the initialization part.


Re: [PATCH 6/9] PCI: hotplug: Constify hotplug_slot_ops

2018-08-20 Thread Rafael J. Wysocki
On Sun, Aug 19, 2018 at 4:36 PM Lukas Wunner  wrote:
>
> Hotplug drivers cannot declare their hotplug_slot_ops const, making them
> attractive targets for attackers, because upon registration of a hotplug
> slot, __pci_hp_initialize() writes to the "owner" and "mod_name" members
> in that struct.
>
> Fix by moving these members to struct hotplug_slot and constify every
> driver's hotplug_slot_ops except for pciehp.
>
> pciehp constructs its hotplug_slot_ops at runtime based on the PCIe
> port's capabilities, hence cannot declare them const.  It can be
> converted to __write_rarely once that's mainlined:
> http://www.openwall.com/lists/kernel-hardening/2016/11/16/3
>
> Signed-off-by: Lukas Wunner 
> Cc: Rafael J. Wysocki 
> Cc: Len Brown 
> Cc: Scott Murray 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Gavin Shan 
> Cc: Sebastian Ott 
> Cc: Gerald Schaefer 
> Cc: Corentin Chary 
> Cc: Darren Hart 
> Cc: Andy Shevchenko 
> ---
>  drivers/pci/hotplug/acpiphp_core.c  |  2 +-
>  drivers/pci/hotplug/cpci_hotplug_core.c |  2 +-
>  drivers/pci/hotplug/cpqphp_core.c   |  2 +-
>  drivers/pci/hotplug/ibmphp.h|  2 +-
>  drivers/pci/hotplug/ibmphp_core.c   |  2 +-
>  drivers/pci/hotplug/pci_hotplug_core.c  | 27 +
>  drivers/pci/hotplug/pnv_php.c   |  2 +-
>  drivers/pci/hotplug/rpaphp.h|  2 +-
>  drivers/pci/hotplug/rpaphp_core.c   |  2 +-
>  drivers/pci/hotplug/s390_pci_hpc.c  |  2 +-
>  drivers/pci/hotplug/sgi_hotplug.c   |  2 +-
>  drivers/pci/hotplug/shpchp_core.c   |  2 +-
>  drivers/pci/pci.c   |  4 ++--
>  drivers/pci/slot.c  |  2 +-
>  drivers/platform/x86/asus-wmi.c |  3 +--
>  drivers/platform/x86/eeepc-laptop.c |  3 +--
>  include/linux/pci_hotplug.h | 10 -
>  17 files changed, 35 insertions(+), 36 deletions(-)

Nice!

Reviewed-by: Rafael J. Wysocki 


Re: [PATCH v6 00/11] hugetlb: Factorize hugetlb architecture primitives

2018-08-20 Thread Alexandre Ghiti

Ok, my bad, sorry about that, I have just added Andrew as CC then.

Thank you,

Alex


On 08/20/2018 09:17 AM, Michal Hocko wrote:

On Mon 20-08-18 08:45:10, Alexandre Ghiti wrote:

Hi Michal,

This patchset got acked, tested and reviewed by quite a few people, and it
has been suggested
that it should be included in -mm tree: could you tell me if something else
needs to be done for
its inclusion ?

Thanks for your time,

I didn't really get to look at the series but seeing an Ack from Mike
and arch maintainers should be good enough for it to go. This email
doesn't have Andrew Morton in the CC list so you should add him if you
want the series to land into the mm tree.




Re: [PATCH v6 00/11] hugetlb: Factorize hugetlb architecture primitives

2018-08-20 Thread Michal Hocko
On Mon 20-08-18 08:45:10, Alexandre Ghiti wrote:
> Hi Michal,
> 
> This patchset got acked, tested and reviewed by quite a few people, and it
> has been suggested
> that it should be included in -mm tree: could you tell me if something else
> needs to be done for
> its inclusion ?
> 
> Thanks for your time,

I didn't really get to look at the series but seeing an Ack from Mike
and arch maintainers should be good enough for it to go. This email
doesn't have Andrew Morton in the CC list so you should add him if you
want the series to land into the mm tree.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v6 00/11] hugetlb: Factorize hugetlb architecture primitives

2018-08-20 Thread Alexandre Ghiti

Hi Michal,

This patchset got acked, tested and reviewed by quite a few people, and 
it has been suggested
that it should be included in -mm tree: could you tell me if something 
else needs to be done for

its inclusion ?

Thanks for your time,

Alex


On 08/06/2018 07:57 PM, Alexandre Ghiti wrote:

[CC linux-mm for inclusion in -mm tree]
  
In order to reduce copy/paste of functions across architectures and then

make riscv hugetlb port (and future ports) simpler and smaller, this
patchset intends to factorize the numerous hugetlb primitives that are
defined across all the architectures.
  
Except for prepare_hugepage_range, this patchset moves the versions that

are just pass-through to standard pte primitives into
asm-generic/hugetlb.h by using the same #ifdef semantic that can be
found in asm-generic/pgtable.h, i.e. __HAVE_ARCH_***.
  
s390 architecture has not been tackled in this serie since it does not

use asm-generic/hugetlb.h at all.
  
This patchset has been compiled on all addressed architectures with

success (except for parisc, but the problem does not come from this
series).
  
v6:

   - Remove nohash/32 and book3s/32 powerpc specific implementations in
 order to use the generic ones.
   - Add all the Reviewed-by, Acked-by and Tested-by in the commits,
 thanks to everyone.
  
v5:

   As suggested by Mike Kravetz, no need to move the #include
for arm and x86 architectures, let it live at
   the top of the file.
  
v4:

   Fix powerpc build error due to misplacing of #include
outside of #ifdef CONFIG_HUGETLB_PAGE, as
   pointed by Christophe Leroy.
  
v1, v2, v3:

   Same version, just problems with email provider and misuse of
   --batch-size option of git send-email

Alexandre Ghiti (11):
   hugetlb: Harmonize hugetlb.h arch specific defines with pgtable.h
   hugetlb: Introduce generic version of hugetlb_free_pgd_range
   hugetlb: Introduce generic version of set_huge_pte_at
   hugetlb: Introduce generic version of huge_ptep_get_and_clear
   hugetlb: Introduce generic version of huge_ptep_clear_flush
   hugetlb: Introduce generic version of huge_pte_none
   hugetlb: Introduce generic version of huge_pte_wrprotect
   hugetlb: Introduce generic version of prepare_hugepage_range
   hugetlb: Introduce generic version of huge_ptep_set_wrprotect
   hugetlb: Introduce generic version of huge_ptep_set_access_flags
   hugetlb: Introduce generic version of huge_ptep_get

  arch/arm/include/asm/hugetlb-3level.h| 32 +-
  arch/arm/include/asm/hugetlb.h   | 30 --
  arch/arm64/include/asm/hugetlb.h | 39 +++-
  arch/ia64/include/asm/hugetlb.h  | 47 ++-
  arch/mips/include/asm/hugetlb.h  | 40 +++--
  arch/parisc/include/asm/hugetlb.h| 33 +++
  arch/powerpc/include/asm/book3s/32/pgtable.h |  6 --
  arch/powerpc/include/asm/book3s/64/pgtable.h |  1 +
  arch/powerpc/include/asm/hugetlb.h   | 43 ++
  arch/powerpc/include/asm/nohash/32/pgtable.h |  6 --
  arch/powerpc/include/asm/nohash/64/pgtable.h |  1 +
  arch/sh/include/asm/hugetlb.h| 54 ++---
  arch/sparc/include/asm/hugetlb.h | 40 +++--
  arch/x86/include/asm/hugetlb.h   | 69 --
  include/asm-generic/hugetlb.h| 88 +++-
  15 files changed, 135 insertions(+), 394 deletions(-)





Re: [PATCH 1/3] arch/powerpc/hugetlb: Use pte_access_permitted for hugetlb access check

2018-08-20 Thread Christophe LEROY




Le 04/12/2017 à 03:19, Aneesh Kumar K.V a écrit :

No functional change in this patch. This update gup_hugepte to use the
helper. This will help later when we add memory keys.

Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/mm/hugetlbpage.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index a9b9083c5e49..c7e5afe5e118 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -855,9 +855,7 @@ int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned 
long addr,
  
  	pte = READ_ONCE(*ptep);
  
-	if (!pte_present(pte) || !pte_read(pte))

-   return 0;
-   if (write && !pte_write(pte))
+   if (!pte_access_permitted(pte, write))


Seems like pte_access_permitted() doesn't check _PAGE_RO whereas 
pte_write() does.


Christophe


return 0;
  
  	/* hugepages are never "special" */




Re: [PATCH 3/3] powerpc/mm/ Add proper pte access check helper

2018-08-20 Thread Christophe LEROY




Le 17/08/2018 à 17:12, Christophe LEROY a écrit :



Le 04/12/2017 à 03:19, Aneesh Kumar K.V a écrit :
pte_access_premitted get called in get_user_pages_fast path. If we 
have marked

the pte PROT_NONE, we should not allow a read access on the address. With
the current implementation we are not checking the READ and only check 
for

WRITE. This is needed on archs like ppc64 that implement PROT_NONE using
_PAGE_USER access instead of _PAGE_PRESENT. Also add pte_user check 
just to make sure

we are not accessing kernel mapping.

Even though there is code duplication, keeping the low level pte 
accessors

different for different platforms helps in code readability.

Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/include/asm/book3s/32/pgtable.h | 23 
+++
  arch/powerpc/include/asm/nohash/pgtable.h    | 23 
+++

  2 files changed, 46 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h

index 016579ef16d3..30a155c0a6b0 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -311,6 +311,29 @@ static inline int pte_present(pte_t pte)
  return pte_val(pte) & _PAGE_PRESENT;
  }
+/*
+ * We only find page table entry in the last level
+ * Hence no need for other accessors
+ */
+#define pte_access_permitted pte_access_permitted
+static inline bool pte_access_permitted(pte_t pte, bool write)
+{
+    unsigned long pteval = pte_val(pte);
+    /*
+ * A read-only access is controlled by _PAGE_USER bit.
+ * We have _PAGE_READ set for WRITE and EXECUTE
+ */
+    unsigned long need_pte_bits = _PAGE_PRESENT | _PAGE_USER;
+
+    if (write)
+    need_pte_bits |= _PAGE_WRITE;
+
+    if ((pteval & need_pte_bits) != need_pte_bits)
+    return false;
+
+    return true;
+}
+
  /* Conversion functions: convert a page and protection to a page entry,
   * and a page entry and page directory to the page they refer to.
   *
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h

index 5c68f4a59f75..fc4376c8d444 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -45,6 +45,29 @@ static inline int pte_present(pte_t pte)
  return pte_val(pte) & _PAGE_PRESENT;
  }
+/*
+ * We only find page table entry in the last level
+ * Hence no need for other accessors
+ */
+#define pte_access_permitted pte_access_permitted
+static inline bool pte_access_permitted(pte_t pte, bool write)
+{
+    unsigned long pteval = pte_val(pte);
+    /*
+ * A read-only access is controlled by _PAGE_USER bit.
+ * We have _PAGE_READ set for WRITE and EXECUTE
+ */


Not fully right. asm/pte-common.h defines:

#define PAGE_NONE    __pgprot(_PAGE_BASE | _PAGE_NA)
#define PAGE_SHARED    __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW)
#define PAGE_SHARED_X    __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW | \
  _PAGE_EXEC)
#define PAGE_COPY    __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RO)
#define PAGE_COPY_X    __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RO | \
  _PAGE_EXEC)
#define PAGE_READONLY    __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RO)
#define PAGE_READONLY_X    __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RO | \
  _PAGE_EXEC)

On the 8xx, _PAGE_USER = 0


+    unsigned long need_pte_bits = _PAGE_PRESENT | _PAGE_USER;


_PAGE_PRIVILEGED should be checked as well.

Indeed, it seems like our patches crossed each other. My patch for 
adding _PAGE_PRIVILEGED is date January 16th while your's was merged on 
the 17th.


I'm wondering if there are other places that are missing _PAGE_RO 
handling and _PAGE_PRIVILEGED handling. Do you remember if you added any 
recently ?


Christophe


+
+    if (write)
+    need_pte_bits |= _PAGE_WRITE;
+
+    if ((pteval & need_pte_bits) != need_pte_bits)
+    return false;


This test is not fully correct:
- To check access(read) permission, you also have to check that _PAGE_NA 
is not set.
- To check write permission, you also have to check that neither 
_PAGE_NA nor _PAGE_RO are set.


On the 8xx, you have:
_PAGE_RW = _PAGE_WRITE = 0
_PAGE_NA = 0x0200
_PAGE_RO = 0x0600

Christophe


+
+    return true;
+}
+
  /* Conversion functions: convert a page and protection to a page entry,
   * and a page entry and page directory to the page they refer to.
   *