Re: [PATCH 1/2] genirq: add an affinity parameter to irq_create_mapping()

2020-11-24 Thread Laurent Vivier
On 24/11/2020 23:19, Thomas Gleixner wrote:
> On Tue, Nov 24 2020 at 21:03, Laurent Vivier wrote:
>> This parameter is needed to pass it to irq_domain_alloc_descs().
>>
>> This seems to have been missed by
>> o06ee6d571f0e ("genirq: Add affinity hint to irq allocation")
> 
> No, this has not been missed at all. There was and is no reason to do
> this.
> 
>> This is needed to implement proper support for multiqueue with
>> pseries.
> 
> And because pseries needs this _all_ callers need to be changed?
> 
>>  123 files changed, 171 insertions(+), 146 deletions(-)
> 
> Lots of churn for nothing. 99% of the callers will never need that.
> 
> What's wrong with simply adding an interface which takes that parameter,
> make the existing one an inline wrapper and and leave the rest alone?

Nothing. I'm going to do like that.

Thank you for your comment.

Laurent



[PATCH V2] powerpc/perf: Exclude kernel samples while counting events in user space.

2020-11-24 Thread Athira Rajeev
Perf event attritube supports exclude_kernel flag
to avoid sampling/profiling in supervisor state (kernel).
Based on this event attr flag, Monitor Mode Control Register
bit is set to freeze on supervisor state. But sometime (due
to hardware limitation), Sampled Instruction Address
Register (SIAR) locks on to kernel address even when
freeze on supervisor is set. Patch here adds a check to
drop those samples.

Signed-off-by: Athira Rajeev 
---
Changes in v2:
- Initial patch was sent along with series:
  https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=209195
  Moving this patch as separate since this change is applicable
  for all PMU platforms.

 arch/powerpc/perf/core-book3s.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 08643cb..40aa117 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2122,6 +2122,17 @@ static void record_and_restart(struct perf_event *event, 
unsigned long val,
perf_event_update_userpage(event);
 
/*
+* Due to hardware limitation, sometimes SIAR could
+* lock on to kernel address even with freeze on
+* supervisor state (kernel) is set in MMCR2.
+* Check attr.exclude_kernel and address
+* to drop the sample in these cases.
+*/
+   if (event->attr.exclude_kernel && record)
+   if (is_kernel_addr(mfspr(SPRN_SIAR)))
+   record = 0;
+
+   /*
 * Finally record data if requested.
 */
if (record) {
-- 
1.8.3.1



Re: [PATCH v4] dt-bindings: misc: convert fsl,qoriq-mc from txt to YAML

2020-11-24 Thread Ioana Ciornei
On Mon, Nov 23, 2020 at 11:00:35AM +0200, Laurentiu Tudor wrote:
> From: Ionut-robert Aron 
> 
> Convert fsl,qoriq-mc to YAML in order to automate the verification
> process of dts files. In addition, update MAINTAINERS accordingly
> and, while at it, add some missing files.
> 
> Signed-off-by: Ionut-robert Aron 
> [laurentiu.tu...@nxp.com: update MINTAINERS, updates & fixes in schema]
> Signed-off-by: Laurentiu Tudor 

Acked-by: Ioana Ciornei 


> ---
> Changes in v4:
>  - use $ref to point to fsl,qoriq-mc-dpmac binding
> 
> Changes in v3:
>  - dropped duplicated "fsl,qoriq-mc-dpmac" schema and replaced with
>reference to it
>  - fixed a dt_binding_check warning
> 
> Changes in v2:
>  - fixed errors reported by yamllint
>  - dropped multiple unnecessary quotes
>  - used schema instead of text in description
>  - added constraints on dpmac reg property
> 
>  .../devicetree/bindings/misc/fsl,qoriq-mc.txt | 196 --
>  .../bindings/misc/fsl,qoriq-mc.yaml   | 186 +
>  .../ethernet/freescale/dpaa2/overview.rst |   5 +-
>  MAINTAINERS   |   4 +-
>  4 files changed, 193 insertions(+), 198 deletions(-)
>  delete mode 100644 Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
>  create mode 100644 Documentation/devicetree/bindings/misc/fsl,qoriq-mc.yaml
> 
> diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt 
> b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
> deleted file mode 100644
> index 7b486d4985dc..
> --- a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
> +++ /dev/null
> @@ -1,196 +0,0 @@
> -* Freescale Management Complex
> -
> -The Freescale Management Complex (fsl-mc) is a hardware resource
> -manager that manages specialized hardware objects used in
> -network-oriented packet processing applications. After the fsl-mc
> -block is enabled, pools of hardware resources are available, such as
> -queues, buffer pools, I/O interfaces. These resources are building
> -blocks that can be used to create functional hardware objects/devices
> -such as network interfaces, crypto accelerator instances, L2 switches,
> -etc.
> -
> -For an overview of the DPAA2 architecture and fsl-mc bus see:
> -Documentation/networking/device_drivers/ethernet/freescale/dpaa2/overview.rst
> -
> -As described in the above overview, all DPAA2 objects in a DPRC share the
> -same hardware "isolation context" and a 10-bit value called an ICID
> -(isolation context id) is expressed by the hardware to identify
> -the requester.
> -
> -The generic 'iommus' property is insufficient to describe the relationship
> -between ICIDs and IOMMUs, so an iommu-map property is used to define
> -the set of possible ICIDs under a root DPRC and how they map to
> -an IOMMU.
> -
> -For generic IOMMU bindings, see
> -Documentation/devicetree/bindings/iommu/iommu.txt.
> -
> -For arm-smmu binding, see:
> -Documentation/devicetree/bindings/iommu/arm,smmu.yaml.
> -
> -The MSI writes are accompanied by sideband data which is derived from the 
> ICID.
> -The msi-map property is used to associate the devices with both the ITS
> -controller and the sideband data which accompanies the writes.
> -
> -For generic MSI bindings, see
> -Documentation/devicetree/bindings/interrupt-controller/msi.txt.
> -
> -For GICv3 and GIC ITS bindings, see:
> -Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml.
> -
> -Required properties:
> -
> -- compatible
> -Value type: 
> -Definition: Must be "fsl,qoriq-mc".  A Freescale Management Complex
> -compatible with this binding must have Block Revision
> -Registers BRR1 and BRR2 at offset 0x0BF8 and 0x0BFC in
> -the MC control register region.
> -
> -- reg
> -Value type: 
> -Definition: A standard property.  Specifies one or two regions
> -defining the MC's registers:
> -
> -   -the first region is the command portal for the
> -this machine and must always be present
> -
> -   -the second region is the MC control registers. This
> -region may not be present in some scenarios, such
> -as in the device tree presented to a virtual machine.
> -
> -- ranges
> -Value type: 
> -Definition: A standard property.  Defines the mapping between the 
> child
> -MC address space and the parent system address space.
> -
> -The MC address space is defined by 3 components:
> - 
> -
> -Valid values for region type are
> -   0x0 - MC portals
> -   0x1 - QBMAN portals
> -
> -- #address-cells
> -Value type: 
> -Definition: Must be 3.  (see definition in 'ranges' property)
> -
> -- #size-cells
> -Value type: 
> - 

[PATCH v1 8/8] powerpc/32: Use SPRN_SPRG_SCRATCH2 in exception prologs

2020-11-24 Thread Christophe Leroy
Use SPRN_SPRG_SCRATCH2 as a third scratch register in
exception prologs in order to simplify them and avoid
data going back and forth from/to CR.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_32.h | 22 +++---
 1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index 5e3393122d29..a1ee1e12241e 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -40,7 +40,7 @@
 
 .macro EXCEPTION_PROLOG_1 for_rtas=0
 #ifdef CONFIG_VMAP_STACK
-   mr  r11, r1
+   mtspr   SPRN_SPRG_SCRATCH2,r1
subir1, r1, INT_FRAME_SIZE  /* use r1 if kernel */
beq 1f
mfspr   r1,SPRN_SPRG_THREAD
@@ -61,15 +61,10 @@
 
 .macro EXCEPTION_PROLOG_2 handle_dar_dsisr=0
 #ifdef CONFIG_VMAP_STACK
-   mtcrr10
-   li  r10, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */
-   mtmsr   r10
+   li  r11, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */
+   mtmsr   r11
isync
-#else
-   stw r10,_CCR(r11)   /* save registers */
-#endif
-   mfspr   r10, SPRN_SPRG_SCRATCH0
-#ifdef CONFIG_VMAP_STACK
+   mfspr   r11, SPRN_SPRG_SCRATCH2
stw r11,GPR1(r1)
stw r11,0(r1)
mr  r11, r1
@@ -78,14 +73,12 @@
stw r1,0(r11)
tovirt(r1, r11) /* set new kernel sp */
 #endif
+   stw r10,_CCR(r11)   /* save registers */
stw r12,GPR12(r11)
stw r9,GPR9(r11)
-   stw r10,GPR10(r11)
-#ifdef CONFIG_VMAP_STACK
-   mfcrr10
-   stw r10, _CCR(r11)
-#endif
+   mfspr   r10,SPRN_SPRG_SCRATCH0
mfspr   r12,SPRN_SPRG_SCRATCH1
+   stw r10,GPR10(r11)
stw r12,GPR11(r11)
mflrr10
stw r10,_LINK(r11)
@@ -99,7 +92,6 @@
stw r10, _DSISR(r11)
.endif
lwz r9, SRR1(r12)
-   andi.   r10, r9, MSR_PR
lwz r12, SRR0(r12)
 #else
mfspr   r12,SPRN_SRR0
-- 
2.25.0



[PATCH v1 7/8] powerpc/32s: Use SPRN_SPRG_SCRATCH2 in DSI prolog

2020-11-24 Thread Christophe Leroy
Use SPRN_SPRG_SCRATCH2 as an alternative scratch register in
the early part of DSI prolog in order to avoid clobbering
SPRN_SPRG_SCRATCH0/1 used by other prologs.

The 603 doesn't like a jump from DataLoadTLBMiss to the 10 nops
that are now in the beginning of DSI exception as a result of
the feature section. To workaround this, add a jump as alternative.
It also avoids fetching 10 nops for nothing.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/reg.h   |  1 +
 arch/powerpc/kernel/head_book3s_32.S | 24 
 2 files changed, 9 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index a37ce826f6f6..acd334ee3936 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1203,6 +1203,7 @@
 #ifdef CONFIG_PPC_BOOK3S_32
 #define SPRN_SPRG_SCRATCH0 SPRN_SPRG0
 #define SPRN_SPRG_SCRATCH1 SPRN_SPRG1
+#define SPRN_SPRG_SCRATCH2 SPRN_SPRG2
 #define SPRN_SPRG_603_LRU  SPRN_SPRG4
 #endif
 
diff --git a/arch/powerpc/kernel/head_book3s_32.S 
b/arch/powerpc/kernel/head_book3s_32.S
index 51eef7b82f9c..22d670263222 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -288,9 +288,9 @@ MachineCheck:
DO_KVM  0x300
 DataAccess:
 #ifdef CONFIG_VMAP_STACK
-   mtspr   SPRN_SPRG_SCRATCH0,r10
-   mfspr   r10, SPRN_SPRG_THREAD
 BEGIN_MMU_FTR_SECTION
+   mtspr   SPRN_SPRG_SCRATCH2,r10
+   mfspr   r10, SPRN_SPRG_THREAD
stw r11, THR11(r10)
mfspr   r10, SPRN_DSISR
mfcrr11
@@ -304,19 +304,11 @@ BEGIN_MMU_FTR_SECTION
 .Lhash_page_dsi_cont:
mtcrr11
lwz r11, THR11(r10)
-END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE)
-   mtspr   SPRN_SPRG_SCRATCH1,r11
-   mfspr   r11, SPRN_DAR
-   stw r11, DAR(r10)
-   mfspr   r11, SPRN_DSISR
-   stw r11, DSISR(r10)
-   mfspr   r11, SPRN_SRR0
-   stw r11, SRR0(r10)
-   mfspr   r11, SPRN_SRR1  /* check whether user or kernel */
-   stw r11, SRR1(r10)
-   mfcrr10
-   andi.   r11, r11, MSR_PR
-
+   mfspr   r10, SPRN_SPRG_SCRATCH2
+MMU_FTR_SECTION_ELSE
+   b   1f
+ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_HPTE_TABLE)
+1: EXCEPTION_PROLOG_0 handle_dar_dsisr=1
EXCEPTION_PROLOG_1
b   handle_page_fault_tramp_1
 #else  /* CONFIG_VMAP_STACK */
@@ -760,7 +752,7 @@ fast_hash_page_return:
/* DSI */
mtcrr11
lwz r11, THR11(r10)
-   mfspr   r10, SPRN_SPRG_SCRATCH0
+   mfspr   r10, SPRN_SPRG_SCRATCH2
RFI
 
 1: /* ISI */
-- 
2.25.0



[PATCH v1 6/8] powerpc/32: Simplify EXCEPTION_PROLOG_1 macro

2020-11-24 Thread Christophe Leroy
Make code more readable with a clear CONFIG_VMAP_STACK
section and a clear non CONFIG_VMAP_STACK section.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_32.h | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index 7c767765071d..5e3393122d29 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -46,18 +46,16 @@
mfspr   r1,SPRN_SPRG_THREAD
lwz r1,TASK_STACK-THREAD(r1)
addir1, r1, THREAD_SIZE - INT_FRAME_SIZE
+1:
+   mtcrf   0x7f, r1
+   bt  32 - THREAD_ALIGN_SHIFT, stack_overflow
 #else
subir11, r1, INT_FRAME_SIZE /* use r1 if kernel */
beq 1f
mfspr   r11,SPRN_SPRG_THREAD
lwz r11,TASK_STACK-THREAD(r11)
addir11, r11, THREAD_SIZE - INT_FRAME_SIZE
-#endif
-1:
-   tophys_novmstack r11, r11
-#ifdef CONFIG_VMAP_STACK
-   mtcrf   0x7f, r1
-   bt  32 - THREAD_ALIGN_SHIFT, stack_overflow
+1: tophys(r11, r11)
 #endif
 .endm
 
-- 
2.25.0



[PATCH v1 5/8] powerpc/603: Use SPRN_SDR1 to store the pgdir phys address

2020-11-24 Thread Christophe Leroy
On the 603, SDR1 is not used.

In order to free SPRN_SPRG2, use SPRN_SDR1 to store the pgdir
phys addr.

But only some bits of SDR1 can be used (0x01ff).
As the pgdir is 4k aligned, rotate it by 4 bits to the left.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/reg.h   |  1 -
 arch/powerpc/kernel/head_book3s_32.S | 31 +---
 2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index f877a576b338..a37ce826f6f6 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1203,7 +1203,6 @@
 #ifdef CONFIG_PPC_BOOK3S_32
 #define SPRN_SPRG_SCRATCH0 SPRN_SPRG0
 #define SPRN_SPRG_SCRATCH1 SPRN_SPRG1
-#define SPRN_SPRG_PGDIRSPRN_SPRG2
 #define SPRN_SPRG_603_LRU  SPRN_SPRG4
 #endif
 
diff --git a/arch/powerpc/kernel/head_book3s_32.S 
b/arch/powerpc/kernel/head_book3s_32.S
index 236a95d163be..51eef7b82f9c 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -457,8 +457,9 @@ InstructionTLBMiss:
lis r1, TASK_SIZE@h /* check if kernel address */
cmplw   0,r1,r3
 #endif
-   mfspr   r2, SPRN_SPRG_PGDIR
+   mfspr   r2, SPRN_SDR1
li  r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
+   rlwinm  r2, r2, 28, 0xf000
 #ifdef CONFIG_MODULES
bgt-112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha   /* if kernel address, 
use */
@@ -519,8 +520,9 @@ DataLoadTLBMiss:
mfspr   r3,SPRN_DMISS
lis r1, TASK_SIZE@h /* check if kernel address */
cmplw   0,r1,r3
-   mfspr   r2, SPRN_SPRG_PGDIR
+   mfspr   r2, SPRN_SDR1
li  r1, _PAGE_PRESENT | _PAGE_ACCESSED
+   rlwinm  r2, r2, 28, 0xf000
bgt-112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha   /* if kernel address, 
use */
addir2, r2, (swapper_pg_dir - PAGE_OFFSET)@l/* kernel page 
table */
@@ -595,8 +597,9 @@ DataStoreTLBMiss:
mfspr   r3,SPRN_DMISS
lis r1, TASK_SIZE@h /* check if kernel address */
cmplw   0,r1,r3
-   mfspr   r2, SPRN_SPRG_PGDIR
+   mfspr   r2, SPRN_SDR1
li  r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED
+   rlwinm  r2, r2, 28, 0xf000
bgt-112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha   /* if kernel address, 
use */
addir2, r2, (swapper_pg_dir - PAGE_OFFSET)@l/* kernel page 
table */
@@ -889,9 +892,12 @@ __secondary_start:
tophys(r4,r2)
addir4,r4,THREAD/* phys address of our thread_struct */
mtspr   SPRN_SPRG_THREAD,r4
+BEGIN_MMU_FTR_SECTION
lis r4, (swapper_pg_dir - PAGE_OFFSET)@h
ori r4, r4, (swapper_pg_dir - PAGE_OFFSET)@l
-   mtspr   SPRN_SPRG_PGDIR, r4
+   rlwinm  r4, r4, 4, 0x01ff
+   mtspr   SPRN_SDR1, r4
+END_MMU_FTR_SECTION_IFCLR(MMU_FTR_HPTE_TABLE)
 
/* enable MMU and jump to start_secondary */
li  r4,MSR_KERNEL
@@ -931,11 +937,13 @@ load_up_mmu:
tlbia   /* Clear all TLB entries */
sync/* wait for tlbia/tlbie to finish */
TLBSYNC /* ... on all CPUs */
+BEGIN_MMU_FTR_SECTION
/* Load the SDR1 register (hash table base & size) */
lis r6,_SDR1@ha
tophys(r6,r6)
lwz r6,_SDR1@l(r6)
mtspr   SPRN_SDR1,r6
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE)
 
 /* Load the BAT registers with the values set up by MMU_init. */
lis r3,BATS@ha
@@ -991,9 +999,12 @@ start_here:
tophys(r4,r2)
addir4,r4,THREAD/* init task's THREAD */
mtspr   SPRN_SPRG_THREAD,r4
+BEGIN_MMU_FTR_SECTION
lis r4, (swapper_pg_dir - PAGE_OFFSET)@h
ori r4, r4, (swapper_pg_dir - PAGE_OFFSET)@l
-   mtspr   SPRN_SPRG_PGDIR, r4
+   rlwinm  r4, r4, 4, 0x01ff
+   mtspr   SPRN_SDR1, r4
+END_MMU_FTR_SECTION_IFCLR(MMU_FTR_HPTE_TABLE)
 
/* stack */
lis r1,init_thread_union@ha
@@ -1073,16 +1084,22 @@ _ENTRY(switch_mmu_context)
li  r0,NUM_USER_SEGMENTS
mtctr   r0
 
-   lwz r4, MM_PGD(r4)
 #ifdef CONFIG_BDI_SWITCH
/* Context switch the PTE pointer for the Abatron BDI2000.
 * The PGDIR is passed as second argument.
 */
+   lwz r4, MM_PGD(r4)
lis r5, abatron_pteptrs@ha
stw r4, abatron_pteptrs@l + 0x4(r5)
+#endif
+BEGIN_MMU_FTR_SECTION
+#ifndef CONFIG_BDI_SWITCH
+   lwz r4, MM_PGD(r4)
 #endif
tophys(r4, r4)
-   mtspr   SPRN_SPRG_PGDIR, r4
+   rlwinm  r4, r4, 4, 0x01ff
+   mtspr   SPRN_SDR1, r4
+END_MMU_FTR_SECTION_IFCLR(MMU_FTR_HPTE_TABLE)
li  r4,0
isync
 3:
-- 
2.25.0



[PATCH v1 2/8] powerpc/32s: Don't hash_preload() kernel text

2020-11-24 Thread Christophe Leroy
We now always map kernel text with BATs. Neither need to preload
hash with kernel text addresses nor ensure they are never evicted.

This is more or less a revert of commit ee4f2ea48674 ("[POWERPC] Fix
32-bit mm operations when not using BATs")

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/book3s32/hash_low.S | 18 +-
 arch/powerpc/mm/book3s32/mmu.c  |  2 +-
 arch/powerpc/mm/mmu_decl.h  |  2 --
 arch/powerpc/mm/pgtable_32.c|  4 
 4 files changed, 2 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/mm/book3s32/hash_low.S 
b/arch/powerpc/mm/book3s32/hash_low.S
index b2c912e517b9..48415c857d80 100644
--- a/arch/powerpc/mm/book3s32/hash_low.S
+++ b/arch/powerpc/mm/book3s32/hash_low.S
@@ -411,30 +411,14 @@ END_FTR_SECTION_IFCLR(CPU_FTR_NEED_COHERENT)
 * and we know there is a definite (although small) speed
 * advantage to putting the PTE in the primary PTEG, we always
 * put the PTE in the primary PTEG.
-*
-* In addition, we skip any slot that is mapping kernel text in
-* order to avoid a deadlock when not using BAT mappings if
-* trying to hash in the kernel hash code itself after it has
-* already taken the hash table lock. This works in conjunction
-* with pre-faulting of the kernel text.
-*
-* If the hash table bucket is full of kernel text entries, we'll
-* lockup here but that shouldn't happen
 */
 
-1: lis r4, (next_slot - PAGE_OFFSET)@ha/* get next evict slot 
*/
+   lis r4, (next_slot - PAGE_OFFSET)@ha/* get next evict slot 
*/
lwz r6, (next_slot - PAGE_OFFSET)@l(r4)
addir6,r6,HPTE_SIZE /* search for candidate */
andi.   r6,r6,7*HPTE_SIZE
stw r6,next_slot@l(r4)
add r4,r3,r6
-   LDPTE   r0,HPTE_SIZE/2(r4)  /* get PTE second word */
-   clrrwi  r0,r0,12
-   lis r6,etext@h
-   ori r6,r6,etext@l   /* get etext */
-   tophys(r6,r6)
-   cmplcr0,r0,r6   /* compare and try again */
-   blt 1b
 
 #ifndef CONFIG_SMP
/* Store PTE in PTEG */
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index 5c60dcade90a..23f60e97196e 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -302,7 +302,7 @@ void __init setbat(int index, unsigned long virt, 
phys_addr_t phys,
 /*
  * Preload a translation in the hash table
  */
-void hash_preload(struct mm_struct *mm, unsigned long ea)
+static void hash_preload(struct mm_struct *mm, unsigned long ea)
 {
pmd_t *pmd;
 
diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index 1b6d39e9baed..0ad6d476d01d 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -91,8 +91,6 @@ void print_system_hash_info(void);
 
 #ifdef CONFIG_PPC32
 
-void hash_preload(struct mm_struct *mm, unsigned long ea);
-
 extern void mapin_ram(void);
 extern void setbat(int index, unsigned long virt, phys_addr_t phys,
   unsigned int size, pgprot_t prot);
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 079159e97bca..6e0083e7f008 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -112,10 +112,6 @@ static void __init __mapin_ram_chunk(unsigned long offset, 
unsigned long top)
ktext = ((char *)v >= _stext && (char *)v < etext) ||
((char *)v >= _sinittext && (char *)v < _einittext);
map_kernel_page(v, p, ktext ? PAGE_KERNEL_TEXT : PAGE_KERNEL);
-#ifdef CONFIG_PPC_BOOK3S_32
-   if (ktext)
-   hash_preload(&init_mm, v);
-#endif
v += PAGE_SIZE;
p += PAGE_SIZE;
}
-- 
2.25.0



[PATCH v1 3/8] powerpc/32s: Fix an FTR_SECTION_ELSE

2020-11-24 Thread Christophe Leroy
An FTR_SECTION_ELSE is in the middle of
BEGIN_MMU_FTR_SECTION/ALT_MMU_FTR_SECTION_END_IFSET

Change it to MMU_FTR_SECTION_ELSE

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_book3s_32.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/head_book3s_32.S 
b/arch/powerpc/kernel/head_book3s_32.S
index 27767f3e7ec1..236a95d163be 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -332,7 +332,7 @@ BEGIN_MMU_FTR_SECTION
rlwinm  r3, r5, 32 - 15, 21, 21 /* DSISR_STORE -> _PAGE_RW */
bl  hash_page
b   handle_page_fault_tramp_1
-FTR_SECTION_ELSE
+MMU_FTR_SECTION_ELSE
b   handle_page_fault_tramp_2
 ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_HPTE_TABLE)
 #endif /* CONFIG_VMAP_STACK */
-- 
2.25.0



[PATCH v1 4/8] powerpc/32s: Don't use SPRN_SPRG_PGDIR in hash_page

2020-11-24 Thread Christophe Leroy
SPRN_SPRG_PGDIR is there mainly to speedup SW TLB miss handlers
for powerpc 603.

We need to free SPRN_SPRG2 to reduce the mess with CONFIG_VMAP_STACK.

In hash_page(), reading PGDIR from thread_struct will be in the noise
performance wise.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/book3s32/hash_low.S | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/book3s32/hash_low.S 
b/arch/powerpc/mm/book3s32/hash_low.S
index 48415c857d80..aca353d1c5f4 100644
--- a/arch/powerpc/mm/book3s32/hash_low.S
+++ b/arch/powerpc/mm/book3s32/hash_low.S
@@ -65,13 +65,14 @@ _GLOBAL(hash_page)
/* Get PTE (linux-style) and check access */
lis r0, TASK_SIZE@h /* check if kernel address */
cmplw   0,r4,r0
+   mfspr   r8,SPRN_SPRG_THREAD /* current task's THREAD (phys) */
ori r3,r3,_PAGE_USER|_PAGE_PRESENT /* test low addresses as user */
-   mfspr   r5, SPRN_SPRG_PGDIR /* phys page-table root */
+   lwz r5,PGDIR(r8)/* virt page-table root */
blt+112f/* assume user more likely */
-   lis r5, (swapper_pg_dir - PAGE_OFFSET)@ha   /* if kernel address, 
use */
-   addir5 ,r5 ,(swapper_pg_dir - PAGE_OFFSET)@l/* kernel page 
table */
+   lis r5,swapper_pg_dir@ha/* if kernel address, use */
+   addir5,r5,swapper_pg_dir@l  /* kernel page table */
rlwimi  r3,r9,32-12,29,29   /* MSR_PR -> _PAGE_USER */
-112:
+112:   tophys(r5, r5)
 #ifndef CONFIG_PTE_64BIT
rlwimi  r5,r4,12,20,29  /* insert top 10 bits of address */
lwz r8,0(r5)/* get pmd entry */
-- 
2.25.0



[PATCH v1 1/8] powerpc/32s: Always map kernel text and rodata with BATs

2020-11-24 Thread Christophe Leroy
Since commit 2b279c0348af ("powerpc/32s: Allow mapping with BATs with
DEBUG_PAGEALLOC"), there is no real situation where mapping without
BATs is required.

In order to simplify memory handling, always map kernel text
and rodata with BATs even when "nobats" kernel parameter is set.

Also fix the 603 TLB miss exceptions that don't require anymore
kernel page table if DEBUG_PAGEALLOC.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_book3s_32.S | 4 ++--
 arch/powerpc/mm/book3s32/mmu.c   | 8 +++-
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/head_book3s_32.S 
b/arch/powerpc/kernel/head_book3s_32.S
index a0dda2a1f2df..27767f3e7ec1 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -453,13 +453,13 @@ InstructionTLBMiss:
  */
/* Get PTE (linux-style) and check access */
mfspr   r3,SPRN_IMISS
-#if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC)
+#ifdef CONFIG_MODULES
lis r1, TASK_SIZE@h /* check if kernel address */
cmplw   0,r1,r3
 #endif
mfspr   r2, SPRN_SPRG_PGDIR
li  r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
-#if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC)
+#ifdef CONFIG_MODULES
bgt-112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha   /* if kernel address, 
use */
addir2, r2, (swapper_pg_dir - PAGE_OFFSET)@l/* kernel page 
table */
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index a59e7ec98180..5c60dcade90a 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -157,11 +157,9 @@ unsigned long __init mmu_mapin_ram(unsigned long base, 
unsigned long top)
unsigned long done;
unsigned long border = (unsigned long)__init_begin - PAGE_OFFSET;
 
-   if (__map_without_bats) {
-   pr_debug("RAM mapped without BATs\n");
-   return base;
-   }
-   if (debug_pagealloc_enabled()) {
+
+   if (debug_pagealloc_enabled() || __map_without_bats) {
+   pr_debug_once("Read-Write memory mapped without BATs\n");
if (base >= border)
return base;
if (top >= border)
-- 
2.25.0



Re: [PATCH 0/2] powerpc: Remove support for ppc405/440 Xilinx platforms

2020-11-24 Thread Christophe Leroy




Le 21/05/2020 à 12:38, Christophe Leroy a écrit :



Le 21/05/2020 à 09:02, Michael Ellerman a écrit :

Arnd Bergmann  writes:

+On Wed, Apr 8, 2020 at 2:04 PM Michael Ellerman  wrote:

Benjamin Herrenschmidt  writes:

On Fri, 2020-04-03 at 15:59 +1100, Michael Ellerman wrote:

Benjamin Herrenschmidt  writes:

IBM still put 40x cores inside POWER chips no ?


Oh yeah that's true. I guess most folks don't know that, or that they
run RHEL on them.


Is there a reason for not having those dts files in mainline then?
If nothing else, it would document what machines are still being
used with future kernels.


Sorry that part was a joke :D  Those chips don't run Linux.



Nice to know :)

What's the plan then, do we still want to keep 40x in the kernel ?

If yes, is it ok to drop the oldies anyway as done in my series 
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=172630 ?


(Note that this series will conflict with my series on hugepages on 8xx due to the 
PTE_ATOMIC_UPDATES stuff. I can rebase the 40x modernisation series on top of the 8xx hugepages 
series if it is worth it)




Do we still want to keep 40x in the kernel ? We don't even have a running 40x QEMU machine as far as 
I know.


I'm asking because I'd like to drop the non CONFIG_VMAP_STACK code to simplify and ease stuff (code 
that works with vmalloc'ed stacks also works with stacks in linear memory), but I can't do it 
because 40x doesn't have VMAP_STACK and should I implement it for 40x, I have to means to test it.


So it would ease things if we could drop 40x completely, unless someone there has a 40x platform to 
test stuff.


Thanks
Christophe


Re: [PATCH net 1/2] ibmvnic: Ensure that SCRQ entry reads are correctly ordered

2020-11-24 Thread Michael Ellerman
Thomas Falcon  writes:
> Ensure that received Subordinate Command-Response Queue (SCRQ)
> entries are properly read in order by the driver. These queues
> are used in the ibmvnic device to process RX buffer and TX completion
> descriptors. dma_rmb barriers have been added after checking for a
> pending descriptor to ensure the correct descriptor entry is checked
> and after reading the SCRQ descriptor to ensure the entire
> descriptor is read before processing.
>
> Fixes: 032c5e828 ("Driver for IBM System i/p VNIC protocol")
> Signed-off-by: Thomas Falcon 
> ---
>  drivers/net/ethernet/ibm/ibmvnic.c | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
> b/drivers/net/ethernet/ibm/ibmvnic.c
> index 2aa40b2..489ed5e 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.c
> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> @@ -2403,6 +2403,8 @@ static int ibmvnic_poll(struct napi_struct *napi, int 
> budget)
>  
>   if (!pending_scrq(adapter, adapter->rx_scrq[scrq_num]))
>   break;
> + /* ensure that we do not prematurely exit the polling loop */
> + dma_rmb();

I'd be happier if these comments were more specific about which read(s)
they are ordering vs which other read(s).

I'm sure it's obvious to you, but it may not be to a future author,
and/or after the code has been refactored over time.

>   next = ibmvnic_next_scrq(adapter, adapter->rx_scrq[scrq_num]);
>   rx_buff =
>   (struct ibmvnic_rx_buff *)be64_to_cpu(next->
> @@ -3098,6 +3100,9 @@ static int ibmvnic_complete_tx(struct ibmvnic_adapter 
> *adapter,
>   unsigned int pool = scrq->pool_index;
>   int num_entries = 0;
>  
> + /* ensure that the correct descriptor entry is read */
> + dma_rmb();
> +
>   next = ibmvnic_next_scrq(adapter, scrq);
>   for (i = 0; i < next->tx_comp.num_comps; i++) {
>   if (next->tx_comp.rcs[i]) {
> @@ -3498,6 +3503,9 @@ static union sub_crq *ibmvnic_next_scrq(struct 
> ibmvnic_adapter *adapter,
>   }
>   spin_unlock_irqrestore(&scrq->lock, flags);
>  
> + /* ensure that the entire SCRQ descriptor is read */
> + dma_rmb();
> +
>   return entry;
>  }

cheers


[PATCH v6 22/22] powerpc/book3s64/pkeys: Optimize FTR_KUAP and FTR_KUEP disabled case

2020-11-24 Thread Aneesh Kumar K.V
If FTR_KUAP is disabled kernel will continue to run with the same AMR
value with which it was entered. Hence there is a high chance that
we can return without restoring the AMR value. This also helps the case
when applications are not using the pkey feature. In this case, different
applications will have the same AMR values and hence we can avoid restoring
AMR in this case too.

Also avoid isync() if not really needed.

Do the same for IAMR.

null-syscall benchmark results:

With smap/smep disabled:
Without patch:
957.95 ns2778.17 cycles
With patch:
858.38 ns2489.30 cycles

With smap/smep enabled:
Without patch:
1017.26 ns2950.36 cycles
With patch:
1021.51 ns2962.44 cycles

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/kup.h | 61 +---
 arch/powerpc/kernel/entry_64.S   |  2 +-
 arch/powerpc/kernel/syscall_64.c | 12 +++--
 3 files changed, 65 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index 7026d1b5d0c6..e063e439b0a8 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -12,28 +12,54 @@
 
 #ifdef __ASSEMBLY__
 
-.macro kuap_restore_user_amr gpr1
+.macro kuap_restore_user_amr gpr1, gpr2
 #if defined(CONFIG_PPC_PKEY)
BEGIN_MMU_FTR_SECTION_NESTED(67)
+   b   100f  // skip_restore_amr
+   END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_PKEY, 67)
/*
 * AMR and IAMR are going to be different when
 * returning to userspace.
 */
ld  \gpr1, STACK_REGS_AMR(r1)
+
+   /*
+* If kuap feature is not enabled, do the mtspr
+* only if AMR value is different.
+*/
+   BEGIN_MMU_FTR_SECTION_NESTED(68)
+   mfspr   \gpr2, SPRN_AMR
+   cmpd\gpr1, \gpr2
+   beq 99f
+   END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_KUAP, 68)
+
isync
mtspr   SPRN_AMR, \gpr1
+99:
/*
 * Restore IAMR only when returning to userspace
 */
ld  \gpr1, STACK_REGS_IAMR(r1)
+
+   /*
+* If kuep feature is not enabled, do the mtspr
+* only if IAMR value is different.
+*/
+   BEGIN_MMU_FTR_SECTION_NESTED(69)
+   mfspr   \gpr2, SPRN_IAMR
+   cmpd\gpr1, \gpr2
+   beq 100f
+   END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_KUEP, 69)
+
+   isync
mtspr   SPRN_IAMR, \gpr1
 
+100: //skip_restore_amr
/* No isync required, see kuap_restore_user_amr() */
-   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_PKEY, 67)
 #endif
 .endm
 
-.macro kuap_restore_kernel_amr gpr1, gpr2
+.macro kuap_restore_kernel_amr gpr1, gpr2
 #if defined(CONFIG_PPC_PKEY)
 
BEGIN_MMU_FTR_SECTION_NESTED(67)
@@ -197,18 +223,41 @@ static inline u64 current_thread_iamr(void)
 
 static inline void kuap_restore_user_amr(struct pt_regs *regs)
 {
+   bool restore_amr = false, restore_iamr = false;
+   unsigned long amr, iamr;
+
if (!mmu_has_feature(MMU_FTR_PKEY))
return;
 
-   isync();
-   mtspr(SPRN_AMR, regs->amr);
-   mtspr(SPRN_IAMR, regs->iamr);
+   if (!mmu_has_feature(MMU_FTR_KUAP)) {
+   amr = mfspr(SPRN_AMR);
+   if (amr != regs->amr)
+   restore_amr = true;
+   } else
+   restore_amr = true;
+
+   if (!mmu_has_feature(MMU_FTR_KUEP)) {
+   iamr = mfspr(SPRN_IAMR);
+   if (iamr != regs->iamr)
+   restore_iamr = true;
+   } else
+   restore_iamr = true;
+
+
+   if (restore_amr || restore_iamr) {
+   isync();
+   if (restore_amr)
+   mtspr(SPRN_AMR, regs->amr);
+   if (restore_iamr)
+   mtspr(SPRN_IAMR, regs->iamr);
+   }
/*
 * No isync required here because we are about to rfi
 * back to previous context before any user accesses
 * would be made, which is a CSI.
 */
 }
+
 static inline void kuap_restore_kernel_amr(struct pt_regs *regs,
   unsigned long amr)
 {
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index e49291594c68..a68517e99fd2 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -675,7 +675,7 @@ _ASM_NOKPROBE_SYMBOL(interrupt_return)
bne-.Lrestore_nvgprs
 
 .Lfast_user_interrupt_return_amr:
-   kuap_restore_user_amr r3
+   kuap_restore_user_amr r3, r4
 .Lfast_user_interrupt_return:
ld  r11,_NIP(r1)
ld  r12,_MSR(r1)
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index 60c57609d316..681f9afafc6f 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -38,6 +38,7 @@ notrace long system_call_exception(long r3, long r4, lon

[PATCH v6 21/22] powerpc/book3s64/hash/kup: Don't hardcode kup key

2020-11-24 Thread Aneesh Kumar K.V
Make KUAP/KUEP key a variable and also check whether the platform
limit the max key such that we can't use the key for KUAP/KEUP.

Signed-off-by: Aneesh Kumar K.V 
---
 .../powerpc/include/asm/book3s/64/hash-pkey.h | 22 +---
 arch/powerpc/include/asm/book3s/64/pkeys.h|  1 +
 arch/powerpc/mm/book3s64/pkeys.c  | 53 ---
 3 files changed, 49 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-pkey.h 
b/arch/powerpc/include/asm/book3s/64/hash-pkey.h
index 9f44e208f036..ff9907c72ee3 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-pkey.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-pkey.h
@@ -2,9 +2,7 @@
 #ifndef _ASM_POWERPC_BOOK3S_64_HASH_PKEY_H
 #define _ASM_POWERPC_BOOK3S_64_HASH_PKEY_H
 
-/*  We use key 3 for KERNEL */
-#define HASH_DEFAULT_KERNEL_KEY (HPTE_R_KEY_BIT0 | HPTE_R_KEY_BIT1)
-
+u64 pte_to_hpte_pkey_bits(u64 pteflags, unsigned long flags);
 static inline u64 hash__vmflag_to_pte_pkey_bits(u64 vm_flags)
 {
return (((vm_flags & VM_PKEY_BIT0) ? H_PTE_PKEY_BIT0 : 0x0UL) |
@@ -14,24 +12,6 @@ static inline u64 hash__vmflag_to_pte_pkey_bits(u64 vm_flags)
((vm_flags & VM_PKEY_BIT4) ? H_PTE_PKEY_BIT4 : 0x0UL));
 }
 
-static inline u64 pte_to_hpte_pkey_bits(u64 pteflags, unsigned long flags)
-{
-   unsigned long pte_pkey;
-
-   pte_pkey = (((pteflags & H_PTE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 0x0UL) |
-   ((pteflags & H_PTE_PKEY_BIT3) ? HPTE_R_KEY_BIT3 : 0x0UL) |
-   ((pteflags & H_PTE_PKEY_BIT2) ? HPTE_R_KEY_BIT2 : 0x0UL) |
-   ((pteflags & H_PTE_PKEY_BIT1) ? HPTE_R_KEY_BIT1 : 0x0UL) |
-   ((pteflags & H_PTE_PKEY_BIT0) ? HPTE_R_KEY_BIT0 : 0x0UL));
-
-   if (mmu_has_feature(MMU_FTR_KUAP) || mmu_has_feature(MMU_FTR_KUEP)) {
-   if ((pte_pkey == 0) && (flags & HPTE_USE_KERNEL_KEY))
-   return HASH_DEFAULT_KERNEL_KEY;
-   }
-
-   return pte_pkey;
-}
-
 static inline u16 hash__pte_to_pkey_bits(u64 pteflags)
 {
return (((pteflags & H_PTE_PKEY_BIT4) ? 0x10 : 0x0UL) |
diff --git a/arch/powerpc/include/asm/book3s/64/pkeys.h 
b/arch/powerpc/include/asm/book3s/64/pkeys.h
index 3b8640498f5b..a2b6c4a7275f 100644
--- a/arch/powerpc/include/asm/book3s/64/pkeys.h
+++ b/arch/powerpc/include/asm/book3s/64/pkeys.h
@@ -8,6 +8,7 @@
 extern u64 __ro_after_init default_uamor;
 extern u64 __ro_after_init default_amr;
 extern u64 __ro_after_init default_iamr;
+extern int kup_key;
 
 static inline u64 vmflag_to_pte_pkey_bits(u64 vm_flags)
 {
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index f029e7bf5ca2..204e4598b45c 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -37,7 +37,10 @@ u64 default_uamor __ro_after_init;
  */
 static int execute_only_key = 2;
 static bool pkey_execute_disable_supported;
-
+/*
+ * key used to implement KUAP/KUEP with hash translation.
+ */
+int kup_key = 3;
 
 #define AMR_BITS_PER_PKEY 2
 #define AMR_RD_BIT 0x1UL
@@ -185,6 +188,25 @@ void __init pkey_early_init_devtree(void)
default_uamor &= ~(0x3ul << pkeyshift(execute_only_key));
}
 
+   if (unlikely(num_pkey <= kup_key)) {
+   /*
+* Insufficient number of keys to support
+* KUAP/KUEP feature.
+*/
+   kup_key = -1;
+   } else {
+   /*  handle key which is used by kernel for KAUP */
+   reserved_allocation_mask |= (0x1 << kup_key);
+   /*
+* Mark access for kup_key in default amr so that
+* we continue to operate with that AMR in
+* copy_to/from_user().
+*/
+   default_amr   &= ~(0x3ul << pkeyshift(kup_key));
+   default_iamr  &= ~(0x1ul << pkeyshift(kup_key));
+   default_uamor &= ~(0x3ul << pkeyshift(kup_key));
+   }
+
/*
 * Allow access for only key 0. And prevent any other modification.
 */
@@ -205,9 +227,6 @@ void __init pkey_early_init_devtree(void)
reserved_allocation_mask |= (0x1 << 1);
default_uamor &= ~(0x3ul << pkeyshift(1));
 
-   /*  handle key 3 which is used by kernel for KAUP */
-   reserved_allocation_mask |= (0x1 << 3);
-   default_uamor &= ~(0x3ul << pkeyshift(3));
 
/*
 * Prevent the usage of OS reserved keys. Update UAMOR
@@ -236,7 +255,7 @@ void __init pkey_early_init_devtree(void)
 #ifdef CONFIG_PPC_KUEP
 void __init setup_kuep(bool disabled)
 {
-   if (disabled)
+   if (disabled || kup_key == -1)
return;
/*
 * On hash if PKEY feature is not enabled, disable KUAP too.
@@ -262,7 +281,7 @@ void __init setup_kuep(bool disabled)
 #ifdef CONFIG_PPC_KUAP
 void __init setup_kuap(bool disabled)
 {
-   if (disabled)
+   if (disabled || kup_key == -1)
return;
/*
 * On 

[PATCH v6 20/22] powerpc/book3s64/hash/kuep: Enable KUEP on hash

2020-11-24 Thread Aneesh Kumar K.V
Reviewed-by: Sandipan Das 
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/book3s64/pkeys.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 84f8664ffc47..f029e7bf5ca2 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -236,7 +236,12 @@ void __init pkey_early_init_devtree(void)
 #ifdef CONFIG_PPC_KUEP
 void __init setup_kuep(bool disabled)
 {
-   if (disabled || !early_radix_enabled())
+   if (disabled)
+   return;
+   /*
+* On hash if PKEY feature is not enabled, disable KUAP too.
+*/
+   if (!early_radix_enabled() && !early_mmu_has_feature(MMU_FTR_PKEY))
return;
 
if (smp_processor_id() == boot_cpuid) {
-- 
2.28.0



[PATCH v6 19/22] powerpc/book3s64/hash/kuap: Enable kuap on hash

2020-11-24 Thread Aneesh Kumar K.V
Reviewed-by: Sandipan Das 
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/book3s64/pkeys.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index f747d66cc87d..84f8664ffc47 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -257,7 +257,12 @@ void __init setup_kuep(bool disabled)
 #ifdef CONFIG_PPC_KUAP
 void __init setup_kuap(bool disabled)
 {
-   if (disabled || !early_radix_enabled())
+   if (disabled)
+   return;
+   /*
+* On hash if PKEY feature is not enabled, disable KUAP too.
+*/
+   if (!early_radix_enabled() && !early_mmu_has_feature(MMU_FTR_PKEY))
return;
 
if (smp_processor_id() == boot_cpuid) {
-- 
2.28.0



[PATCH v6 18/22] powerpc/book3s64/kuep: Use Key 3 to implement KUEP with hash translation.

2020-11-24 Thread Aneesh Kumar K.V
Radix use IAMR Key 0 and hash translation use IAMR key 3.

Reviewed-by: Sandipan Das 
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/kup.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index b8861cc2b6c7..7026d1b5d0c6 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -7,7 +7,7 @@
 
 #define AMR_KUAP_BLOCK_READUL(0x5455)
 #define AMR_KUAP_BLOCK_WRITE   UL(0xa8aa)
-#define AMR_KUEP_BLOCKED   (1UL << 62)
+#define AMR_KUEP_BLOCKED   UL(0x5455)
 #define AMR_KUAP_BLOCKED   (AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
 
 #ifdef __ASSEMBLY__
-- 
2.28.0



[PATCH v6 17/22] powerpc/book3s64/kuap: Use Key 3 to implement KUAP with hash translation.

2020-11-24 Thread Aneesh Kumar K.V
Radix use AMR Key 0 and hash translation use AMR key 3.

Reviewed-by: Sandipan Das 
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/kup.h | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index 2922c442a218..b8861cc2b6c7 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -5,11 +5,10 @@
 #include 
 #include 
 
-#define AMR_KUAP_BLOCK_READUL(0x4000)
-#define AMR_KUAP_BLOCK_WRITE   UL(0x8000)
+#define AMR_KUAP_BLOCK_READUL(0x5455)
+#define AMR_KUAP_BLOCK_WRITE   UL(0xa8aa)
 #define AMR_KUEP_BLOCKED   (1UL << 62)
 #define AMR_KUAP_BLOCKED   (AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
-#define AMR_KUAP_SHIFT 62
 
 #ifdef __ASSEMBLY__
 
@@ -62,8 +61,8 @@
 #ifdef CONFIG_PPC_KUAP_DEBUG
BEGIN_MMU_FTR_SECTION_NESTED(67)
mfspr   \gpr1, SPRN_AMR
-   li  \gpr2, (AMR_KUAP_BLOCKED >> AMR_KUAP_SHIFT)
-   sldi\gpr2, \gpr2, AMR_KUAP_SHIFT
+   /* Prevent access to userspace using any key values */
+   LOAD_REG_IMMEDIATE(\gpr2, AMR_KUAP_BLOCKED)
 999:   tdne\gpr1, \gpr2
EMIT_BUG_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | 
BUGFLAG_ONCE)
END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67)
-- 
2.28.0



[PATCH v6 16/22] powerpc/book3s64/kuap: Improve error reporting with KUAP

2020-11-24 Thread Aneesh Kumar K.V
With hash translation use DSISR_KEYFAULT to identify a wrong access.
With Radix we look at the AMR value and type of fault.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/32/kup.h |  4 +--
 arch/powerpc/include/asm/book3s/64/kup.h | 27 
 arch/powerpc/include/asm/kup.h   |  4 +--
 arch/powerpc/include/asm/nohash/32/kup-8xx.h |  4 +--
 arch/powerpc/mm/fault.c  |  2 +-
 5 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/kup.h 
b/arch/powerpc/include/asm/book3s/32/kup.h
index 32fd4452e960..b18cd931e325 100644
--- a/arch/powerpc/include/asm/book3s/32/kup.h
+++ b/arch/powerpc/include/asm/book3s/32/kup.h
@@ -177,8 +177,8 @@ static inline void restore_user_access(unsigned long flags)
allow_user_access(to, to, end - addr, KUAP_READ_WRITE);
 }
 
-static inline bool
-bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
+static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address,
+ bool is_write, unsigned long error_code)
 {
unsigned long begin = regs->kuap & 0xf000;
unsigned long end = regs->kuap << 28;
diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index 4a3d0d601745..2922c442a218 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -301,12 +301,29 @@ static inline void set_kuap(unsigned long value)
isync();
 }
 
-static inline bool
-bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
+#define RADIX_KUAP_BLOCK_READ  UL(0x4000)
+#define RADIX_KUAP_BLOCK_WRITE UL(0x8000)
+
+static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address,
+ bool is_write, unsigned long error_code)
 {
-   return WARN(mmu_has_feature(MMU_FTR_KUAP) &&
-   (regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : 
AMR_KUAP_BLOCK_READ)),
-   "Bug: %s fault blocked by AMR!", is_write ? "Write" : 
"Read");
+   if (!mmu_has_feature(MMU_FTR_KUAP))
+   return false;
+
+   if (radix_enabled()) {
+   /*
+* Will be a storage protection fault.
+* Only check the details of AMR[0]
+*/
+   return WARN((regs->kuap & (is_write ? RADIX_KUAP_BLOCK_WRITE : 
RADIX_KUAP_BLOCK_READ)),
+   "Bug: %s fault blocked by AMR!", is_write ? "Write" 
: "Read");
+   }
+   /*
+* We don't want to WARN here because userspace can setup
+* keys such that a kernel access to user address can cause
+* fault
+*/
+   return !!(error_code & DSISR_KEYFAULT);
 }
 
 static __always_inline void allow_user_access(void __user *to, const void 
__user *from,
diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
index a06e50b68d40..952be0414f43 100644
--- a/arch/powerpc/include/asm/kup.h
+++ b/arch/powerpc/include/asm/kup.h
@@ -59,8 +59,8 @@ void setup_kuap(bool disabled);
 #else
 static inline void setup_kuap(bool disabled) { }
 
-static inline bool
-bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
+static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address,
+ bool is_write, unsigned long error_code)
 {
return false;
 }
diff --git a/arch/powerpc/include/asm/nohash/32/kup-8xx.h 
b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
index 567cdc557402..7bdd9e5b63ed 100644
--- a/arch/powerpc/include/asm/nohash/32/kup-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
@@ -60,8 +60,8 @@ static inline void restore_user_access(unsigned long flags)
mtspr(SPRN_MD_AP, flags);
 }
 
-static inline bool
-bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
+static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address,
+ bool is_write, unsigned long error_code)
 {
return WARN(!((regs->kuap ^ MD_APG_KUAP) & 0xff00),
"Bug: fault blocked by AP register !");
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 0add963a849b..c91621df0c61 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -227,7 +227,7 @@ static bool bad_kernel_fault(struct pt_regs *regs, unsigned 
long error_code,
 
// Read/write fault in a valid region (the exception table search passed
// above), but blocked by KUAP is bad, it can never succeed.
-   if (bad_kuap_fault(regs, address, is_write))
+   if (bad_kuap_fault(regs, address, is_write, error_code))
return true;
 
// What's left? Kernel fault on user in well defined regions (extable
-- 
2.28.0



[PATCH v6 15/22] powerpc/book3s64/kuap: Restrict access to userspace based on userspace AMR

2020-11-24 Thread Aneesh Kumar K.V
If an application has configured address protection such that read/write is
denied using pkey even the kernel should receive a FAULT on accessing the same.

This patch use user AMR value stored in pt_regs.amr to achieve the same.

Reviewed-by: Sandipan Das 
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/kup.h | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index 47270596215b..4a3d0d601745 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -312,14 +312,20 @@ bad_kuap_fault(struct pt_regs *regs, unsigned long 
address, bool is_write)
 static __always_inline void allow_user_access(void __user *to, const void 
__user *from,
  unsigned long size, unsigned long 
dir)
 {
+   unsigned long thread_amr = 0;
+
// This is written so we can resolve to a single case at build time
BUILD_BUG_ON(!__builtin_constant_p(dir));
+
+   if (mmu_has_feature(MMU_FTR_PKEY))
+   thread_amr = current_thread_amr();
+
if (dir == KUAP_READ)
-   set_kuap(AMR_KUAP_BLOCK_WRITE);
+   set_kuap(thread_amr | AMR_KUAP_BLOCK_WRITE);
else if (dir == KUAP_WRITE)
-   set_kuap(AMR_KUAP_BLOCK_READ);
+   set_kuap(thread_amr | AMR_KUAP_BLOCK_READ);
else if (dir == KUAP_READ_WRITE)
-   set_kuap(0);
+   set_kuap(thread_amr);
else
BUILD_BUG();
 }
-- 
2.28.0



[PATCH v6 14/22] powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode.

2020-11-24 Thread Aneesh Kumar K.V
Now that kernel correctly store/restore userspace AMR/IAMR values, avoid
manipulating AMR and IAMR from the kernel on behalf of userspace.

Reviewed-by: Sandipan Das 
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/kup.h | 21 +
 arch/powerpc/include/asm/processor.h |  4 --
 arch/powerpc/kernel/process.c|  4 --
 arch/powerpc/kernel/traps.c  |  6 ---
 arch/powerpc/mm/book3s64/pkeys.c | 57 +---
 5 files changed, 31 insertions(+), 61 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index 4dbb2d53fd8f..47270596215b 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -175,6 +175,27 @@ DECLARE_STATIC_KEY_FALSE(uaccess_flush_key);
 #include 
 #include 
 
+/*
+ * For kernel thread that doesn't have thread.regs return
+ * default AMR/IAMR values.
+ */
+static inline u64 current_thread_amr(void)
+{
+   if (current->thread.regs)
+   return current->thread.regs->amr;
+   return AMR_KUAP_BLOCKED;
+}
+
+static inline u64 current_thread_iamr(void)
+{
+   if (current->thread.regs)
+   return current->thread.regs->iamr;
+   return AMR_KUEP_BLOCKED;
+}
+#endif /* CONFIG_PPC_PKEY */
+
+#ifdef CONFIG_PPC_KUAP
+
 static inline void kuap_restore_user_amr(struct pt_regs *regs)
 {
if (!mmu_has_feature(MMU_FTR_PKEY))
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index c61c859b51a8..c3df3a420c92 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -230,10 +230,6 @@ struct thread_struct {
struct thread_vr_state ckvr_state; /* Checkpointed VR state */
unsigned long   ckvrsave; /* Checkpointed VRSAVE */
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
-#ifdef CONFIG_PPC_MEM_KEYS
-   unsigned long   amr;
-   unsigned long   iamr;
-#endif
 #ifdef CONFIG_KVM_BOOK3S_32_HANDLER
void*   kvm_shadow_vcpu; /* KVM internal data */
 #endif /* CONFIG_KVM_BOOK3S_32_HANDLER */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 98f7e9ec766f..5ffdac46a187 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -589,7 +589,6 @@ static void save_all(struct task_struct *tsk)
__giveup_spe(tsk);
 
msr_check_and_clear(msr_all_available);
-   thread_pkey_regs_save(&tsk->thread);
 }
 
 void flush_all_to_thread(struct task_struct *tsk)
@@ -1160,8 +1159,6 @@ static inline void save_sprs(struct thread_struct *t)
t->tar = mfspr(SPRN_TAR);
}
 #endif
-
-   thread_pkey_regs_save(t);
 }
 
 static inline void restore_sprs(struct thread_struct *old_thread,
@@ -1202,7 +1199,6 @@ static inline void restore_sprs(struct thread_struct 
*old_thread,
mtspr(SPRN_TIDR, new_thread->tidr);
 #endif
 
-   thread_pkey_regs_restore(new_thread, old_thread);
 }
 
 struct task_struct *__switch_to(struct task_struct *prev,
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 5006dcbe1d9f..419028d53fd6 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -347,12 +347,6 @@ static bool exception_common(int signr, struct pt_regs 
*regs, int code,
 
current->thread.trap_nr = code;
 
-   /*
-* Save all the pkey registers AMR/IAMR/UAMOR. Eg: Core dumps need
-* to capture the content, if the task gets killed.
-*/
-   thread_pkey_regs_save(¤t->thread);
-
return true;
 }
 
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index f47d11f2743d..f747d66cc87d 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -273,30 +273,17 @@ void __init setup_kuap(bool disabled)
 }
 #endif
 
-static inline u64 read_amr(void)
+static inline void update_current_thread_amr(u64 value)
 {
-   return mfspr(SPRN_AMR);
+   current->thread.regs->amr = value;
 }
 
-static inline void write_amr(u64 value)
-{
-   mtspr(SPRN_AMR, value);
-}
-
-static inline u64 read_iamr(void)
-{
-   if (!likely(pkey_execute_disable_supported))
-   return 0x0UL;
-
-   return mfspr(SPRN_IAMR);
-}
-
-static inline void write_iamr(u64 value)
+static inline void update_current_thread_iamr(u64 value)
 {
if (!likely(pkey_execute_disable_supported))
return;
 
-   mtspr(SPRN_IAMR, value);
+   current->thread.regs->iamr = value;
 }
 
 #ifdef CONFIG_PPC_MEM_KEYS
@@ -311,17 +298,17 @@ void pkey_mm_init(struct mm_struct *mm)
 static inline void init_amr(int pkey, u8 init_bits)
 {
u64 new_amr_bits = (((u64)init_bits & 0x3UL) << pkeyshift(pkey));
-   u64 old_amr = read_amr() & ~((u64)(0x3ul) << pkeyshift(pkey));
+   u64 old_amr = current_thread_amr() & ~((u64)(0x3ul) << pkeyshift(pkey));
 
-   write_amr(old_amr | new_am

[PATCH v6 13/22] powerpc/ptrace-view: Use pt_regs values instead of thread_struct based one.

2020-11-24 Thread Aneesh Kumar K.V
We will remove thread.amr/iamr/uamor in a later patch

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/kernel/ptrace/ptrace-view.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/ptrace/ptrace-view.c 
b/arch/powerpc/kernel/ptrace/ptrace-view.c
index 7e6478e7ed07..bdbe8cfdafc7 100644
--- a/arch/powerpc/kernel/ptrace/ptrace-view.c
+++ b/arch/powerpc/kernel/ptrace/ptrace-view.c
@@ -470,12 +470,12 @@ static int pkey_active(struct task_struct *target, const 
struct user_regset *reg
 static int pkey_get(struct task_struct *target, const struct user_regset 
*regset,
struct membuf to)
 {
-   BUILD_BUG_ON(TSO(amr) + sizeof(unsigned long) != TSO(iamr));
 
if (!arch_pkeys_enabled())
return -ENODEV;
 
-   membuf_write(&to, &target->thread.amr, 2 * sizeof(unsigned long));
+   membuf_store(&to, target->thread.regs->amr);
+   membuf_store(&to, target->thread.regs->iamr);
return membuf_store(&to, default_uamor);
 }
 
@@ -508,7 +508,8 @@ static int pkey_set(struct task_struct *target, const 
struct user_regset *regset
 * Pick the AMR values for the keys that kernel is using. This
 * will be indicated by the ~default_uamor bits.
 */
-   target->thread.amr = (new_amr & default_uamor) | (target->thread.amr & 
~default_uamor);
+   target->thread.regs->amr = (new_amr & default_uamor) |
+   (target->thread.regs->amr & ~default_uamor);
 
return 0;
 }
-- 
2.28.0



[PATCH v6 12/22] powerpc/book3s64/pkeys: Reset userspace AMR correctly on exec

2020-11-24 Thread Aneesh Kumar K.V
On fork, we inherit from the parent and on exec, we should switch to 
default_amr values.

Also, avoid changing the AMR register value within the kernel. The kernel now 
runs with
different AMR values.

Reviewed-by: Sandipan Das 
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pkeys.h |  2 ++
 arch/powerpc/kernel/process.c  |  6 +-
 arch/powerpc/mm/book3s64/pkeys.c   | 16 ++--
 3 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pkeys.h 
b/arch/powerpc/include/asm/book3s/64/pkeys.h
index b7d9f4267bcd..3b8640498f5b 100644
--- a/arch/powerpc/include/asm/book3s/64/pkeys.h
+++ b/arch/powerpc/include/asm/book3s/64/pkeys.h
@@ -6,6 +6,8 @@
 #include 
 
 extern u64 __ro_after_init default_uamor;
+extern u64 __ro_after_init default_amr;
+extern u64 __ro_after_init default_iamr;
 
 static inline u64 vmflag_to_pte_pkey_bits(u64 vm_flags)
 {
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 733680de0ba4..98f7e9ec766f 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1542,6 +1542,11 @@ void arch_setup_new_exec(void)
current->thread.regs = regs - 1;
}
 
+#ifdef CONFIG_PPC_MEM_KEYS
+   current->thread.regs->amr  = default_amr;
+   current->thread.regs->iamr  = default_iamr;
+#endif
+
 }
 #else
 void arch_setup_new_exec(void)
@@ -1902,7 +1907,6 @@ void start_thread(struct pt_regs *regs, unsigned long 
start, unsigned long sp)
current->thread.load_tm = 0;
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
-   thread_pkey_regs_init(¤t->thread);
 }
 EXPORT_SYMBOL(start_thread);
 
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 640f090b9f9d..f47d11f2743d 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -28,8 +28,8 @@ static u32 initial_allocation_mask __ro_after_init;
  * Even if we allocate keys with sys_pkey_alloc(), we need to make sure
  * other thread still find the access denied using the same keys.
  */
-static u64 default_amr = ~0x0UL;
-static u64 default_iamr = 0xUL;
+u64 default_amr __ro_after_init  = ~0x0UL;
+u64 default_iamr __ro_after_init = 0xUL;
 u64 default_uamor __ro_after_init;
 /*
  * Key used to implement PROT_EXEC mmap. Denies READ/WRITE
@@ -388,18 +388,6 @@ void thread_pkey_regs_restore(struct thread_struct 
*new_thread,
write_iamr(new_thread->iamr);
 }
 
-void thread_pkey_regs_init(struct thread_struct *thread)
-{
-   if (!mmu_has_feature(MMU_FTR_PKEY))
-   return;
-
-   thread->amr   = default_amr;
-   thread->iamr  = default_iamr;
-
-   write_amr(default_amr);
-   write_iamr(default_iamr);
-}
-
 int execute_only_pkey(struct mm_struct *mm)
 {
return mm->context.execute_only_pkey;
-- 
2.28.0



[PATCH v6 11/22] powerpc/book3s64/pkeys: Inherit correctly on fork.

2020-11-24 Thread Aneesh Kumar K.V
Child thread.kuap value is inherited from the parent in copy_thread_tls. We 
still
need to make sure when the child returns from a fork in the kernel we start 
with the kernel
default AMR value.

Reviewed-by: Sandipan Das 
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/kernel/process.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index b6b8a845e454..733680de0ba4 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1768,6 +1768,17 @@ int copy_thread(unsigned long clone_flags, unsigned long 
usp,
childregs->ppr = DEFAULT_PPR;
 
p->thread.tidr = 0;
+#endif
+   /*
+* Run with the current AMR value of the kernel
+*/
+#ifdef CONFIG_PPC_KUAP
+   if (mmu_has_feature(MMU_FTR_KUAP))
+   kregs->kuap = AMR_KUAP_BLOCKED;
+#endif
+#ifdef CONFIG_PPC_KUEP
+   if (mmu_has_feature(MMU_FTR_KUEP))
+   kregs->iamr = AMR_KUEP_BLOCKED;
 #endif
kregs->nip = ppc_function_entry(f);
return 0;
-- 
2.28.0



[PATCH v6 10/22] powerpc/book3s64/pkeys: Store/restore userspace AMR/IAMR correctly on entry and exit from kernel

2020-11-24 Thread Aneesh Kumar K.V
This prepare kernel to operate with a different value than userspace AMR/IAMR.
For this, AMR/IAMR need to be saved and restored on entry and return from the
kernel.

With KUAP we modify kernel AMR when accessing user address from the kernel
via copy_to/from_user interfaces. We don't need to modify IAMR value in
similar fashion.

If MMU_FTR_PKEY is enabled we need to save AMR/IAMR in pt_regs on entering
kernel from userspace. If not we can assume that AMR/IAMR is not modified
from userspace.

We need to save AMR if we have MMU_FTR_KUAP feature enabled and we are
interrupted within kernel. This is required so that if we get interrupted
within copy_to/from_user we continue with the right AMR value.

If we hae MMU_FTR_KUEP enabled we need to restore IAMR on return to userspace
beause kernel will be running with a different IAMR value.

Reviewed-by: Sandipan Das 
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/kup.h | 222 +++
 arch/powerpc/include/asm/ptrace.h|   5 +-
 arch/powerpc/kernel/asm-offsets.c|   2 +
 arch/powerpc/kernel/entry_64.S   |   6 +-
 arch/powerpc/kernel/exceptions-64s.S |   4 +-
 arch/powerpc/kernel/syscall_64.c |  32 +++-
 6 files changed, 225 insertions(+), 46 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index 1d38eab83d48..4dbb2d53fd8f 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -13,17 +13,46 @@
 
 #ifdef __ASSEMBLY__
 
-.macro kuap_restore_amrgpr1, gpr2
-#ifdef CONFIG_PPC_KUAP
+.macro kuap_restore_user_amr gpr1
+#if defined(CONFIG_PPC_PKEY)
BEGIN_MMU_FTR_SECTION_NESTED(67)
-   mfspr   \gpr1, SPRN_AMR
+   /*
+* AMR and IAMR are going to be different when
+* returning to userspace.
+*/
+   ld  \gpr1, STACK_REGS_AMR(r1)
+   isync
+   mtspr   SPRN_AMR, \gpr1
+   /*
+* Restore IAMR only when returning to userspace
+*/
+   ld  \gpr1, STACK_REGS_IAMR(r1)
+   mtspr   SPRN_IAMR, \gpr1
+
+   /* No isync required, see kuap_restore_user_amr() */
+   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_PKEY, 67)
+#endif
+.endm
+
+.macro kuap_restore_kernel_amr gpr1, gpr2
+#if defined(CONFIG_PPC_PKEY)
+
+   BEGIN_MMU_FTR_SECTION_NESTED(67)
+   /*
+* AMR is going to be mostly the same since we are
+* returning to the kernel. Compare and do a mtspr.
+*/
ld  \gpr2, STACK_REGS_AMR(r1)
+   mfspr   \gpr1, SPRN_AMR
cmpd\gpr1, \gpr2
-   beq 998f
+   beq 100f
isync
mtspr   SPRN_AMR, \gpr2
-   /* No isync required, see kuap_restore_amr() */
-998:
+   /*
+* No isync required, see kuap_restore_amr()
+* No need to restore IAMR when returning to kernel space.
+*/
+100:
END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67)
 #endif
 .endm
@@ -42,23 +71,98 @@
 .endm
 #endif
 
+/*
+ * if (pkey) {
+ *
+ * save AMR -> stack;
+ * if (kuap) {
+ * if (AMR != BLOCKED)
+ * KUAP_BLOCKED -> AMR;
+ * }
+ * if (from_user) {
+ * save IAMR -> stack;
+ * if (kuep) {
+ * KUEP_BLOCKED ->IAMR
+ * }
+ * }
+ * return;
+ * }
+ *
+ * if (kuap) {
+ * if (from_kernel) {
+ * save AMR -> stack;
+ * if (AMR != BLOCKED)
+ * KUAP_BLOCKED -> AMR;
+ * }
+ *
+ * }
+ */
 .macro kuap_save_amr_and_lock gpr1, gpr2, use_cr, msr_pr_cr
-#ifdef CONFIG_PPC_KUAP
+#if defined(CONFIG_PPC_PKEY)
+
+   /*
+* if both pkey and kuap is disabled, nothing to do
+*/
+   BEGIN_MMU_FTR_SECTION_NESTED(68)
+   b   100f  // skip_save_amr
+   END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_PKEY | MMU_FTR_KUAP, 68)
+
+   /*
+* if pkey is disabled and we are entering from userspace
+* don't do anything.
+*/
BEGIN_MMU_FTR_SECTION_NESTED(67)
.ifnb \msr_pr_cr
-   bne \msr_pr_cr, 99f
+   /*
+* Without pkey we are not changing AMR outside the kernel
+* hence skip this completely.
+*/
+   bne \msr_pr_cr, 100f  // from userspace
.endif
+END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_PKEY, 67)
+
+   /*
+* pkey is enabled or pkey is disabled but entering from kernel
+*/
mfspr   \gpr1, SPRN_AMR
std \gpr1, STACK_REGS_AMR(r1)
-   li  \gpr2, (AMR_KUAP_BLOCKED >> AMR_KUAP_SHIFT)
-   sldi\gpr2, \gpr2, AMR_KUAP_SHIFT
+
+   /*
+* update kernel AMR with AMR_KUAP_BLOCKED only
+* if KUAP feature is enabled
+*/
+   BEGIN_MMU_FTR_SECTION_NESTED(69)
+   LO

[PATCH v6 09/22] powerpc/exec: Set thread.regs early during exec

2020-11-24 Thread Aneesh Kumar K.V
In later patches during exec, we would like to access default regs.amr to
control access to the user mapping. Having thread.regs set early makes the
code changes simpler.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/thread_info.h |  2 --
 arch/powerpc/kernel/process.c  | 37 +-
 2 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index 46a210b03d2b..de4c911d9ced 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -77,10 +77,8 @@ struct thread_info {
 /* how to get the thread information struct from C */
 extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct 
*src);
 
-#ifdef CONFIG_PPC_BOOK3S_64
 void arch_setup_new_exec(void);
 #define arch_setup_new_exec arch_setup_new_exec
-#endif
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index d421a2c7f822..b6b8a845e454 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1530,10 +1530,32 @@ void flush_thread(void)
 #ifdef CONFIG_PPC_BOOK3S_64
 void arch_setup_new_exec(void)
 {
-   if (radix_enabled())
-   return;
-   hash__setup_new_exec();
+   if (!radix_enabled())
+   hash__setup_new_exec();
+
+   /*
+* If we exec out of a kernel thread then thread.regs will not be
+* set.  Do it now.
+*/
+   if (!current->thread.regs) {
+   struct pt_regs *regs = task_stack_page(current) + THREAD_SIZE;
+   current->thread.regs = regs - 1;
+   }
+
+}
+#else
+void arch_setup_new_exec(void)
+{
+   /*
+* If we exec out of a kernel thread then thread.regs will not be
+* set.  Do it now.
+*/
+   if (!current->thread.regs) {
+   struct pt_regs *regs = task_stack_page(current) + THREAD_SIZE;
+   current->thread.regs = regs - 1;
+   }
 }
+
 #endif
 
 #ifdef CONFIG_PPC64
@@ -1765,15 +1787,6 @@ void start_thread(struct pt_regs *regs, unsigned long 
start, unsigned long sp)
preload_new_slb_context(start, sp);
 #endif
 
-   /*
-* If we exec out of a kernel thread then thread.regs will not be
-* set.  Do it now.
-*/
-   if (!current->thread.regs) {
-   struct pt_regs *regs = task_stack_page(current) + THREAD_SIZE;
-   current->thread.regs = regs - 1;
-   }
-
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
/*
 * Clear any transactional state, we're exec()ing. The cause is
-- 
2.28.0



[PATCH v6 08/22] powerpc/book3s64/kuap: Use Key 3 for kernel mapping with hash translation

2020-11-24 Thread Aneesh Kumar K.V
This patch updates kernel hash page table entries to use storage key 3
for its mapping. This implies all kernel access will now use key 3 to
control READ/WRITE. The patch also prevents the allocation of key 3 from
userspace and UAMOR value is updated such that userspace cannot modify key 3.

Reviewed-by: Sandipan Das 
Signed-off-by: Aneesh Kumar K.V 
---
 .../powerpc/include/asm/book3s/64/hash-pkey.h | 24 ++-
 arch/powerpc/include/asm/book3s/64/hash.h |  2 +-
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  1 +
 arch/powerpc/include/asm/mmu_context.h|  2 +-
 arch/powerpc/mm/book3s64/hash_4k.c|  2 +-
 arch/powerpc/mm/book3s64/hash_64k.c   |  4 ++--
 arch/powerpc/mm/book3s64/hash_hugepage.c  |  2 +-
 arch/powerpc/mm/book3s64/hash_hugetlbpage.c   |  2 +-
 arch/powerpc/mm/book3s64/hash_pgtable.c   |  2 +-
 arch/powerpc/mm/book3s64/hash_utils.c | 10 
 arch/powerpc/mm/book3s64/pkeys.c  |  4 
 11 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-pkey.h 
b/arch/powerpc/include/asm/book3s/64/hash-pkey.h
index 795010897e5d..9f44e208f036 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-pkey.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-pkey.h
@@ -2,6 +2,9 @@
 #ifndef _ASM_POWERPC_BOOK3S_64_HASH_PKEY_H
 #define _ASM_POWERPC_BOOK3S_64_HASH_PKEY_H
 
+/*  We use key 3 for KERNEL */
+#define HASH_DEFAULT_KERNEL_KEY (HPTE_R_KEY_BIT0 | HPTE_R_KEY_BIT1)
+
 static inline u64 hash__vmflag_to_pte_pkey_bits(u64 vm_flags)
 {
return (((vm_flags & VM_PKEY_BIT0) ? H_PTE_PKEY_BIT0 : 0x0UL) |
@@ -11,13 +14,22 @@ static inline u64 hash__vmflag_to_pte_pkey_bits(u64 
vm_flags)
((vm_flags & VM_PKEY_BIT4) ? H_PTE_PKEY_BIT4 : 0x0UL));
 }
 
-static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
+static inline u64 pte_to_hpte_pkey_bits(u64 pteflags, unsigned long flags)
 {
-   return (((pteflags & H_PTE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 0x0UL) |
-   ((pteflags & H_PTE_PKEY_BIT3) ? HPTE_R_KEY_BIT3 : 0x0UL) |
-   ((pteflags & H_PTE_PKEY_BIT2) ? HPTE_R_KEY_BIT2 : 0x0UL) |
-   ((pteflags & H_PTE_PKEY_BIT1) ? HPTE_R_KEY_BIT1 : 0x0UL) |
-   ((pteflags & H_PTE_PKEY_BIT0) ? HPTE_R_KEY_BIT0 : 0x0UL));
+   unsigned long pte_pkey;
+
+   pte_pkey = (((pteflags & H_PTE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 0x0UL) |
+   ((pteflags & H_PTE_PKEY_BIT3) ? HPTE_R_KEY_BIT3 : 0x0UL) |
+   ((pteflags & H_PTE_PKEY_BIT2) ? HPTE_R_KEY_BIT2 : 0x0UL) |
+   ((pteflags & H_PTE_PKEY_BIT1) ? HPTE_R_KEY_BIT1 : 0x0UL) |
+   ((pteflags & H_PTE_PKEY_BIT0) ? HPTE_R_KEY_BIT0 : 0x0UL));
+
+   if (mmu_has_feature(MMU_FTR_KUAP) || mmu_has_feature(MMU_FTR_KUEP)) {
+   if ((pte_pkey == 0) && (flags & HPTE_USE_KERNEL_KEY))
+   return HASH_DEFAULT_KERNEL_KEY;
+   }
+
+   return pte_pkey;
 }
 
 static inline u16 hash__pte_to_pkey_bits(u64 pteflags)
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 73ad038ed10b..d959b0195ad9 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -145,7 +145,7 @@ extern void hash__mark_initmem_nx(void);
 
 extern void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned long pte, int huge);
-extern unsigned long htab_convert_pte_flags(unsigned long pteflags);
+unsigned long htab_convert_pte_flags(unsigned long pteflags, unsigned long 
flags);
 /* Atomic PTE updates */
 static inline unsigned long hash__pte_update(struct mm_struct *mm,
 unsigned long addr,
diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 683a9c7d1b03..9192cb05a6ab 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -452,6 +452,7 @@ static inline unsigned long hpt_hash(unsigned long vpn,
 
 #define HPTE_LOCAL_UPDATE  0x1
 #define HPTE_NOHPTE_UPDATE 0x2
+#define HPTE_USE_KERNEL_KEY0x4
 
 extern int __hash_page_4K(unsigned long ea, unsigned long access,
  unsigned long vsid, pte_t *ptep, unsigned long trap,
diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index e02aa793420b..4b5e1cb49dce 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -284,7 +284,7 @@ static inline bool arch_vma_access_permitted(struct 
vm_area_struct *vma,
 #define thread_pkey_regs_init(thread)
 #define arch_dup_pkeys(oldmm, mm)
 
-static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
+static inline u64 pte_to_hpte_pkey_bits(u64 pteflags, unsigned long flags)
 {
return 0x0UL;
 }
diff --git a/arch/powerpc/mm/book3s64/hash_4k.c 
b/arch/powerp

[PATCH v6 07/22] powerpc/book3s64/kuap: Rename MMU_FTR_RADIX_KUAP to MMU_FTR_KUAP

2020-11-24 Thread Aneesh Kumar K.V
This is in preparate to adding support for kuap with hash translation.
In preparation for that rename/move kuap related functions to
non radix names. Also move the feature bit closer to MMU_FTR_KUEP.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/kup.h | 18 +-
 arch/powerpc/include/asm/mmu.h   | 14 +++---
 arch/powerpc/mm/book3s64/pkeys.c |  2 +-
 3 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index 39d2e3a0d64d..1d38eab83d48 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -24,7 +24,7 @@
mtspr   SPRN_AMR, \gpr2
/* No isync required, see kuap_restore_amr() */
 998:
-   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67)
+   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67)
 #endif
 .endm
 
@@ -37,7 +37,7 @@
sldi\gpr2, \gpr2, AMR_KUAP_SHIFT
 999:   tdne\gpr1, \gpr2
EMIT_BUG_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | 
BUGFLAG_ONCE)
-   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67)
+   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67)
 #endif
 .endm
 #endif
@@ -58,7 +58,7 @@
mtspr   SPRN_AMR, \gpr2
isync
 99:
-   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67)
+   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67)
 #endif
 .endm
 
@@ -73,7 +73,7 @@ DECLARE_STATIC_KEY_FALSE(uaccess_flush_key);
 
 static inline void kuap_restore_amr(struct pt_regs *regs, unsigned long amr)
 {
-   if (mmu_has_feature(MMU_FTR_RADIX_KUAP) && unlikely(regs->kuap != amr)) 
{
+   if (mmu_has_feature(MMU_FTR_KUAP) && unlikely(regs->kuap != amr)) {
isync();
mtspr(SPRN_AMR, regs->kuap);
/*
@@ -86,7 +86,7 @@ static inline void kuap_restore_amr(struct pt_regs *regs, 
unsigned long amr)
 
 static inline unsigned long kuap_get_and_check_amr(void)
 {
-   if (mmu_has_feature(MMU_FTR_RADIX_KUAP)) {
+   if (mmu_has_feature(MMU_FTR_KUAP)) {
unsigned long amr = mfspr(SPRN_AMR);
if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG)) /* kuap_check_amr() */
WARN_ON_ONCE(amr != AMR_KUAP_BLOCKED);
@@ -97,7 +97,7 @@ static inline unsigned long kuap_get_and_check_amr(void)
 
 static inline void kuap_check_amr(void)
 {
-   if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && 
mmu_has_feature(MMU_FTR_RADIX_KUAP))
+   if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && mmu_has_feature(MMU_FTR_KUAP))
WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED);
 }
 
@@ -116,7 +116,7 @@ static inline unsigned long get_kuap(void)
 * This has no effect in terms of actually blocking things on hash,
 * so it doesn't break anything.
 */
-   if (!early_mmu_has_feature(MMU_FTR_RADIX_KUAP))
+   if (!early_mmu_has_feature(MMU_FTR_KUAP))
return AMR_KUAP_BLOCKED;
 
return mfspr(SPRN_AMR);
@@ -124,7 +124,7 @@ static inline unsigned long get_kuap(void)
 
 static inline void set_kuap(unsigned long value)
 {
-   if (!early_mmu_has_feature(MMU_FTR_RADIX_KUAP))
+   if (!early_mmu_has_feature(MMU_FTR_KUAP))
return;
 
/*
@@ -139,7 +139,7 @@ static inline void set_kuap(unsigned long value)
 static inline bool
 bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
 {
-   return WARN(mmu_has_feature(MMU_FTR_RADIX_KUAP) &&
+   return WARN(mmu_has_feature(MMU_FTR_KUAP) &&
(regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : 
AMR_KUAP_BLOCK_READ)),
"Bug: %s fault blocked by AMR!", is_write ? "Write" : 
"Read");
 }
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 255a1837e9f7..f5c7a17c198a 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -28,6 +28,11 @@
  * Individual features below.
  */
 
+/*
+ * Supports KUAP (key 0 controlling userspace addresses) on radix
+ */
+#define MMU_FTR_KUAP   ASM_CONST(0x0200)
+
 /*
  * Support for KUEP feature.
  */
@@ -120,11 +125,6 @@
  */
 #define MMU_FTR_1T_SEGMENT ASM_CONST(0x4000)
 
-/*
- * Supports KUAP (key 0 controlling userspace addresses) on radix
- */
-#define MMU_FTR_RADIX_KUAP ASM_CONST(0x8000)
-
 /* MMU feature bit sets for various CPUs */
 #define MMU_FTRS_DEFAULT_HPTE_ARCH_V2  \
MMU_FTR_HPTE_TABLE | MMU_FTR_PPCAS_ARCH_V2
@@ -187,10 +187,10 @@ enum {
 #ifdef CONFIG_PPC_RADIX_MMU
MMU_FTR_TYPE_RADIX |
MMU_FTR_GTSE |
+#endif /* CONFIG_PPC_RADIX_MMU */
 #ifdef CONFIG_PPC_KUAP
-   MMU_FTR_RADIX_KUAP |
+   MMU_FTR_KUAP |
 #endif /* CONFIG_PPC_KUAP */
-#endif /* CONFIG_PPC_RADIX_MMU */
 #ifdef CONFIG_PPC_MEM_KEYS
MMU_FTR_PKEY |
 #endif
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch

[PATCH v6 06/22] powerpc/book3s64/kuep: Move KUEP related function outside radix

2020-11-24 Thread Aneesh Kumar K.V
The next set of patches adds support for kuep with hash translation.
In preparation for that rename/move kuap related functions to
non radix names.

Also set MMU_FTR_KUEP and add the missing isync().

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/kup.h |  1 +
 arch/powerpc/mm/book3s64/pkeys.c | 21 +
 arch/powerpc/mm/book3s64/radix_pgtable.c | 20 
 3 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index 56dbe3666dc8..39d2e3a0d64d 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -7,6 +7,7 @@
 
 #define AMR_KUAP_BLOCK_READUL(0x4000)
 #define AMR_KUAP_BLOCK_WRITE   UL(0x8000)
+#define AMR_KUEP_BLOCKED   (1UL << 62)
 #define AMR_KUAP_BLOCKED   (AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
 #define AMR_KUAP_SHIFT 62
 
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index c75994cf50a7..82c722fbce52 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -229,6 +229,27 @@ void __init pkey_early_init_devtree(void)
return;
 }
 
+#ifdef CONFIG_PPC_KUEP
+void __init setup_kuep(bool disabled)
+{
+   if (disabled || !early_radix_enabled())
+   return;
+
+   if (smp_processor_id() == boot_cpuid) {
+   pr_info("Activating Kernel Userspace Execution Prevention\n");
+   cur_cpu_spec->mmu_features |= MMU_FTR_KUEP;
+   }
+
+   /*
+* Radix always uses key0 of the IAMR to determine if an access is
+* allowed. We set bit 0 (IBM bit 1) of key0, to prevent instruction
+* fetch.
+*/
+   mtspr(SPRN_IAMR, AMR_KUEP_BLOCKED);
+   isync();
+}
+#endif
+
 #ifdef CONFIG_PPC_KUAP
 void __init setup_kuap(bool disabled)
 {
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index cd9872894552..fd22a5e9f0ff 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -589,26 +589,6 @@ static void radix_init_amor(void)
mtspr(SPRN_AMOR, (3ul << 62));
 }
 
-#ifdef CONFIG_PPC_KUEP
-void setup_kuep(bool disabled)
-{
-   if (disabled || !early_radix_enabled())
-   return;
-
-   if (smp_processor_id() == boot_cpuid) {
-   pr_info("Activating Kernel Userspace Execution Prevention\n");
-   cur_cpu_spec->mmu_features |= MMU_FTR_KUEP;
-   }
-
-   /*
-* Radix always uses key0 of the IAMR to determine if an access is
-* allowed. We set bit 0 (IBM bit 1) of key0, to prevent instruction
-* fetch.
-*/
-   mtspr(SPRN_IAMR, (1ul << 62));
-}
-#endif
-
 void __init radix__early_init_mmu(void)
 {
unsigned long lpcr;
-- 
2.28.0



[PATCH v6 05/22] powerpc/book3s64/kuap: Move KUAP related function outside radix

2020-11-24 Thread Aneesh Kumar K.V
The next set of patches adds support for kuap with hash translation.
In preparation for that rename/move kuap related functions to
non radix names.

Signed-off-by: Aneesh Kumar K.V 
---
 .../asm/book3s/64/{kup-radix.h => kup.h}  |  6 ++---
 arch/powerpc/include/asm/kup.h|  4 +++-
 arch/powerpc/mm/book3s64/pkeys.c  | 22 +++
 arch/powerpc/mm/book3s64/radix_pgtable.c  | 19 
 4 files changed, 28 insertions(+), 23 deletions(-)
 rename arch/powerpc/include/asm/book3s/64/{kup-radix.h => kup.h} (97%)

diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
similarity index 97%
rename from arch/powerpc/include/asm/book3s/64/kup-radix.h
rename to arch/powerpc/include/asm/book3s/64/kup.h
index 68eaa2fac3ab..56dbe3666dc8 100644
--- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_POWERPC_BOOK3S_64_KUP_RADIX_H
-#define _ASM_POWERPC_BOOK3S_64_KUP_RADIX_H
+#ifndef _ASM_POWERPC_BOOK3S_64_KUP_H
+#define _ASM_POWERPC_BOOK3S_64_KUP_H
 
 #include 
 #include 
@@ -200,4 +200,4 @@ static inline void restore_user_access(unsigned long flags)
 }
 #endif /* __ASSEMBLY__ */
 
-#endif /* _ASM_POWERPC_BOOK3S_64_KUP_RADIX_H */
+#endif /* _ASM_POWERPC_BOOK3S_64_KUP_H */
diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
index 0d93331d0fab..a06e50b68d40 100644
--- a/arch/powerpc/include/asm/kup.h
+++ b/arch/powerpc/include/asm/kup.h
@@ -15,11 +15,13 @@
 #define KUAP_CURRENT   (KUAP_CURRENT_READ | KUAP_CURRENT_WRITE)
 
 #ifdef CONFIG_PPC_BOOK3S_64
-#include 
+#include 
 #endif
+
 #ifdef CONFIG_PPC_8xx
 #include 
 #endif
+
 #ifdef CONFIG_PPC_BOOK3S_32
 #include 
 #endif
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 7dc71f85683d..c75994cf50a7 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -9,9 +9,12 @@
 #include 
 #include 
 #include 
+#include 
+
 #include 
 #include 
 
+
 int  num_pkey; /* Max number of pkeys supported */
 /*
  *  Keys marked in the reservation list cannot be allocated by  userspace
@@ -226,6 +229,25 @@ void __init pkey_early_init_devtree(void)
return;
 }
 
+#ifdef CONFIG_PPC_KUAP
+void __init setup_kuap(bool disabled)
+{
+   if (disabled || !early_radix_enabled())
+   return;
+
+   if (smp_processor_id() == boot_cpuid) {
+   pr_info("Activating Kernel Userspace Access Prevention\n");
+   cur_cpu_spec->mmu_features |= MMU_FTR_RADIX_KUAP;
+   }
+
+   /*
+* Set the default kernel AMR values on all cpus.
+*/
+   mtspr(SPRN_AMR, AMR_KUAP_BLOCKED);
+   isync();
+}
+#endif
+
 static inline u64 read_amr(void)
 {
return mfspr(SPRN_AMR);
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index bfe441af916a..cd9872894552 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -609,25 +609,6 @@ void setup_kuep(bool disabled)
 }
 #endif
 
-#ifdef CONFIG_PPC_KUAP
-void setup_kuap(bool disabled)
-{
-   if (disabled || !early_radix_enabled())
-   return;
-
-   if (smp_processor_id() == boot_cpuid) {
-   pr_info("Activating Kernel Userspace Access Prevention\n");
-   cur_cpu_spec->mmu_features |= MMU_FTR_RADIX_KUAP;
-   }
-
-   /*
-* Set the default kernel AMR values on all cpus.
-*/
-   mtspr(SPRN_AMR, AMR_KUAP_BLOCKED);
-   isync();
-}
-#endif
-
 void __init radix__early_init_mmu(void)
 {
unsigned long lpcr;
-- 
2.28.0



[PATCH v6 04/22] powerpc/book3s64/kuap/kuep: Move uamor setup to pkey init

2020-11-24 Thread Aneesh Kumar K.V
This patch consolidates UAMOR update across pkey, kuap and kuep features.
The boot cpu initialize UAMOR via pkey init and both radix/hash do the
secondary cpu UAMOR init in early_init_mmu_secondary.

We don't check for mmu_feature in radix secondary init because UAMOR
is a supported SPRN with all CPUs supporting radix translation.
The old code was not updating UAMOR if we had smap disabled and smep enabled.
This change handles that case.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/book3s64/radix_pgtable.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 3adcf730f478..bfe441af916a 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -620,9 +620,6 @@ void setup_kuap(bool disabled)
cur_cpu_spec->mmu_features |= MMU_FTR_RADIX_KUAP;
}
 
-   /* Make sure userspace can't change the AMR */
-   mtspr(SPRN_UAMOR, 0);
-
/*
 * Set the default kernel AMR values on all cpus.
 */
@@ -721,6 +718,11 @@ void radix__early_init_mmu_secondary(void)
 
radix__switch_mmu_context(NULL, &init_mm);
tlbiel_all();
+
+#ifdef CONFIG_PPC_PKEY
+   /* Make sure userspace can't change the AMR */
+   mtspr(SPRN_UAMOR, 0);
+#endif
 }
 
 void radix__mmu_cleanup_all(void)
-- 
2.28.0



[PATCH v6 03/22] powerpc/book3s64/kuap/kuep: Make KUAP and KUEP a subfeature of PPC_MEM_KEYS

2020-11-24 Thread Aneesh Kumar K.V
The next set of patches adds support for kuap with hash translation.
Hence make KUAP a BOOK3S_64 feature. Also make it a subfeature of
PPC_MEM_KEYS. Hash translation is going to use pkeys to support
KUAP/KUEP. Adding this dependency reduces the code complexity and
enables us to move some of the initialization code to pkeys.c

Signed-off-by: Aneesh Kumar K.V 
---
 .../powerpc/include/asm/book3s/64/kup-radix.h |  4 ++--
 arch/powerpc/include/asm/book3s/64/mmu.h  |  2 +-
 arch/powerpc/include/asm/ptrace.h |  7 +-
 arch/powerpc/kernel/asm-offsets.c |  3 +++
 arch/powerpc/mm/book3s64/Makefile |  2 +-
 arch/powerpc/mm/book3s64/pkeys.c  | 24 ---
 arch/powerpc/platforms/Kconfig.cputype|  5 
 7 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
b/arch/powerpc/include/asm/book3s/64/kup-radix.h
index 28716e2f13e3..68eaa2fac3ab 100644
--- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
@@ -16,7 +16,7 @@
 #ifdef CONFIG_PPC_KUAP
BEGIN_MMU_FTR_SECTION_NESTED(67)
mfspr   \gpr1, SPRN_AMR
-   ld  \gpr2, STACK_REGS_KUAP(r1)
+   ld  \gpr2, STACK_REGS_AMR(r1)
cmpd\gpr1, \gpr2
beq 998f
isync
@@ -48,7 +48,7 @@
bne \msr_pr_cr, 99f
.endif
mfspr   \gpr1, SPRN_AMR
-   std \gpr1, STACK_REGS_KUAP(r1)
+   std \gpr1, STACK_REGS_AMR(r1)
li  \gpr2, (AMR_KUAP_BLOCKED >> AMR_KUAP_SHIFT)
sldi\gpr2, \gpr2, AMR_KUAP_SHIFT
cmpd\use_cr, \gpr1, \gpr2
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index e0b52940e43c..a2a015066bae 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -199,7 +199,7 @@ extern int mmu_io_psize;
 void mmu_early_init_devtree(void);
 void hash__early_init_devtree(void);
 void radix__early_init_devtree(void);
-#ifdef CONFIG_PPC_MEM_KEYS
+#ifdef CONFIG_PPC_PKEY
 void pkey_early_init_devtree(void);
 #else
 static inline void pkey_early_init_devtree(void) {}
diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index e2c778c176a3..e7f1caa007a4 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -53,9 +53,14 @@ struct pt_regs
 #ifdef CONFIG_PPC64
unsigned long ppr;
 #endif
+   union {
 #ifdef CONFIG_PPC_KUAP
-   unsigned long kuap;
+   unsigned long kuap;
 #endif
+#ifdef CONFIG_PPC_PKEY
+   unsigned long amr;
+#endif
+   };
};
unsigned long __pad[2]; /* Maintain 16 byte interrupt stack 
alignment */
};
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index c2722ff36e98..418a0b314a33 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -354,6 +354,9 @@ int main(void)
STACK_PT_REGS_OFFSET(_PPR, ppr);
 #endif /* CONFIG_PPC64 */
 
+#ifdef CONFIG_PPC_PKEY
+   STACK_PT_REGS_OFFSET(STACK_REGS_AMR, amr);
+#endif
 #ifdef CONFIG_PPC_KUAP
STACK_PT_REGS_OFFSET(STACK_REGS_KUAP, kuap);
 #endif
diff --git a/arch/powerpc/mm/book3s64/Makefile 
b/arch/powerpc/mm/book3s64/Makefile
index fd393b8be14f..1b56d3af47d4 100644
--- a/arch/powerpc/mm/book3s64/Makefile
+++ b/arch/powerpc/mm/book3s64/Makefile
@@ -17,7 +17,7 @@ endif
 obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += hash_hugepage.o
 obj-$(CONFIG_PPC_SUBPAGE_PROT) += subpage_prot.o
 obj-$(CONFIG_SPAPR_TCE_IOMMU)  += iommu_api.o
-obj-$(CONFIG_PPC_MEM_KEYS) += pkeys.o
+obj-$(CONFIG_PPC_PKEY) += pkeys.o
 
 # Instrumenting the SLB fault path can lead to duplicate SLB entries
 KCOV_INSTRUMENT_slb.o := n
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index b1d091a97611..7dc71f85683d 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -89,12 +89,14 @@ static int scan_pkey_feature(void)
}
}
 
+#ifdef CONFIG_PPC_MEM_KEYS
/*
 * Adjust the upper limit, based on the number of bits supported by
 * arch-neutral code.
 */
pkeys_total = min_t(int, pkeys_total,
((ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT) + 1));
+#endif
return pkeys_total;
 }
 
@@ -102,6 +104,7 @@ void __init pkey_early_init_devtree(void)
 {
int pkeys_total, i;
 
+#ifdef CONFIG_PPC_MEM_KEYS
/*
 * We define PKEY_DISABLE_EXECUTE in addition to the arch-neutral
 * generic defines for PKEY_DISABLE_ACCESS and PKEY_DISABLE_WRITE.
@@ -117,7 +120,7 @@ void __init pkey_early_init_devtree(void)
BUILD_BUG_ON(__builtin_clzl(ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT) +
 __builti

[PATCH v6 00/22] Kernel userspace access/execution prevention with hash translation

2020-11-24 Thread Aneesh Kumar K.V
This patch series implements KUAP and KUEP with hash translation mode using
memory keys. The kernel now uses memory protection key 3 to control access
to the kernel. Kernel page table entries are now configured with key 3.
Access to locations configured with any other key value is denied when in
kernel mode (MSR_PR=0). This includes userspace which is by default configured
with key 0.

null-syscall benchmark results:

With smap/smep disabled:
Without patch:
845.29 ns2451.44 cycles
With patch series:
858.38 ns2489.30 cycles

With smap/smep enabled:
Without patch:
NA
With patch series:
1021.51 ns2962.44 cycles

Changes from v5:
* Rework the patch based on suggestion from Michael to avoid the
  usage of CONFIG_PPC_PKEY on BOOKE platforms. 

Changes from v4:
* Repost with other pkey related changes split out as a separate series.
* Improve null-syscall benchmark by optimizing SPRN save and restore.

Changes from v3:
* Fix build error reported by kernel test robot 

Changes from v2:
* Rebase to the latest kernel.
* Fixed a bug with disabling KUEP/KUAP on kernel command line
* Added a patch to make kup key dynamic.

Changes from V1:
* Rebased on latest kernel

Aneesh Kumar K.V (22):
  powerpc: Add new macro to handle NESTED_IFCLR
  KVM: PPC: BOOK3S: PR: Ignore UAMOR SPR
  powerpc/book3s64/kuap/kuep: Make KUAP and KUEP a subfeature of
PPC_MEM_KEYS
  powerpc/book3s64/kuap/kuep: Move uamor setup to pkey init
  powerpc/book3s64/kuap: Move KUAP related function outside radix
  powerpc/book3s64/kuep: Move KUEP related function outside radix
  powerpc/book3s64/kuap: Rename MMU_FTR_RADIX_KUAP to MMU_FTR_KUAP
  powerpc/book3s64/kuap: Use Key 3 for kernel mapping with hash
translation
  powerpc/exec: Set thread.regs early during exec
  powerpc/book3s64/pkeys: Store/restore userspace AMR/IAMR correctly on
entry and exit from kernel
  powerpc/book3s64/pkeys: Inherit correctly on fork.
  powerpc/book3s64/pkeys: Reset userspace AMR correctly on exec
  powerpc/ptrace-view: Use pt_regs values instead of thread_struct based
one.
  powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode.
  powerpc/book3s64/kuap: Restrict access to userspace based on userspace
AMR
  powerpc/book3s64/kuap: Improve error reporting with KUAP
  powerpc/book3s64/kuap: Use Key 3 to implement KUAP with hash
translation.
  powerpc/book3s64/kuep: Use Key 3 to implement KUEP with hash
translation.
  powerpc/book3s64/hash/kuap: Enable kuap on hash
  powerpc/book3s64/hash/kuep: Enable KUEP on hash
  powerpc/book3s64/hash/kup: Don't hardcode kup key
  powerpc/book3s64/pkeys: Optimize FTR_KUAP and FTR_KUEP disabled case

 arch/powerpc/include/asm/book3s/32/kup.h  |   4 +-
 .../powerpc/include/asm/book3s/64/hash-pkey.h |  10 +-
 arch/powerpc/include/asm/book3s/64/hash.h |   2 +-
 .../powerpc/include/asm/book3s/64/kup-radix.h | 203 
 arch/powerpc/include/asm/book3s/64/kup.h  | 440 ++
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |   1 +
 arch/powerpc/include/asm/book3s/64/mmu.h  |   2 +-
 arch/powerpc/include/asm/book3s/64/pkeys.h|   3 +
 arch/powerpc/include/asm/feature-fixups.h |   3 +
 arch/powerpc/include/asm/kup.h|   8 +-
 arch/powerpc/include/asm/mmu.h|  14 +-
 arch/powerpc/include/asm/mmu_context.h|   2 +-
 arch/powerpc/include/asm/nohash/32/kup-8xx.h  |   4 +-
 arch/powerpc/include/asm/processor.h  |   4 -
 arch/powerpc/include/asm/ptrace.h |  12 +-
 arch/powerpc/include/asm/thread_info.h|   2 -
 arch/powerpc/kernel/asm-offsets.c |   5 +
 arch/powerpc/kernel/entry_64.S|   6 +-
 arch/powerpc/kernel/exceptions-64s.S  |   4 +-
 arch/powerpc/kernel/process.c |  58 ++-
 arch/powerpc/kernel/ptrace/ptrace-view.c  |   7 +-
 arch/powerpc/kernel/syscall_64.c  |  38 +-
 arch/powerpc/kernel/traps.c   |   6 -
 arch/powerpc/kvm/book3s_emulate.c |   6 +
 arch/powerpc/mm/book3s64/Makefile |   2 +-
 arch/powerpc/mm/book3s64/hash_4k.c|   2 +-
 arch/powerpc/mm/book3s64/hash_64k.c   |   4 +-
 arch/powerpc/mm/book3s64/hash_hugepage.c  |   2 +-
 arch/powerpc/mm/book3s64/hash_hugetlbpage.c   |   2 +-
 arch/powerpc/mm/book3s64/hash_pgtable.c   |   2 +-
 arch/powerpc/mm/book3s64/hash_utils.c |  10 +-
 arch/powerpc/mm/book3s64/pkeys.c  | 177 ---
 arch/powerpc/mm/book3s64/radix_pgtable.c  |  47 +-
 arch/powerpc/mm/fault.c   |   2 +-
 arch/powerpc/platforms/Kconfig.cputype|   5 +
 35 files changed, 715 insertions(+), 384 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/book3s/64/kup-radix.h
 create mode 100644 arch/powerpc/include/asm/book3s/64/kup.h

-- 
2.28.0



[PATCH v6 02/22] KVM: PPC: BOOK3S: PR: Ignore UAMOR SPR

2020-11-24 Thread Aneesh Kumar K.V
With power7 and above we expect the cpu to support keys. The
number of keys are firmware controlled based on device tree.
PR KVM do not expose key details via device tree. Hence when running with PR KVM
we do run with MMU_FTR_KEY support disabled. But we can still
get updates on UAMOR. Hence ignore access to them and for mfstpr return
0 indicating no AMR/IAMR update is no allowed.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/kvm/book3s_emulate.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index 0effd48c8f4d..b08cc15f31c7 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -840,6 +840,9 @@ int kvmppc_core_emulate_mtspr_pr(struct kvm_vcpu *vcpu, int 
sprn, ulong spr_val)
case SPRN_MMCR1:
case SPRN_MMCR2:
case SPRN_UMMCR2:
+   case SPRN_UAMOR:
+   case SPRN_IAMR:
+   case SPRN_AMR:
 #endif
break;
 unprivileged:
@@ -1004,6 +1007,9 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, 
int sprn, ulong *spr_val
case SPRN_MMCR2:
case SPRN_UMMCR2:
case SPRN_TIR:
+   case SPRN_UAMOR:
+   case SPRN_IAMR:
+   case SPRN_AMR:
 #endif
*spr_val = 0;
break;
-- 
2.28.0



[PATCH v6 01/22] powerpc: Add new macro to handle NESTED_IFCLR

2020-11-24 Thread Aneesh Kumar K.V
This will be used by the following patches

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/feature-fixups.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/include/asm/feature-fixups.h 
b/arch/powerpc/include/asm/feature-fixups.h
index fbd406cd6916..5cdba929a8ae 100644
--- a/arch/powerpc/include/asm/feature-fixups.h
+++ b/arch/powerpc/include/asm/feature-fixups.h
@@ -100,6 +100,9 @@ label##5:   
\
 #define END_MMU_FTR_SECTION_NESTED_IFSET(msk, label)   \
END_MMU_FTR_SECTION_NESTED((msk), (msk), label)
 
+#define END_MMU_FTR_SECTION_NESTED_IFCLR(msk, label)   \
+   END_MMU_FTR_SECTION_NESTED((msk), 0, label)
+
 #define END_MMU_FTR_SECTION_IFSET(msk) END_MMU_FTR_SECTION((msk), (msk))
 #define END_MMU_FTR_SECTION_IFCLR(msk) END_MMU_FTR_SECTION((msk), 0)
 
-- 
2.28.0



Re: [PATCH 2/3] powerpc: Make NUMA default y for powernv

2020-11-24 Thread Srikar Dronamraju
* Michael Ellerman  [2020-11-24 23:05:46]:

> Our NUMA option is default y for pseries, but not powernv. The bulk of
> powernv systems are NUMA, so make NUMA default y for powernv also.
> 
> Signed-off-by: Michael Ellerman 

Looks good to me.

Reviewed-by: Srikar Dronamraju 
> ---
>  arch/powerpc/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index a22db3db6b96..4d688b426353 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -661,7 +661,7 @@ config IRQ_ALL_CPUS
>  config NUMA
>   bool "NUMA support"
>   depends on PPC64 && SMP
> - default y if SMP && PPC_PSERIES
> + default y if PPC_PSERIES || PPC_POWERNV
> 
>  config NODES_SHIFT
>   int
> -- 
> 2.25.1
> 

-- 
Thanks and Regards
Srikar Dronamraju


Re: [PATCH 1/3] powerpc: Make NUMA depend on SMP

2020-11-24 Thread Srikar Dronamraju
* Michael Ellerman  [2020-11-24 23:05:45]:

> Our Kconfig allows NUMA to be enabled without SMP, but none of
> our defconfigs use that combination. This means it can easily be
> broken inadvertently by code changes, which has happened recently.
> 
> Although it's theoretically possible to have a machine with a single
> CPU and multiple memory nodes, I can't think of any real systems where
> that's the case. Even so if such a system exists, it can just run an
> SMP kernel anyway.
> 
> So to avoid the need to add extra #ifdefs and/or build breaks, make
> NUMA depend on SMP.
> 
> Reported-by: kernel test robot 
> Reported-by: Randy Dunlap 
> Signed-off-by: Michael Ellerman 

Looks good to me.

Reviewed-by: Srikar Dronamraju 
> ---
>  arch/powerpc/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index e9f13fe08492..a22db3db6b96 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -660,7 +660,7 @@ config IRQ_ALL_CPUS
> 
>  config NUMA
>   bool "NUMA support"
> - depends on PPC64
> + depends on PPC64 && SMP
>   default y if SMP && PPC_PSERIES
> 
>  config NODES_SHIFT
> -- 
> 2.25.1
> 

-- 
Thanks and Regards
Srikar Dronamraju


[PATCH] powerpc/configs: Add ppc64le_allnoconfig target

2020-11-24 Thread Michael Ellerman
Add a phony target for ppc64le_allnoconfig, which tests some
combinations of CONFIG symbols that aren't covered by any of our
defconfigs.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/Makefile   | 5 +
 arch/powerpc/configs/ppc64le.config | 2 ++
 2 files changed, 7 insertions(+)
 create mode 100644 arch/powerpc/configs/ppc64le.config

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index a4d56f0a41d9..26a17798c815 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -376,6 +376,11 @@ PHONY += ppc64le_allmodconfig
$(Q)$(MAKE) KCONFIG_ALLCONFIG=$(srctree)/arch/powerpc/configs/le.config 
\
-f $(srctree)/Makefile allmodconfig
 
+PHONY += ppc64le_allnoconfig
+ppc64le_allnoconfig:
+   $(Q)$(MAKE) 
KCONFIG_ALLCONFIG=$(srctree)/arch/powerpc/configs/ppc64le.config \
+   -f $(srctree)/Makefile allnoconfig
+
 PHONY += ppc64_book3e_allmodconfig
 ppc64_book3e_allmodconfig:
$(Q)$(MAKE) 
KCONFIG_ALLCONFIG=$(srctree)/arch/powerpc/configs/85xx-64bit.config \
diff --git a/arch/powerpc/configs/ppc64le.config 
b/arch/powerpc/configs/ppc64le.config
new file mode 100644
index ..14dca1062c1b
--- /dev/null
+++ b/arch/powerpc/configs/ppc64le.config
@@ -0,0 +1,2 @@
+CONFIG_PPC64=y
+CONFIG_CPU_LITTLE_ENDIAN=y
-- 
2.25.1



Re: linux-next: build failure in Linus' tree

2020-11-24 Thread Michael Ellerman
Daniel Axtens  writes:
> Thanks sfr and mpe.
>
>> Applied to powerpc/fixes.
>>
>> [1/1] powerpc/64s: Fix allnoconfig build since uaccess flush
>>   
>> https://git.kernel.org/powerpc/c/b6b79dd53082db11070b4368d85dd6699ff0b063
>
> We also needed a similar fix for stable, which has also been applied.
>
> I guess I should build some sort of build process that tests a whole
> range of configs. I did test a few but clearly not enough. Is there a
> known list that I should be using? Something from kisskb?

It's basically unsolvable in general. I guess allnoconfig is a good one
to build, although by default that gets you a 32-bit config.

I'll send a patch to add ppc64le_allnoconfig.

cheers


Re: C vdso

2020-11-24 Thread Michael Ellerman
Christophe Leroy  writes:
> Le 03/11/2020 à 19:13, Christophe Leroy a écrit :
>> Le 23/10/2020 à 15:24, Michael Ellerman a écrit :
>>> Christophe Leroy  writes:
 Le 24/09/2020 à 15:17, Christophe Leroy a écrit :
> Le 17/09/2020 à 14:33, Michael Ellerman a écrit :
>> Christophe Leroy  writes:
>>>
>>> What is the status with the generic C vdso merge ?
>>> In some mail, you mentionned having difficulties getting it working on
>>> ppc64, any progress ? What's the problem ? Can I help ?
>>
>> Yeah sorry I was hoping to get time to work on it but haven't been able
>> to.
>>
>> It's causing crashes on ppc64 ie. big endian.
>>> ...
>
> Can you tell what defconfig you are using ? I have been able to setup a 
> full glibc PPC64 cross
> compilation chain and been able to test it under QEMU with success, using 
> Nathan's vdsotest tool.

 What config are you using ?
>>>
>>> ppc64_defconfig + guest.config
>>>
>>> Or pseries_defconfig.
>>>
>>> I'm using Ubuntu GCC 9.3.0 mostly, but it happens with other toolchains too.
>>>
>>> At a minimum we're seeing relocations in the output, which is a problem:
>>>
>>>    $ readelf -r build\~/arch/powerpc/kernel/vdso64/vdso64.so
>>>    Relocation section '.rela.dyn' at offset 0x12a8 contains 8 entries:
>>>  Offset  Info   Type   Sym. Value    Sym. Name 
>>> + Addend
>>>    1368  0016 R_PPC64_RELATIVE 7c0
>>>    1370  0016 R_PPC64_RELATIVE 9300
>>>    1380  0016 R_PPC64_RELATIVE 970
>>>    1388  0016 R_PPC64_RELATIVE 9300
>>>    1398  0016 R_PPC64_RELATIVE a90
>>>    13a0  0016 R_PPC64_RELATIVE 9300
>>>    13b0  0016 R_PPC64_RELATIVE b20
>>>    13b8  0016 R_PPC64_RELATIVE 9300
>> 
>> Looks like it's due to the OPD and relation between the function() and 
>> .function()
>> 
>> By using DOTSYM() in the 'bl' call, that's directly the dot function which 
>> is called and the OPD is 
>> not used anymore, it can get dropped.
>> 
>> Now I get .rela.dyn full of 0, don't know if we should drop it explicitely.
>
> What is the status now with latest version of CVDSO ? I saw you had it in 
> next-test for some time, 
> it is not there anymore today.

Still having some trouble with the compat VDSO.

eg:

$ ./vdsotest clock-gettime-monotonic verify
timestamp obtained from kernel predates timestamp
previously obtained from libc/vDSO:
[1346, 821441653] (vDSO)
[570, 769440040] (kernel)


And similar for all clocks except the coarse ones.

Hopefully I can find time to dig into it.

cheers


Re: [PATCH] net/ethernet/freescale: Fix incorrect IS_ERR_VALUE macro usages

2020-11-24 Thread liwei (GF)
Hi Yang,

On 2020/11/25 6:13, Li Yang wrote:
> On Tue, Nov 24, 2020 at 3:44 PM Li Yang  wrote:
>>
>> On Tue, Nov 24, 2020 at 12:24 AM Wei Li  wrote:
>>>
>>> IS_ERR_VALUE macro should be used only with unsigned long type.
>>> Especially it works incorrectly with unsigned shorter types on
>>> 64bit machines.
>>
>> This is truly a problem for the driver to run on 64-bit architectures.
>> But from an earlier discussion
>> https://patchwork.kernel.org/project/linux-kbuild/patch/1464384685-347275-1-git-send-email-a...@arndb.de/,
>> the preferred solution would be removing the IS_ERR_VALUE() usage or
>> make the values to be unsigned long.
>>
>> It looks like we are having a bigger problem with the 64-bit support
>> for the driver that the offset variables can also be real pointers
>> which cannot be held with 32-bit data types(when uf_info->bd_mem_part
>> == MEM_PART_SYSTEM).  So actually we have to change these offsets to
>> unsigned long, otherwise we are having more serious issues on 64-bit
>> systems.  Are you willing to make such changes or you want us to deal
>> with it?
> 
> Well, it looks like this hardware block was never integrated on a
> 64-bit SoC and will very likely to keep so.  So probably we can keep
> the driver 32-bit only.  It is currently limited to PPC32 in Kconfig,
> how did you build it for 64-bit?
> 
>>

Thank you for providing the earlier discussion archive. In fact, this
issue is detected by our static analysis tool.

>From my view, there is no harm to fix these potential misuses. But if you
really have decided to keep the driver 32-bit only, please just ingore this 
patch.

Thanks,
Wei

>>>
>>> Fixes: 4c35630ccda5 ("[POWERPC] Change rheap functions to use ulongs 
>>> instead of pointers")
>>> Signed-off-by: Wei Li 
>>> ---
>>>  drivers/net/ethernet/freescale/ucc_geth.c | 30 +++
>>>  1 file changed, 15 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/freescale/ucc_geth.c 
>>> b/drivers/net/ethernet/freescale/ucc_geth.c
>>> index 714b501be7d0..8656d9be256a 100644
>>> --- a/drivers/net/ethernet/freescale/ucc_geth.c
>>> +++ b/drivers/net/ethernet/freescale/ucc_geth.c
>>> @@ -286,7 +286,7 @@ static int fill_init_enet_entries(struct 
>>> ucc_geth_private *ugeth,
>>> else {
>>> init_enet_offset =
>>> qe_muram_alloc(thread_size, thread_alignment);
>>> -   if (IS_ERR_VALUE(init_enet_offset)) {
>>> +   if (IS_ERR_VALUE((unsigned 
>>> long)(int)init_enet_offset)) {
>>> if (netif_msg_ifup(ugeth))
>>> pr_err("Can not allocate DPRAM 
>>> memory\n");
>>> qe_put_snum((u8) snum);
>>> @@ -2223,7 +2223,7 @@ static int ucc_geth_alloc_tx(struct ucc_geth_private 
>>> *ugeth)
>>> ugeth->tx_bd_ring_offset[j] =
>>> qe_muram_alloc(length,
>>>UCC_GETH_TX_BD_RING_ALIGNMENT);
>>> -   if (!IS_ERR_VALUE(ugeth->tx_bd_ring_offset[j]))
>>> +   if (!IS_ERR_VALUE((unsigned 
>>> long)(int)ugeth->tx_bd_ring_offset[j]))
>>> ugeth->p_tx_bd_ring[j] =
>>> (u8 __iomem *) qe_muram_addr(ugeth->
>>>  
>>> tx_bd_ring_offset[j]);
>>> @@ -2300,7 +2300,7 @@ static int ucc_geth_alloc_rx(struct ucc_geth_private 
>>> *ugeth)
>>> ugeth->rx_bd_ring_offset[j] =
>>> qe_muram_alloc(length,
>>>UCC_GETH_RX_BD_RING_ALIGNMENT);
>>> -   if (!IS_ERR_VALUE(ugeth->rx_bd_ring_offset[j]))
>>> +   if (!IS_ERR_VALUE((unsigned 
>>> long)(int)ugeth->rx_bd_ring_offset[j]))
>>> ugeth->p_rx_bd_ring[j] =
>>> (u8 __iomem *) qe_muram_addr(ugeth->
>>>  
>>> rx_bd_ring_offset[j]);
>>> @@ -2510,7 +2510,7 @@ static int ucc_geth_startup(struct ucc_geth_private 
>>> *ugeth)
>>> ugeth->tx_glbl_pram_offset =
>>> qe_muram_alloc(sizeof(struct ucc_geth_tx_global_pram),
>>>UCC_GETH_TX_GLOBAL_PRAM_ALIGNMENT);
>>> -   if (IS_ERR_VALUE(ugeth->tx_glbl_pram_offset)) {
>>> +   if (IS_ERR_VALUE((unsigned long)(int)ugeth->tx_glbl_pram_offset)) {
>>> if (netif_msg_ifup(ugeth))
>>> pr_err("Can not allocate DPRAM memory for 
>>> p_tx_glbl_pram\n");
>>> return -ENOMEM;
>>> @@ -2530,7 +2530,7 @@ static int ucc_geth_startup(struct ucc_geth_private 
>>> *ugeth)
>>>sizeof(struct ucc_geth_thread_data_tx) +
>>>32 * (numThreadsTxNumerical == 1),
>>>  

Re: [PATCH 1/2] genirq: add an affinity parameter to irq_create_mapping()

2020-11-24 Thread kernel test robot
Hi Laurent,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on gpio/for-next]
[also build test ERROR on linus/master v5.10-rc5 next-20201124]
[cannot apply to powerpc/next tip/irq/core]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Laurent-Vivier/powerpc-pseries-fix-MSI-X-IRQ-affinity-on-pseries/20201125-040537
base:   https://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio.git 
for-next
config: powerpc64-randconfig-r024-20201124 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
df9ae5992889560a8f3c6760b54d5051b47c7bf5)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc64 cross compiling tool for clang build
# apt-get install binutils-powerpc64-linux-gnu
# 
https://github.com/0day-ci/linux/commit/86de9fd2e4f360722119b69bb2269330ae9e1d54
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Laurent-Vivier/powerpc-pseries-fix-MSI-X-IRQ-affinity-on-pseries/20201125-040537
git checkout 86de9fd2e4f360722119b69bb2269330ae9e1d54
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross 
ARCH=powerpc64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   In file included from drivers/mfd/wm831x-core.c:21:
>> include/linux/mfd/wm831x/core.h:424:51: error: too few arguments to function 
>> call, expected 3, have 2
   return irq_create_mapping(wm831x->irq_domain, irq);
  ~~^
   include/linux/irqdomain.h:387:21: note: 'irq_create_mapping' declared here
   extern unsigned int irq_create_mapping(struct irq_domain *host,
   ^
   1 error generated.

vim +424 include/linux/mfd/wm831x/core.h

7d4d0a3e7343e31 Mark Brown 2009-07-27  421  
cd99758ba3bde64 Mark Brown 2012-05-14  422  static inline int wm831x_irq(struct 
wm831x *wm831x, int irq)
cd99758ba3bde64 Mark Brown 2012-05-14  423  {
cd99758ba3bde64 Mark Brown 2012-05-14 @424  return 
irq_create_mapping(wm831x->irq_domain, irq);
cd99758ba3bde64 Mark Brown 2012-05-14  425  }
cd99758ba3bde64 Mark Brown 2012-05-14  426  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH 1/2] genirq: add an affinity parameter to irq_create_mapping()

2020-11-24 Thread kernel test robot
Hi Laurent,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on gpio/for-next]
[also build test ERROR on linus/master v5.10-rc5 next-20201124]
[cannot apply to powerpc/next tip/irq/core]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Laurent-Vivier/powerpc-pseries-fix-MSI-X-IRQ-affinity-on-pseries/20201125-040537
base:   https://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio.git 
for-next
config: parisc-randconfig-r014-20201124 (attached as .config)
compiler: hppa64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/86de9fd2e4f360722119b69bb2269330ae9e1d54
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Laurent-Vivier/powerpc-pseries-fix-MSI-X-IRQ-affinity-on-pseries/20201125-040537
git checkout 86de9fd2e4f360722119b69bb2269330ae9e1d54
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=parisc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   In file included from drivers/regulator/wm831x-dcdc.c:21:
   include/linux/mfd/wm831x/core.h: In function 'wm831x_irq':
>> include/linux/mfd/wm831x/core.h:424:9: error: too few arguments to function 
>> 'irq_create_mapping'
 424 |  return irq_create_mapping(wm831x->irq_domain, irq);
 | ^~
   In file included from include/linux/acpi.h:13,
from include/linux/i2c.h:13,
from drivers/regulator/wm831x-dcdc.c:14:
   include/linux/irqdomain.h:387:21: note: declared here
 387 | extern unsigned int irq_create_mapping(struct irq_domain *host,
 | ^~
   In file included from drivers/regulator/wm831x-dcdc.c:21:
   include/linux/mfd/wm831x/core.h:425:1: error: control reaches end of 
non-void function [-Werror=return-type]
 425 | }
 | ^
   cc1: some warnings being treated as errors

vim +/irq_create_mapping +424 include/linux/mfd/wm831x/core.h

7d4d0a3e7343e31 Mark Brown 2009-07-27  421  
cd99758ba3bde64 Mark Brown 2012-05-14  422  static inline int wm831x_irq(struct 
wm831x *wm831x, int irq)
cd99758ba3bde64 Mark Brown 2012-05-14  423  {
cd99758ba3bde64 Mark Brown 2012-05-14 @424  return 
irq_create_mapping(wm831x->irq_domain, irq);
cd99758ba3bde64 Mark Brown 2012-05-14  425  }
cd99758ba3bde64 Mark Brown 2012-05-14  426  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()

2020-11-24 Thread Thomas Gleixner
On Tue, Nov 24 2020 at 21:03, Laurent Vivier wrote:
> With virtio multiqueue, normally each queue IRQ is mapped to a CPU.
>
> This problem cannot be shown on x86_64 for two reasons:

There is only _ONE_ reason why this is not a problem on x86. x86 uses
the generic PCI/MSI domain which supports this out of the box.

> - the call path traverses arch_setup_msi_irqs() that is arch specific:
>
>virtscsi_probe()
>   virtscsi_init()
>  vp_modern_find_vqs()
> vp_find_vqs()
>vp_find_vqs_msix()
>   pci_alloc_irq_vectors_affinity()
>  __pci_enable_msix_range()
> pci_msi_setup_msi_irqs()
>arch_setup_msi_irqs()
>   rtas_setup_msi_irqs()

This is a problem on _all_ variants of PPC MSI providers, not only for
pseries. It's not restricted to virtscsi devices either, that's just the
device which made you discover this.

Thanks,

tglx







Re: [PATCH v2 2/2] kbuild: Disable CONFIG_LD_ORPHAN_WARN for ld.lld 10.0.1

2020-11-24 Thread Kees Cook
On Thu, Nov 19, 2020 at 01:13:27PM -0800, Nick Desaulniers wrote:
> On Thu, Nov 19, 2020 at 12:57 PM Nathan Chancellor
>  wrote:
> >
> > ld.lld 10.0.1 spews a bunch of various warnings about .rela sections,
> > along with a few others. Newer versions of ld.lld do not have these
> > warnings. As a result, do not add '--orphan-handling=warn' to
> > LDFLAGS_vmlinux if ld.lld's version is not new enough.
> >
> > Link: https://github.com/ClangBuiltLinux/linux/issues/1187
> > Link: https://github.com/ClangBuiltLinux/linux/issues/1193
> > Reported-by: Arvind Sankar 
> > Reported-by: kernelci.org bot 
> > Reported-by: Mark Brown 
> > Reviewed-by: Kees Cook 
> > Signed-off-by: Nathan Chancellor 
> 
> Thanks for the additions in v2.
> Reviewed-by: Nick Desaulniers 

I'm going to carry this for a few days in -next, and if no one screams,
ask Linus to pull it for v5.10-rc6.

Thanks!

-- 
Kees Cook


Re: [PATCH 1/2] genirq: add an affinity parameter to irq_create_mapping()

2020-11-24 Thread Thomas Gleixner
On Tue, Nov 24 2020 at 21:03, Laurent Vivier wrote:
> This parameter is needed to pass it to irq_domain_alloc_descs().
>
> This seems to have been missed by
> o06ee6d571f0e ("genirq: Add affinity hint to irq allocation")

No, this has not been missed at all. There was and is no reason to do
this.

> This is needed to implement proper support for multiqueue with
> pseries.

And because pseries needs this _all_ callers need to be changed?

>  123 files changed, 171 insertions(+), 146 deletions(-)

Lots of churn for nothing. 99% of the callers will never need that.

What's wrong with simply adding an interface which takes that parameter,
make the existing one an inline wrapper and and leave the rest alone?

Thanks,

tglx





Re: [PATCH] net/ethernet/freescale: Fix incorrect IS_ERR_VALUE macro usages

2020-11-24 Thread Li Yang
On Tue, Nov 24, 2020 at 3:44 PM Li Yang  wrote:
>
> On Tue, Nov 24, 2020 at 12:24 AM Wei Li  wrote:
> >
> > IS_ERR_VALUE macro should be used only with unsigned long type.
> > Especially it works incorrectly with unsigned shorter types on
> > 64bit machines.
>
> This is truly a problem for the driver to run on 64-bit architectures.
> But from an earlier discussion
> https://patchwork.kernel.org/project/linux-kbuild/patch/1464384685-347275-1-git-send-email-a...@arndb.de/,
> the preferred solution would be removing the IS_ERR_VALUE() usage or
> make the values to be unsigned long.
>
> It looks like we are having a bigger problem with the 64-bit support
> for the driver that the offset variables can also be real pointers
> which cannot be held with 32-bit data types(when uf_info->bd_mem_part
> == MEM_PART_SYSTEM).  So actually we have to change these offsets to
> unsigned long, otherwise we are having more serious issues on 64-bit
> systems.  Are you willing to make such changes or you want us to deal
> with it?

Well, it looks like this hardware block was never integrated on a
64-bit SoC and will very likely to keep so.  So probably we can keep
the driver 32-bit only.  It is currently limited to PPC32 in Kconfig,
how did you build it for 64-bit?

>
> Regards,
> Leo
> >
> > Fixes: 4c35630ccda5 ("[POWERPC] Change rheap functions to use ulongs 
> > instead of pointers")
> > Signed-off-by: Wei Li 
> > ---
> >  drivers/net/ethernet/freescale/ucc_geth.c | 30 +++
> >  1 file changed, 15 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/freescale/ucc_geth.c 
> > b/drivers/net/ethernet/freescale/ucc_geth.c
> > index 714b501be7d0..8656d9be256a 100644
> > --- a/drivers/net/ethernet/freescale/ucc_geth.c
> > +++ b/drivers/net/ethernet/freescale/ucc_geth.c
> > @@ -286,7 +286,7 @@ static int fill_init_enet_entries(struct 
> > ucc_geth_private *ugeth,
> > else {
> > init_enet_offset =
> > qe_muram_alloc(thread_size, thread_alignment);
> > -   if (IS_ERR_VALUE(init_enet_offset)) {
> > +   if (IS_ERR_VALUE((unsigned 
> > long)(int)init_enet_offset)) {
> > if (netif_msg_ifup(ugeth))
> > pr_err("Can not allocate DPRAM 
> > memory\n");
> > qe_put_snum((u8) snum);
> > @@ -2223,7 +2223,7 @@ static int ucc_geth_alloc_tx(struct ucc_geth_private 
> > *ugeth)
> > ugeth->tx_bd_ring_offset[j] =
> > qe_muram_alloc(length,
> >UCC_GETH_TX_BD_RING_ALIGNMENT);
> > -   if (!IS_ERR_VALUE(ugeth->tx_bd_ring_offset[j]))
> > +   if (!IS_ERR_VALUE((unsigned 
> > long)(int)ugeth->tx_bd_ring_offset[j]))
> > ugeth->p_tx_bd_ring[j] =
> > (u8 __iomem *) qe_muram_addr(ugeth->
> >  
> > tx_bd_ring_offset[j]);
> > @@ -2300,7 +2300,7 @@ static int ucc_geth_alloc_rx(struct ucc_geth_private 
> > *ugeth)
> > ugeth->rx_bd_ring_offset[j] =
> > qe_muram_alloc(length,
> >UCC_GETH_RX_BD_RING_ALIGNMENT);
> > -   if (!IS_ERR_VALUE(ugeth->rx_bd_ring_offset[j]))
> > +   if (!IS_ERR_VALUE((unsigned 
> > long)(int)ugeth->rx_bd_ring_offset[j]))
> > ugeth->p_rx_bd_ring[j] =
> > (u8 __iomem *) qe_muram_addr(ugeth->
> >  
> > rx_bd_ring_offset[j]);
> > @@ -2510,7 +2510,7 @@ static int ucc_geth_startup(struct ucc_geth_private 
> > *ugeth)
> > ugeth->tx_glbl_pram_offset =
> > qe_muram_alloc(sizeof(struct ucc_geth_tx_global_pram),
> >UCC_GETH_TX_GLOBAL_PRAM_ALIGNMENT);
> > -   if (IS_ERR_VALUE(ugeth->tx_glbl_pram_offset)) {
> > +   if (IS_ERR_VALUE((unsigned long)(int)ugeth->tx_glbl_pram_offset)) {
> > if (netif_msg_ifup(ugeth))
> > pr_err("Can not allocate DPRAM memory for 
> > p_tx_glbl_pram\n");
> > return -ENOMEM;
> > @@ -2530,7 +2530,7 @@ static int ucc_geth_startup(struct ucc_geth_private 
> > *ugeth)
> >sizeof(struct ucc_geth_thread_data_tx) +
> >32 * (numThreadsTxNumerical == 1),
> >UCC_GETH_THREAD_DATA_ALIGNMENT);
> > -   if (IS_ERR_VALUE(ugeth->thread_dat_tx_offset)) {
> > +   if (IS_ERR_VALUE((unsigned long)(int)ugeth->thread_dat_tx_offset)) {
> > if (netif_msg_ifup(ugeth))
> > pr_err("Can not allocate DPRAM memory for 
> > p_thread_data_tx\n");
> > re

Re: [PATCH] net/ethernet/freescale: Fix incorrect IS_ERR_VALUE macro usages

2020-11-24 Thread Li Yang
On Tue, Nov 24, 2020 at 12:24 AM Wei Li  wrote:
>
> IS_ERR_VALUE macro should be used only with unsigned long type.
> Especially it works incorrectly with unsigned shorter types on
> 64bit machines.

This is truly a problem for the driver to run on 64-bit architectures.
But from an earlier discussion
https://patchwork.kernel.org/project/linux-kbuild/patch/1464384685-347275-1-git-send-email-a...@arndb.de/,
the preferred solution would be removing the IS_ERR_VALUE() usage or
make the values to be unsigned long.

It looks like we are having a bigger problem with the 64-bit support
for the driver that the offset variables can also be real pointers
which cannot be held with 32-bit data types(when uf_info->bd_mem_part
== MEM_PART_SYSTEM).  So actually we have to change these offsets to
unsigned long, otherwise we are having more serious issues on 64-bit
systems.  Are you willing to make such changes or you want us to deal
with it?

Regards,
Leo
>
> Fixes: 4c35630ccda5 ("[POWERPC] Change rheap functions to use ulongs instead 
> of pointers")
> Signed-off-by: Wei Li 
> ---
>  drivers/net/ethernet/freescale/ucc_geth.c | 30 +++
>  1 file changed, 15 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/net/ethernet/freescale/ucc_geth.c 
> b/drivers/net/ethernet/freescale/ucc_geth.c
> index 714b501be7d0..8656d9be256a 100644
> --- a/drivers/net/ethernet/freescale/ucc_geth.c
> +++ b/drivers/net/ethernet/freescale/ucc_geth.c
> @@ -286,7 +286,7 @@ static int fill_init_enet_entries(struct ucc_geth_private 
> *ugeth,
> else {
> init_enet_offset =
> qe_muram_alloc(thread_size, thread_alignment);
> -   if (IS_ERR_VALUE(init_enet_offset)) {
> +   if (IS_ERR_VALUE((unsigned 
> long)(int)init_enet_offset)) {
> if (netif_msg_ifup(ugeth))
> pr_err("Can not allocate DPRAM 
> memory\n");
> qe_put_snum((u8) snum);
> @@ -2223,7 +2223,7 @@ static int ucc_geth_alloc_tx(struct ucc_geth_private 
> *ugeth)
> ugeth->tx_bd_ring_offset[j] =
> qe_muram_alloc(length,
>UCC_GETH_TX_BD_RING_ALIGNMENT);
> -   if (!IS_ERR_VALUE(ugeth->tx_bd_ring_offset[j]))
> +   if (!IS_ERR_VALUE((unsigned 
> long)(int)ugeth->tx_bd_ring_offset[j]))
> ugeth->p_tx_bd_ring[j] =
> (u8 __iomem *) qe_muram_addr(ugeth->
>  
> tx_bd_ring_offset[j]);
> @@ -2300,7 +2300,7 @@ static int ucc_geth_alloc_rx(struct ucc_geth_private 
> *ugeth)
> ugeth->rx_bd_ring_offset[j] =
> qe_muram_alloc(length,
>UCC_GETH_RX_BD_RING_ALIGNMENT);
> -   if (!IS_ERR_VALUE(ugeth->rx_bd_ring_offset[j]))
> +   if (!IS_ERR_VALUE((unsigned 
> long)(int)ugeth->rx_bd_ring_offset[j]))
> ugeth->p_rx_bd_ring[j] =
> (u8 __iomem *) qe_muram_addr(ugeth->
>  
> rx_bd_ring_offset[j]);
> @@ -2510,7 +2510,7 @@ static int ucc_geth_startup(struct ucc_geth_private 
> *ugeth)
> ugeth->tx_glbl_pram_offset =
> qe_muram_alloc(sizeof(struct ucc_geth_tx_global_pram),
>UCC_GETH_TX_GLOBAL_PRAM_ALIGNMENT);
> -   if (IS_ERR_VALUE(ugeth->tx_glbl_pram_offset)) {
> +   if (IS_ERR_VALUE((unsigned long)(int)ugeth->tx_glbl_pram_offset)) {
> if (netif_msg_ifup(ugeth))
> pr_err("Can not allocate DPRAM memory for 
> p_tx_glbl_pram\n");
> return -ENOMEM;
> @@ -2530,7 +2530,7 @@ static int ucc_geth_startup(struct ucc_geth_private 
> *ugeth)
>sizeof(struct ucc_geth_thread_data_tx) +
>32 * (numThreadsTxNumerical == 1),
>UCC_GETH_THREAD_DATA_ALIGNMENT);
> -   if (IS_ERR_VALUE(ugeth->thread_dat_tx_offset)) {
> +   if (IS_ERR_VALUE((unsigned long)(int)ugeth->thread_dat_tx_offset)) {
> if (netif_msg_ifup(ugeth))
> pr_err("Can not allocate DPRAM memory for 
> p_thread_data_tx\n");
> return -ENOMEM;
> @@ -2557,7 +2557,7 @@ static int ucc_geth_startup(struct ucc_geth_private 
> *ugeth)
> qe_muram_alloc(ug_info->numQueuesTx *
>sizeof(struct ucc_geth_send_queue_qd),
>UCC_GETH_SEND_QUEUE_QUEUE_DESCRIPTOR_ALIGNMENT);
> -   if (IS_ERR_VALUE(ugeth->send_q_mem_reg_offset)) {
> +   if (IS_ERR_VALUE((unsigned long)(int)ugeth->send_q_mem_reg_offset)) {
> if (netif_msg_if

Re: [PATCH 0/2] powerpc/pseries: fix MSI/X IRQ affinity on pseries

2020-11-24 Thread Michael S. Tsirkin
On Tue, Nov 24, 2020 at 09:03:06PM +0100, Laurent Vivier wrote:
> With virtio, in multiqueue case, each queue IRQ is normally
> bound to a different CPU using the affinity mask.
> 
> This works fine on x86_64 but totally ignored on pseries.
> 
> This is not obvious at first look because irqbalance is doing
> some balancing to improve that.
> 
> It appears that the "managed" flag set in the MSI entry
> is never copied to the system IRQ entry.
> 
> This series passes the affinity mask from rtas_setup_msi_irqs()
> to irq_domain_alloc_descs() by adding an affinity parameter to
> irq_create_mapping().
> 
> The first patch adds the parameter (no functional change), the
> second patch passes the actual affinity mask to irq_create_mapping()
> in rtas_setup_msi_irqs().
> 
> For instance, with 32 CPUs VM and 32 queues virtio-scsi interface:
> 
> ... -smp 32 -device virtio-scsi-pci,id=virtio_scsi_pci0,num_queues=32
> 
> for IRQ in $(grep virtio2-request /proc/interrupts |cut -d: -f1); do
> for file in /proc/irq/$IRQ/ ; do
> echo -n "IRQ: $(basename $file) CPU: " ; cat $file/smp_affinity_list
> done
> done
> 
> Without the patch (and without irqbalanced)
> 
> IRQ: 268 CPU: 0-31
> IRQ: 269 CPU: 0-31
> IRQ: 270 CPU: 0-31
> IRQ: 271 CPU: 0-31
> IRQ: 272 CPU: 0-31
> IRQ: 273 CPU: 0-31
> IRQ: 274 CPU: 0-31
> IRQ: 275 CPU: 0-31
> IRQ: 276 CPU: 0-31
> IRQ: 277 CPU: 0-31
> IRQ: 278 CPU: 0-31
> IRQ: 279 CPU: 0-31
> IRQ: 280 CPU: 0-31
> IRQ: 281 CPU: 0-31
> IRQ: 282 CPU: 0-31
> IRQ: 283 CPU: 0-31
> IRQ: 284 CPU: 0-31
> IRQ: 285 CPU: 0-31
> IRQ: 286 CPU: 0-31
> IRQ: 287 CPU: 0-31
> IRQ: 288 CPU: 0-31
> IRQ: 289 CPU: 0-31
> IRQ: 290 CPU: 0-31
> IRQ: 291 CPU: 0-31
> IRQ: 292 CPU: 0-31
> IRQ: 293 CPU: 0-31
> IRQ: 294 CPU: 0-31
> IRQ: 295 CPU: 0-31
> IRQ: 296 CPU: 0-31
> IRQ: 297 CPU: 0-31
> IRQ: 298 CPU: 0-31
> IRQ: 299 CPU: 0-31
> 
> With the patch:
> 
> IRQ: 265 CPU: 0
> IRQ: 266 CPU: 1
> IRQ: 267 CPU: 2
> IRQ: 268 CPU: 3
> IRQ: 269 CPU: 4
> IRQ: 270 CPU: 5
> IRQ: 271 CPU: 6
> IRQ: 272 CPU: 7
> IRQ: 273 CPU: 8
> IRQ: 274 CPU: 9
> IRQ: 275 CPU: 10
> IRQ: 276 CPU: 11
> IRQ: 277 CPU: 12
> IRQ: 278 CPU: 13
> IRQ: 279 CPU: 14
> IRQ: 280 CPU: 15
> IRQ: 281 CPU: 16
> IRQ: 282 CPU: 17
> IRQ: 283 CPU: 18
> IRQ: 284 CPU: 19
> IRQ: 285 CPU: 20
> IRQ: 286 CPU: 21
> IRQ: 287 CPU: 22
> IRQ: 288 CPU: 23
> IRQ: 289 CPU: 24
> IRQ: 290 CPU: 25
> IRQ: 291 CPU: 26
> IRQ: 292 CPU: 27
> IRQ: 293 CPU: 28
> IRQ: 294 CPU: 29
> IRQ: 295 CPU: 30
> IRQ: 299 CPU: 31
> 
> This matches what we have on an x86_64 system.


Makes sense to me. FWIW

Acked-by: Michael S. Tsirkin 

> Laurent Vivier (2):
>   genirq: add an affinity parameter to irq_create_mapping()
>   powerpc/pseries: pass MSI affinity to irq_create_mapping()
> 
>  arch/arc/kernel/intc-arcv2.c  | 4 ++--
>  arch/arc/kernel/mcip.c| 2 +-
>  arch/arm/common/sa.c  | 2 +-
>  arch/arm/mach-s3c/irq-s3c24xx.c   | 3 ++-
>  arch/arm/plat-orion/gpio.c| 2 +-
>  arch/mips/ath25/ar2315.c  | 4 ++--
>  arch/mips/ath25/ar5312.c  | 4 ++--
>  arch/mips/lantiq/irq.c| 2 +-
>  arch/mips/pci/pci-ar2315.c| 3 ++-
>  arch/mips/pic32/pic32mzda/time.c  | 2 +-
>  arch/mips/ralink/irq.c| 2 +-
>  arch/powerpc/kernel/pci-common.c  | 2 +-
>  arch/powerpc/kvm/book3s_xive.c| 2 +-
>  arch/powerpc/platforms/44x/ppc476.c   | 4 ++--
>  arch/powerpc/platforms/cell/interrupt.c   | 4 ++--
>  arch/powerpc/platforms/cell/iommu.c   | 3 ++-
>  arch/powerpc/platforms/cell/pmu.c | 2 +-
>  arch/powerpc/platforms/cell/spider-pic.c  | 2 +-
>  arch/powerpc/platforms/cell/spu_manage.c  | 6 +++---
>  arch/powerpc/platforms/maple/pci.c| 2 +-
>  arch/powerpc/platforms/pasemi/dma_lib.c   | 5 +++--
>  arch/powerpc/platforms/pasemi/msi.c   | 2 +-
>  arch/powerpc/platforms/pasemi/setup.c | 4 ++--
>  arch/powerpc/platforms/powermac/pci.c | 2 +-
>  arch/powerpc/platforms/powermac/pic.c | 2 +-
>  arch/powerpc/platforms/powermac/smp.c | 2 +-
>  arch/powerpc/platforms/powernv/opal-irqchip.c | 5 +++--
>  arch/powerpc/platforms/powernv/pci.c  | 2 +-
>  arch/powerpc/platforms/powernv/vas.c  | 2 +-
>  arch/powerpc/platforms/ps3/interrupt.c| 2 +-
>  arch/powerpc/platforms/pseries/ibmebus.c  | 2 +-
>  arch/powerpc/platforms/pseries/msi.c  | 2 +-
>  arch/powerpc/sysdev/fsl_mpic_err.c| 2 +-
>  arch/powerpc/sysdev/fsl_msi.c | 2 +-
>  arch/powerpc/sysdev/mpic.c| 3 ++-
>  arch/powerpc/sysdev/mpic_u3msi.c  | 2 +-
>  arch/powerpc/sysdev/xics/xics-common.c| 2 +-
>  arch/powerpc/sysdev/xive/common.c | 2 +-
>  arch/sh/boards/mach-se/7343/irq.c | 2 +-
>  arch/sh/boards/mach-se/7722/irq.c | 2 +-

Re: [PATCH V2 4/5] ocxl: Add mmu notifier

2020-11-24 Thread Christophe Lombard


Le 24/11/2020 à 14:45, Jason Gunthorpe a écrit :

On Tue, Nov 24, 2020 at 09:17:38AM +, Christoph Hellwig wrote:


@@ -470,6 +487,26 @@ void ocxl_link_release(struct pci_dev *dev, void 
*link_handle)
  }
  EXPORT_SYMBOL_GPL(ocxl_link_release);
  
+static void invalidate_range(struct mmu_notifier *mn,

+struct mm_struct *mm,
+unsigned long start, unsigned long end)
+{
+   struct pe_data *pe_data = container_of(mn, struct pe_data, 
mmu_notifier);
+   struct ocxl_link *link = pe_data->link;
+   unsigned long addr, pid, page_size = PAGE_SIZE;

The page_size variable seems unnecessary


+
+   pid = mm->context.id;
+
+   spin_lock(&link->atsd_lock);
+   for (addr = start; addr < end; addr += page_size)
+   pnv_ocxl_tlb_invalidate(&link->arva, pid, addr);
+   spin_unlock(&link->atsd_lock);
+}
+
+static const struct mmu_notifier_ops ocxl_mmu_notifier_ops = {
+   .invalidate_range = invalidate_range,
+};
+
  static u64 calculate_cfg_state(bool kernel)
  {
u64 state;
@@ -526,6 +563,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
pe_data->mm = mm;
pe_data->xsl_err_cb = xsl_err_cb;
pe_data->xsl_err_data = xsl_err_data;
+   pe_data->link = link;
+   pe_data->mmu_notifier.ops = &ocxl_mmu_notifier_ops;
  
  	memset(pe, 0, sizeof(struct ocxl_process_element));

pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0));
@@ -542,8 +581,16 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
 * by the nest MMU. If we have a kernel context, TLBIs are
 * already global.
 */
-   if (mm)
+   if (mm) {
mm_context_add_copro(mm);
+   if (link->arva) {
+   /* Use MMIO registers for the TLB Invalidate
+* operations.
+*/
+   mmu_notifier_register(&pe_data->mmu_notifier, mm);

Every other place doing stuff like this is de-duplicating the
notifier. If you have multiple clients this will do multiple redundant
invalidations?


We could have multiple clients, although not something that we have often.
We have only one attach per process. But if not, we must still have 
invalidation for each.




The notifier get/put API is designed to solve that problem, you'd get
a single notifier for the mm and then add the impacted arva's to some
list at the notifier.


Thanks for the information.


Jason


[PATCH 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()

2020-11-24 Thread Laurent Vivier
With virtio multiqueue, normally each queue IRQ is mapped to a CPU.

But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity")
this is broken on pseries.

The affinity is correctly computed in msi_desc but this is not applied
to the system IRQs.

It appears the affinity is correctly passed to rtas_setup_msi_irqs() but
lost at this point and never passed to irq_domain_alloc_descs()
(see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation"))
because irq_create_mapping() doesn't take an affinity parameter.

As the previous patch has added the affinity parameter to
irq_create_mapping() we can forward the affinity from rtas_setup_msi_irqs()
to irq_domain_alloc_descs().

With this change, the virtqueues are correctly dispatched between the CPUs
on pseries.

This problem cannot be shown on x86_64 for two reasons:

- the call path traverses arch_setup_msi_irqs() that is arch specific:

   virtscsi_probe()
  virtscsi_init()
 vp_modern_find_vqs()
vp_find_vqs()
   vp_find_vqs_msix()
  pci_alloc_irq_vectors_affinity()
 __pci_enable_msix_range()
pci_msi_setup_msi_irqs()
   arch_setup_msi_irqs()
  rtas_setup_msi_irqs()
 irq_create_mapping()
irq_domain_alloc_descs()
  __irq_alloc_descs()

- and x86_64 has CONFIG_PCI_MSI_IRQ_DOMAIN that uses another path:

   virtscsi_probe()
  virtscsi_init()
 vp_modern_find_vqs()
vp_find_vqs()
   vp_find_vqs_msix()
  pci_alloc_irq_vectors_affinity()
 __pci_enable_msix_range()
__msi_domain_alloc_irqs()
   __irq_domain_alloc_irqs()
  __irq_alloc_descs()

Signed-off-by: Laurent Vivier 
---
 arch/powerpc/platforms/pseries/msi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/msi.c 
b/arch/powerpc/platforms/pseries/msi.c
index 42ba08eaea91..58197f92c6a2 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -458,7 +458,7 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int 
nvec_in, int type)
return hwirq;
}
 
-   virq = irq_create_mapping(NULL, hwirq, NULL);
+   virq = irq_create_mapping(NULL, hwirq, entry->affinity);
 
if (!virq) {
pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
-- 
2.28.0



[PATCH 0/2] powerpc/pseries: fix MSI/X IRQ affinity on pseries

2020-11-24 Thread Laurent Vivier
With virtio, in multiqueue case, each queue IRQ is normally
bound to a different CPU using the affinity mask.

This works fine on x86_64 but totally ignored on pseries.

This is not obvious at first look because irqbalance is doing
some balancing to improve that.

It appears that the "managed" flag set in the MSI entry
is never copied to the system IRQ entry.

This series passes the affinity mask from rtas_setup_msi_irqs()
to irq_domain_alloc_descs() by adding an affinity parameter to
irq_create_mapping().

The first patch adds the parameter (no functional change), the
second patch passes the actual affinity mask to irq_create_mapping()
in rtas_setup_msi_irqs().

For instance, with 32 CPUs VM and 32 queues virtio-scsi interface:

... -smp 32 -device virtio-scsi-pci,id=virtio_scsi_pci0,num_queues=32

for IRQ in $(grep virtio2-request /proc/interrupts |cut -d: -f1); do
for file in /proc/irq/$IRQ/ ; do
echo -n "IRQ: $(basename $file) CPU: " ; cat $file/smp_affinity_list
done
done

Without the patch (and without irqbalanced)

IRQ: 268 CPU: 0-31
IRQ: 269 CPU: 0-31
IRQ: 270 CPU: 0-31
IRQ: 271 CPU: 0-31
IRQ: 272 CPU: 0-31
IRQ: 273 CPU: 0-31
IRQ: 274 CPU: 0-31
IRQ: 275 CPU: 0-31
IRQ: 276 CPU: 0-31
IRQ: 277 CPU: 0-31
IRQ: 278 CPU: 0-31
IRQ: 279 CPU: 0-31
IRQ: 280 CPU: 0-31
IRQ: 281 CPU: 0-31
IRQ: 282 CPU: 0-31
IRQ: 283 CPU: 0-31
IRQ: 284 CPU: 0-31
IRQ: 285 CPU: 0-31
IRQ: 286 CPU: 0-31
IRQ: 287 CPU: 0-31
IRQ: 288 CPU: 0-31
IRQ: 289 CPU: 0-31
IRQ: 290 CPU: 0-31
IRQ: 291 CPU: 0-31
IRQ: 292 CPU: 0-31
IRQ: 293 CPU: 0-31
IRQ: 294 CPU: 0-31
IRQ: 295 CPU: 0-31
IRQ: 296 CPU: 0-31
IRQ: 297 CPU: 0-31
IRQ: 298 CPU: 0-31
IRQ: 299 CPU: 0-31

With the patch:

IRQ: 265 CPU: 0
IRQ: 266 CPU: 1
IRQ: 267 CPU: 2
IRQ: 268 CPU: 3
IRQ: 269 CPU: 4
IRQ: 270 CPU: 5
IRQ: 271 CPU: 6
IRQ: 272 CPU: 7
IRQ: 273 CPU: 8
IRQ: 274 CPU: 9
IRQ: 275 CPU: 10
IRQ: 276 CPU: 11
IRQ: 277 CPU: 12
IRQ: 278 CPU: 13
IRQ: 279 CPU: 14
IRQ: 280 CPU: 15
IRQ: 281 CPU: 16
IRQ: 282 CPU: 17
IRQ: 283 CPU: 18
IRQ: 284 CPU: 19
IRQ: 285 CPU: 20
IRQ: 286 CPU: 21
IRQ: 287 CPU: 22
IRQ: 288 CPU: 23
IRQ: 289 CPU: 24
IRQ: 290 CPU: 25
IRQ: 291 CPU: 26
IRQ: 292 CPU: 27
IRQ: 293 CPU: 28
IRQ: 294 CPU: 29
IRQ: 295 CPU: 30
IRQ: 299 CPU: 31

This matches what we have on an x86_64 system.

Laurent Vivier (2):
  genirq: add an affinity parameter to irq_create_mapping()
  powerpc/pseries: pass MSI affinity to irq_create_mapping()

 arch/arc/kernel/intc-arcv2.c  | 4 ++--
 arch/arc/kernel/mcip.c| 2 +-
 arch/arm/common/sa.c  | 2 +-
 arch/arm/mach-s3c/irq-s3c24xx.c   | 3 ++-
 arch/arm/plat-orion/gpio.c| 2 +-
 arch/mips/ath25/ar2315.c  | 4 ++--
 arch/mips/ath25/ar5312.c  | 4 ++--
 arch/mips/lantiq/irq.c| 2 +-
 arch/mips/pci/pci-ar2315.c| 3 ++-
 arch/mips/pic32/pic32mzda/time.c  | 2 +-
 arch/mips/ralink/irq.c| 2 +-
 arch/powerpc/kernel/pci-common.c  | 2 +-
 arch/powerpc/kvm/book3s_xive.c| 2 +-
 arch/powerpc/platforms/44x/ppc476.c   | 4 ++--
 arch/powerpc/platforms/cell/interrupt.c   | 4 ++--
 arch/powerpc/platforms/cell/iommu.c   | 3 ++-
 arch/powerpc/platforms/cell/pmu.c | 2 +-
 arch/powerpc/platforms/cell/spider-pic.c  | 2 +-
 arch/powerpc/platforms/cell/spu_manage.c  | 6 +++---
 arch/powerpc/platforms/maple/pci.c| 2 +-
 arch/powerpc/platforms/pasemi/dma_lib.c   | 5 +++--
 arch/powerpc/platforms/pasemi/msi.c   | 2 +-
 arch/powerpc/platforms/pasemi/setup.c | 4 ++--
 arch/powerpc/platforms/powermac/pci.c | 2 +-
 arch/powerpc/platforms/powermac/pic.c | 2 +-
 arch/powerpc/platforms/powermac/smp.c | 2 +-
 arch/powerpc/platforms/powernv/opal-irqchip.c | 5 +++--
 arch/powerpc/platforms/powernv/pci.c  | 2 +-
 arch/powerpc/platforms/powernv/vas.c  | 2 +-
 arch/powerpc/platforms/ps3/interrupt.c| 2 +-
 arch/powerpc/platforms/pseries/ibmebus.c  | 2 +-
 arch/powerpc/platforms/pseries/msi.c  | 2 +-
 arch/powerpc/sysdev/fsl_mpic_err.c| 2 +-
 arch/powerpc/sysdev/fsl_msi.c | 2 +-
 arch/powerpc/sysdev/mpic.c| 3 ++-
 arch/powerpc/sysdev/mpic_u3msi.c  | 2 +-
 arch/powerpc/sysdev/xics/xics-common.c| 2 +-
 arch/powerpc/sysdev/xive/common.c | 2 +-
 arch/sh/boards/mach-se/7343/irq.c | 2 +-
 arch/sh/boards/mach-se/7722/irq.c | 2 +-
 arch/sh/boards/mach-x3proto/gpio.c| 2 +-
 arch/xtensa/kernel/perf_event.c   | 2 +-
 arch/xtensa/kernel/smp.c  | 2 +-
 arch/xtensa/kernel/time.c | 2 +-
 drivers/ata/pata_macio.c  | 2 +-
 drivers/base/regmap/regmap-irq.c  | 2 +-
 drivers/bus/moxtet.c  | 2 +-
 drivers/clocksource/ingenic-t

[PATCH 1/2] genirq: add an affinity parameter to irq_create_mapping()

2020-11-24 Thread Laurent Vivier
This parameter is needed to pass it to irq_domain_alloc_descs().

This seems to have been missed by
o06ee6d571f0e ("genirq: Add affinity hint to irq allocation")

This is needed to implement proper support for multiqueue with pseries.

All irq_create_mapping() callers have been updated with the help
of the following coccinelle script:
@@
expression a, b;
@@
<...
- irq_create_mapping(a, b)
+ irq_create_mapping(a, b, NULL)
...>

With some manual changes to comply with checkpatch errors.

No functional change.

Signed-off-by: Laurent Vivier 
---
 arch/arc/kernel/intc-arcv2.c  | 4 ++--
 arch/arc/kernel/mcip.c| 2 +-
 arch/arm/common/sa.c  | 2 +-
 arch/arm/mach-s3c/irq-s3c24xx.c   | 3 ++-
 arch/arm/plat-orion/gpio.c| 2 +-
 arch/mips/ath25/ar2315.c  | 4 ++--
 arch/mips/ath25/ar5312.c  | 4 ++--
 arch/mips/lantiq/irq.c| 2 +-
 arch/mips/pci/pci-ar2315.c| 3 ++-
 arch/mips/pic32/pic32mzda/time.c  | 2 +-
 arch/mips/ralink/irq.c| 2 +-
 arch/powerpc/kernel/pci-common.c  | 2 +-
 arch/powerpc/kvm/book3s_xive.c| 2 +-
 arch/powerpc/platforms/44x/ppc476.c   | 4 ++--
 arch/powerpc/platforms/cell/interrupt.c   | 4 ++--
 arch/powerpc/platforms/cell/iommu.c   | 3 ++-
 arch/powerpc/platforms/cell/pmu.c | 2 +-
 arch/powerpc/platforms/cell/spider-pic.c  | 2 +-
 arch/powerpc/platforms/cell/spu_manage.c  | 6 +++---
 arch/powerpc/platforms/maple/pci.c| 2 +-
 arch/powerpc/platforms/pasemi/dma_lib.c   | 5 +++--
 arch/powerpc/platforms/pasemi/msi.c   | 2 +-
 arch/powerpc/platforms/pasemi/setup.c | 4 ++--
 arch/powerpc/platforms/powermac/pci.c | 2 +-
 arch/powerpc/platforms/powermac/pic.c | 2 +-
 arch/powerpc/platforms/powermac/smp.c | 2 +-
 arch/powerpc/platforms/powernv/opal-irqchip.c | 5 +++--
 arch/powerpc/platforms/powernv/pci.c  | 2 +-
 arch/powerpc/platforms/powernv/vas.c  | 2 +-
 arch/powerpc/platforms/ps3/interrupt.c| 2 +-
 arch/powerpc/platforms/pseries/ibmebus.c  | 2 +-
 arch/powerpc/platforms/pseries/msi.c  | 2 +-
 arch/powerpc/sysdev/fsl_mpic_err.c| 2 +-
 arch/powerpc/sysdev/fsl_msi.c | 2 +-
 arch/powerpc/sysdev/mpic.c| 3 ++-
 arch/powerpc/sysdev/mpic_u3msi.c  | 2 +-
 arch/powerpc/sysdev/xics/xics-common.c| 2 +-
 arch/powerpc/sysdev/xive/common.c | 2 +-
 arch/sh/boards/mach-se/7343/irq.c | 2 +-
 arch/sh/boards/mach-se/7722/irq.c | 2 +-
 arch/sh/boards/mach-x3proto/gpio.c| 2 +-
 arch/xtensa/kernel/perf_event.c   | 2 +-
 arch/xtensa/kernel/smp.c  | 2 +-
 arch/xtensa/kernel/time.c | 2 +-
 drivers/ata/pata_macio.c  | 2 +-
 drivers/base/regmap/regmap-irq.c  | 2 +-
 drivers/bus/moxtet.c  | 2 +-
 drivers/clocksource/ingenic-timer.c   | 2 +-
 drivers/clocksource/timer-riscv.c | 2 +-
 drivers/extcon/extcon-max8997.c   | 3 ++-
 drivers/gpio/gpio-bcm-kona.c  | 2 +-
 drivers/gpio/gpio-brcmstb.c   | 2 +-
 drivers/gpio/gpio-davinci.c   | 2 +-
 drivers/gpio/gpio-em.c| 3 ++-
 drivers/gpio/gpio-grgpio.c| 2 +-
 drivers/gpio/gpio-mockup.c| 2 +-
 drivers/gpio/gpio-mpc8xxx.c   | 2 +-
 drivers/gpio/gpio-mvebu.c | 2 +-
 drivers/gpio/gpio-tb10x.c | 2 +-
 drivers/gpio/gpio-tegra.c | 2 +-
 drivers/gpio/gpio-wm831x.c| 2 +-
 drivers/gpio/gpiolib.c| 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c   | 3 ++-
 drivers/gpu/ipu-v3/ipu-common.c   | 2 +-
 drivers/hid/hid-rmi.c | 2 +-
 drivers/i2c/busses/i2c-cht-wc.c   | 2 +-
 drivers/i2c/i2c-core-base.c   | 2 +-
 drivers/i2c/muxes/i2c-mux-pca954x.c   | 2 +-
 drivers/ide/pmac.c| 2 +-
 drivers/iio/dummy/iio_dummy_evgen.c   | 3 ++-
 drivers/input/rmi4/rmi_bus.c  | 2 +-
 drivers/irqchip/irq-ath79-misc.c  | 3 ++-
 drivers/irqchip/irq-bcm2835.c | 3 ++-
 drivers/irqchip/irq-csky-mpintc.c | 2 +-
 drivers/irqchip/irq-eznps.c   | 2 +-
 drivers/irqchip/irq-mips-gic.c| 8 +---
 drivers/irqchip/irq-mmp.c | 4 ++--
 drivers/irqchip/irq-versatile-fpga.c  | 2 +-
 drivers/irqchip/irq-vic.c | 2 +-
 drivers/macintosh/macio_asic.c| 2 +-
 drivers/memory/omap-gpmc.c| 2 +-
 drivers/mfd/ab8500-core.c  

[PATCH v1 3/3] powerpc/32s: Cleanup around PTE_FLAGS_OFFSET in hash_low.S

2020-11-24 Thread Christophe Leroy
PTE_FLAGS_OFFSET is defined in asm/page_32.h and used only
in hash_low.S

And PTE_FLAGS_OFFSET nullity depends on CONFIG_PTE_64BIT

Instead of tests like #if (PTE_FLAGS_OFFSET != 0), use
CONFIG_PTE_64BIT related code.

Also move the definition of PTE_FLAGS_OFFSET into hash_low.S
directly, that improves readability.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/page_32.h  |  6 --
 arch/powerpc/mm/book3s32/hash_low.S | 23 +--
 2 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/page_32.h 
b/arch/powerpc/include/asm/page_32.h
index d64dfe3ac712..56f217606327 100644
--- a/arch/powerpc/include/asm/page_32.h
+++ b/arch/powerpc/include/asm/page_32.h
@@ -16,12 +16,6 @@
 #define ARCH_DMA_MINALIGN  L1_CACHE_BYTES
 #endif
 
-#ifdef CONFIG_PTE_64BIT
-#define PTE_FLAGS_OFFSET   4   /* offset of PTE flags, in bytes */
-#else
-#define PTE_FLAGS_OFFSET   0
-#endif
-
 #if defined(CONFIG_PPC_256K_PAGES) || \
 (defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES))
 #define PTE_SHIFT  (PAGE_SHIFT - PTE_T_LOG2 - 2)   /* 1/4 of a page */
diff --git a/arch/powerpc/mm/book3s32/hash_low.S 
b/arch/powerpc/mm/book3s32/hash_low.S
index 1366e8e4fc05..f559a931b9a8 100644
--- a/arch/powerpc/mm/book3s32/hash_low.S
+++ b/arch/powerpc/mm/book3s32/hash_low.S
@@ -26,6 +26,12 @@
 #include 
 #include 
 
+#ifdef CONFIG_PTE_64BIT
+#define PTE_FLAGS_OFFSET   4   /* offset of PTE flags, in bytes */
+#else
+#define PTE_FLAGS_OFFSET   0
+#endif
+
 #ifdef CONFIG_SMP
.section .bss
.align  2
@@ -94,6 +100,11 @@ _GLOBAL(hash_page)
rlwimi  r8,r4,22,20,29  /* insert next 10 bits of address */
 #else
rlwimi  r8,r4,23,20,28  /* compute pte address */
+   /*
+* If PTE_64BIT is set, the low word is the flags word; use that
+* word for locking since it contains all the interesting bits.
+*/
+   addir8,r8,PTE_FLAGS_OFFSET
 #endif
 
/*
@@ -101,13 +112,7 @@ _GLOBAL(hash_page)
 * because almost always, there won't be a permission violation
 * and there won't already be an HPTE, and thus we will have
 * to update the PTE to set _PAGE_HASHPTE.  -- paulus.
-*
-* If PTE_64BIT is set, the low word is the flags word; use that
-* word for locking since it contains all the interesting bits.
 */
-#if (PTE_FLAGS_OFFSET != 0)
-   addir8,r8,PTE_FLAGS_OFFSET
-#endif
 .Lretry:
lwarx   r6,0,r8 /* get linux-style pte, flag word */
 #ifdef CONFIG_PPC_KUAP
@@ -511,8 +516,9 @@ _GLOBAL(flush_hash_pages)
rlwimi  r5,r4,22,20,29
 #else
rlwimi  r5,r4,23,20,28
+   addir5,r5,PTE_FLAGS_OFFSET
 #endif
-1: lwz r0,PTE_FLAGS_OFFSET(r5)
+1: lwz r0,0(r5)
cmpwi   cr1,r6,1
andi.   r0,r0,_PAGE_HASHPTE
bne 2f
@@ -556,9 +562,6 @@ _GLOBAL(flush_hash_pages)
 * already clear, we're done (for this pte).  If not,
 * clear it (atomically) and proceed.  -- paulus.
 */
-#if (PTE_FLAGS_OFFSET != 0)
-   addir5,r5,PTE_FLAGS_OFFSET
-#endif
 33:lwarx   r8,0,r5 /* fetch the pte flags word */
andi.   r0,r8,_PAGE_HASHPTE
beq 8f  /* done if HASHPTE is already clear */
-- 
2.25.0



[PATCH v1 1/3] powerpc/32s: Remove unused counters incremented by create_hpte()

2020-11-24 Thread Christophe Leroy
primary_pteg_full and htab_hash_searches are not used.

Remove them.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/book3s32/hash_low.S | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/arch/powerpc/mm/book3s32/hash_low.S 
b/arch/powerpc/mm/book3s32/hash_low.S
index 9a56ba4f68f2..f964fd34dad9 100644
--- a/arch/powerpc/mm/book3s32/hash_low.S
+++ b/arch/powerpc/mm/book3s32/hash_low.S
@@ -359,11 +359,6 @@ END_FTR_SECTION_IFCLR(CPU_FTR_NEED_COHERENT)
beq+10f /* no PTE: go look for an empty slot */
tlbie   r4
 
-   lis r4, (htab_hash_searches - PAGE_OFFSET)@ha
-   lwz r6, (htab_hash_searches - PAGE_OFFSET)@l(r4)
-   addir6,r6,1 /* count how many searches we do */
-   stw r6, (htab_hash_searches - PAGE_OFFSET)@l(r4)
-
/* Search the primary PTEG for a PTE whose 1st (d)word matches r5 */
mtctr   r0
addir4,r3,-HPTE_SIZE
@@ -393,12 +388,6 @@ END_FTR_SECTION_IFCLR(CPU_FTR_NEED_COHERENT)
bdnzf   2,1b/* loop while ctr != 0 && !cr0.eq */
beq+.Lfound_empty
 
-   /* update counter of times that the primary PTEG is full */
-   lis r4, (primary_pteg_full - PAGE_OFFSET)@ha
-   lwz r6, (primary_pteg_full - PAGE_OFFSET)@l(r4)
-   addir6,r6,1
-   stw r6, (primary_pteg_full - PAGE_OFFSET)@l(r4)
-
patch_site  0f, patch__hash_page_C
/* Search the secondary PTEG for an empty slot */
ori r5,r5,PTE_H /* set H (secondary hash) bit */
@@ -491,10 +480,6 @@ _ASM_NOKPROBE_SYMBOL(create_hpte)
.align  2
 next_slot:
.space  4
-primary_pteg_full:
-   .space  4
-htab_hash_searches:
-   .space  4
.previous
 
 /*
-- 
2.25.0



[PATCH v1 2/3] powerpc/32s: In add_hash_page(), calculate VSID later

2020-11-24 Thread Christophe Leroy
VSID is only for create_hpte(). When _PAGE_HASHPTE is
already set, add_hash_page() bails out without calling
create_hpte() and doesn't need the value of VSID.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/book3s32/hash_low.S | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/mm/book3s32/hash_low.S 
b/arch/powerpc/mm/book3s32/hash_low.S
index f964fd34dad9..1366e8e4fc05 100644
--- a/arch/powerpc/mm/book3s32/hash_low.S
+++ b/arch/powerpc/mm/book3s32/hash_low.S
@@ -188,12 +188,6 @@ _GLOBAL(add_hash_page)
mflrr0
stw r0,4(r1)
 
-   /* Convert context and va to VSID */
-   mulli   r3,r3,897*16/* multiply context by context skew */
-   rlwinm  r0,r4,4,28,31   /* get ESID (top 4 bits of va) */
-   mulli   r0,r0,0x111 /* multiply by ESID skew */
-   add r3,r3,r0/* note create_hpte trims to 24 bits */
-
 #ifdef CONFIG_SMP
lwz r8,TASK_CPU(r2) /* to go in mmu_hash_lock */
orisr8,r8,12
@@ -257,6 +251,12 @@ _GLOBAL(add_hash_page)
stwcx.  r5,0,r8
bne-1b
 
+   /* Convert context and va to VSID */
+   mulli   r3,r3,897*16/* multiply context by context skew */
+   rlwinm  r0,r4,4,28,31   /* get ESID (top 4 bits of va) */
+   mulli   r0,r0,0x111 /* multiply by ESID skew */
+   add r3,r3,r0/* note create_hpte trims to 24 bits */
+
bl  create_hpte
 
 9:
-- 
2.25.0



Re: eBPF on powerpc

2020-11-24 Thread Naveen N. Rao

Christophe Leroy wrote:



Le 24/11/2020 à 17:35, Naveen N. Rao a écrit :

Hi Christophe,

Christophe Leroy wrote:

Hi Naveen,

Few years ago, you implemented eBPF on PPC64.

Is there any reason for implementing it for PPC64 only ?


I focused on ppc64 since eBPF is a 64-bit VM and it was more straight-forward 
to target.


Is there something that makes it impossible to have eBPF for PPC32 as well ?


No, I just wasn't sure if it would be performant enough to warrant it. Since then however, there 
have been arm32 and riscv 32-bit JIT implementations and atleast the arm32 JIT seems to be showing 
~50% better performance compared to the interpreter (*). So, it would be worthwhile to add support 
for ppc32.


That's great.

I know close to nothing about eBPF. Is there any interesting documentation on it somewhere that 
would allow me to easily understand how it works and allow me to extend the 64 bit powerpc to 32 bits ?


I don't think there was ever a formal spec written for the eBPF VM. Here 
are a few resources which should help, alongside the existing JIT 
implementations:
- BPF Kernel Internals:  
 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/filter.rst#n604

- 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/bpf
- BPF and XDP Reference Guide: https://docs.cilium.io/en/stable/bpf/


- Naveen



Re: [PATCH 3/3] powerpc: Update NUMA Kconfig description & help text

2020-11-24 Thread Randy Dunlap
On 11/24/20 4:05 AM, Michael Ellerman wrote:
> Update the NUMA Kconfig description to match other architectures, and
> add some help text. Shamelessly borrowed from x86/arm64.
> 
> Signed-off-by: Michael Ellerman 

Reviewed-by: Randy Dunlap 

Thanks.

> ---
>  arch/powerpc/Kconfig | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 4d688b426353..7f4995b245a3 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -659,9 +659,15 @@ config IRQ_ALL_CPUS
> reported with SMP Power Macintoshes with this option enabled.
>  
>  config NUMA
> - bool "NUMA support"
> + bool "NUMA Memory Allocation and Scheduler Support"
>   depends on PPC64 && SMP
>   default y if PPC_PSERIES || PPC_POWERNV
> + help
> +   Enable NUMA (Non-Uniform Memory Access) support.
> +
> +   The kernel will try to allocate memory used by a CPU on the
> +   local memory controller of the CPU and add some more
> +   NUMA awareness to the kernel.
>  
>  config NODES_SHIFT
>   int
> 


-- 
~Randy



Re: [PATCH 1/3] powerpc: Make NUMA depend on SMP

2020-11-24 Thread Randy Dunlap
On 11/24/20 4:05 AM, Michael Ellerman wrote:
> Our Kconfig allows NUMA to be enabled without SMP, but none of
> our defconfigs use that combination. This means it can easily be
> broken inadvertently by code changes, which has happened recently.
> 
> Although it's theoretically possible to have a machine with a single
> CPU and multiple memory nodes, I can't think of any real systems where
> that's the case. Even so if such a system exists, it can just run an
> SMP kernel anyway.
> 
> So to avoid the need to add extra #ifdefs and/or build breaks, make
> NUMA depend on SMP.
> 
> Reported-by: kernel test robot 
> Reported-by: Randy Dunlap 
> Signed-off-by: Michael Ellerman 

Reviewed-by: Randy Dunlap 

Thanks.

> ---
>  arch/powerpc/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index e9f13fe08492..a22db3db6b96 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -660,7 +660,7 @@ config IRQ_ALL_CPUS
>  
>  config NUMA
>   bool "NUMA support"
> - depends on PPC64
> + depends on PPC64 && SMP
>   default y if SMP && PPC_PSERIES
>  
>  config NODES_SHIFT
> 


-- 
~Randy


Re: eBPF on powerpc

2020-11-24 Thread Christophe Leroy




Le 24/11/2020 à 17:35, Naveen N. Rao a écrit :

Hi Christophe,

Christophe Leroy wrote:

Hi Naveen,

Few years ago, you implemented eBPF on PPC64.

Is there any reason for implementing it for PPC64 only ?


I focused on ppc64 since eBPF is a 64-bit VM and it was more straight-forward 
to target.


Is there something that makes it impossible to have eBPF for PPC32 as well ?


No, I just wasn't sure if it would be performant enough to warrant it. Since then however, there 
have been arm32 and riscv 32-bit JIT implementations and atleast the arm32 JIT seems to be showing 
~50% better performance compared to the interpreter (*). So, it would be worthwhile to add support 
for ppc32.


That's great.

I know close to nothing about eBPF. Is there any interesting documentation on it somewhere that 
would allow me to easily understand how it works and allow me to extend the 64 bit powerpc to 32 bits ?




Note that there might be a few instructions which would be difficult to support on 32-bit, but those 
can fallback to the interpreter, while allowing other programs to be JIT'ed.



- Naveen

(*) 
http://lkml.kernel.org/r/cagxu5jlyunvcjgcfhpebkdaoq71hdmgq4hhddxtypbqw_hx...@mail.gmail.com
(*) http://lkml.kernel.org/r/b63fae4b-cb74-1928-b210-80914f3c8...@fb.com
(*) http://lkml.kernel.org/r/20200305050207.4159-1-luke.r.n...@gmail.com


Christophe


Re: [PATCH kernel v4 1/8] genirq/ipi: Simplify irq_reserve_ipi

2020-11-24 Thread Cédric Le Goater
On 11/24/20 7:17 AM, Alexey Kardashevskiy wrote:
> __irq_domain_alloc_irqs() can already handle virq==-1 and free
> descriptors if it failed allocating hardware interrupts so let's skip
> this extra step.
> 
> Signed-off-by: Alexey Kardashevskiy 

LGTM,

Reviewed-by: Cédric Le Goater 

Copying the MIPS folks since the IPI interface is only used under arch/mips.

C.
 
> ---
>  kernel/irq/ipi.c | 16 +++-
>  1 file changed, 3 insertions(+), 13 deletions(-)
> 
> diff --git a/kernel/irq/ipi.c b/kernel/irq/ipi.c
> index 43e3d1be622c..1b2807318ea9 100644
> --- a/kernel/irq/ipi.c
> +++ b/kernel/irq/ipi.c
> @@ -75,18 +75,12 @@ int irq_reserve_ipi(struct irq_domain *domain,
>   }
>   }
>  
> - virq = irq_domain_alloc_descs(-1, nr_irqs, 0, NUMA_NO_NODE, NULL);
> - if (virq <= 0) {
> - pr_warn("Can't reserve IPI, failed to alloc descs\n");
> - return -ENOMEM;
> - }
> -
> - virq = __irq_domain_alloc_irqs(domain, virq, nr_irqs, NUMA_NO_NODE,
> -(void *) dest, true, NULL);
> + virq = __irq_domain_alloc_irqs(domain, -1, nr_irqs, NUMA_NO_NODE,
> +(void *) dest, false, NULL);
>  
>   if (virq <= 0) {
>   pr_warn("Can't reserve IPI, failed to alloc hw irqs\n");
> - goto free_descs;
> + return -EBUSY;
>   }
>  
>   for (i = 0; i < nr_irqs; i++) {
> @@ -96,10 +90,6 @@ int irq_reserve_ipi(struct irq_domain *domain,
>   irq_set_status_flags(virq + i, IRQ_NO_BALANCING);
>   }
>   return virq;
> -
> -free_descs:
> - irq_free_descs(virq, nr_irqs);
> - return -EBUSY;
>  }
>  
>  /**
> 



[PATCH net 2/2] ibmvnic: Fix TX completion error handling

2020-11-24 Thread Thomas Falcon
TX completions received with an error return code are not
being processed properly. When an error code is seen, do not
proceed to the next completion before cleaning up the existing
entry's data structures.

Fixes: 032c5e828 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 489ed5e..7097bcb 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -3105,11 +3105,9 @@ static int ibmvnic_complete_tx(struct ibmvnic_adapter 
*adapter,
 
next = ibmvnic_next_scrq(adapter, scrq);
for (i = 0; i < next->tx_comp.num_comps; i++) {
-   if (next->tx_comp.rcs[i]) {
+   if (next->tx_comp.rcs[i])
dev_err(dev, "tx error %x\n",
next->tx_comp.rcs[i]);
-   continue;
-   }
index = be32_to_cpu(next->tx_comp.correlators[i]);
if (index & IBMVNIC_TSO_POOL_MASK) {
tx_pool = &adapter->tso_pool[pool];
-- 
1.8.3.1



[PATCH net 1/2] ibmvnic: Ensure that SCRQ entry reads are correctly ordered

2020-11-24 Thread Thomas Falcon
Ensure that received Subordinate Command-Response Queue (SCRQ)
entries are properly read in order by the driver. These queues
are used in the ibmvnic device to process RX buffer and TX completion
descriptors. dma_rmb barriers have been added after checking for a
pending descriptor to ensure the correct descriptor entry is checked
and after reading the SCRQ descriptor to ensure the entire
descriptor is read before processing.

Fixes: 032c5e828 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 2aa40b2..489ed5e 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2403,6 +2403,8 @@ static int ibmvnic_poll(struct napi_struct *napi, int 
budget)
 
if (!pending_scrq(adapter, adapter->rx_scrq[scrq_num]))
break;
+   /* ensure that we do not prematurely exit the polling loop */
+   dma_rmb();
next = ibmvnic_next_scrq(adapter, adapter->rx_scrq[scrq_num]);
rx_buff =
(struct ibmvnic_rx_buff *)be64_to_cpu(next->
@@ -3098,6 +3100,9 @@ static int ibmvnic_complete_tx(struct ibmvnic_adapter 
*adapter,
unsigned int pool = scrq->pool_index;
int num_entries = 0;
 
+   /* ensure that the correct descriptor entry is read */
+   dma_rmb();
+
next = ibmvnic_next_scrq(adapter, scrq);
for (i = 0; i < next->tx_comp.num_comps; i++) {
if (next->tx_comp.rcs[i]) {
@@ -3498,6 +3503,9 @@ static union sub_crq *ibmvnic_next_scrq(struct 
ibmvnic_adapter *adapter,
}
spin_unlock_irqrestore(&scrq->lock, flags);
 
+   /* ensure that the entire SCRQ descriptor is read */
+   dma_rmb();
+
return entry;
 }
 
-- 
1.8.3.1



[PATCH net 0/2] ibmvnic: Bug fixes for queue descriptor processing

2020-11-24 Thread Thomas Falcon
This series resolves a few issues in the ibmvnic driver's
RX buffer and TX completion processing. The first patch
includes memory barriers to synchronize queue descriptor
reads. The second patch fixes a memory leak that could
occur if the device returns a TX completion with an error
code in the descriptor, in which case the respective socket
buffer and other relevant data structures may not be freed
or updated properly.

Thomas Falcon (2):
  ibmvnic: Ensure that SCRQ entry reads are correctly ordered
  ibmvnic: Fix TX completion error handling

 drivers/net/ethernet/ibm/ibmvnic.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

-- 
1.8.3.1



eBPF on powerpc

2020-11-24 Thread Naveen N. Rao

Hi Christophe,

Christophe Leroy wrote:

Hi Naveen,

Few years ago, you implemented eBPF on PPC64.

Is there any reason for implementing it for PPC64 only ?


I focused on ppc64 since eBPF is a 64-bit VM and it was more 
straight-forward to target.


Is there something that makes it impossible to have eBPF for PPC32 as 
well ?


No, I just wasn't sure if it would be performant enough to warrant it.  
Since then however, there have been arm32 and riscv 32-bit JIT 
implementations and atleast the arm32 JIT seems to be showing ~50% 
better performance compared to the interpreter (*). So, it would be 
worthwhile to add support for ppc32.


Note that there might be a few instructions which would be difficult to 
support on 32-bit, but those can fallback to the interpreter, while 
allowing other programs to be JIT'ed.



- Naveen

(*) 
http://lkml.kernel.org/r/cagxu5jlyunvcjgcfhpebkdaoq71hdmgq4hhddxtypbqw_hx...@mail.gmail.com

(*) http://lkml.kernel.org/r/b63fae4b-cb74-1928-b210-80914f3c8...@fb.com
(*) http://lkml.kernel.org/r/20200305050207.4159-1-luke.r.n...@gmail.com


Re: [PATCH 1/3] perf/core: Flush PMU internal buffers for per-CPU events

2020-11-24 Thread Liang, Kan




On 11/24/2020 12:42 AM, Madhavan Srinivasan wrote:


On 11/24/20 10:21 AM, Namhyung Kim wrote:

Hello,

On Mon, Nov 23, 2020 at 8:00 PM Michael Ellerman  
wrote:

Namhyung Kim  writes:

Hi Peter and Kan,

(Adding PPC folks)

On Tue, Nov 17, 2020 at 2:01 PM Namhyung Kim  
wrote:

Hello,

On Thu, Nov 12, 2020 at 4:54 AM Liang, Kan 
 wrote:



On 11/11/2020 11:25 AM, Peter Zijlstra wrote:

On Mon, Nov 09, 2020 at 09:49:31AM -0500, Liang, Kan wrote:

- When the large PEBS was introduced (9c964efa4330), the 
sched_task() should
be invoked to flush the PEBS buffer in each context switch. 
However, The
perf_sched_events in account_event() is not updated accordingly. 
The
perf_event_task_sched_* never be invoked for a pure per-CPU 
context. Only

per-task event works.
 At that time, the perf_pmu_sched_task() is outside of
perf_event_context_sched_in/out. It means that perf has to double
perf_pmu_disable() for per-task event.
- The patch 1 tries to fix broken per-CPU events. The CPU 
context cannot be
retrieved from the task->perf_event_ctxp. So it has to be 
tracked in the
sched_cb_list. Yes, the code is very similar to the original 
codes, but it
is actually the new code for per-CPU events. The optimization 
for per-task

events is still kept.
    For the case, which has both a CPU context and a task 
context, yes, the

__perf_pmu_sched_task() in this patch is not invoked. Because the
sched_task() only need to be invoked once in a context switch. The
sched_task() will be eventually invoked in the task context.
The thing is; your first two patches rely on PERF_ATTACH_SCHED_CB 
and
only set that for large pebs. Are you sure the other users (Intel 
LBR

and PowerPC BHRB) don't need it?
I didn't set it for LBR, because the perf_sched_events is always 
enabled

for LBR. But, yes, we should explicitly set the PERF_ATTACH_SCHED_CB
for LBR.

 if (has_branch_stack(event))
 inc = true;


If they indeed do not require the pmu::sched_task() callback for CPU
events, then I still think the whole perf_sched_cb_{inc,dec}() 
interface

No, LBR requires the pmu::sched_task() callback for CPU events.

Now, The LBR registers have to be reset in sched in even for CPU 
events.


To fix the shorter LBR callstack issue for CPU events, we also 
need to

save/restore LBRs in pmu::sched_task().
https://lore.kernel.org/lkml/1578495789-95006-4-git-send-email-kan.li...@linux.intel.com/ 




is confusing at best.

Can't we do something like this instead?


I think the below patch may have two issues.
- PERF_ATTACH_SCHED_CB is required for LBR (maybe PowerPC BHRB as 
well) now.

- We may disable the large PEBS later if not all PEBS events support
large PEBS. The PMU need a way to notify the generic code to decrease
the nr_sched_task.

Any updates on this?  I've reviewed and tested Kan's patches
and they all look good.

Maybe we can talk to PPC folks to confirm the BHRB case?
Can we move this forward?  I saw patch 3/3 also adds 
PERF_ATTACH_SCHED_CB

for PowerPC too.  But it'd be nice if ppc folks can confirm the change.

Sorry I've read the whole thread, but I'm still not entirely sure I
understand the question.

Thanks for your time and sorry about not being clear enough.

We found per-cpu events are not calling pmu::sched_task()
on context switches.  So PERF_ATTACH_SCHED_CB was
added to indicate the core logic that it needs to invoke the
callback.

The patch 3/3 added the flag to PPC (for BHRB) with other
changes (I think it should be split like in the patch 2/3) and
want to get ACKs from the PPC folks.


Sorry for delay.

I guess first it will be better to split the ppc change to a separate 
patch,


Both PPC and X86 invokes the perf_sched_cb_inc() directly. The patch 
changes the parameters of the perf_sched_cb_inc(). I think we have to 
update the PPC and X86 codes together. Otherwise, there will be a 
compile error, if someone may only applies the change for the 
perf_sched_cb_inc() but forget to applies the changes in PPC or X86 
specific codes.




secondly, we are missing the changes needed in the power_pmu_bhrb_disable()

where perf_sched_cb_dec() needs the "state" to be included.



Ah, right. The below patch should fix the issue.

diff --git a/arch/powerpc/perf/core-book3s.c 
b/arch/powerpc/perf/core-book3s.c

index bced502f64a1..6756d1602a67 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -391,13 +391,18 @@ static void power_pmu_bhrb_enable(struct 
perf_event *event)

 static void power_pmu_bhrb_disable(struct perf_event *event)
 {
struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
+   int state = PERF_SCHED_CB_SW_IN;

if (!ppmu->bhrb_nr)
return;

WARN_ON_ONCE(!cpuhw->bhrb_users);
cpuhw->bhrb_users--;
-   perf_sched_cb_dec(event->ctx->pmu);
+
+   if (!(event->attach_state & PERF_ATTACH_TASK))
+   state |= PERF_SCHED_CB_CPU;
+
+   perf_sched_cb_dec(event->ctx->pmu, state);

  

[PATCH v1 6/6] powerpc/ppc-opcode: Add PPC_RAW_MFSPR()

2020-11-24 Thread Christophe Leroy
Add PPC_RAW_MFSPR() to replace open coding done in 8xx-pmu.c

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/ppc-opcode.h | 3 ++-
 arch/powerpc/perf/8xx-pmu.c   | 5 +
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index a6e3700c4566..da6f300e9788 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -230,7 +230,6 @@
 #define PPC_INST_POPCNTB_MASK  0xfc0007fe
 #define PPC_INST_RFEBB 0x4c000124
 #define PPC_INST_RFID  0x4c24
-#define PPC_INST_MFSPR 0x7c0002a6
 #define PPC_INST_MFSPR_DSCR0x7c1102a6
 #define PPC_INST_MFSPR_DSCR_MASK   0xfc1e
 #define PPC_INST_MTSPR_DSCR0x7c1103a6
@@ -507,6 +506,8 @@
 
 #define PPC_RAW_NEG(d, a)  (0x7cd0 | ___PPC_RT(d) | 
___PPC_RA(a))
 
+#define PPC_RAW_MFSPR(d, spr)  (0x7c0002a6 | ___PPC_RT(d) | 
__PPC_SPR(spr))
+
 /* Deal with instructions that older assemblers aren't aware of */
 #definePPC_BCCTR_FLUSH stringify_in_c(.long 
PPC_INST_BCCTR_FLUSH)
 #definePPC_CP_ABORTstringify_in_c(.long PPC_RAW_CP_ABORT)
diff --git a/arch/powerpc/perf/8xx-pmu.c b/arch/powerpc/perf/8xx-pmu.c
index 93004ee586a1..f970d1510d3d 100644
--- a/arch/powerpc/perf/8xx-pmu.c
+++ b/arch/powerpc/perf/8xx-pmu.c
@@ -153,10 +153,7 @@ static void mpc8xx_pmu_read(struct perf_event *event)
 
 static void mpc8xx_pmu_del(struct perf_event *event, int flags)
 {
-   struct ppc_inst insn;
-
-   /* mfspr r10, SPRN_SPRG_SCRATCH2 */
-   insn = ppc_inst(PPC_INST_MFSPR | __PPC_RS(R10) | 
__PPC_SPR(SPRN_SPRG_SCRATCH2));
+   struct ppc_inst insn = ppc_inst(PPC_RAW_MFSPR(10, SPRN_SPRG_SCRATCH2));
 
mpc8xx_pmu_read(event);
 
-- 
2.25.0



[PATCH v1 5/6] powerpc/8xx: Use SPRN_SPRG_SCRATCH2 in DTLB miss exception

2020-11-24 Thread Christophe Leroy
Use SPRN_SPRG_SCRATCH2 in DTLB miss exception instead of DAR
in order to be similar to ITLB miss exception.

This also simplifies mpc8xx_pmu_del()

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S |  9 -
 arch/powerpc/perf/8xx-pmu.c| 19 +++
 2 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 45239b06b6ce..35707e86c5f3 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -247,7 +247,7 @@ InstructionTLBMiss:
 
. = 0x1200
 DataStoreTLBMiss:
-   mtspr   SPRN_DAR, r10
+   mtspr   SPRN_SPRG_SCRATCH2, r10
mtspr   SPRN_M_TW, r11
mfcrr11
 
@@ -286,11 +286,11 @@ DataStoreTLBMiss:
li  r11, RPN_PATTERN
rlwimi  r10, r11, 0, 24, 27 /* Set 24-27 */
mtspr   SPRN_MD_RPN, r10/* Update TLB entry */
+   mtspr   SPRN_DAR, r11   /* Tag DAR */
 
/* Restore registers */
 
-0: mfspr   r10, SPRN_DAR
-   mtspr   SPRN_DAR, r11   /* Tag DAR */
+0: mfspr   r10, SPRN_SPRG_SCRATCH2
mfspr   r11, SPRN_M_TW
rfi
patch_site  0b, patch__dtlbmiss_exit_1
@@ -300,8 +300,7 @@ DataStoreTLBMiss:
 0: lwz r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0)
addir10, r10, 1
stw r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0)
-   mfspr   r10, SPRN_DAR
-   mtspr   SPRN_DAR, r11   /* Tag DAR */
+   mfspr   r10, SPRN_SPRG_SCRATCH2
mfspr   r11, SPRN_M_TW
rfi
 #endif
diff --git a/arch/powerpc/perf/8xx-pmu.c b/arch/powerpc/perf/8xx-pmu.c
index 02db58c7427a..93004ee586a1 100644
--- a/arch/powerpc/perf/8xx-pmu.c
+++ b/arch/powerpc/perf/8xx-pmu.c
@@ -153,6 +153,11 @@ static void mpc8xx_pmu_read(struct perf_event *event)
 
 static void mpc8xx_pmu_del(struct perf_event *event, int flags)
 {
+   struct ppc_inst insn;
+
+   /* mfspr r10, SPRN_SPRG_SCRATCH2 */
+   insn = ppc_inst(PPC_INST_MFSPR | __PPC_RS(R10) | 
__PPC_SPR(SPRN_SPRG_SCRATCH2));
+
mpc8xx_pmu_read(event);
 
/* If it was the last user, stop counting to avoid useles overhead */
@@ -164,22 +169,12 @@ static void mpc8xx_pmu_del(struct perf_event *event, int 
flags)
mtspr(SPRN_ICTRL, 7);
break;
case PERF_8xx_ID_ITLB_LOAD_MISS:
-   if (atomic_dec_return(&itlb_miss_ref) == 0) {
-   /* mfspr r10, SPRN_SPRG_SCRATCH2 */
-   struct ppc_inst insn = ppc_inst(PPC_INST_MFSPR | 
__PPC_RS(R10) |
-   __PPC_SPR(SPRN_SPRG_SCRATCH2));
-
+   if (atomic_dec_return(&itlb_miss_ref) == 0)
patch_instruction_site(&patch__itlbmiss_exit_1, insn);
-   }
break;
case PERF_8xx_ID_DTLB_LOAD_MISS:
-   if (atomic_dec_return(&dtlb_miss_ref) == 0) {
-   /* mfspr r10, SPRN_DAR */
-   struct ppc_inst insn = ppc_inst(PPC_INST_MFSPR | 
__PPC_RS(R10) |
-   __PPC_SPR(SPRN_DAR));
-
+   if (atomic_dec_return(&dtlb_miss_ref) == 0)
patch_instruction_site(&patch__dtlbmiss_exit_1, insn);
-   }
break;
}
 }
-- 
2.25.0



[PATCH v1 2/6] powerpc/8xx: Always pin kernel text TLB

2020-11-24 Thread Christophe Leroy
There is no big poing in not pinning kernel text anymore, as now
we can keep pinned TLB even with things like DEBUG_PAGEALLOC.

Remove CONFIG_PIN_TLB_TEXT, making it always right.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig   |  3 +--
 arch/powerpc/kernel/head_8xx.S | 20 +++-
 arch/powerpc/mm/nohash/8xx.c   |  3 +--
 arch/powerpc/platforms/8xx/Kconfig |  7 ---
 4 files changed, 5 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e9f13fe08492..bf088b5b0a89 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -795,8 +795,7 @@ config DATA_SHIFT_BOOL
bool "Set custom data alignment"
depends on ADVANCED_OPTIONS
depends on STRICT_KERNEL_RWX || DEBUG_PAGEALLOC
-   depends on PPC_BOOK3S_32 || (PPC_8xx && !PIN_TLB_DATA && \
-(!PIN_TLB_TEXT || !STRICT_KERNEL_RWX))
+   depends on PPC_BOOK3S_32 || (PPC_8xx && !PIN_TLB_DATA && 
!STRICT_KERNEL_RWX)
help
  This option allows you to set the kernel data alignment. When
  RAM is mapped by blocks, the alignment needs to fit the size and
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 66ee62f30d36..775b4f4d011e 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -42,15 +42,6 @@
 #endif
 .endm
 
-/*
- * We need an ITLB miss handler for kernel addresses if:
- * - Either we have modules
- * - Or we have not pinned the first 8M
- */
-#if defined(CONFIG_MODULES) || !defined(CONFIG_PIN_TLB_TEXT)
-#define ITLB_MISS_KERNEL   1
-#endif
-
 /*
  * Value for the bits that have fixed value in RPN entries.
  * Also used for tagging DAR for DTLBerror.
@@ -209,12 +200,12 @@ InstructionTLBMiss:
mfspr   r10, SPRN_SRR0  /* Get effective address of fault */
INVALIDATE_ADJACENT_PAGES_CPU15(r10)
mtspr   SPRN_MD_EPN, r10
-#ifdef ITLB_MISS_KERNEL
+#ifdef CONFIG_MODULES
mfcrr11
compare_to_kernel_boundary r10, r10
 #endif
mfspr   r10, SPRN_M_TWB /* Get level 1 table */
-#ifdef ITLB_MISS_KERNEL
+#ifdef CONFIG_MODULES
blt+3f
rlwinm  r10, r10, 0, 20, 31
orisr10, r10, (swapper_pg_dir - PAGE_OFFSET)@ha
@@ -618,10 +609,6 @@ start_here:
lis r0, (MD_TWAM | MD_RSV4I)@h
mtspr   SPRN_MD_CTR, r0
 #endif
-#ifndef CONFIG_PIN_TLB_TEXT
-   li  r0, 0
-   mtspr   SPRN_MI_CTR, r0
-#endif
 #if !defined(CONFIG_PIN_TLB_DATA) && !defined(CONFIG_PIN_TLB_IMMR)
lis r0, MD_TWAM@h
mtspr   SPRN_MD_CTR, r0
@@ -739,7 +726,6 @@ _GLOBAL(mmu_pin_tlb)
mtspr   SPRN_MD_CTR, r6
tlbia
 
-#ifdef CONFIG_PIN_TLB_TEXT
LOAD_REG_IMMEDIATE(r5, 28 << 8)
LOAD_REG_IMMEDIATE(r6, PAGE_OFFSET)
LOAD_REG_IMMEDIATE(r7, MI_SVALID | MI_PS8MEG | _PMD_ACCESSED)
@@ -760,7 +746,7 @@ _GLOBAL(mmu_pin_tlb)
bdnzt   lt, 2b
lis r0, MI_RSV4I@h
mtspr   SPRN_MI_CTR, r0
-#endif
+
LOAD_REG_IMMEDIATE(r5, 28 << 8 | MD_TWAM)
 #ifdef CONFIG_PIN_TLB_DATA
LOAD_REG_IMMEDIATE(r6, PAGE_OFFSET)
diff --git a/arch/powerpc/mm/nohash/8xx.c b/arch/powerpc/mm/nohash/8xx.c
index 231ca95f9ffb..19a3eec1d8c5 100644
--- a/arch/powerpc/mm/nohash/8xx.c
+++ b/arch/powerpc/mm/nohash/8xx.c
@@ -186,8 +186,7 @@ void mmu_mark_initmem_nx(void)
mmu_mapin_ram_chunk(0, boundary, PAGE_KERNEL_TEXT, false);
mmu_mapin_ram_chunk(boundary, einittext8, PAGE_KERNEL, false);
 
-   if (IS_ENABLED(CONFIG_PIN_TLB_TEXT))
-   mmu_pin_tlb(block_mapped_ram, false);
+   mmu_pin_tlb(block_mapped_ram, false);
 }
 
 #ifdef CONFIG_STRICT_KERNEL_RWX
diff --git a/arch/powerpc/platforms/8xx/Kconfig 
b/arch/powerpc/platforms/8xx/Kconfig
index cdda034733ff..1a8400bfbe82 100644
--- a/arch/powerpc/platforms/8xx/Kconfig
+++ b/arch/powerpc/platforms/8xx/Kconfig
@@ -202,13 +202,6 @@ config PIN_TLB_IMMR
  CONFIG_PIN_TLB_DATA is also selected, it will reduce
  CONFIG_PIN_TLB_DATA to 24 Mbytes.
 
-config PIN_TLB_TEXT
-   bool "Pinned TLB for TEXT"
-   depends on PIN_TLB
-   default y
-   help
- This pins kernel text with 8M pages.
-
 endmenu
 
 endmenu
-- 
2.25.0



[PATCH v1 4/6] powerpc/8xx: Use SPRN_SPRG_SCRATCH2 in ITLB miss exception

2020-11-24 Thread Christophe Leroy
In order to re-enable MMU earlier, ensure ITLB miss exception
cannot clobber SPRN_SPRG_SCRATCH0 and SPRN_SPRG_SCRATCH1.
Do so by using SPRN_SPRG_SCRATCH2 and SPRN_M_TW instead, like
the DTLB miss exception.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S | 12 ++--
 arch/powerpc/perf/8xx-pmu.c|  4 ++--
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 558c8e615ef9..45239b06b6ce 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -190,8 +190,8 @@ SystemCall:
 #endif
 
 InstructionTLBMiss:
-   mtspr   SPRN_SPRG_SCRATCH0, r10
-   mtspr   SPRN_SPRG_SCRATCH1, r11
+   mtspr   SPRN_SPRG_SCRATCH2, r10
+   mtspr   SPRN_M_TW, r11
 
/* If we are faulting a kernel address, we have to use the
 * kernel page tables.
@@ -230,8 +230,8 @@ InstructionTLBMiss:
mtspr   SPRN_MI_RPN, r10/* Update TLB entry */
 
/* Restore registers */
-0: mfspr   r10, SPRN_SPRG_SCRATCH0
-   mfspr   r11, SPRN_SPRG_SCRATCH1
+0: mfspr   r10, SPRN_SPRG_SCRATCH2
+   mfspr   r11, SPRN_M_TW
rfi
patch_site  0b, patch__itlbmiss_exit_1
 
@@ -240,8 +240,8 @@ InstructionTLBMiss:
 0: lwz r10, (itlb_miss_counter - PAGE_OFFSET)@l(0)
addir10, r10, 1
stw r10, (itlb_miss_counter - PAGE_OFFSET)@l(0)
-   mfspr   r10, SPRN_SPRG_SCRATCH0
-   mfspr   r11, SPRN_SPRG_SCRATCH1
+   mfspr   r10, SPRN_SPRG_SCRATCH2
+   mfspr   r11, SPRN_M_TW
rfi
 #endif
 
diff --git a/arch/powerpc/perf/8xx-pmu.c b/arch/powerpc/perf/8xx-pmu.c
index e53c3c161257..02db58c7427a 100644
--- a/arch/powerpc/perf/8xx-pmu.c
+++ b/arch/powerpc/perf/8xx-pmu.c
@@ -165,9 +165,9 @@ static void mpc8xx_pmu_del(struct perf_event *event, int 
flags)
break;
case PERF_8xx_ID_ITLB_LOAD_MISS:
if (atomic_dec_return(&itlb_miss_ref) == 0) {
-   /* mfspr r10, SPRN_SPRG_SCRATCH0 */
+   /* mfspr r10, SPRN_SPRG_SCRATCH2 */
struct ppc_inst insn = ppc_inst(PPC_INST_MFSPR | 
__PPC_RS(R10) |
-   __PPC_SPR(SPRN_SPRG_SCRATCH0));
+   __PPC_SPR(SPRN_SPRG_SCRATCH2));
 
patch_instruction_site(&patch__itlbmiss_exit_1, insn);
}
-- 
2.25.0



[PATCH v1 1/6] powerpc/8xx: DEBUG_PAGEALLOC doesn't require an ITLB miss exception handler

2020-11-24 Thread Christophe Leroy
Since commit e611939fc8ec ("powerpc/mm: Ensure change_page_attr()
doesn't invalidate pinned TLBs"), pinned TLBs are not anymore
invalidated by __kernel_map_pages() when CONFIG_DEBUG_PAGEALLOC is
selected.

Remove the dependency on CONFIG_DEBUG_PAGEALLOC.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index ee0bfebc375f..66ee62f30d36 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -47,8 +47,7 @@
  * - Either we have modules
  * - Or we have not pinned the first 8M
  */
-#if defined(CONFIG_MODULES) || !defined(CONFIG_PIN_TLB_TEXT) || \
-defined(CONFIG_DEBUG_PAGEALLOC)
+#if defined(CONFIG_MODULES) || !defined(CONFIG_PIN_TLB_TEXT)
 #define ITLB_MISS_KERNEL   1
 #endif
 
-- 
2.25.0



[PATCH v1 3/6] powerpc/8xx: Simplify INVALIDATE_ADJACENT_PAGES_CPU15

2020-11-24 Thread Christophe Leroy
We now have r11 available as a scratch register so
INVALIDATE_ADJACENT_PAGES_CPU15() can be simplified.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 775b4f4d011e..558c8e615ef9 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -180,14 +180,13 @@ SystemCall:
  */
 
 #ifdef CONFIG_8xx_CPU15
-#define INVALIDATE_ADJACENT_PAGES_CPU15(addr)  \
-   addiaddr, addr, PAGE_SIZE;  \
-   tlbie   addr;   \
-   addiaddr, addr, -(PAGE_SIZE << 1);  \
-   tlbie   addr;   \
-   addiaddr, addr, PAGE_SIZE
+#define INVALIDATE_ADJACENT_PAGES_CPU15(addr, tmp) \
+   additmp, addr, PAGE_SIZE;   \
+   tlbie   tmp;\
+   additmp, addr, -PAGE_SIZE;  \
+   tlbie   tmp
 #else
-#define INVALIDATE_ADJACENT_PAGES_CPU15(addr)
+#define INVALIDATE_ADJACENT_PAGES_CPU15(addr, tmp)
 #endif
 
 InstructionTLBMiss:
@@ -198,7 +197,7 @@ InstructionTLBMiss:
 * kernel page tables.
 */
mfspr   r10, SPRN_SRR0  /* Get effective address of fault */
-   INVALIDATE_ADJACENT_PAGES_CPU15(r10)
+   INVALIDATE_ADJACENT_PAGES_CPU15(r10, r11)
mtspr   SPRN_MD_EPN, r10
 #ifdef CONFIG_MODULES
mfcrr11
-- 
2.25.0



eBPF on powerpc

2020-11-24 Thread Christophe Leroy

Hi Naveen,

Few years ago, you implemented eBPF on PPC64.

Is there any reason for implementing it for PPC64 only ? Is there something that makes it impossible 
to have eBPF for PPC32 as well ?


Thanks
Christophe


Re: [PATCH] tpm: ibmvtpm: fix error return code in tpm_ibmvtpm_probe()

2020-11-24 Thread Stefan Berger

On 11/24/20 8:52 AM, Wang Hai wrote:

Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: d8d74ea3c002 ("tpm: ibmvtpm: Wait for buffer to be set before 
proceeding")
Reported-by: Hulk Robot 
Signed-off-by: Wang Hai 
---
  drivers/char/tpm/tpm_ibmvtpm.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c
index 994385bf37c0..813eb2cac0ce 100644
--- a/drivers/char/tpm/tpm_ibmvtpm.c
+++ b/drivers/char/tpm/tpm_ibmvtpm.c
@@ -687,6 +687,7 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev,
ibmvtpm->rtce_buf != NULL,
HZ)) {
dev_err(dev, "CRQ response timed out\n");
+   rc = -ETIMEDOUT;
goto init_irq_cleanup;
}
  


Reviewed-by: Stefan Berger 



[PATCH] ASoC: fsl_xcvr: fix potential resource leak

2020-11-24 Thread Viorel Suman (OSS)
From: Viorel Suman 

"fw" variable must be relased before return.

Signed-off-by: Viorel Suman 
---
 sound/soc/fsl/fsl_xcvr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/soc/fsl/fsl_xcvr.c b/sound/soc/fsl/fsl_xcvr.c
index 2a28810d0e29..3d58c88ea603 100644
--- a/sound/soc/fsl/fsl_xcvr.c
+++ b/sound/soc/fsl/fsl_xcvr.c
@@ -706,6 +706,7 @@ static int fsl_xcvr_load_firmware(struct fsl_xcvr *xcvr)
/* RAM is 20KiB = 16KiB code + 4KiB data => max 10 pages 2KiB each */
if (rem > 16384) {
dev_err(dev, "FW size %d is bigger than 16KiB.\n", rem);
+   release_firmware(fw);
return -ENOMEM;
}
 
-- 
2.26.2



[PATCH] tpm: ibmvtpm: fix error return code in tpm_ibmvtpm_probe()

2020-11-24 Thread Wang Hai
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: d8d74ea3c002 ("tpm: ibmvtpm: Wait for buffer to be set before 
proceeding")
Reported-by: Hulk Robot 
Signed-off-by: Wang Hai 
---
 drivers/char/tpm/tpm_ibmvtpm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c
index 994385bf37c0..813eb2cac0ce 100644
--- a/drivers/char/tpm/tpm_ibmvtpm.c
+++ b/drivers/char/tpm/tpm_ibmvtpm.c
@@ -687,6 +687,7 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev,
ibmvtpm->rtce_buf != NULL,
HZ)) {
dev_err(dev, "CRQ response timed out\n");
+   rc = -ETIMEDOUT;
goto init_irq_cleanup;
}
 
-- 
2.17.1



Re: [PATCH V2 4/5] ocxl: Add mmu notifier

2020-11-24 Thread Jason Gunthorpe
On Tue, Nov 24, 2020 at 09:17:38AM +, Christoph Hellwig wrote:

> > @@ -470,6 +487,26 @@ void ocxl_link_release(struct pci_dev *dev, void 
> > *link_handle)
> >  }
> >  EXPORT_SYMBOL_GPL(ocxl_link_release);
> >  
> > +static void invalidate_range(struct mmu_notifier *mn,
> > +struct mm_struct *mm,
> > +unsigned long start, unsigned long end)
> > +{
> > +   struct pe_data *pe_data = container_of(mn, struct pe_data, 
> > mmu_notifier);
> > +   struct ocxl_link *link = pe_data->link;
> > +   unsigned long addr, pid, page_size = PAGE_SIZE;

The page_size variable seems unnecessary

> > +
> > +   pid = mm->context.id;
> > +
> > +   spin_lock(&link->atsd_lock);
> > +   for (addr = start; addr < end; addr += page_size)
> > +   pnv_ocxl_tlb_invalidate(&link->arva, pid, addr);
> > +   spin_unlock(&link->atsd_lock);
> > +}
> > +
> > +static const struct mmu_notifier_ops ocxl_mmu_notifier_ops = {
> > +   .invalidate_range = invalidate_range,
> > +};
> > +
> >  static u64 calculate_cfg_state(bool kernel)
> >  {
> > u64 state;
> > @@ -526,6 +563,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
> > pidr, u32 tidr,
> > pe_data->mm = mm;
> > pe_data->xsl_err_cb = xsl_err_cb;
> > pe_data->xsl_err_data = xsl_err_data;
> > +   pe_data->link = link;
> > +   pe_data->mmu_notifier.ops = &ocxl_mmu_notifier_ops;
> >  
> > memset(pe, 0, sizeof(struct ocxl_process_element));
> > pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0));
> > @@ -542,8 +581,16 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
> > pidr, u32 tidr,
> >  * by the nest MMU. If we have a kernel context, TLBIs are
> >  * already global.
> >  */
> > -   if (mm)
> > +   if (mm) {
> > mm_context_add_copro(mm);
> > +   if (link->arva) {
> > +   /* Use MMIO registers for the TLB Invalidate
> > +* operations.
> > +*/
> > +   mmu_notifier_register(&pe_data->mmu_notifier, mm);

Every other place doing stuff like this is de-duplicating the
notifier. If you have multiple clients this will do multiple redundant
invalidations?

The notifier get/put API is designed to solve that problem, you'd get
a single notifier for the mm and then add the impacted arva's to some
list at the notifier.

Jason


[PATCH 2/3] powerpc: Make NUMA default y for powernv

2020-11-24 Thread Michael Ellerman
Our NUMA option is default y for pseries, but not powernv. The bulk of
powernv systems are NUMA, so make NUMA default y for powernv also.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a22db3db6b96..4d688b426353 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -661,7 +661,7 @@ config IRQ_ALL_CPUS
 config NUMA
bool "NUMA support"
depends on PPC64 && SMP
-   default y if SMP && PPC_PSERIES
+   default y if PPC_PSERIES || PPC_POWERNV
 
 config NODES_SHIFT
int
-- 
2.25.1



[PATCH 3/3] powerpc: Update NUMA Kconfig description & help text

2020-11-24 Thread Michael Ellerman
Update the NUMA Kconfig description to match other architectures, and
add some help text. Shamelessly borrowed from x86/arm64.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/Kconfig | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 4d688b426353..7f4995b245a3 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -659,9 +659,15 @@ config IRQ_ALL_CPUS
  reported with SMP Power Macintoshes with this option enabled.
 
 config NUMA
-   bool "NUMA support"
+   bool "NUMA Memory Allocation and Scheduler Support"
depends on PPC64 && SMP
default y if PPC_PSERIES || PPC_POWERNV
+   help
+ Enable NUMA (Non-Uniform Memory Access) support.
+
+ The kernel will try to allocate memory used by a CPU on the
+ local memory controller of the CPU and add some more
+ NUMA awareness to the kernel.
 
 config NODES_SHIFT
int
-- 
2.25.1



[PATCH 1/3] powerpc: Make NUMA depend on SMP

2020-11-24 Thread Michael Ellerman
Our Kconfig allows NUMA to be enabled without SMP, but none of
our defconfigs use that combination. This means it can easily be
broken inadvertently by code changes, which has happened recently.

Although it's theoretically possible to have a machine with a single
CPU and multiple memory nodes, I can't think of any real systems where
that's the case. Even so if such a system exists, it can just run an
SMP kernel anyway.

So to avoid the need to add extra #ifdefs and/or build breaks, make
NUMA depend on SMP.

Reported-by: kernel test robot 
Reported-by: Randy Dunlap 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e9f13fe08492..a22db3db6b96 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -660,7 +660,7 @@ config IRQ_ALL_CPUS
 
 config NUMA
bool "NUMA support"
-   depends on PPC64
+   depends on PPC64 && SMP
default y if SMP && PPC_PSERIES
 
 config NODES_SHIFT
-- 
2.25.1



Re: [PATCH 3/3] selftests/powerpc: Add VF recovery tests

2020-11-24 Thread Oliver O'Halloran
On Tue, Nov 24, 2020 at 9:14 PM Frederic Barrat  wrote:
>
> Is it possible to run those tests on pseries? I haven't managed to set
> up a LPAR with a physical function which would let me enable a virtual
> function. All I could do is assign a virtual function to a LPAR. When
> assigning a physical function to the LPAR, enabling a virtual function
> fails because of missing properties in the device tree, so it looks like
> the hypervisor doesn't support it (?).
>
> Same story on qemu.
>
>Fred

IIRC having the guest manage SR-IOV was a half-baked feature that
never made it into a production phyp build. I never managed to get any
real documentation from the phyp folks about how it worked either. As
far as I can tell it's pretty similar to what we do on PowerNV with
the PE configuration being handled by h-call rather than OPAL call.
The main difference would be in how EEH freezes are handled and I know
there's *something* going on there, but I never really understood it
due to the lack of documentation.

I've been tempted to rip out all that crap a few times, but never
really got around to it. There was also some noises about implementing
support for guest managed SRIOV in pseries qemu, but I'm not sure what
ever happened to that.

Oliver


[PATCH v2 4/4] KVM: PPC: Introduce new capability for 2nd DAWR

2020-11-24 Thread Ravi Bangoria
Introduce KVM_CAP_PPC_DAWR1 which can be used by Qemu to query whether
kvm supports 2nd DAWR or not.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/kvm/powerpc.c | 3 +++
 include/uapi/linux/kvm.h   | 1 +
 2 files changed, 4 insertions(+)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 13999123b735..48763fe59fc5 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -679,6 +679,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
!kvmppc_hv_ops->enable_svm(NULL);
break;
 #endif
+   case KVM_CAP_PPC_DAWR1:
+   r = cpu_has_feature(CPU_FTR_DAWR1);
+   break;
default:
r = 0;
break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f6d86033c4fa..0f32d6cbabc2 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1035,6 +1035,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_LAST_CPU 184
 #define KVM_CAP_SMALLER_MAXPHYADDR 185
 #define KVM_CAP_S390_DIAG318 186
+#define KVM_CAP_PPC_DAWR1 187
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.26.2



[PATCH v2 0/4] KVM: PPC: Power10 2nd DAWR enablement

2020-11-24 Thread Ravi Bangoria
Enable p10 2nd DAWR feature for Book3S kvm guest. DAWR is a hypervisor
resource and thus H_SET_MODE hcall is used to set/unset it. A new case
H_SET_MODE_RESOURCE_SET_DAWR1 is introduced in H_SET_MODE hcall for
setting/unsetting 2nd DAWR. Also, new capability KVM_CAP_PPC_DAWR1 has
been added to query 2nd DAWR support via kvm ioctl.

This feature also needs to be enabled in Qemu to really use it. I'll
post Qemu patches once kvm patches get accepted.

v1: 
https://lore.kernel.org/r/20200723102058.312282-1-ravi.bango...@linux.ibm.com

v1->v2:
 - patch #1: New patch
 - patch #2: Don't rename KVM_REG_PPC_DAWR, it's an uapi macro
 - patch #3: Increment HV_GUEST_STATE_VERSION
 - Split kvm and selftests patches into different series
 - Patches rebased to paulus/kvm-ppc-next (cf59eb13e151) + few
   other watchpoint patches which are yet to be merged in
   paulus/kvm-ppc-next.

Ravi Bangoria (4):
  KVM: PPC: Allow nested guest creation when L0 hv_guest_state > L1
  KVM: PPC: Rename current DAWR macros and variables
  KVM: PPC: Add infrastructure to support 2nd DAWR
  KVM: PPC: Introduce new capability for 2nd DAWR

 Documentation/virt/kvm/api.rst|  2 +
 arch/powerpc/include/asm/hvcall.h | 25 -
 arch/powerpc/include/asm/kvm_host.h   |  6 +-
 arch/powerpc/include/uapi/asm/kvm.h   |  2 +
 arch/powerpc/kernel/asm-offsets.c |  6 +-
 arch/powerpc/kvm/book3s_hv.c  | 65 ++
 arch/powerpc/kvm/book3s_hv_nested.c   | 68 ++-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 43 ++
 arch/powerpc/kvm/powerpc.c|  3 +
 include/uapi/linux/kvm.h  |  1 +
 tools/arch/powerpc/include/uapi/asm/kvm.h |  2 +
 11 files changed, 181 insertions(+), 42 deletions(-)

-- 
2.26.2



[PATCH v2 1/4] KVM: PPC: Allow nested guest creation when L0 hv_guest_state > L1

2020-11-24 Thread Ravi Bangoria
On powerpc, L1 hypervisor takes help of L0 using H_ENTER_NESTED
hcall to load L2 guest state in cpu. L1 hypervisor prepares the
L2 state in struct hv_guest_state and passes a pointer to it via
hcall. Using that pointer, L0 reads/writes that state directly
from/to L1 memory. Thus L0 must be aware of hv_guest_state layout
of L1. Currently it uses version field to achieve this. i.e. If
L0 hv_guest_state.version != L1 hv_guest_state.version, L0 won't
allow nested kvm guest.

This restriction can be loosen up a bit. L0 can be taught to
understand older layout of hv_guest_state, if we restrict the
new member to be added only at the end. i.e. we can allow
nested guest even when L0 hv_guest_state.version > L1
hv_guest_state.version. Though, the other way around is not
possible.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/include/asm/hvcall.h   | 17 +++--
 arch/powerpc/kvm/book3s_hv_nested.c | 53 -
 2 files changed, 59 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index fbb377055471..a7073fddb657 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -524,9 +524,12 @@ struct h_cpu_char_result {
u64 behaviour;
 };
 
-/* Register state for entering a nested guest with H_ENTER_NESTED */
+/*
+ * Register state for entering a nested guest with H_ENTER_NESTED.
+ * New member must be added at the end.
+ */
 struct hv_guest_state {
-   u64 version;/* version of this structure layout */
+   u64 version;/* version of this structure layout, must be 
first */
u32 lpid;
u32 vcpu_token;
/* These registers are hypervisor privileged (at least for writing) */
@@ -560,6 +563,16 @@ struct hv_guest_state {
 /* Latest version of hv_guest_state structure */
 #define HV_GUEST_STATE_VERSION 1
 
+static inline int hv_guest_state_size(unsigned int version)
+{
+   switch (version) {
+   case 1:
+   return offsetofend(struct hv_guest_state, ppr);
+   default:
+   return -1;
+   }
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_HVCALL_H */
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index 33b58549a9aa..2b433c3bacea 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -215,6 +215,45 @@ static void kvmhv_nested_mmio_needed(struct kvm_vcpu 
*vcpu, u64 regs_ptr)
}
 }
 
+static int kvmhv_read_guest_state_and_regs(struct kvm_vcpu *vcpu,
+  struct hv_guest_state *l2_hv,
+  struct pt_regs *l2_regs,
+  u64 hv_ptr, u64 regs_ptr)
+{
+   int size;
+
+   if (kvm_vcpu_read_guest(vcpu, hv_ptr, &(l2_hv->version),
+   sizeof(l2_hv->version)))
+   return -1;
+
+   if (kvmppc_need_byteswap(vcpu))
+   l2_hv->version = swab64(l2_hv->version);
+
+   size = hv_guest_state_size(l2_hv->version);
+   if (size < 0)
+   return -1;
+
+   return kvm_vcpu_read_guest(vcpu, hv_ptr, l2_hv, size) ||
+   kvm_vcpu_read_guest(vcpu, regs_ptr, l2_regs,
+   sizeof(struct pt_regs));
+}
+
+static int kvmhv_write_guest_state_and_regs(struct kvm_vcpu *vcpu,
+   struct hv_guest_state *l2_hv,
+   struct pt_regs *l2_regs,
+   u64 hv_ptr, u64 regs_ptr)
+{
+   int size;
+
+   size = hv_guest_state_size(l2_hv->version);
+   if (size < 0)
+   return -1;
+
+   return kvm_vcpu_write_guest(vcpu, hv_ptr, l2_hv, size) ||
+   kvm_vcpu_write_guest(vcpu, regs_ptr, l2_regs,
+sizeof(struct pt_regs));
+}
+
 long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 {
long int err, r;
@@ -235,17 +274,15 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
hv_ptr = kvmppc_get_gpr(vcpu, 4);
regs_ptr = kvmppc_get_gpr(vcpu, 5);
vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
-   err = kvm_vcpu_read_guest(vcpu, hv_ptr, &l2_hv,
- sizeof(struct hv_guest_state)) ||
-   kvm_vcpu_read_guest(vcpu, regs_ptr, &l2_regs,
-   sizeof(struct pt_regs));
+   err = kvmhv_read_guest_state_and_regs(vcpu, &l2_hv, &l2_regs,
+ hv_ptr, regs_ptr);
srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
if (err)
return H_PARAMETER;
 
if (kvmppc_need_byteswap(vcpu))
byteswap_hv_regs(&l2_hv);
-   if (l2_hv.version != HV_GUEST_STATE_VERSION)
+   if (l2_hv.version > HV_GUEST_STATE_VERSION)
return H_P2;

[PATCH v2 3/4] KVM: PPC: Add infrastructure to support 2nd DAWR

2020-11-24 Thread Ravi Bangoria
kvm code assumes single DAWR everywhere. Add code to support 2nd DAWR.
DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/
unset it. Introduce new case H_SET_MODE_RESOURCE_SET_DAWR1 for 2nd DAWR.
Also, kvm will support 2nd DAWR only if CPU_FTR_DAWR1 is set.

Signed-off-by: Ravi Bangoria 
---
 Documentation/virt/kvm/api.rst|  2 ++
 arch/powerpc/include/asm/hvcall.h |  8 -
 arch/powerpc/include/asm/kvm_host.h   |  2 ++
 arch/powerpc/include/uapi/asm/kvm.h   |  2 ++
 arch/powerpc/kernel/asm-offsets.c |  2 ++
 arch/powerpc/kvm/book3s_hv.c  | 41 +++
 arch/powerpc/kvm/book3s_hv_nested.c   |  7 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 23 +
 tools/arch/powerpc/include/uapi/asm/kvm.h |  2 ++
 9 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index eb3a1316f03e..72c98735aa52 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -2249,6 +2249,8 @@ registers, find a list below:
   PPC KVM_REG_PPC_PSSCR   64
   PPC KVM_REG_PPC_DEC_EXPIRY  64
   PPC KVM_REG_PPC_PTCR64
+  PPC KVM_REG_PPC_DAWR1   64
+  PPC KVM_REG_PPC_DAWRX1  64
   PPC KVM_REG_PPC_TM_GPR0 64
   ...
   PPC KVM_REG_PPC_TM_GPR3164
diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index a7073fddb657..4bacd27a348b 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -558,16 +558,22 @@ struct hv_guest_state {
u64 pidr;
u64 cfar;
u64 ppr;
+   /* Version 1 ends here */
+   u64 dawr1;
+   u64 dawrx1;
+   /* Version 2 ends here */
 };
 
 /* Latest version of hv_guest_state structure */
-#define HV_GUEST_STATE_VERSION 1
+#define HV_GUEST_STATE_VERSION 2
 
 static inline int hv_guest_state_size(unsigned int version)
 {
switch (version) {
case 1:
return offsetofend(struct hv_guest_state, ppr);
+   case 2:
+   return offsetofend(struct hv_guest_state, dawrx1);
default:
return -1;
}
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 62cadf1a596e..9804afdf8578 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -586,6 +586,8 @@ struct kvm_vcpu_arch {
ulong dabr;
ulong dawr0;
ulong dawrx0;
+   ulong dawr1;
+   ulong dawrx1;
ulong ciabr;
ulong cfar;
ulong ppr;
diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index c3af3f324c5a..9f18fa090f1f 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -644,6 +644,8 @@ struct kvm_ppc_cpu_char {
 #define KVM_REG_PPC_MMCR3  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc1)
 #define KVM_REG_PPC_SIER2  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc2)
 #define KVM_REG_PPC_SIER3  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc3)
+#define KVM_REG_PPC_DAWR1  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc4)
+#define KVM_REG_PPC_DAWRX1 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc5)
 
 /* Transactional Memory checkpointed state:
  * This is all GPRs, all VSX regs and a subset of SPRs
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index e4256f5b4602..26d4fa8fe51e 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -549,6 +549,8 @@ int main(void)
OFFSET(VCPU_DABRX, kvm_vcpu, arch.dabrx);
OFFSET(VCPU_DAWR0, kvm_vcpu, arch.dawr0);
OFFSET(VCPU_DAWRX0, kvm_vcpu, arch.dawrx0);
+   OFFSET(VCPU_DAWR1, kvm_vcpu, arch.dawr1);
+   OFFSET(VCPU_DAWRX1, kvm_vcpu, arch.dawrx1);
OFFSET(VCPU_CIABR, kvm_vcpu, arch.ciabr);
OFFSET(VCPU_HFLAGS, kvm_vcpu, arch.hflags);
OFFSET(VCPU_DEC, kvm_vcpu, arch.dec);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d5c6efc8a76e..2ff645789e9e 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -785,6 +785,20 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, 
unsigned long mflags,
vcpu->arch.dawr0  = value1;
vcpu->arch.dawrx0 = value2;
return H_SUCCESS;
+   case H_SET_MODE_RESOURCE_SET_DAWR1:
+   if (!kvmppc_power8_compatible(vcpu))
+   return H_P2;
+   if (!ppc_breakpoint_available())
+   return H_P2;
+   if (!cpu_has_feature(CPU_FTR_DAWR1))
+   return H_P2;
+   if (mflags)
+   return H_UNSUPPORTED_FLAG_START;
+   if (value2 & DABRX_HYP)
+   return H_P4;
+   vcpu->arch.dawr1  = value1;
+   vcpu->a

[PATCH v2 2/4] KVM: PPC: Rename current DAWR macros and variables

2020-11-24 Thread Ravi Bangoria
Power10 is introducing second DAWR. Use real register names (with
suffix 0) from ISA for current macros and variables used by kvm.
One exception is KVM_REG_PPC_DAWR. Keep it as it is because it's
uapi so changing it will break userspace.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/include/asm/kvm_host.h |  4 ++--
 arch/powerpc/kernel/asm-offsets.c   |  4 ++--
 arch/powerpc/kvm/book3s_hv.c| 24 
 arch/powerpc/kvm/book3s_hv_nested.c |  8 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 ++--
 5 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index d67a470e95a3..62cadf1a596e 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -584,8 +584,8 @@ struct kvm_vcpu_arch {
u32 ctrl;
u32 dabrx;
ulong dabr;
-   ulong dawr;
-   ulong dawrx;
+   ulong dawr0;
+   ulong dawrx0;
ulong ciabr;
ulong cfar;
ulong ppr;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 8711c2164b45..e4256f5b4602 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -547,8 +547,8 @@ int main(void)
OFFSET(VCPU_CTRL, kvm_vcpu, arch.ctrl);
OFFSET(VCPU_DABR, kvm_vcpu, arch.dabr);
OFFSET(VCPU_DABRX, kvm_vcpu, arch.dabrx);
-   OFFSET(VCPU_DAWR, kvm_vcpu, arch.dawr);
-   OFFSET(VCPU_DAWRX, kvm_vcpu, arch.dawrx);
+   OFFSET(VCPU_DAWR0, kvm_vcpu, arch.dawr0);
+   OFFSET(VCPU_DAWRX0, kvm_vcpu, arch.dawrx0);
OFFSET(VCPU_CIABR, kvm_vcpu, arch.ciabr);
OFFSET(VCPU_HFLAGS, kvm_vcpu, arch.hflags);
OFFSET(VCPU_DEC, kvm_vcpu, arch.dec);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 490a0f6a7285..d5c6efc8a76e 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -782,8 +782,8 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, 
unsigned long mflags,
return H_UNSUPPORTED_FLAG_START;
if (value2 & DABRX_HYP)
return H_P4;
-   vcpu->arch.dawr  = value1;
-   vcpu->arch.dawrx = value2;
+   vcpu->arch.dawr0  = value1;
+   vcpu->arch.dawrx0 = value2;
return H_SUCCESS;
case H_SET_MODE_RESOURCE_ADDR_TRANS_MODE:
/* KVM does not support mflags=2 (AIL=2) */
@@ -1747,10 +1747,10 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, 
u64 id,
*val = get_reg_val(id, vcpu->arch.vcore->vtb);
break;
case KVM_REG_PPC_DAWR:
-   *val = get_reg_val(id, vcpu->arch.dawr);
+   *val = get_reg_val(id, vcpu->arch.dawr0);
break;
case KVM_REG_PPC_DAWRX:
-   *val = get_reg_val(id, vcpu->arch.dawrx);
+   *val = get_reg_val(id, vcpu->arch.dawrx0);
break;
case KVM_REG_PPC_CIABR:
*val = get_reg_val(id, vcpu->arch.ciabr);
@@ -1979,10 +1979,10 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, 
u64 id,
vcpu->arch.vcore->vtb = set_reg_val(id, *val);
break;
case KVM_REG_PPC_DAWR:
-   vcpu->arch.dawr = set_reg_val(id, *val);
+   vcpu->arch.dawr0 = set_reg_val(id, *val);
break;
case KVM_REG_PPC_DAWRX:
-   vcpu->arch.dawrx = set_reg_val(id, *val) & ~DAWRX_HYP;
+   vcpu->arch.dawrx0 = set_reg_val(id, *val) & ~DAWRX_HYP;
break;
case KVM_REG_PPC_CIABR:
vcpu->arch.ciabr = set_reg_val(id, *val);
@@ -3437,8 +3437,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu 
*vcpu, u64 time_limit,
int trap;
unsigned long host_hfscr = mfspr(SPRN_HFSCR);
unsigned long host_ciabr = mfspr(SPRN_CIABR);
-   unsigned long host_dawr = mfspr(SPRN_DAWR0);
-   unsigned long host_dawrx = mfspr(SPRN_DAWRX0);
+   unsigned long host_dawr0 = mfspr(SPRN_DAWR0);
+   unsigned long host_dawrx0 = mfspr(SPRN_DAWRX0);
unsigned long host_psscr = mfspr(SPRN_PSSCR);
unsigned long host_pidr = mfspr(SPRN_PID);
 
@@ -3477,8 +3477,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu 
*vcpu, u64 time_limit,
mtspr(SPRN_SPURR, vcpu->arch.spurr);
 
if (dawr_enabled()) {
-   mtspr(SPRN_DAWR0, vcpu->arch.dawr);
-   mtspr(SPRN_DAWRX0, vcpu->arch.dawrx);
+   mtspr(SPRN_DAWR0, vcpu->arch.dawr0);
+   mtspr(SPRN_DAWRX0, vcpu->arch.dawrx0);
}
mtspr(SPRN_CIABR, vcpu->arch.ciabr);
mtspr(SPRN_IC, vcpu->arch.ic);
@@ -3530,8 +3530,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu 
*vcpu, u64 time_limit,
  (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG));
m

Re: [PATCH kernel v4 2/8] genirq/irqdomain: Clean legacy IRQ allocation

2020-11-24 Thread Alexey Kardashevskiy




On 11/24/20 8:19 PM, Andy Shevchenko wrote:

On Tue, Nov 24, 2020 at 8:20 AM Alexey Kardashevskiy  wrote:


There are 10 users of __irq_domain_alloc_irqs() and only one - IOAPIC -
passes realloc==true. There is no obvious reason for handling this
specific case in the generic code.

This splits out __irq_domain_alloc_irqs_data() to make it clear what
IOAPIC does and makes __irq_domain_alloc_irqs() cleaner.

This should cause no behavioral change.



+   ret = __irq_domain_alloc_irqs_data(domain, virq, nr_irqs, node, arg, 
affinity);
+   if (ret <= 0)
 goto out_free_desc;


Was or wasn't 0 considered as error code previously?


Oh. I need to clean this up, the idea is since this does not allocate 
IRQs, this should return error code and not an irq, I'll make this explicit.





 return virq;



  out_free_desc:
 irq_free_descs(virq, nr_irqs);
 return ret;




--
Alexey


Re: [PATCH 3/3] selftests/powerpc: Add VF recovery tests

2020-11-24 Thread Frederic Barrat




On 03/11/2020 05:45, Oliver O'Halloran wrote:


--- a/tools/testing/selftests/powerpc/eeh/eeh-functions.sh
+++ b/tools/testing/selftests/powerpc/eeh/eeh-functions.sh
@@ -135,3 +135,111 @@ eeh_one_dev() {
return 0;
  }
  
+eeh_has_driver() {

+   test -e /sys/bus/pci/devices/$1/driver;
+   return $?
+}
+
+eeh_can_recover() {
+   # we'll get an IO error if the device's current driver doesn't support
+   # error recovery
+   echo $1 > '/sys/kernel/debug/powerpc/eeh_dev_can_recover' 2>/dev/null
+
+   return $?
+}
+
+eeh_find_all_pfs() {
+   devices=""
+
+   # SR-IOV on pseries requires hypervisor support, so check for that
+   is_pseries=""
+   if grep -q pSeries /proc/cpuinfo ; then
+   if [ ! -f /proc/device-tree/rtas/ibm,open-sriov-allow-unfreeze 
] ||
+  [ ! -f /proc/device-tree/rtas/ibm,open-sriov-map-pe-number ] 
; then
+   return 1;
+   fi



Is it possible to run those tests on pseries? I haven't managed to set 
up a LPAR with a physical function which would let me enable a virtual 
function. All I could do is assign a virtual function to a LPAR. When 
assigning a physical function to the LPAR, enabling a virtual function 
fails because of missing properties in the device tree, so it looks like 
the hypervisor doesn't support it (?).


Same story on qemu.

  Fred




Re: C vdso

2020-11-24 Thread Christophe Leroy

Hi Michael,

Le 03/11/2020 à 19:13, Christophe Leroy a écrit :



Le 23/10/2020 à 15:24, Michael Ellerman a écrit :

Christophe Leroy  writes:

Le 24/09/2020 à 15:17, Christophe Leroy a écrit :

Le 17/09/2020 à 14:33, Michael Ellerman a écrit :

Christophe Leroy  writes:


What is the status with the generic C vdso merge ?
In some mail, you mentionned having difficulties getting it working on
ppc64, any progress ? What's the problem ? Can I help ?


Yeah sorry I was hoping to get time to work on it but haven't been able
to.

It's causing crashes on ppc64 ie. big endian.

...


Can you tell what defconfig you are using ? I have been able to setup a full 
glibc PPC64 cross
compilation chain and been able to test it under QEMU with success, using 
Nathan's vdsotest tool.


What config are you using ?


ppc64_defconfig + guest.config

Or pseries_defconfig.

I'm using Ubuntu GCC 9.3.0 mostly, but it happens with other toolchains too.

At a minimum we're seeing relocations in the output, which is a problem:

   $ readelf -r build\~/arch/powerpc/kernel/vdso64/vdso64.so
   Relocation section '.rela.dyn' at offset 0x12a8 contains 8 entries:
 Offset  Info   Type   Sym. Value    Sym. Name + 
Addend
   1368  0016 R_PPC64_RELATIVE 7c0
   1370  0016 R_PPC64_RELATIVE 9300
   1380  0016 R_PPC64_RELATIVE 970
   1388  0016 R_PPC64_RELATIVE 9300
   1398  0016 R_PPC64_RELATIVE a90
   13a0  0016 R_PPC64_RELATIVE 9300
   13b0  0016 R_PPC64_RELATIVE b20
   13b8  0016 R_PPC64_RELATIVE 9300


Looks like it's due to the OPD and relation between the function() and 
.function()

By using DOTSYM() in the 'bl' call, that's directly the dot function which is called and the OPD is 
not used anymore, it can get dropped.


Now I get .rela.dyn full of 0, don't know if we should drop it explicitely.



What is the status now with latest version of CVDSO ? I saw you had it in next-test for some time, 
it is not there anymore today.


Thanks,
Christophe


[PATCH V3 5/5] ocxl: Add new kernel traces

2020-11-24 Thread Christophe Lombard
Add specific kernel traces which provide information on mmu notifier and on
pages range.

Signed-off-by: Christophe Lombard 
---
 drivers/misc/ocxl/link.c  |  4 +++
 drivers/misc/ocxl/trace.h | 64 +++
 2 files changed, 68 insertions(+)

diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 129d4eddc4d2..ab039c115381 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -499,6 +499,7 @@ static void invalidate_range(struct mmu_notifier *mn,
unsigned long addr, pid, page_size = PAGE_SIZE;
 
pid = mm->context.id;
+   trace_ocxl_mmu_notifier_range(start, end, pid);
 
spin_lock(&link->atsd_lock);
for (addr = start; addr < end; addr += page_size)
@@ -590,6 +591,7 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
/* Use MMIO registers for the TLB Invalidate
 * operations.
 */
+   trace_ocxl_init_mmu_notifier(pasid, mm->context.id);
mmu_notifier_register(&pe_data->mmu_notifier, mm);
}
}
@@ -725,6 +727,8 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
} else {
if (pe_data->mm) {
if (link->arva) {
+   trace_ocxl_release_mmu_notifier(pasid,
+   
pe_data->mm->context.id);
mmu_notifier_unregister(&pe_data->mmu_notifier,
pe_data->mm);
spin_lock(&link->atsd_lock);
diff --git a/drivers/misc/ocxl/trace.h b/drivers/misc/ocxl/trace.h
index 17e21cb2addd..a33a5094ff6c 100644
--- a/drivers/misc/ocxl/trace.h
+++ b/drivers/misc/ocxl/trace.h
@@ -8,6 +8,70 @@
 
 #include 
 
+
+TRACE_EVENT(ocxl_mmu_notifier_range,
+   TP_PROTO(unsigned long start, unsigned long end, unsigned long pidr),
+   TP_ARGS(start, end, pidr),
+
+   TP_STRUCT__entry(
+   __field(unsigned long, start)
+   __field(unsigned long, end)
+   __field(unsigned long, pidr)
+   ),
+
+   TP_fast_assign(
+   __entry->start = start;
+   __entry->end = end;
+   __entry->pidr = pidr;
+   ),
+
+   TP_printk("start=0x%lx end=0x%lx pidr=0x%lx",
+   __entry->start,
+   __entry->end,
+   __entry->pidr
+   )
+);
+
+TRACE_EVENT(ocxl_init_mmu_notifier,
+   TP_PROTO(int pasid, unsigned long pidr),
+   TP_ARGS(pasid, pidr),
+
+   TP_STRUCT__entry(
+   __field(int, pasid)
+   __field(unsigned long, pidr)
+   ),
+
+   TP_fast_assign(
+   __entry->pasid = pasid;
+   __entry->pidr = pidr;
+   ),
+
+   TP_printk("pasid=%d, pidr=0x%lx",
+   __entry->pasid,
+   __entry->pidr
+   )
+);
+
+TRACE_EVENT(ocxl_release_mmu_notifier,
+   TP_PROTO(int pasid, unsigned long pidr),
+   TP_ARGS(pasid, pidr),
+
+   TP_STRUCT__entry(
+   __field(int, pasid)
+   __field(unsigned long, pidr)
+   ),
+
+   TP_fast_assign(
+   __entry->pasid = pasid;
+   __entry->pidr = pidr;
+   ),
+
+   TP_printk("pasid=%d, pidr=0x%lx",
+   __entry->pasid,
+   __entry->pidr
+   )
+);
+
 DECLARE_EVENT_CLASS(ocxl_context,
TP_PROTO(pid_t pid, void *spa, int pasid, u32 pidr, u32 tidr),
TP_ARGS(pid, spa, pasid, pidr, tidr),
-- 
2.28.0



[PATCH V3 1/5] ocxl: Assign a register set to a Logical Partition

2020-11-24 Thread Christophe Lombard
Platform specific function to assign a register set to a Logical Partition.
The "ibm,mmio-atsd" property, provided by the firmware, contains the 16
base ATSD physical addresses (ATSD0 through ATSD15) of the set of MMIO
registers (XTS MMIO ATSDx LPARID/AVA/launch/status register).

For the time being, the ATSD0 set of registers is used by default.

Signed-off-by: Christophe Lombard 
---
 arch/powerpc/include/asm/pnv-ocxl.h   |  3 ++
 arch/powerpc/platforms/powernv/ocxl.c | 45 +++
 2 files changed, 48 insertions(+)

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h 
b/arch/powerpc/include/asm/pnv-ocxl.h
index d37ededca3ee..60c3c74427d9 100644
--- a/arch/powerpc/include/asm/pnv-ocxl.h
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -28,4 +28,7 @@ int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, 
int PE_mask, void **p
 void pnv_ocxl_spa_release(void *platform_data);
 int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle);
 
+int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid,
+ uint64_t lpcr, void __iomem **arva);
+void pnv_ocxl_unmap_lpar(void __iomem *arva);
 #endif /* _ASM_PNV_OCXL_H */
diff --git a/arch/powerpc/platforms/powernv/ocxl.c 
b/arch/powerpc/platforms/powernv/ocxl.c
index ecdad219d704..57fc1062677b 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -483,3 +483,48 @@ int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, 
int pe_handle)
return rc;
 }
 EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe_from_cache);
+
+int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid,
+ uint64_t lpcr, void __iomem **arva)
+{
+   struct pci_controller *hose = pci_bus_to_host(dev->bus);
+   struct pnv_phb *phb = hose->private_data;
+   u64 mmio_atsd;
+   int rc;
+
+   /* ATSD physical address.
+* ATSD LAUNCH register: write access initiates a shoot down to
+* initiate the TLB Invalidate command.
+*/
+   rc = of_property_read_u64_index(hose->dn, "ibm,mmio-atsd",
+   0, &mmio_atsd);
+   if (rc) {
+   dev_info(&dev->dev, "No available ATSD found\n");
+   return rc;
+   }
+
+   /* Assign a register set to a Logical Partition and MMIO ATSD
+* LPARID register to the required value.
+*/
+   rc = opal_npu_map_lpar(phb->opal_id, pci_dev_id(dev),
+  lparid, lpcr);
+   if (rc) {
+   dev_err(&dev->dev, "Error mapping device to LPAR: %d\n", rc);
+   return rc;
+   }
+
+   *arva = ioremap(mmio_atsd, 24);
+   if (!(*arva)) {
+   dev_warn(&dev->dev, "ioremap failed - mmio_atsd: %#llx\n", 
mmio_atsd);
+   rc = -ENOMEM;
+   }
+
+   return rc;
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_map_lpar);
+
+void pnv_ocxl_unmap_lpar(void __iomem *arva)
+{
+   iounmap(arva);
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_unmap_lpar);
-- 
2.28.0



[PATCH V3 0/5] ocxl: Mmio invalidation support

2020-11-24 Thread Christophe Lombard
OpenCAPI 4.0/5.0 with TLBI/SLBI Snooping, is not used due to performance
problems caused by the PAU having to process all incoming TLBI/SLBI
commands which will cause them to back up on the PowerBus.

When the Address Translation Mode requires TLB operations to be initiated
using MMIO registers, a set of registers like the following is used:
• XTS MMIO ATSD0 LPARID register
• XTS MMIO ATSD0 AVA register
• XTS MMIO ATSD0 launch register, write access initiates a shoot down
• XTS MMIO ATSD0 status register

The MMIO based mechanism also blocks the NPU/PAU from snooping TLBIE
commands from the PowerBus.

The Shootdown commands (ATSD) will be generated using MMIO registers
in the NPU/PAU and sent to the device.

Signed-off-by: Christophe Lombard 

---
Changelog[v3]
 - Rebase to latest upstream.
 - Add page_size argument in pnv_ocxl_tlb_invalidate()
 - Remove double pointer
 
Changelog[v2]
 - Rebase to latest upstream.
 - Create a set of smaller patches
 - Move the device tree parsing and ioremap() for the shootdown page in a
   platform-specific file (powernv)
 - Release the shootdown page in release_xsl()
 - Initialize atsd_lock
 - Move the code to initiate the TLB Invalidate command in a
   platform-specific file (powernv)
 - Use the notifier invalidate_range
---
Christophe Lombard (5):
  ocxl: Assign a register set to a Logical Partition
  ocxl: Initiate a TLB invalidate command
  ocxl: Update the Process Element Entry
  ocxl: Add mmu notifier
  ocxl: Add new kernel traces

 arch/powerpc/include/asm/pnv-ocxl.h   |  54 
 arch/powerpc/platforms/powernv/ocxl.c | 115 ++
 drivers/misc/ocxl/context.c   |   4 +-
 drivers/misc/ocxl/link.c  |  70 +++-
 drivers/misc/ocxl/ocxl_internal.h |   9 +-
 drivers/misc/ocxl/trace.h |  64 ++
 drivers/scsi/cxlflash/ocxl_hw.c   |   6 +-
 include/misc/ocxl.h   |   2 +-
 8 files changed, 315 insertions(+), 9 deletions(-)

-- 
2.28.0



[PATCH V3 2/5] ocxl: Initiate a TLB invalidate command

2020-11-24 Thread Christophe Lombard
When a TLB Invalidate is required for the Logical Partition, the following
sequence has to be performed:

1. Load MMIO ATSD AVA register with the necessary value, if required.
2. Write the MMIO ATSD launch register to initiate the TLB Invalidate
command.
3. Poll the MMIO ATSD status register to determine when the TLB Invalidate
   has been completed.

Signed-off-by: Christophe Lombard 
---
 arch/powerpc/include/asm/pnv-ocxl.h   | 51 +++
 arch/powerpc/platforms/powernv/ocxl.c | 70 +++
 2 files changed, 121 insertions(+)

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h 
b/arch/powerpc/include/asm/pnv-ocxl.h
index 60c3c74427d9..9acd1fbf1197 100644
--- a/arch/powerpc/include/asm/pnv-ocxl.h
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -3,12 +3,59 @@
 #ifndef _ASM_PNV_OCXL_H
 #define _ASM_PNV_OCXL_H
 
+#include 
 #include 
 
 #define PNV_OCXL_TL_MAX_TEMPLATE63
 #define PNV_OCXL_TL_BITS_PER_RATE   4
 #define PNV_OCXL_TL_RATE_BUF_SIZE   ((PNV_OCXL_TL_MAX_TEMPLATE+1) * 
PNV_OCXL_TL_BITS_PER_RATE / 8)
 
+#define PNV_OCXL_ATSD_TIMEOUT  1
+
+/* TLB Management Instructions */
+#define PNV_OCXL_ATSD_LNCH 0x00
+/* Radix Invalidate */
+#define   PNV_OCXL_ATSD_LNCH_R PPC_BIT(0)
+/* Radix Invalidation Control
+ * 0b00 Just invalidate TLB.
+ * 0b01 Invalidate just Page Walk Cache.
+ * 0b10 Invalidate TLB, Page Walk Cache, and any
+ * caching of Partition and Process Table Entries.
+ */
+#define   PNV_OCXL_ATSD_LNCH_RIC   PPC_BITMASK(1, 2)
+/* Number and Page Size of translations to be invalidated */
+#define   PNV_OCXL_ATSD_LNCH_LPPPC_BITMASK(3, 10)
+/* Invalidation Criteria
+ * 0b00 Invalidate just the target VA.
+ * 0b01 Invalidate matching PID.
+ */
+#define   PNV_OCXL_ATSD_LNCH_ISPPC_BITMASK(11, 12)
+/* 0b1: Process Scope, 0b0: Partition Scope */
+#define   PNV_OCXL_ATSD_LNCH_PRS   PPC_BIT(13)
+/* Invalidation Flag */
+#define   PNV_OCXL_ATSD_LNCH_B PPC_BIT(14)
+/* Actual Page Size to be invalidated
+ * 000 4KB
+ * 101 64KB
+ * 001 2MB
+ * 010 1GB
+ */
+#define   PNV_OCXL_ATSD_LNCH_APPPC_BITMASK(15, 17)
+/* Defines the large page select
+ * L=0b0 for 4KB pages
+ * L=0b1 for large pages)
+ */
+#define   PNV_OCXL_ATSD_LNCH_L PPC_BIT(18)
+/* Process ID */
+#define   PNV_OCXL_ATSD_LNCH_PID   PPC_BITMASK(19, 38)
+/* NoFlush – Assumed to be 0b0 */
+#define   PNV_OCXL_ATSD_LNCH_F PPC_BIT(39)
+#define   PNV_OCXL_ATSD_LNCH_OCAPI_SLBIPPC_BIT(40)
+#define   PNV_OCXL_ATSD_LNCH_OCAPI_SINGLETON   PPC_BIT(41)
+#define PNV_OCXL_ATSD_AVA  0x08
+#define   PNV_OCXL_ATSD_AVA_AVAPPC_BITMASK(0, 51)
+#define PNV_OCXL_ATSD_STAT 0x10
+
 int pnv_ocxl_get_actag(struct pci_dev *dev, u16 *base, u16 *enabled, u16 
*supported);
 int pnv_ocxl_get_pasid_count(struct pci_dev *dev, int *count);
 
@@ -31,4 +78,8 @@ int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, 
int pe_handle);
 int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid,
  uint64_t lpcr, void __iomem **arva);
 void pnv_ocxl_unmap_lpar(void __iomem *arva);
+void pnv_ocxl_tlb_invalidate(void __iomem *arva,
+unsigned long pid,
+unsigned long addr,
+unsigned long page_size);
 #endif /* _ASM_PNV_OCXL_H */
diff --git a/arch/powerpc/platforms/powernv/ocxl.c 
b/arch/powerpc/platforms/powernv/ocxl.c
index 57fc1062677b..f665846d2b28 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -528,3 +528,73 @@ void pnv_ocxl_unmap_lpar(void __iomem *arva)
iounmap(arva);
 }
 EXPORT_SYMBOL_GPL(pnv_ocxl_unmap_lpar);
+
+void pnv_ocxl_tlb_invalidate(void __iomem *arva,
+unsigned long pid,
+unsigned long addr,
+unsigned long page_size)
+{
+   unsigned long timeout = jiffies + (HZ * PNV_OCXL_ATSD_TIMEOUT);
+   u64 val = 0ull;
+   int pend;
+   u8 size;
+
+   if (!(arva))
+   return;
+
+   if (addr) {
+   /* load Abbreviated Virtual Address register with
+* the necessary value
+*/
+   val |= FIELD_PREP(PNV_OCXL_ATSD_AVA_AVA, addr >> (63-51));
+   out_be64(arva + PNV_OCXL_ATSD_AVA, val);
+   }
+
+   /* Write access initiates a shoot down to initiate the
+* TLB Invalidate command
+*/
+   val = PNV_OCXL_ATSD_LNCH_R;
+   if (addr) {
+   val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_RIC, 0b00);
+   val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_IS, 0b00);
+   } else {
+   val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_RIC, 0b10);
+   val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_IS, 0b01);
+   val |= PNV_OCXL_ATSD_LNCH_OCAPI_SINGLETON;
+   }
+   val |= PNV_OCXL_ATSD_LNCH_PRS;
+   /* Actual

[PATCH V3 4/5] ocxl: Add mmu notifier

2020-11-24 Thread Christophe Lombard
Add invalidate_range mmu notifier, when required (ATSD access of MMIO
registers is available), to initiate TLB invalidation commands.
For the time being, the ATSD0 set of registers is used by default.

The pasid and bdf values have to be configured in the Process Element
Entry.
The PEE must be set up to match the BDF/PASID of the AFU.

Signed-off-by: Christophe Lombard 
---
 drivers/misc/ocxl/link.c | 62 +++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 77381dda2c45..129d4eddc4d2 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -2,8 +2,10 @@
 // Copyright 2017 IBM Corp.
 #include 
 #include 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -33,6 +35,7 @@
 
 #define SPA_PE_VALID   0x8000
 
+struct ocxl_link;
 
 struct pe_data {
struct mm_struct *mm;
@@ -41,6 +44,8 @@ struct pe_data {
/* opaque pointer to be passed to the above callback */
void *xsl_err_data;
struct rcu_head rcu;
+   struct ocxl_link *link;
+   struct mmu_notifier mmu_notifier;
 };
 
 struct spa {
@@ -83,6 +88,8 @@ struct ocxl_link {
int domain;
int bus;
int dev;
+   void __iomem *arva; /* ATSD register virtual address */
+   spinlock_t atsd_lock;   /* to serialize shootdowns */
atomic_t irq_available;
struct spa *spa;
void *platform_data;
@@ -388,6 +395,7 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, 
struct ocxl_link **out_l
link->bus = dev->bus->number;
link->dev = PCI_SLOT(dev->devfn);
atomic_set(&link->irq_available, MAX_IRQ_PER_LINK);
+   spin_lock_init(&link->atsd_lock);
 
rc = alloc_spa(dev, link);
if (rc)
@@ -403,6 +411,13 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, 
struct ocxl_link **out_l
if (rc)
goto err_xsl_irq;
 
+   /* if link->arva is not defeined, MMIO registers are not used to
+* generate TLB invalidate. PowerBus snooping is enabled.
+* Otherwise, PowerBus snooping is disabled. TLB Invalidates are
+* initiated using MMIO registers.
+*/
+   pnv_ocxl_map_lpar(dev, mfspr(SPRN_LPID), 0, &link->arva);
+
*out_link = link;
return 0;
 
@@ -454,6 +469,11 @@ static void release_xsl(struct kref *ref)
 {
struct ocxl_link *link = container_of(ref, struct ocxl_link, ref);
 
+   if (link->arva) {
+   pnv_ocxl_unmap_lpar(link->arva);
+   link->arva = NULL;
+   }
+
list_del(&link->list);
/* call platform code before releasing data */
pnv_ocxl_spa_release(link->platform_data);
@@ -470,6 +490,26 @@ void ocxl_link_release(struct pci_dev *dev, void 
*link_handle)
 }
 EXPORT_SYMBOL_GPL(ocxl_link_release);
 
+static void invalidate_range(struct mmu_notifier *mn,
+struct mm_struct *mm,
+unsigned long start, unsigned long end)
+{
+   struct pe_data *pe_data = container_of(mn, struct pe_data, 
mmu_notifier);
+   struct ocxl_link *link = pe_data->link;
+   unsigned long addr, pid, page_size = PAGE_SIZE;
+
+   pid = mm->context.id;
+
+   spin_lock(&link->atsd_lock);
+   for (addr = start; addr < end; addr += page_size)
+   pnv_ocxl_tlb_invalidate(link->arva, pid, addr, page_size);
+   spin_unlock(&link->atsd_lock);
+}
+
+static const struct mmu_notifier_ops ocxl_mmu_notifier_ops = {
+   .invalidate_range = invalidate_range,
+};
+
 static u64 calculate_cfg_state(bool kernel)
 {
u64 state;
@@ -526,6 +566,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
pe_data->mm = mm;
pe_data->xsl_err_cb = xsl_err_cb;
pe_data->xsl_err_data = xsl_err_data;
+   pe_data->link = link;
+   pe_data->mmu_notifier.ops = &ocxl_mmu_notifier_ops;
 
memset(pe, 0, sizeof(struct ocxl_process_element));
pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0));
@@ -542,8 +584,16 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
 * by the nest MMU. If we have a kernel context, TLBIs are
 * already global.
 */
-   if (mm)
+   if (mm) {
mm_context_add_copro(mm);
+   if (link->arva) {
+   /* Use MMIO registers for the TLB Invalidate
+* operations.
+*/
+   mmu_notifier_register(&pe_data->mmu_notifier, mm);
+   }
+   }
+
/*
 * Barrier is to make sure PE is visible in the SPA before it
 * is used by the device. It also helps with the global TLBI
@@ -674,6 +724,16 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
WARN(1, "Couldn't find pe data when removing PE\n");
} else {
  

[PATCH V3 3/5] ocxl: Update the Process Element Entry

2020-11-24 Thread Christophe Lombard
To complete the MMIO based mechanism, the fields: PASID, bus, device and
function of the Process Element Entry have to be filled. (See
OpenCAPI Power Platform Architecture document)

   Hypervisor Process Element Entry
Word
0 1  7  8  .. 12  13 ..15  16 19  20 ... 31
0  OSL Configuration State (0:31)
1  OSL Configuration State (32:63)
2   PASID  |Reserved
3   Bus   |   Device|Function |Reserved
4 Reserved
5 Reserved
6   

Signed-off-by: Christophe Lombard 
---
 drivers/misc/ocxl/context.c   | 4 +++-
 drivers/misc/ocxl/link.c  | 4 +++-
 drivers/misc/ocxl/ocxl_internal.h | 9 ++---
 drivers/scsi/cxlflash/ocxl_hw.c   | 6 --
 include/misc/ocxl.h   | 2 +-
 5 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index c21f65a5c762..9eb0d93b01c6 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -70,6 +70,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr, 
struct mm_struct *mm)
 {
int rc;
unsigned long pidr = 0;
+   struct pci_dev *dev;
 
// Locks both status & tidr
mutex_lock(&ctx->status_mutex);
@@ -81,8 +82,9 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr, 
struct mm_struct *mm)
if (mm)
pidr = mm->context.id;
 
+   dev = to_pci_dev(ctx->afu->fn->dev.parent);
rc = ocxl_link_add_pe(ctx->afu->fn->link, ctx->pasid, pidr, ctx->tidr,
- amr, mm, xsl_fault_error, ctx);
+ amr, pci_dev_id(dev), mm, xsl_fault_error, ctx);
if (rc)
goto out;
 
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index fd73d3bc0eb6..77381dda2c45 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -494,7 +494,7 @@ static u64 calculate_cfg_state(bool kernel)
 }
 
 int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
-   u64 amr, struct mm_struct *mm,
+   u64 amr, u16 bdf, struct mm_struct *mm,
void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
void *xsl_err_data)
 {
@@ -529,6 +529,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
 
memset(pe, 0, sizeof(struct ocxl_process_element));
pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0));
+   pe->pasid = cpu_to_be32(pasid << (31 - 19));
+   pe->bdf = cpu_to_be16(bdf);
pe->lpid = cpu_to_be32(mfspr(SPRN_LPID));
pe->pid = cpu_to_be32(pidr);
pe->tid = cpu_to_be32(tidr);
diff --git a/drivers/misc/ocxl/ocxl_internal.h 
b/drivers/misc/ocxl/ocxl_internal.h
index 0bad0a123af6..10125a22d5a5 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -84,13 +84,16 @@ struct ocxl_context {
 
 struct ocxl_process_element {
__be64 config_state;
-   __be32 reserved1[11];
+   __be32 pasid;
+   __be16 bdf;
+   __be16 reserved1;
+   __be32 reserved2[9];
__be32 lpid;
__be32 tid;
__be32 pid;
-   __be32 reserved2[10];
+   __be32 reserved3[10];
__be64 amr;
-   __be32 reserved3[3];
+   __be32 reserved4[3];
__be32 software_state;
 };
 
diff --git a/drivers/scsi/cxlflash/ocxl_hw.c b/drivers/scsi/cxlflash/ocxl_hw.c
index e4e0d767b98e..244fc27215dc 100644
--- a/drivers/scsi/cxlflash/ocxl_hw.c
+++ b/drivers/scsi/cxlflash/ocxl_hw.c
@@ -329,6 +329,7 @@ static int start_context(struct ocxlflash_context *ctx)
struct ocxl_hw_afu *afu = ctx->hw_afu;
struct ocxl_afu_config *acfg = &afu->acfg;
void *link_token = afu->link_token;
+   struct pci_dev *pdev = afu->pdev;
struct device *dev = afu->dev;
bool master = ctx->master;
struct mm_struct *mm;
@@ -360,8 +361,9 @@ static int start_context(struct ocxlflash_context *ctx)
mm = current->mm;
}
 
-   rc = ocxl_link_add_pe(link_token, ctx->pe, pid, 0, 0, mm,
- ocxlflash_xsl_fault, ctx);
+   rc = ocxl_link_add_pe(link_token, ctx->pe, pid, 0, 0,
+ pci_dev_id(pdev), mm, ocxlflash_xsl_fault,
+ ctx);
if (unlikely(rc)) {
dev_err(dev, "%s: ocxl_link_add_pe failed rc=%d\n",
__func__, rc);
diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index e013736e275d..3ed736da02c8 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -447,7 +447,7 @@ void ocxl_link_release(struct pci_dev *dev, void 
*link_handle);
  * defined
  */
 int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
-   u64 amr, struct mm_struct *mm,
+   u64 amr, u16

Re: [PATCH v1 0/2] Use H_RPT_INVALIDATE for nested guest

2020-11-24 Thread Bharata B Rao
Hi,

Any comments on this patchset? Anything specific to be addressed
before it could be considered for inclusion?

Regards,
Bharata.

On Mon, Oct 19, 2020 at 04:56:40PM +0530, Bharata B Rao wrote:
> This patchset adds support for the new hcall H_RPT_INVALIDATE
> (currently handles nested case only) and replaces the nested tlb flush
> calls with this new hcall if the support for the same exists.
> 
> Changes in v1:
> -
> - Removed the bits that added the FW_FEATURE_RPT_INVALIDATE feature
>   as they are already upstream.
> 
> v0: 
> https://lore.kernel.org/linuxppc-dev/20200703104420.21349-1-bhar...@linux.ibm.com/T/#m1800c5f5b3d4f6a154ae58fc1c617c06f286358f
> 
> H_RPT_INVALIDATE
> 
> Syntax:
> int64   /* H_Success: Return code on successful completion */
>     /* H_Busy - repeat the call with the same */
>     /* H_Parameter, H_P2, H_P3, H_P4, H_P5 : Invalid parameters */
>     hcall(const uint64 H_RPT_INVALIDATE, /* Invalidate RPT translation 
> lookaside information */
>   uint64 pid,   /* PID/LPID to invalidate */
>   uint64 target,    /* Invalidation target */
>   uint64 type,  /* Type of lookaside information */
>   uint64 pageSizes, /* Page sizes */
>   uint64 start, /* Start of Effective Address (EA) range 
> (inclusive) */
>   uint64 end)   /* End of EA range (exclusive) */
> 
> Invalidation targets (target)
> -
> Core MMU    0x01 /* All virtual processors in the partition */
> Core local MMU  0x02 /* Current virtual processor */
> Nest MMU    0x04 /* All nest/accelerator agents in use by the partition */
> 
> A combination of the above can be specified, except core and core local.
> 
> Type of translation to invalidate (type)
> ---
> NESTED   0x0001  /* Invalidate nested guest partition-scope */
> TLB  0x0002  /* Invalidate TLB */
> PWC  0x0004  /* Invalidate Page Walk Cache */
> PRT  0x0008  /* Invalidate Process Table Entries if NESTED is clear */
> PAT  0x0008  /* Invalidate Partition Table Entries if NESTED is set */
> 
> A combination of the above can be specified.
> 
> Page size mask (pageSizes)
> --
> 4K  0x01
> 64K 0x02
> 2M  0x04
> 1G  0x08
> All sizes   (-1UL)
> 
> A combination of the above can be specified.
> All page sizes can be selected with -1.
> 
> Semantics: Invalidate radix tree lookaside information
>    matching the parameters given.
> * Return H_P2, H_P3 or H_P4 if target, type, or pageSizes parameters are
>   different from the defined values.
> * Return H_PARAMETER if NESTED is set and pid is not a valid nested
>   LPID allocated to this partition
> * Return H_P5 if (start, end) doesn't form a valid range. Start and end
>   should be a valid Quadrant address and  end > start.
> * Return H_NotSupported if the partition is not in running in radix
>   translation mode.
> * May invalidate more translation information than requested.
> * If start = 0 and end = -1, set the range to cover all valid addresses.
>   Else start and end should be aligned to 4kB (lower 11 bits clear).
> * If NESTED is clear, then invalidate process scoped lookaside information.
>   Else pid specifies a nested LPID, and the invalidation is performed
>   on nested guest partition table and nested guest partition scope real
>   addresses.
> * If pid = 0 and NESTED is clear, then valid addresses are quadrant 3 and
>   quadrant 0 spaces, Else valid addresses are quadrant 0.
> * Pages which are fully covered by the range are to be invalidated.
>   Those which are partially covered are considered outside invalidation
>   range, which allows a caller to optimally invalidate ranges that may
>   contain mixed page sizes.
> * Return H_SUCCESS on success.
> 
> Bharata B Rao (2):
>   KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE (nested case
> only)
>   KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM
> 
>  Documentation/virt/kvm/api.rst|  17 +++
>  .../include/asm/book3s/64/tlbflush-radix.h|  18 +++
>  arch/powerpc/include/asm/kvm_book3s.h |   3 +
>  arch/powerpc/kvm/book3s_64_mmu_radix.c|  26 -
>  arch/powerpc/kvm/book3s_hv.c  |  32 ++
>  arch/powerpc/kvm/book3s_hv_nested.c   | 107 +-
>  arch/powerpc/kvm/powerpc.c|   3 +
>  arch/powerpc/mm/book3s64/radix_tlb.c  |   4 -
>  include/uapi/linux/kvm.h  |   1 +
>  9 files changed, 200 insertions(+), 11 deletions(-)
> 
> -- 
> 2.26.2


Re: [PATCH kernel v4 2/8] genirq/irqdomain: Clean legacy IRQ allocation

2020-11-24 Thread Andy Shevchenko
On Tue, Nov 24, 2020 at 8:20 AM Alexey Kardashevskiy  wrote:
>
> There are 10 users of __irq_domain_alloc_irqs() and only one - IOAPIC -
> passes realloc==true. There is no obvious reason for handling this
> specific case in the generic code.
>
> This splits out __irq_domain_alloc_irqs_data() to make it clear what
> IOAPIC does and makes __irq_domain_alloc_irqs() cleaner.
>
> This should cause no behavioral change.

> +   ret = __irq_domain_alloc_irqs_data(domain, virq, nr_irqs, node, arg, 
> affinity);
> +   if (ret <= 0)
> goto out_free_desc;

Was or wasn't 0 considered as error code previously?

> return virq;

>  out_free_desc:
> irq_free_descs(virq, nr_irqs);
> return ret;

-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH V2 4/5] ocxl: Add mmu notifier

2020-11-24 Thread Christoph Hellwig
You probably want to add Jason for an audit of new notifier uses.

On Fri, Nov 20, 2020 at 06:32:40PM +0100, Christophe Lombard wrote:
> Add invalidate_range mmu notifier, when required (ATSD access of MMIO
> registers is available), to initiate TLB invalidation commands.
> For the time being, the ATSD0 set of registers is used by default.
> 
> The pasid and bdf values have to be configured in the Process Element
> Entry.
> The PEE must be set up to match the BDF/PASID of the AFU.
> 
> Signed-off-by: Christophe Lombard 
> ---
>  drivers/misc/ocxl/link.c | 58 +++-
>  1 file changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> index 20444db8a2bb..100bdfe9ec37 100644
> --- a/drivers/misc/ocxl/link.c
> +++ b/drivers/misc/ocxl/link.c
> @@ -2,8 +2,10 @@
>  // Copyright 2017 IBM Corp.
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -33,6 +35,7 @@
>  
>  #define SPA_PE_VALID 0x8000
>  
> +struct ocxl_link;
>  
>  struct pe_data {
>   struct mm_struct *mm;
> @@ -41,6 +44,8 @@ struct pe_data {
>   /* opaque pointer to be passed to the above callback */
>   void *xsl_err_data;
>   struct rcu_head rcu;
> + struct ocxl_link *link;
> + struct mmu_notifier mmu_notifier;
>  };
>  
>  struct spa {
> @@ -83,6 +88,8 @@ struct ocxl_link {
>   int domain;
>   int bus;
>   int dev;
> + void __iomem *arva; /* ATSD register virtual address */
> + spinlock_t atsd_lock;   /* to serialize shootdowns */
>   atomic_t irq_available;
>   struct spa *spa;
>   void *platform_data;
> @@ -403,6 +410,11 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, 
> struct ocxl_link **out_l
>   if (rc)
>   goto err_xsl_irq;
>  
> + rc = pnv_ocxl_map_lpar(dev, mfspr(SPRN_LPID), 0,
> +   &link->arva);
> + if (!rc)
> + spin_lock_init(&link->atsd_lock);
> +
>   *out_link = link;
>   return 0;
>  
> @@ -454,6 +466,11 @@ static void release_xsl(struct kref *ref)
>  {
>   struct ocxl_link *link = container_of(ref, struct ocxl_link, ref);
>  
> + if (link->arva) {
> + pnv_ocxl_unmap_lpar(&link->arva);
> + link->arva = NULL;
> + }
> +
>   list_del(&link->list);
>   /* call platform code before releasing data */
>   pnv_ocxl_spa_release(link->platform_data);
> @@ -470,6 +487,26 @@ void ocxl_link_release(struct pci_dev *dev, void 
> *link_handle)
>  }
>  EXPORT_SYMBOL_GPL(ocxl_link_release);
>  
> +static void invalidate_range(struct mmu_notifier *mn,
> +  struct mm_struct *mm,
> +  unsigned long start, unsigned long end)
> +{
> + struct pe_data *pe_data = container_of(mn, struct pe_data, 
> mmu_notifier);
> + struct ocxl_link *link = pe_data->link;
> + unsigned long addr, pid, page_size = PAGE_SIZE;
> +
> + pid = mm->context.id;
> +
> + spin_lock(&link->atsd_lock);
> + for (addr = start; addr < end; addr += page_size)
> + pnv_ocxl_tlb_invalidate(&link->arva, pid, addr);
> + spin_unlock(&link->atsd_lock);
> +}
> +
> +static const struct mmu_notifier_ops ocxl_mmu_notifier_ops = {
> + .invalidate_range = invalidate_range,
> +};
> +
>  static u64 calculate_cfg_state(bool kernel)
>  {
>   u64 state;
> @@ -526,6 +563,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
> pidr, u32 tidr,
>   pe_data->mm = mm;
>   pe_data->xsl_err_cb = xsl_err_cb;
>   pe_data->xsl_err_data = xsl_err_data;
> + pe_data->link = link;
> + pe_data->mmu_notifier.ops = &ocxl_mmu_notifier_ops;
>  
>   memset(pe, 0, sizeof(struct ocxl_process_element));
>   pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0));
> @@ -542,8 +581,16 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
> pidr, u32 tidr,
>* by the nest MMU. If we have a kernel context, TLBIs are
>* already global.
>*/
> - if (mm)
> + if (mm) {
>   mm_context_add_copro(mm);
> + if (link->arva) {
> + /* Use MMIO registers for the TLB Invalidate
> +  * operations.
> +  */
> + mmu_notifier_register(&pe_data->mmu_notifier, mm);
> + }
> + }
> +
>   /*
>* Barrier is to make sure PE is visible in the SPA before it
>* is used by the device. It also helps with the global TLBI
> @@ -674,6 +721,15 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
>   WARN(1, "Couldn't find pe data when removing PE\n");
>   } else {
>   if (pe_data->mm) {
> + if (link->arva) {
> + mmu_notifier_unregister(&pe_data->mmu_notifier,
> + pe_data->mm);