Hello to all, and thank you very much for first and second fast response.

I do not have a long history on PowerPC MMU environment, I hacked into this 
topic
for about 3 months for analyzing that problem- so, sorry, if I am wrong in some 
points ...

What I learn so far from this MPC5121e (variant of e300c4 core):
- It uses book3s32 hash-code, but it DOES NOT provide KEY hash method, so 
always the 
   branch  "if (! Hash) ...." is taken, so, I assume that "key 0" and "key 1" 
setups are not
   used on this CPU (not supporting MMU_FTR_HPTE_TABLE)
- The PP bits are NOT checked by the CPU in HW, even if set to 00, the CPU does 
not react.
   As far I have understood, the TLB miss routines are responsible for checking 
permissions.
   The TLB miss routines check the Linux PTE styled entries and generates the 
PP bits
   for the TLB entry. The PowerPC PP bits are never check elsewhere on that CPU 
models ...
- The PTE entries in Linux are fully "void" in sense of this CPU type, as this 
CPU does not
   read any PTEs from RAM (no HW support in contrast to x86 or ARM or later 
ppc...).

In summary - as far as I understand it now - we have to handle the PTE bits 
differently
(Linux style) for PROT_NONE permissions - OR - we have to expand the permission 
checking like my proposed experimental patch. (PROT_NONE is not NUMA related 
only,
but may not used very often ...).

Another related point:
According e300 RM (manual) the ACCESSED bit in the PTE shall be set on TLB 
miss, as
it is an indication, that page is used. In 4.4 kernel this write back of the 
_PAGE_ACCESSED 
bit was performed after successful permission check:

        bne-    DataAddressInvalid      /* return if access not permitted */
        ori     r0,r0,_PAGE_ACCESSED    /* set _PAGE_ACCESSED in pte */
        /*
         * NOTE! We are assuming this is not an SMP system, otherwise
         * we would need to update the pte atomically with lwarx/stwcx.
         */
        stw     r0,0(r2)                /* update PTE (accessed bit) */
        /* Convert linux-style PTE to low word of PPC-style PTE */

Bit is set (ori ...) and written back (stw ...) to Linux PTE. May be, this is 
not needed, as the
PTE is never seen by the PPC chip. But I do not understand, WHY the 
PAGE_ACCCESSED 
is used for permission check in the late 5.4 kernel (not used in 4.4 kernel):

        cmplw   0,r1,r3
        mfspr   r2, SPRN_SDR1
        li      r1, _PAGE_PRESENT | _PAGE_ACCESSED
        rlwinm  r2, r2, 28, 0xfffff000
        bgt-    112f

What is the reason or relevance for checking this here ?
Was not checked in 4.4, bit or-ed afterwards, as it is accessed now.
Do you know the reason of change on this point ?

Another remark to Core manual relevant for this:
There is the reference manual for e300 core available (e300 RM). It includes
many remarks in range of Memory Management section, that many features
are optional or variable for dedicated implementations. On the other hand, 
the MPC5121e reference manual refers to the e300 core RM, but DOES NOT 
information, which of the optional points are there or nor. According my
analysis, MPC5121e does not include any of the optional features.


Thanks a lot for first reactions
Christoph


-----Original Message-----
From: Christophe Leroy <christophe.le...@csgroup.eu> 
Sent: Montag, 1. Februar 2021 07:30
To: Benjamin Herrenschmidt <b...@kernel.crashing.org>; Paul Mackerras 
<pau...@samba.org>; Michael Ellerman <m...@ellerman.id.au>; PLATTNER Christoph 
<christoph.platt...@thalesgroup.com>
Cc: linux-kernel@vger.kernel.org; linuxppc-...@lists.ozlabs.org
Subject: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE

On book3s/32, page protection is defined by the PP bits in the PTE which 
provide the following protection depending on the access keys defined in the 
matching segment register:
- PP 00 means RW with key 0 and N/A with key 1.
- PP 01 means RW with key 0 and RO with key 1.
- PP 10 means RW with both key 0 and key 1.
- PP 11 means RO with both key 0 and key 1.

Since the implementation of kernel userspace access protection, PP bits have 
been set as follows:
- PP00 for pages without _PAGE_USER
- PP01 for pages with _PAGE_USER and _PAGE_RW
- PP11 for pages with _PAGE_USER and without _PAGE_RW

For kernelspace segments, kernel accesses are performed with key 0 and user 
accesses are performed with key 1. As PP00 is used for non _PAGE_USER pages, 
user can't access kernel pages not flagged _PAGE_USER while kernel can.

For userspace segments, both kernel and user accesses are performed with key 0, 
therefore pages not flagged _PAGE_USER are still accessible to the user.

This shouldn't be an issue, because userspace is expected to be accessible to 
the user. But unlike most other architectures, powerpc implements PROT_NONE 
protection by removing _PAGE_USER flag instead of flagging the page as not 
valid. This means that pages in userspace that are not flagged _PAGE_USER shall 
remain inaccessible.

To get the expected behaviour, just mimic other architectures in the TLB miss 
handler by checking _PAGE_USER permission on userspace accesses as if it was 
the _PAGE_PRESENT bit.

Note that this problem only is only for 603 cores. The 604+ have an hash table, 
and hash_page() function already implement the verification of _PAGE_USER 
permission on userspace pages.

Reported-by: Christoph Plattner <christoph.platt...@thalesgroup.com>
Fixes: f342adca3afc ("powerpc/32s: Prepare Kernel Userspace Access Protection")
Cc: sta...@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.le...@csgroup.eu>
---
 arch/powerpc/kernel/head_book3s_32.S | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/head_book3s_32.S 
b/arch/powerpc/kernel/head_book3s_32.S
index 858fbc8b19f3..0004e8a6a58e 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -453,11 +453,12 @@ InstructionTLBMiss:
        cmplw   0,r1,r3
 #endif
        mfspr   r2, SPRN_SDR1
-       li      r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
+       li      r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC | _PAGE_USER
        rlwinm  r2, r2, 28, 0xfffff000
 #ifdef CONFIG_MODULES
        bgt-    112f
        lis     r2, (swapper_pg_dir - PAGE_OFFSET)@ha   /* if kernel address, 
use */
+       li      r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
        addi    r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l        /* kernel page 
table */
 #endif
 112:   rlwimi  r2,r3,12,20,29          /* insert top 10 bits of address */
@@ -516,10 +517,11 @@ DataLoadTLBMiss:
        lis     r1, TASK_SIZE@h         /* check if kernel address */
        cmplw   0,r1,r3
        mfspr   r2, SPRN_SDR1
-       li      r1, _PAGE_PRESENT | _PAGE_ACCESSED
+       li      r1, _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_USER
        rlwinm  r2, r2, 28, 0xfffff000
        bgt-    112f
        lis     r2, (swapper_pg_dir - PAGE_OFFSET)@ha   /* if kernel address, 
use */
+       li      r1, _PAGE_PRESENT | _PAGE_ACCESSED
        addi    r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l        /* kernel page 
table */
 112:   rlwimi  r2,r3,12,20,29          /* insert top 10 bits of address */
        lwz     r2,0(r2)                /* get pmd entry */
@@ -593,10 +595,11 @@ DataStoreTLBMiss:
        lis     r1, TASK_SIZE@h         /* check if kernel address */
        cmplw   0,r1,r3
        mfspr   r2, SPRN_SDR1
-       li      r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED
+       li      r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED | 
_PAGE_USER
        rlwinm  r2, r2, 28, 0xfffff000
        bgt-    112f
        lis     r2, (swapper_pg_dir - PAGE_OFFSET)@ha   /* if kernel address, 
use */
+       li      r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED
        addi    r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l        /* kernel page 
table */
 112:   rlwimi  r2,r3,12,20,29          /* insert top 10 bits of address */
        lwz     r2,0(r2)                /* get pmd entry */
--
2.25.0

Reply via email to