RE: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE

2021-03-15 Thread PLATTNER Christoph


Thank you for maintenance and for following this request.

Regards
Christoph

-Original Message-
From: Michael Ellerman  
Sent: Sonntag, 14. März 2021 11:01
To: Michael Ellerman ; Paul Mackerras ; 
Benjamin Herrenschmidt ; PLATTNER Christoph 
; Christophe Leroy 

Cc: linux-kernel@vger.kernel.org; linuxppc-...@lists.ozlabs.org
Subject: Re: [PATCH] powerpc/603: Fix protection of user pages mapped with 
PROT_NONE

On Mon, 1 Feb 2021 06:29:50 + (UTC), Christophe Leroy wrote:
> On book3s/32, page protection is defined by the PP bits in the PTE 
> which provide the following protection depending on the access keys 
> defined in the matching segment register:
> - PP 00 means RW with key 0 and N/A with key 1.
> - PP 01 means RW with key 0 and RO with key 1.
> - PP 10 means RW with both key 0 and key 1.
> - PP 11 means RO with both key 0 and key 1.
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/603: Fix protection of user pages mapped with PROT_NONE
  https://git.kernel.org/powerpc/c/c119565a15a628efdfa51352f9f6c5186e506a1c

cheers


Re: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE

2021-03-14 Thread Michael Ellerman
On Mon, 1 Feb 2021 06:29:50 + (UTC), Christophe Leroy wrote:
> On book3s/32, page protection is defined by the PP bits in the PTE
> which provide the following protection depending on the access
> keys defined in the matching segment register:
> - PP 00 means RW with key 0 and N/A with key 1.
> - PP 01 means RW with key 0 and RO with key 1.
> - PP 10 means RW with both key 0 and key 1.
> - PP 11 means RO with both key 0 and key 1.
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/603: Fix protection of user pages mapped with PROT_NONE
  https://git.kernel.org/powerpc/c/c119565a15a628efdfa51352f9f6c5186e506a1c

cheers


RE: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE

2021-02-01 Thread PLATTNER Christoph
Thank you very much, I appreciate your fast responses.
Thank you also for clarification, I did completely oversee
the permission settings in the segment setup and expected
the fault reaction on the PP bits in the TLB.
And I will re-read the chapters, got get deeper into this topic.

Greetings
Christoph 


-Original Message-
From: Christophe Leroy  
Sent: Montag, 1. Februar 2021 12:39
To: PLATTNER Christoph ; Benjamin 
Herrenschmidt ; Paul Mackerras ; 
Michael Ellerman 
Cc: linux-kernel@vger.kernel.org; linuxppc-...@lists.ozlabs.org; HAMETNER 
Reinhard ; REITHER Robert - Contractor 
; KOENIG Werner 

Subject: Re: [PATCH] powerpc/603: Fix protection of user pages mapped with 
PROT_NONE



Le 01/02/2021 à 11:22, PLATTNER Christoph a écrit :
> Hello to all, and thank you very much for first and second fast response.
> 
> I do not have a long history on PowerPC MMU environment, I hacked into 
> this topic for about 3 months for analyzing that problem- so, sorry, if I am 
> wrong in some points ...

Yes you are wrong on some points, sorry, see below.


> 
> What I learn so far from this MPC5121e (variant of e300c4 core):
> - It uses book3s32 hash-code, but it DOES NOT provide KEY hash method, so 
> always the
> branch  "if (! Hash) " is taken, so, I assume that "key 0" and "key 
> 1" setups are not
> used on this CPU (not supporting MMU_FTR_HPTE_TABLE)

hash method is not used, this is SW TLB loading that is used, but still, all 
the PP and Ks/Kp keys defined in the segment register are used, see e300 core 
reference manual §6.4.2 Page Memory Protection

> - The PP bits are NOT checked by the CPU in HW, even if set to 00, the CPU 
> does not react.
> As far I have understood, the TLB miss routines are responsible for 
> checking permissions.
> The TLB miss routines check the Linux PTE styled entries and generates 
> the PP bits
> for the TLB entry. The PowerPC PP bits are never check elsewhere on that 
> CPU models ...

PP bits ARE checked hoppefully. If it was not the case, then the TLB miss 
routines would install a TLB on a read, then the user could do a write without 
any verification being done ?

Refer to e300 Core reference Manual, §6.1.4 Memory Protection Facilities

As I explained in the patch, the problem is not that the HW doesn't check the 
permission. It is that user accessed been done with key 0 as programmed in the 
segment registers, PP 00 means RW access.

> - The PTE entries in Linux are fully "void" in sense of this CPU type, as 
> this CPU does not
> read any PTEs from RAM (no HW support in contrast to x86 or ARM or later 
> ppc...).

No, the PTE are read by the TLB miss exception handlers and writen into TLB 
entries.

> 
> In summary - as far as I understand it now - we have to handle the PTE 
> bits differently (Linux style) for PROT_NONE permissions - OR - we 
> have to expand the permission checking like my proposed experimental 
> patch. (PROT_NONE is not NUMA related only, but may not used very often ...).

Yes, expanding the permission checking is the easiest solution, hence the patch 
I sent out based on your proposal.

> 
> Another related point:
> According e300 RM (manual) the ACCESSED bit in the PTE shall be set on 
> TLB miss, as it is an indication, that page is used. In 4.4 kernel 
> this write back of the _PAGE_ACCESSED bit was performed after successful 
> permission check:
> 
>  bne-DataAddressInvalid  /* return if access not permitted */
>  ori r0,r0,_PAGE_ACCESSED/* set _PAGE_ACCESSED in pte */
>  /*
>   * NOTE! We are assuming this is not an SMP system, otherwise
>   * we would need to update the pte atomically with lwarx/stwcx.
>   */
>  stw r0,0(r2)/* update PTE (accessed bit) */
>  /* Convert linux-style PTE to low word of PPC-style PTE */
> 
> Bit is set (ori ...) and written back (stw ...) to Linux PTE. May be, 
> this is not needed, as the PTE is never seen by the PPC chip. But I do 
> not understand, WHY the PAGE_ACCCESSED is used for permission check in the 
> late 5.4 kernel (not used in 4.4 kernel):
> 
>   cmplw   0,r1,r3
>   mfspr   r2, SPRN_SDR1
>   li  r1, _PAGE_PRESENT | _PAGE_ACCESSED
>   rlwinm  r2, r2, 28, 0xf000
>   bgt-112f
> 
> What is the reason or relevance for checking this here ?
> Was not checked in 4.4, bit or-ed afterwards, as it is accessed now.
> Do you know the reason of change on this point ?

PAGE_ACCESSED is important for memory management, linux kernel need it.

But instead of spending time at every miss to perform a write which will be a 
no-op in 99% of cases, we prefer bailing out to the page_fault logic when the 
accessed

Re: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE

2021-02-01 Thread Christophe Leroy




Le 01/02/2021 à 11:22, PLATTNER Christoph a écrit :

Hello to all, and thank you very much for first and second fast response.

I do not have a long history on PowerPC MMU environment, I hacked into this 
topic
for about 3 months for analyzing that problem- so, sorry, if I am wrong in some 
points ...


Yes you are wrong on some points, sorry, see below.




What I learn so far from this MPC5121e (variant of e300c4 core):
- It uses book3s32 hash-code, but it DOES NOT provide KEY hash method, so 
always the
branch  "if (! Hash) " is taken, so, I assume that "key 0" and "key 1" 
setups are not
used on this CPU (not supporting MMU_FTR_HPTE_TABLE)


hash method is not used, this is SW TLB loading that is used, but still, all the PP and Ks/Kp keys 
defined in the segment register are used, see e300 core reference manual §6.4.2 Page Memory Protection



- The PP bits are NOT checked by the CPU in HW, even if set to 00, the CPU does 
not react.
As far I have understood, the TLB miss routines are responsible for 
checking permissions.
The TLB miss routines check the Linux PTE styled entries and generates the 
PP bits
for the TLB entry. The PowerPC PP bits are never check elsewhere on that 
CPU models ...


PP bits ARE checked hoppefully. If it was not the case, then the TLB miss routines would install a 
TLB on a read, then the user could do a write without any verification being done ?


Refer to e300 Core reference Manual, §6.1.4 Memory Protection Facilities

As I explained in the patch, the problem is not that the HW doesn't check the permission. It is that 
user accessed been done with key 0 as programmed in the segment registers, PP 00 means RW access.



- The PTE entries in Linux are fully "void" in sense of this CPU type, as this 
CPU does not
read any PTEs from RAM (no HW support in contrast to x86 or ARM or later 
ppc...).


No, the PTE are read by the TLB miss exception handlers and writen into TLB 
entries.



In summary - as far as I understand it now - we have to handle the PTE bits 
differently
(Linux style) for PROT_NONE permissions - OR - we have to expand the permission
checking like my proposed experimental patch. (PROT_NONE is not NUMA related 
only,
but may not used very often ...).


Yes, expanding the permission checking is the easiest solution, hence the patch I sent out based on 
your proposal.




Another related point:
According e300 RM (manual) the ACCESSED bit in the PTE shall be set on TLB 
miss, as
it is an indication, that page is used. In 4.4 kernel this write back of the 
_PAGE_ACCESSED
bit was performed after successful permission check:

 bne-DataAddressInvalid  /* return if access not permitted */
 ori r0,r0,_PAGE_ACCESSED/* set _PAGE_ACCESSED in pte */
 /*
  * NOTE! We are assuming this is not an SMP system, otherwise
  * we would need to update the pte atomically with lwarx/stwcx.
  */
 stw r0,0(r2)/* update PTE (accessed bit) */
 /* Convert linux-style PTE to low word of PPC-style PTE */

Bit is set (ori ...) and written back (stw ...) to Linux PTE. May be, this is 
not needed, as the
PTE is never seen by the PPC chip. But I do not understand, WHY the 
PAGE_ACCCESSED
is used for permission check in the late 5.4 kernel (not used in 4.4 kernel):

cmplw   0,r1,r3
mfspr   r2, SPRN_SDR1
li  r1, _PAGE_PRESENT | _PAGE_ACCESSED
rlwinm  r2, r2, 28, 0xf000
bgt-112f

What is the reason or relevance for checking this here ?
Was not checked in 4.4, bit or-ed afterwards, as it is accessed now.
Do you know the reason of change on this point ?


PAGE_ACCESSED is important for memory management, linux kernel need it.

But instead of spending time at every miss to perform a write which will be a no-op in 99% of cases, 
we prefer bailing out to the page_fault logic when the accessed bit is not set. Then the page_fault 
logic will set the bit.
This also allowed to simplify the handling in __set_pte()_at function by avoiding races in the 
update of PTEs.




Another remark to Core manual relevant for this:
There is the reference manual for e300 core available (e300 RM). It includes
many remarks in range of Memory Management section, that many features
are optional or variable for dedicated implementations. On the other hand,
the MPC5121e reference manual refers to the e300 core RM, but DOES NOT
information, which of the optional points are there or nor. According my
analysis, MPC5121e does not include any of the optional features.



Not sure what you mean. As far as I understand, that chapter tells you that some functionnalities 
are optional for the powerpc architectecture, and provided (or not) by the e300 core. The MPC5121 
supports all the things that are defined by e300 core.





Thanks a lot for first reactions


You are welcome, don't hesitate if you have additional questions.

Christophe


RE: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE

2021-02-01 Thread PLATTNER Christoph
Hello to all, and thank you very much for first and second fast response.

I do not have a long history on PowerPC MMU environment, I hacked into this 
topic
for about 3 months for analyzing that problem- so, sorry, if I am wrong in some 
points ...

What I learn so far from this MPC5121e (variant of e300c4 core):
- It uses book3s32 hash-code, but it DOES NOT provide KEY hash method, so 
always the 
   branch  "if (! Hash) " is taken, so, I assume that "key 0" and "key 1" 
setups are not
   used on this CPU (not supporting MMU_FTR_HPTE_TABLE)
- The PP bits are NOT checked by the CPU in HW, even if set to 00, the CPU does 
not react.
   As far I have understood, the TLB miss routines are responsible for checking 
permissions.
   The TLB miss routines check the Linux PTE styled entries and generates the 
PP bits
   for the TLB entry. The PowerPC PP bits are never check elsewhere on that CPU 
models ...
- The PTE entries in Linux are fully "void" in sense of this CPU type, as this 
CPU does not
   read any PTEs from RAM (no HW support in contrast to x86 or ARM or later 
ppc...).

In summary - as far as I understand it now - we have to handle the PTE bits 
differently
(Linux style) for PROT_NONE permissions - OR - we have to expand the permission 
checking like my proposed experimental patch. (PROT_NONE is not NUMA related 
only,
but may not used very often ...).

Another related point:
According e300 RM (manual) the ACCESSED bit in the PTE shall be set on TLB 
miss, as
it is an indication, that page is used. In 4.4 kernel this write back of the 
_PAGE_ACCESSED 
bit was performed after successful permission check:

bne-DataAddressInvalid  /* return if access not permitted */
ori r0,r0,_PAGE_ACCESSED/* set _PAGE_ACCESSED in pte */
/*
 * NOTE! We are assuming this is not an SMP system, otherwise
 * we would need to update the pte atomically with lwarx/stwcx.
 */
stw r0,0(r2)/* update PTE (accessed bit) */
/* Convert linux-style PTE to low word of PPC-style PTE */

Bit is set (ori ...) and written back (stw ...) to Linux PTE. May be, this is 
not needed, as the
PTE is never seen by the PPC chip. But I do not understand, WHY the 
PAGE_ACCCESSED 
is used for permission check in the late 5.4 kernel (not used in 4.4 kernel):

cmplw   0,r1,r3
mfspr   r2, SPRN_SDR1
li  r1, _PAGE_PRESENT | _PAGE_ACCESSED
rlwinm  r2, r2, 28, 0xf000
bgt-112f

What is the reason or relevance for checking this here ?
Was not checked in 4.4, bit or-ed afterwards, as it is accessed now.
Do you know the reason of change on this point ?

Another remark to Core manual relevant for this:
There is the reference manual for e300 core available (e300 RM). It includes
many remarks in range of Memory Management section, that many features
are optional or variable for dedicated implementations. On the other hand, 
the MPC5121e reference manual refers to the e300 core RM, but DOES NOT 
information, which of the optional points are there or nor. According my
analysis, MPC5121e does not include any of the optional features.


Thanks a lot for first reactions
Christoph


-Original Message-
From: Christophe Leroy  
Sent: Montag, 1. Februar 2021 07:30
To: Benjamin Herrenschmidt ; Paul Mackerras 
; Michael Ellerman ; PLATTNER Christoph 

Cc: linux-kernel@vger.kernel.org; linuxppc-...@lists.ozlabs.org
Subject: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE

On book3s/32, page protection is defined by the PP bits in the PTE which 
provide the following protection depending on the access keys defined in the 
matching segment register:
- PP 00 means RW with key 0 and N/A with key 1.
- PP 01 means RW with key 0 and RO with key 1.
- PP 10 means RW with both key 0 and key 1.
- PP 11 means RO with both key 0 and key 1.

Since the implementation of kernel userspace access protection, PP bits have 
been set as follows:
- PP00 for pages without _PAGE_USER
- PP01 for pages with _PAGE_USER and _PAGE_RW
- PP11 for pages with _PAGE_USER and without _PAGE_RW

For kernelspace segments, kernel accesses are performed with key 0 and user 
accesses are performed with key 1. As PP00 is used for non _PAGE_USER pages, 
user can't access kernel pages not flagged _PAGE_USER while kernel can.

For userspace segments, both kernel and user accesses are performed with key 0, 
therefore pages not flagged _PAGE_USER are still accessible to the user.

This shouldn't be an issue, because userspace is expected to be accessible to 
the user. But unlike most other architectures, powerpc implements PROT_NONE 
protection by removing _PAGE_USER flag instead of flagging the page as not 
valid. This means that pages in userspace that are not flagged _PAGE_USER shall 
remain inaccessible.

To get the expected behaviour, just mimic other architectures in the TLB miss 
h

[PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE

2021-01-31 Thread Christophe Leroy
On book3s/32, page protection is defined by the PP bits in the PTE
which provide the following protection depending on the access
keys defined in the matching segment register:
- PP 00 means RW with key 0 and N/A with key 1.
- PP 01 means RW with key 0 and RO with key 1.
- PP 10 means RW with both key 0 and key 1.
- PP 11 means RO with both key 0 and key 1.

Since the implementation of kernel userspace access protection,
PP bits have been set as follows:
- PP00 for pages without _PAGE_USER
- PP01 for pages with _PAGE_USER and _PAGE_RW
- PP11 for pages with _PAGE_USER and without _PAGE_RW

For kernelspace segments, kernel accesses are performed with key 0
and user accesses are performed with key 1. As PP00 is used for
non _PAGE_USER pages, user can't access kernel pages not flagged
_PAGE_USER while kernel can.

For userspace segments, both kernel and user accesses are performed
with key 0, therefore pages not flagged _PAGE_USER are still
accessible to the user.

This shouldn't be an issue, because userspace is expected to be
accessible to the user. But unlike most other architectures, powerpc
implements PROT_NONE protection by removing _PAGE_USER flag instead of
flagging the page as not valid. This means that pages in userspace
that are not flagged _PAGE_USER shall remain inaccessible.

To get the expected behaviour, just mimic other architectures in the
TLB miss handler by checking _PAGE_USER permission on userspace
accesses as if it was the _PAGE_PRESENT bit.

Note that this problem only is only for 603 cores. The 604+ have
an hash table, and hash_page() function already implement the
verification of _PAGE_USER permission on userspace pages.

Reported-by: Christoph Plattner 
Fixes: f342adca3afc ("powerpc/32s: Prepare Kernel Userspace Access Protection")
Cc: sta...@vger.kernel.org
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_book3s_32.S | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/head_book3s_32.S 
b/arch/powerpc/kernel/head_book3s_32.S
index 858fbc8b19f3..0004e8a6a58e 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -453,11 +453,12 @@ InstructionTLBMiss:
cmplw   0,r1,r3
 #endif
mfspr   r2, SPRN_SDR1
-   li  r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
+   li  r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC | _PAGE_USER
rlwinm  r2, r2, 28, 0xf000
 #ifdef CONFIG_MODULES
bgt-112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha   /* if kernel address, 
use */
+   li  r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
addir2, r2, (swapper_pg_dir - PAGE_OFFSET)@l/* kernel page 
table */
 #endif
 112:   rlwimi  r2,r3,12,20,29  /* insert top 10 bits of address */
@@ -516,10 +517,11 @@ DataLoadTLBMiss:
lis r1, TASK_SIZE@h /* check if kernel address */
cmplw   0,r1,r3
mfspr   r2, SPRN_SDR1
-   li  r1, _PAGE_PRESENT | _PAGE_ACCESSED
+   li  r1, _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_USER
rlwinm  r2, r2, 28, 0xf000
bgt-112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha   /* if kernel address, 
use */
+   li  r1, _PAGE_PRESENT | _PAGE_ACCESSED
addir2, r2, (swapper_pg_dir - PAGE_OFFSET)@l/* kernel page 
table */
 112:   rlwimi  r2,r3,12,20,29  /* insert top 10 bits of address */
lwz r2,0(r2)/* get pmd entry */
@@ -593,10 +595,11 @@ DataStoreTLBMiss:
lis r1, TASK_SIZE@h /* check if kernel address */
cmplw   0,r1,r3
mfspr   r2, SPRN_SDR1
-   li  r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED
+   li  r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED | 
_PAGE_USER
rlwinm  r2, r2, 28, 0xf000
bgt-112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha   /* if kernel address, 
use */
+   li  r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED
addir2, r2, (swapper_pg_dir - PAGE_OFFSET)@l/* kernel page 
table */
 112:   rlwimi  r2,r3,12,20,29  /* insert top 10 bits of address */
lwz r2,0(r2)/* get pmd entry */
-- 
2.25.0