Re: [PATCH v3 2/7] uaccess: Tell user_access_begin() if it's for a write or not

2020-01-24 Thread Tony Luck
On Thu, Jan 23, 2020 at 10:03 AM Linus Torvalds
 wrote:
> We used to have a read/write argument to the old "verify_area()" and
> "access_ok()" model, and it was a mistake. It was due to odd i386 user
> access issues. We got rid of it. I'm not convinced this is any better
> - it looks very similar and for odd ppc access issues.

If the mode (read or write) were made visible to the trap handler, I'd
find that useful for machine check recovery.  If I'm in the middle of a
copy_from_user() and I get a machine check reading poison from a
user address ... then I could try to recover in the same way as for the
user accessing the poison (offline the page, SIGBUS the task). But if
the poison is in kernel memory and we are doing a copy_to_user(), then
we are hosed (or would need some more complex recovery plan).

[Note that we only get recoverable machine checks on loads... writes
are posted, so if something goes wrong it isn't synchronous with the store
instruction that initiated it]

-Tony


Re: [PATCH v4 1/6] libnvdimm/namespace: Make namespace size validation arch dependent

2020-01-24 Thread Dan Williams
On Fri, Jan 24, 2020 at 9:07 AM Aneesh Kumar K.V
 wrote:
>
> On 1/24/20 10:15 PM, Dan Williams wrote:
> > On Thu, Jan 23, 2020 at 11:34 PM Aneesh Kumar K.V
> >  wrote:
> >>
> >> On 1/24/20 11:27 AM, Dan Williams wrote:
> >>> On Mon, Jan 20, 2020 at 6:08 AM Aneesh Kumar K.V
> >>>
> >>
> >> 
> >>
> 
>  +unsigned long arch_namespace_map_size(void)
>  +{
>  +   return PAGE_SIZE;
>  +}
>  +EXPORT_SYMBOL_GPL(arch_namespace_map_size);
>  +
>  +
> static void __cpa_flush_all(void *arg)
> {
>    unsigned long cache = (unsigned long)arg;
>  diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
>  index 9df091bd30ba..a3476dbd2656 100644
>  --- a/include/linux/libnvdimm.h
>  +++ b/include/linux/libnvdimm.h
>  @@ -284,4 +284,5 @@ static inline void arch_invalidate_pmem(void *addr, 
>  size_t size)
> }
> #endif
> 
>  +unsigned long arch_namespace_map_size(void);
> >>>
> >>> This property is more generic than the nvdimm namespace mapping size,
> >>> it's more the fundamental remap granularity that the architecture
> >>> supports. So I would expect this to be defined in core header files.
> >>> Something like:
> >>>
> >>> diff --git a/include/linux/io.h b/include/linux/io.h
> >>> index a59834bc0a11..58b3b2091dbb 100644
> >>> --- a/include/linux/io.h
> >>> +++ b/include/linux/io.h
> >>> @@ -155,6 +155,13 @@ enum {
> >>>void *memremap(resource_size_t offset, size_t size, unsigned long 
> >>> flags);
> >>>void memunmap(void *addr);
> >>>
> >>> +#ifndef memremap_min_align
> >>> +static inline unsigned int memremap_min_align(void)
> >>> +{
> >>> +   return PAGE_SIZE;
> >>> +}
> >>> +#endif
> >>> +
> >>
> >>
> >> Should that be memremap_pages_min_align()?
> >
> > No, and on second look it needs to be a common value that results in
> > properly aligned / sized namespaces across architectures.
> >
> > What would it take for Power to make it's minimum mapping granularity
> > SUBSECTION_SIZE? The minute that the minimum alignment changes across
> > architectures we lose compatibility.
> >
> > The namespaces need to be sized such that the mode can be changed freely.
> >
>
> Linux on ppc64 with hash translation use just one page size for direct
> mapping and that is 16MB.

Ok, I think this means that the dream of SUBSECTION_SIZE being the
minimum compat alignment is dead, or at least a dream deferred.

Let's do this, change the name of this function to:

memremap_compat_align()

...and define it to be the max() of all the alignment constraints that
the arch may require through either memremap(), or memremap_pages().
Then, teach ndctl to make its default alignment compatible by default,
16MiB, with an override to allow namespace creation with the current
architecture's memremap_compat_align(), exported via sysfs, if it
happens to be less then 16MiB. Finally, cross our fingers and hope
that Power remains the only arch that violates the SUBSECTION_SIZE
minimum value for memremap_compat_align().


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.5-6 tag

2020-01-24 Thread pr-tracker-bot
The pull request you sent on Fri, 24 Jan 2020 23:13:30 +1100:

> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
> tags/powerpc-5.5-6

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/3c45d7510cf563be2ebc04f6864b70bc69f68cb9

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [PATCH v4 1/6] libnvdimm/namespace: Make namespace size validation arch dependent

2020-01-24 Thread Aneesh Kumar K.V

On 1/24/20 10:15 PM, Dan Williams wrote:

On Thu, Jan 23, 2020 at 11:34 PM Aneesh Kumar K.V
 wrote:


On 1/24/20 11:27 AM, Dan Williams wrote:

On Mon, Jan 20, 2020 at 6:08 AM Aneesh Kumar K.V







+unsigned long arch_namespace_map_size(void)
+{
+   return PAGE_SIZE;
+}
+EXPORT_SYMBOL_GPL(arch_namespace_map_size);
+
+
   static void __cpa_flush_all(void *arg)
   {
  unsigned long cache = (unsigned long)arg;
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 9df091bd30ba..a3476dbd2656 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -284,4 +284,5 @@ static inline void arch_invalidate_pmem(void *addr, size_t 
size)
   }
   #endif

+unsigned long arch_namespace_map_size(void);


This property is more generic than the nvdimm namespace mapping size,
it's more the fundamental remap granularity that the architecture
supports. So I would expect this to be defined in core header files.
Something like:

diff --git a/include/linux/io.h b/include/linux/io.h
index a59834bc0a11..58b3b2091dbb 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -155,6 +155,13 @@ enum {
   void *memremap(resource_size_t offset, size_t size, unsigned long flags);
   void memunmap(void *addr);

+#ifndef memremap_min_align
+static inline unsigned int memremap_min_align(void)
+{
+   return PAGE_SIZE;
+}
+#endif
+



Should that be memremap_pages_min_align()?


No, and on second look it needs to be a common value that results in
properly aligned / sized namespaces across architectures.

What would it take for Power to make it's minimum mapping granularity
SUBSECTION_SIZE? The minute that the minimum alignment changes across
architectures we lose compatibility.

The namespaces need to be sized such that the mode can be changed freely.



Linux on ppc64 with hash translation use just one page size for direct 
mapping and that is 16MB.


-aneesh



[PATCH v2] powerpc/32: Warn and return ENOSYS on syscalls from kernel

2020-01-24 Thread Christophe Leroy
Since commit b86fb88855ea ("powerpc/32: implement fast entry for
syscalls on non BOOKE") and commit 1a4b739bbb4f ("powerpc/32:
implement fast entry for syscalls on BOOKE"), syscalls from
kernel are unexpected and can have catastrophic consequences
as it will destroy the kernel stack.

Test MSR_PR on syscall entry. In case syscall is from kernel,
emit a warning and return ENOSYS error.

Signed-off-by: Christophe Leroy 
---
v2: Rebased on powerpc/next-test, ie on top of VMAP_STACK series
---
 arch/powerpc/kernel/entry_32.S   | 26 ++
 arch/powerpc/kernel/head_32.h| 18 +++---
 arch/powerpc/kernel/head_booke.h |  5 -
 3 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 3795654d15d1..73b80143ffac 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -574,6 +574,32 @@ syscall_exit_work:
bl  do_syscall_trace_leave
b   ret_from_except_full
 
+   /*
+* System call was called from kernel. We get here with SRR1 in r9.
+* Mark the exception as recoverable once we have retrieved SRR0,
+* trap a warning and return ENOSYS with CR[SO] set.
+*/
+   .globl  ret_from_kernel_syscall
+ret_from_kernel_syscall:
+   mfspr   r11, SPRN_SRR0
+#if !defined(CONFIG_4xx) && !defined(CONFIG_BOOKE)
+   LOAD_REG_IMMEDIATE(r12, MSR_KERNEL & ~(MSR_IR|MSR_DR))
+   MTMSRD(r12)
+#endif
+
+0: trap
+   EMIT_BUG_ENTRY 0b,__FILE__,__LINE__, BUGFLAG_WARNING
+
+   li  r3, ENOSYS
+   crset   so
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PERF_EVENTS)
+   mtspr   SPRN_NRI, r0
+#endif
+   mtspr   SPRN_SRR1, r9
+   mtspr   SPRN_SRR0, r11
+   SYNC
+   RFI
+
 /*
  * The fork/clone functions need to copy the full register set into
  * the child process. Therefore we need to save all the nonvolatile
diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index a6a5fbbf8504..5f3cfc9ef1b6 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -112,13 +112,17 @@
 .macro SYSCALL_ENTRY trapno
mfspr   r12,SPRN_SPRG_THREAD
 #ifdef CONFIG_VMAP_STACK
-   mfspr   r9, SPRN_SRR0
-   mfspr   r11, SPRN_SRR1
-   stw r9, SRR0(r12)
-   stw r11, SRR1(r12)
+   mfspr   r11, SPRN_SRR0
+   mfspr   r9, SPRN_SRR1
+   stw r11, SRR0(r12)
+   stw r9, SRR1(r12)
+#else
+   mfspr   r9, SPRN_SRR1
 #endif
mfcrr10
+   andi.   r11, r9, MSR_PR
lwz r11,TASK_STACK-THREAD(r12)
+   beq-99f
rlwinm  r10,r10,0,4,2   /* Clear SO bit in CR */
addir11, r11, THREAD_SIZE - INT_FRAME_SIZE
 #ifdef CONFIG_VMAP_STACK
@@ -128,15 +132,14 @@
 #endif
tovirt_vmstack r12, r12
tophys_novmstack r11, r11
-   mflrr9
stw r10,_CCR(r11)   /* save registers */
-   stw r9, _LINK(r11)
+   mflrr10
+   stw r10, _LINK(r11)
 #ifdef CONFIG_VMAP_STACK
lwz r10, SRR0(r12)
lwz r9, SRR1(r12)
 #else
mfspr   r10,SPRN_SRR0
-   mfspr   r9,SPRN_SRR1
 #endif
stw r1,GPR1(r11)
stw r1,0(r11)
@@ -209,6 +212,7 @@
mtspr   SPRN_SRR0,r11
SYNC
RFI /* jump to handler, enable MMU */
+99:b   ret_from_kernel_syscall
 .endm
 
 .macro save_dar_dsisr_on_stack reg1, reg2, sp
diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
index 37fc84ed90e3..bd2e5ed8dd50 100644
--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ -104,16 +104,18 @@ FTR_SECTION_ELSE
 #ifdef CONFIG_KVM_BOOKE_HV
 ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
 #endif
+   mfspr   r9, SPRN_SRR1
BOOKE_CLEAR_BTB(r11)
+   andi.   r11, r9, MSR_PR
lwz r11, TASK_STACK - THREAD(r10)
rlwinm  r12,r12,0,4,2   /* Clear SO bit in CR */
+   beq-99f
ALLOC_STACK_FRAME(r11, THREAD_SIZE - INT_FRAME_SIZE)
stw r12, _CCR(r11)  /* save various registers */
mflrr12
stw r12,_LINK(r11)
mfspr   r12,SPRN_SRR0
stw r1, GPR1(r11)
-   mfspr   r9,SPRN_SRR1
stw r1, 0(r11)
mr  r1, r11
stw r12,_NIP(r11)
@@ -176,6 +178,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
mtspr   SPRN_SRR0,r11
SYNC
RFI /* jump to handler, enable MMU */
+99:b   ret_from_kernel_syscall
 .endm
 
 /* To handle the additional exception priority levels on 40x and Book-E
-- 
2.25.0



[PATCH] powerpc/32: Add missing context synchronisation with CONFIG_VMAP_STACK

2020-01-24 Thread Christophe Leroy
After reactivation of data translation by modifying MSR[DR], a isync
is required to ensure the translation is effective.

Signed-off-by: Christophe Leroy 
---
Rebased on powerpc/merge-test

@mpe: If not too late:
- change to head_32.h should be squashed into "powerpc/32: prepare for 
CONFIG_VMAP_STACK"
- change to head_32.S should be squashed into "powerpc/32s: Enable 
CONFIG_VMAP_STACK"

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_32.S | 1 +
 arch/powerpc/kernel/head_32.h | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index cb7864091641..0493fcac6409 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -277,6 +277,7 @@ MachineCheck:
 #ifdef CONFIG_VMAP_STACK
li  r11, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */
mtmsr   r11
+   isync
 #endif
 #ifdef CONFIG_PPC_CHRP
mfspr   r11, SPRN_SPRG_THREAD
diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index 73a035b40dbf..a6a5fbbf8504 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -43,6 +43,7 @@
.ifeq   \for_rtas
li  r11, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */
mtmsr   r11
+   isync
.endif
subir11, r1, INT_FRAME_SIZE /* use r1 if kernel */
 #else
@@ -123,6 +124,7 @@
 #ifdef CONFIG_VMAP_STACK
li  r9, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */
mtmsr   r9
+   isync
 #endif
tovirt_vmstack r12, r12
tophys_novmstack r11, r11
-- 
2.25.0



Re: vmlinux ELF header sometimes corrupt

2020-01-24 Thread Andreas Schwab
On Jan 22 2020, Rasmus Villemoes wrote:

> So the inode number and mtime/ctime are exactly the same, but for some
> reason Blocks: has changed? This is on an ext4 filesystem, but I don't
> suspect the filesystem to be broken, because it's always just vmlinux
> that ends up corrupt, and always in exactly this way with the first 52
> bytes having been wiped.

Note that the size of the ELF header (Elf32_Ehdr) is 52 bytes.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


[GIT PULL] Please pull powerpc/linux.git powerpc-5.5-6 tag

2020-01-24 Thread Michael Ellerman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi Linus,

Please pull some more powerpc fixes for 5.5:

The following changes since commit 6da3eced8c5f3b03340b0c395bacd552c4d52411:

  powerpc/spinlocks: Include correct header for static key (2019-12-30 21:20:41 
+1100)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-5.5-6

for you to fetch changes up to 5d2e5dd5849b4ef5e8ec35e812cdb732c13cd27e:

  powerpc/mm/hash: Fix sharing context ids between kernel & userspace 
(2020-01-23 21:26:20 +1100)

- --
powerpc fixes for 5.5 #6

Fix our hash MMU code to avoid having overlapping ids between user and kernel,
which isn't as bad as it sounds but led to crashes on some machines.

A fix for the Power9 XIVE interrupt code, which could return the wrong interrupt
state in obscure error conditions.

A minor Kconfig fix for the recently added CONFIG_PPC_UV code.

Thanks to:
  Aneesh Kumar K.V, Bharata B Rao, Cédric Le Goater, Frederic Barrat.

- --
Aneesh Kumar K.V (1):
  powerpc/mm/hash: Fix sharing context ids between kernel & userspace

Bharata B Rao (1):
  powerpc: Ultravisor: Fix the dependencies for CONFIG_PPC_UV

Frederic Barrat (1):
  powerpc/xive: Discard ESB load value when interrupt is invalid


 arch/powerpc/Kconfig  |  6 +-
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  5 -
 arch/powerpc/include/asm/xive-regs.h  |  1 +
 arch/powerpc/sysdev/xive/common.c | 15 ---
 4 files changed, 18 insertions(+), 9 deletions(-)
-BEGIN PGP SIGNATURE-

iQIzBAEBCAAdFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl4q30AACgkQUevqPMjh
pYCwSQ//Xbc8BofjbbiGambesHUG3Bkho8K38HSXfj9wdFdDS1f5PnUnTVNtNsch
GAXYEnreN33I5MZnOQO3Zaa6Ub+DPyDIn5dq1alo2uv/sJY4zyC/lwCeLNOAbC93
lj1s7laC7SQG6phf7hVT0xFMYA/j0GhQG8w4RIxiyJEXnm+Vb9SGUgbrArxmYRwF
dI7wHxEHdixMBgdA1q00inv51UIqmNJS/nPyBJUxBbKp1Kkzy0fOTLhgFCj4um6g
kX1AEzEkiNNpWDG30Hu6qrapa5181tet07ABgSxdyTB0pElbsigoFR5mRgeTWuAk
mP14couPhOx3NkW90yvuI9AvLAQIlvMH4rGbB61rqNgiLnfTGiYxy1Hvq2ihbXQy
TVdlLdjW2pa3vqz04vZa09NOjz9CHY1/llunpkJvUhd/ddXa+Cieu9fzDDC0fheF
ftZD4o9LtloGbWpa+re81uuQ/V6MhEOPbP44PJI70bVG+OWY0GPTuvqgKS/yckTv
Xxs5lTsU9qKDGsNOFzqwjTCLd3o4hPurfo6xrzyyMin2LRm4t8sTWLRm8SVoL+Wh
KRydvI8oE9qXNNCwwDnMYtiDmisZPWpWWRyPG+e34TXjc0SSEoeZUoVV+B/u5pVu
BIbzhRbbKY80E/m1virbaB+7Z92omfMnenIUdt3ZDTarh/INMdA=
=coVk
-END PGP SIGNATURE-


[PATCH v4 7/7] powerpc: Implement user_access_save() and user_access_restore()

2020-01-24 Thread Christophe Leroy
Implement user_access_save() and user_access_restore()

On 8xx and radix:
- On save, get the value of the associated special register
then prevent user access.
- On restore, set back the saved value to the associated special
register.

On book3s/32:
- On save, get the value stored in current->thread.kuap and prevent
user access.
- On restore, regenerate address range from the stored value and
reopen read/write access for that range.

Signed-off-by: Christophe Leroy 
---
v4: new
---
 arch/powerpc/include/asm/book3s/32/kup.h  | 23 +++
 .../powerpc/include/asm/book3s/64/kup-radix.h | 22 ++
 arch/powerpc/include/asm/kup.h|  2 ++
 arch/powerpc/include/asm/nohash/32/kup-8xx.h  | 14 +++
 arch/powerpc/include/asm/uaccess.h|  5 ++--
 5 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/kup.h 
b/arch/powerpc/include/asm/book3s/32/kup.h
index 17e069291c72..3c0ba22dc360 100644
--- a/arch/powerpc/include/asm/book3s/32/kup.h
+++ b/arch/powerpc/include/asm/book3s/32/kup.h
@@ -153,6 +153,29 @@ static __always_inline void prevent_user_access(void 
__user *to, const void __us
kuap_update_sr(mfsrin(addr) | SR_KS, addr, end);/* set Ks */
 }
 
+static inline unsigned long prevent_user_access_return(void)
+{
+   unsigned long flags = current->thread.kuap;
+   unsigned long addr = flags & 0xf000;
+   unsigned long end = flags << 28;
+   void __user *to = (__force void __user *)addr;
+
+   if (flags)
+   prevent_user_access(to, to, end - addr, KUAP_READ_WRITE);
+
+   return flags;
+}
+
+static inline void restore_user_access(unsigned long flags)
+{
+   unsigned long addr = flags & 0xf000;
+   unsigned long end = flags << 28;
+   void __user *to = (__force void __user *)addr;
+
+   if (flags)
+   allow_user_access(to, to, end - addr, KUAP_READ_WRITE);
+}
+
 static inline bool
 bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
 {
diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
b/arch/powerpc/include/asm/book3s/64/kup-radix.h
index a0263e94df33..90dd3a3fc8c7 100644
--- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
@@ -63,6 +63,14 @@
  * because that would require an expensive read/modify write of the AMR.
  */
 
+static inline unsigned long get_kuap(void)
+{
+   if (!early_mmu_has_feature(MMU_FTR_RADIX_KUAP))
+   return 0;
+
+   return mfspr(SPRN_AMR);
+}
+
 static inline void set_kuap(unsigned long value)
 {
if (!early_mmu_has_feature(MMU_FTR_RADIX_KUAP))
@@ -98,6 +106,20 @@ static inline void prevent_user_access(void __user *to, 
const void __user *from,
set_kuap(AMR_KUAP_BLOCKED);
 }
 
+static inline unsigned long prevent_user_access_return(void)
+{
+   unsigned long flags = get_kuap();
+
+   set_kuap(AMR_KUAP_BLOCKED);
+
+   return flags;
+}
+
+static inline void restore_user_access(unsigned long flags)
+{
+   set_kuap(flags);
+}
+
 static inline bool
 bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
 {
diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
index c3ce7e8ae9ea..92bcd1a26d73 100644
--- a/arch/powerpc/include/asm/kup.h
+++ b/arch/powerpc/include/asm/kup.h
@@ -55,6 +55,8 @@ static inline void allow_user_access(void __user *to, const 
void __user *from,
 unsigned long size, unsigned long dir) { }
 static inline void prevent_user_access(void __user *to, const void __user 
*from,
   unsigned long size, unsigned long dir) { 
}
+static inline unsigned long prevent_user_access_return(void) { return 0UL; }
+static inline void restore_user_access(unsigned long flags) { }
 static inline bool
 bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
 {
diff --git a/arch/powerpc/include/asm/nohash/32/kup-8xx.h 
b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
index 1d70c80366fd..85ed2390fb99 100644
--- a/arch/powerpc/include/asm/nohash/32/kup-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
@@ -46,6 +46,20 @@ static inline void prevent_user_access(void __user *to, 
const void __user *from,
mtspr(SPRN_MD_AP, MD_APG_KUAP);
 }
 
+static inline unsigned long prevent_user_access_return(void)
+{
+   unsigned long flags = mfspr(SPRN_MD_AP);
+
+   mtspr(SPRN_MD_AP, MD_APG_KUAP);
+
+   return flags;
+}
+
+static inline void restore_user_access(unsigned long flags)
+{
+   mtspr(SPRN_MD_AP, flags);
+}
+
 static inline bool
 bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
 {
diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index af905d7fc1df..2f500debae21 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -465,9 +465,8 @@ 

[PATCH v4 6/7] powerpc: Implement user_access_begin and friends

2020-01-24 Thread Christophe Leroy
Today, when a function like strncpy_from_user() is called,
the userspace access protection is de-activated and re-activated
for every word read.

By implementing user_access_begin and friends, the protection
is de-activated at the beginning of the copy and re-activated at the
end.

Implement user_access_begin(), user_access_end() and
unsafe_get_user(), unsafe_put_user() and unsafe_copy_to_user()

For the time being, we keep user_access_save() and
user_access_restore() as nops.

Signed-off-by: Christophe Leroy 
---
v2: no change
v3: adapted to the new format of user_access_begin/end()
v4: back to original format of user_access_begin/end() ; No duplication of 
raw_copy_to_user()
---
 arch/powerpc/include/asm/uaccess.h | 85 +++---
 1 file changed, 66 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index cafad1960e76..af905d7fc1df 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -91,9 +91,14 @@ static inline int __access_ok(unsigned long addr, unsigned 
long size,
__put_user_check((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))
 
 #define __get_user(x, ptr) \
-   __get_user_nocheck((x), (ptr), sizeof(*(ptr)))
+   __get_user_nocheck((x), (ptr), sizeof(*(ptr)), true)
 #define __put_user(x, ptr) \
-   __put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))
+   __put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)), true)
+
+#define __get_user_allowed(x, ptr) \
+   __get_user_nocheck((x), (ptr), sizeof(*(ptr)), false)
+#define __put_user_allowed(x, ptr) \
+   __put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)), 
false)
 
 #define __get_user_inatomic(x, ptr) \
__get_user_nosleep((x), (ptr), sizeof(*(ptr)))
@@ -138,10 +143,9 @@ extern long __put_user_bad(void);
: "r" (x), "b" (addr), "i" (-EFAULT), "0" (err))
 #endif /* __powerpc64__ */
 
-#define __put_user_size(x, ptr, size, retval)  \
+#define __put_user_size_allowed(x, ptr, size, retval)  \
 do {   \
retval = 0; \
-   allow_write_to_user(ptr, size); \
switch (size) { \
  case 1: __put_user_asm(x, ptr, retval, "stb"); break; \
  case 2: __put_user_asm(x, ptr, retval, "sth"); break; \
@@ -149,17 +153,26 @@ do {  
\
  case 8: __put_user_asm2(x, ptr, retval); break;   \
  default: __put_user_bad();\
}   \
+} while (0)
+
+#define __put_user_size(x, ptr, size, retval)  \
+do {   \
+   allow_write_to_user(ptr, size); \
+   __put_user_size_allowed(x, ptr, size, retval);  \
prevent_write_to_user(ptr, size);   \
 } while (0)
 
-#define __put_user_nocheck(x, ptr, size)   \
+#define __put_user_nocheck(x, ptr, size, do_allow) \
 ({ \
long __pu_err;  \
__typeof__(*(ptr)) __user *__pu_addr = (ptr);   \
if (!is_kernel_addr((unsigned long)__pu_addr))  \
might_fault();  \
__chk_user_ptr(ptr);\
-   __put_user_size((x), __pu_addr, (size), __pu_err);  \
+   if (do_allow)   
\
+   __put_user_size((x), __pu_addr, (size), __pu_err);  
\
+   else
\
+   __put_user_size_allowed((x), __pu_addr, (size), __pu_err);  
\
__pu_err;   \
 })
 
@@ -236,13 +249,12 @@ extern long __get_user_bad(void);
: "b" (addr), "i" (-EFAULT), "0" (err))
 #endif /* __powerpc64__ */
 
-#define __get_user_size(x, ptr, size, retval)  \
+#define __get_user_size_allowed(x, ptr, size, retval)  \
 do {   \
retval = 0; \
__chk_user_ptr(ptr);\
if (size > sizeof(x))   \
(x) = __get_user_bad(); \
-   allow_read_from_user(ptr, size);\
switch (size) { \
case 1: __get_user_asm(x, ptr, retval, "lbz"); break;   \
case 2: 

[PATCH v4 5/7] powerpc/32s: prepare prevent_user_access() for user_access_end()

2020-01-24 Thread Christophe Leroy
In preparation of implementing user_access_begin and friends
on powerpc, the book3s/32 version of prevent_user_access() need
to be prepared for user_access_end().

user_access_end() doesn't provide the address and size which
were passed to user_access_begin(), required by prevent_user_access()
to know which segment to modify.

The list of segments which where unprotected by allow_user_access()
are available in current->kuap. But we don't want prevent_user_access()
to read this all the time, especially everytime it is 0 (for instance
because the access was not a write access).

Implement a special direction named KUAP_CURRENT. In this case only,
the addr and end are retrieved from current->kuap.

Signed-off-by: Christophe Leroy 
---
v2: No change
v3: This patch was not in v3
v4: Added build protection against bad use of KUAP_CURRENT.
---
 arch/powerpc/include/asm/book3s/32/kup.h  | 23 +++
 .../powerpc/include/asm/book3s/64/kup-radix.h |  4 +++-
 arch/powerpc/include/asm/kup.h| 11 +
 3 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/kup.h 
b/arch/powerpc/include/asm/book3s/32/kup.h
index de29fb752cca..17e069291c72 100644
--- a/arch/powerpc/include/asm/book3s/32/kup.h
+++ b/arch/powerpc/include/asm/book3s/32/kup.h
@@ -108,6 +108,8 @@ static __always_inline void allow_user_access(void __user 
*to, const void __user
u32 addr, end;
 
BUILD_BUG_ON(!__builtin_constant_p(dir));
+   BUILD_BUG_ON(dir == KUAP_CURRENT);
+
if (!(dir & KUAP_WRITE))
return;
 
@@ -117,6 +119,7 @@ static __always_inline void allow_user_access(void __user 
*to, const void __user
return;
 
end = min(addr + size, TASK_SIZE);
+
current->thread.kuap = (addr & 0xf000) | end - 1) >> 28) + 1) & 
0xf);
kuap_update_sr(mfsrin(addr) & ~SR_KS, addr, end);   /* Clear Ks */
 }
@@ -127,15 +130,25 @@ static __always_inline void prevent_user_access(void 
__user *to, const void __us
u32 addr, end;
 
BUILD_BUG_ON(!__builtin_constant_p(dir));
-   if (!(dir & KUAP_WRITE))
-   return;
 
-   addr = (__force u32)to;
+   if (dir == KUAP_CURRENT) {
+   u32 kuap = current->thread.kuap;
 
-   if (unlikely(addr >= TASK_SIZE || !size))
+   if (unlikely(!kuap))
+   return;
+
+   addr = kuap & 0xf000;
+   end = kuap << 28;
+   } else if (dir & KUAP_WRITE) {
+   addr = (__force u32)to;
+   end = min(addr + size, TASK_SIZE);
+
+   if (unlikely(addr >= TASK_SIZE || !size))
+   return;
+   } else {
return;
+   }
 
-   end = min(addr + size, TASK_SIZE);
current->thread.kuap = 0;
kuap_update_sr(mfsrin(addr) | SR_KS, addr, end);/* set Ks */
 }
diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
b/arch/powerpc/include/asm/book3s/64/kup-radix.h
index c8d1076e0ebb..a0263e94df33 100644
--- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
@@ -86,8 +86,10 @@ static __always_inline void allow_user_access(void __user 
*to, const void __user
set_kuap(AMR_KUAP_BLOCK_WRITE);
else if (dir == KUAP_WRITE)
set_kuap(AMR_KUAP_BLOCK_READ);
-   else
+   else if (dir == KUAP_READ_WRITE)
set_kuap(0);
+   else
+   BUILD_BUG();
 }
 
 static inline void prevent_user_access(void __user *to, const void __user 
*from,
diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
index 94f24928916a..c3ce7e8ae9ea 100644
--- a/arch/powerpc/include/asm/kup.h
+++ b/arch/powerpc/include/asm/kup.h
@@ -5,6 +5,12 @@
 #define KUAP_READ  1
 #define KUAP_WRITE 2
 #define KUAP_READ_WRITE(KUAP_READ | KUAP_WRITE)
+/*
+ * For prevent_user_access() only.
+ * Use the current saved situation instead of the to/from/size params.
+ * Used on book3s/32
+ */
+#define KUAP_CURRENT   4
 
 #ifdef CONFIG_PPC64
 #include 
@@ -88,6 +94,11 @@ static inline void prevent_read_write_user(void __user *to, 
const void __user *f
prevent_user_access(to, from, size, KUAP_READ_WRITE);
 }
 
+static inline void prevent_current_access_user(void)
+{
+   prevent_user_access(NULL, NULL, ~0UL, KUAP_CURRENT);
+}
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_KUAP_H_ */
-- 
2.25.0



[PATCH v4 4/7] powerpc/32s: Drop NULL addr verification

2020-01-24 Thread Christophe Leroy
NULL addr is a user address. Don't waste time checking it. If
someone tries to access it, it will SIGFAULT the same way as for
address 1, so no need to make it special.

The special case is when not doing a write, in that case we want
to drop the entire function. This is now handled by 'dir' param
and not by the nulity of 'to' anymore.

Also make beginning of prevent_user_access() similar
to beginning of allow_user_access(), and tell the compiler
that writing in kernel space or with a 0 length is unlikely

Signed-off-by: Christophe Leroy 
Signed-off-by: Michael Ellerman 
Link: 
https://lore.kernel.org/r/d79cb9f680f4e971e05262303103a4b94baa91ce.1579715466.git.christophe.le...@c-s.fr
---
v4: taken from powerpc/merge-test
---
 arch/powerpc/include/asm/book3s/32/kup.h | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/kup.h 
b/arch/powerpc/include/asm/book3s/32/kup.h
index 91c8f1d9bcee..de29fb752cca 100644
--- a/arch/powerpc/include/asm/book3s/32/kup.h
+++ b/arch/powerpc/include/asm/book3s/32/kup.h
@@ -113,7 +113,7 @@ static __always_inline void allow_user_access(void __user 
*to, const void __user
 
addr = (__force u32)to;
 
-   if (!addr || addr >= TASK_SIZE || !size)
+   if (unlikely(addr >= TASK_SIZE || !size))
return;
 
end = min(addr + size, TASK_SIZE);
@@ -124,16 +124,18 @@ static __always_inline void allow_user_access(void __user 
*to, const void __user
 static __always_inline void prevent_user_access(void __user *to, const void 
__user *from,
u32 size, unsigned long dir)
 {
-   u32 addr = (__force u32)to;
-   u32 end = min(addr + size, TASK_SIZE);
+   u32 addr, end;
 
BUILD_BUG_ON(!__builtin_constant_p(dir));
if (!(dir & KUAP_WRITE))
return;
 
-   if (!addr || addr >= TASK_SIZE || !size)
+   addr = (__force u32)to;
+
+   if (unlikely(addr >= TASK_SIZE || !size))
return;
 
+   end = min(addr + size, TASK_SIZE);
current->thread.kuap = 0;
kuap_update_sr(mfsrin(addr) | SR_KS, addr, end);/* set Ks */
 }
-- 
2.25.0



[PATCH v4 3/7] powerpc/kuap: Fix set direction in allow/prevent_user_access()

2020-01-24 Thread Christophe Leroy
__builtin_constant_p() always return 0 for pointers, so on RADIX
we always end up opening both direction (by writing 0 in SPR29):

  0170 <._copy_to_user>:
  ...
   1b0: 4c 00 01 2c isync
   1b4: 39 20 00 00 li  r9,0
   1b8: 7d 3d 03 a6 mtspr   29,r9
   1bc: 4c 00 01 2c isync
   1c0: 48 00 00 01 bl  1c0 <._copy_to_user+0x50>
1c0: R_PPC64_REL24  .__copy_tofrom_user
  ...
  0220 <._copy_from_user>:
  ...
   2ac: 4c 00 01 2c isync
   2b0: 39 20 00 00 li  r9,0
   2b4: 7d 3d 03 a6 mtspr   29,r9
   2b8: 4c 00 01 2c isync
   2bc: 7f c5 f3 78 mr  r5,r30
   2c0: 7f 83 e3 78 mr  r3,r28
   2c4: 48 00 00 01 bl  2c4 <._copy_from_user+0xa4>
2c4: R_PPC64_REL24  .__copy_tofrom_user
  ...

Use an explicit parameter for direction selection, so that GCC
is able to see it is a constant:

  01b0 <._copy_to_user>:
  ...
   1f0: 4c 00 01 2c isync
   1f4: 3d 20 40 00 lis r9,16384
   1f8: 79 29 07 c6 rldicr  r9,r9,32,31
   1fc: 7d 3d 03 a6 mtspr   29,r9
   200: 4c 00 01 2c isync
   204: 48 00 00 01 bl  204 <._copy_to_user+0x54>
204: R_PPC64_REL24  .__copy_tofrom_user
  ...
  0260 <._copy_from_user>:
  ...
   2ec: 4c 00 01 2c isync
   2f0: 39 20 ff ff li  r9,-1
   2f4: 79 29 00 04 rldicr  r9,r9,0,0
   2f8: 7d 3d 03 a6 mtspr   29,r9
   2fc: 4c 00 01 2c isync
   300: 7f c5 f3 78 mr  r5,r30
   304: 7f 83 e3 78 mr  r3,r28
   308: 48 00 00 01 bl  308 <._copy_from_user+0xa8>
308: R_PPC64_REL24  .__copy_tofrom_user
  ...

Signed-off-by: Christophe Leroy 
[mpe: Spell out the directions, s/KUAP_R/KUAP_READ/ etc.]
Signed-off-by: Michael Ellerman 
Link: 
https://lore.kernel.org/r/56743b700c56e5d341468151f0e95ff0c46663cd.1579715466.git.christophe.le...@c-s.fr
---
v4: taken from powerpc/next-test
---
 arch/powerpc/include/asm/book3s/32/kup.h  | 13 ++--
 .../powerpc/include/asm/book3s/64/kup-radix.h | 11 +++
 arch/powerpc/include/asm/kup.h| 30 ++-
 arch/powerpc/include/asm/nohash/32/kup-8xx.h  |  4 +--
 arch/powerpc/include/asm/uaccess.h|  4 +--
 5 files changed, 43 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/kup.h 
b/arch/powerpc/include/asm/book3s/32/kup.h
index d88008c8eb85..91c8f1d9bcee 100644
--- a/arch/powerpc/include/asm/book3s/32/kup.h
+++ b/arch/powerpc/include/asm/book3s/32/kup.h
@@ -102,11 +102,13 @@ static inline void kuap_update_sr(u32 sr, u32 addr, u32 
end)
isync();/* Context sync required after mtsrin() */
 }
 
-static inline void allow_user_access(void __user *to, const void __user *from, 
u32 size)
+static __always_inline void allow_user_access(void __user *to, const void 
__user *from,
+ u32 size, unsigned long dir)
 {
u32 addr, end;
 
-   if (__builtin_constant_p(to) && to == NULL)
+   BUILD_BUG_ON(!__builtin_constant_p(dir));
+   if (!(dir & KUAP_WRITE))
return;
 
addr = (__force u32)to;
@@ -119,11 +121,16 @@ static inline void allow_user_access(void __user *to, 
const void __user *from, u
kuap_update_sr(mfsrin(addr) & ~SR_KS, addr, end);   /* Clear Ks */
 }
 
-static inline void prevent_user_access(void __user *to, const void __user 
*from, u32 size)
+static __always_inline void prevent_user_access(void __user *to, const void 
__user *from,
+   u32 size, unsigned long dir)
 {
u32 addr = (__force u32)to;
u32 end = min(addr + size, TASK_SIZE);
 
+   BUILD_BUG_ON(!__builtin_constant_p(dir));
+   if (!(dir & KUAP_WRITE))
+   return;
+
if (!addr || addr >= TASK_SIZE || !size)
return;
 
diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
b/arch/powerpc/include/asm/book3s/64/kup-radix.h
index dbbd22cb80f5..c8d1076e0ebb 100644
--- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
@@ -77,20 +77,21 @@ static inline void set_kuap(unsigned long value)
isync();
 }
 
-static inline void allow_user_access(void __user *to, const void __user *from,
-unsigned long size)
+static __always_inline void allow_user_access(void __user *to, const void 
__user *from,
+ unsigned long size, unsigned long 
dir)
 {
// This is written so we can resolve to a single case at build time
-   if (__builtin_constant_p(to) && to == NULL)
+   BUILD_BUG_ON(!__builtin_constant_p(dir));
+   if (dir == KUAP_READ)
set_kuap(AMR_KUAP_BLOCK_WRITE);
-   else if (__builtin_constant_p(from) && from == NULL)
+   else if (dir == KUAP_WRITE)

[PATCH v4 2/7] powerpc/32s: Fix bad_kuap_fault()

2020-01-24 Thread Christophe Leroy
At the moment, bad_kuap_fault() reports a fault only if a bad access
to userspace occurred while access to userspace was not granted.

But if a fault occurs for a write outside the allowed userspace
segment(s) that have been unlocked, bad_kuap_fault() fails to
detect it and the kernel loops forever in do_page_fault().

Fix it by checking that the accessed address is within the allowed
range.

Fixes: a68c31fc01ef ("powerpc/32s: Implement Kernel Userspace Access 
Protection")
Cc: sta...@vger.kernel.org # v5.2+
Signed-off-by: Christophe Leroy 
Signed-off-by: Michael Ellerman 
Link: 
https://lore.kernel.org/r/1e07c7de4ffdd9cda35d1ffe8258af75579d3e91.1579715466.git.christophe.le...@c-s.fr
---
v4: taken from powerpc/next-test
---
 arch/powerpc/include/asm/book3s/32/kup.h   | 9 +++--
 arch/powerpc/include/asm/book3s/64/kup-radix.h | 3 ++-
 arch/powerpc/include/asm/kup.h | 6 +-
 arch/powerpc/include/asm/nohash/32/kup-8xx.h   | 3 ++-
 arch/powerpc/mm/fault.c| 2 +-
 5 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/kup.h 
b/arch/powerpc/include/asm/book3s/32/kup.h
index f9dc597b0b86..d88008c8eb85 100644
--- a/arch/powerpc/include/asm/book3s/32/kup.h
+++ b/arch/powerpc/include/asm/book3s/32/kup.h
@@ -131,12 +131,17 @@ static inline void prevent_user_access(void __user *to, 
const void __user *from,
kuap_update_sr(mfsrin(addr) | SR_KS, addr, end);/* set Ks */
 }
 
-static inline bool bad_kuap_fault(struct pt_regs *regs, bool is_write)
+static inline bool
+bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
 {
+   unsigned long begin = regs->kuap & 0xf000;
+   unsigned long end = regs->kuap << 28;
+
if (!is_write)
return false;
 
-   return WARN(!regs->kuap, "Bug: write fault blocked by segment registers 
!");
+   return WARN(address < begin || address >= end,
+   "Bug: write fault blocked by segment registers !");
 }
 
 #endif /* CONFIG_PPC_KUAP */
diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
b/arch/powerpc/include/asm/book3s/64/kup-radix.h
index f254de956d6a..dbbd22cb80f5 100644
--- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
@@ -95,7 +95,8 @@ static inline void prevent_user_access(void __user *to, const 
void __user *from,
set_kuap(AMR_KUAP_BLOCKED);
 }
 
-static inline bool bad_kuap_fault(struct pt_regs *regs, bool is_write)
+static inline bool
+bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
 {
return WARN(mmu_has_feature(MMU_FTR_RADIX_KUAP) &&
(regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : 
AMR_KUAP_BLOCK_READ)),
diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
index 5b5e39643a27..812e66f31934 100644
--- a/arch/powerpc/include/asm/kup.h
+++ b/arch/powerpc/include/asm/kup.h
@@ -45,7 +45,11 @@ static inline void allow_user_access(void __user *to, const 
void __user *from,
 unsigned long size) { }
 static inline void prevent_user_access(void __user *to, const void __user 
*from,
   unsigned long size) { }
-static inline bool bad_kuap_fault(struct pt_regs *regs, bool is_write) { 
return false; }
+static inline bool
+bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
+{
+   return false;
+}
 #endif /* CONFIG_PPC_KUAP */
 
 static inline void allow_read_from_user(const void __user *from, unsigned long 
size)
diff --git a/arch/powerpc/include/asm/nohash/32/kup-8xx.h 
b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
index 1006a427e99c..f2fea603b929 100644
--- a/arch/powerpc/include/asm/nohash/32/kup-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
@@ -46,7 +46,8 @@ static inline void prevent_user_access(void __user *to, const 
void __user *from,
mtspr(SPRN_MD_AP, MD_APG_KUAP);
 }
 
-static inline bool bad_kuap_fault(struct pt_regs *regs, bool is_write)
+static inline bool
+bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
 {
return WARN(!((regs->kuap ^ MD_APG_KUAP) & 0xf000),
"Bug: fault blocked by AP register !");
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index b5047f9b5dec..1baeb045f7f4 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -233,7 +233,7 @@ static bool bad_kernel_fault(struct pt_regs *regs, unsigned 
long error_code,
 
// Read/write fault in a valid region (the exception table search passed
// above), but blocked by KUAP is bad, it can never succeed.
-   if (bad_kuap_fault(regs, is_write))
+   if (bad_kuap_fault(regs, address, is_write))
return true;
 
// What's left? Kernel fault on user in well defined regions (extable
-- 
2.25.0



[PATCH v4 1/7] readdir: make user_access_begin() use the real access range

2020-01-24 Thread Christophe Leroy
From: Linus Torvalds 

In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I changed filldir to not do individual __put_user()
accesses, but instead use unsafe_put_user() surrounded by the proper
user_access_begin/end() pair.

That make them enormously faster on modern x86, where the STAC/CLAC
games make individual user accesses fairly heavy-weight.

However, the user_access_begin() range was not really the exact right
one, since filldir() has the unfortunate problem that it needs to not
only fill out the new directory entry, it also needs to fix up the
previous one to contain the proper file offset.

It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
file offset of the directory entry itself - it's the offset of the next
one.  So we end up backfilling the offset in the previous entry as we
walk along.

But since x86 didn't really care about the exact range, and used to be
the only architecture that did anything fancy in user_access_begin() to
begin with, the filldir[64]() changes did something lazy, and even
commented on it:

/*
 * Note! This range-checks 'previous' (which may be NULL).
 * The real range was checked in getdents
 */
if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;

and it all worked fine.

But now 32-bit ppc is starting to also implement user_access_begin(),
and the fact that we faked the range to only be the (possibly not even
valid) previous directory entry becomes a problem, because ppc32 will
actually be using the range that is passed in for more than just "check
that it's user space".

This is a complete rewrite of Christophe's original patch.

By saving off the record length of the previous entry instead of a
pointer to it in the filldir data structures, we can simplify the range
check and the writing of the previous entry d_off field.  No need for
any conditionals in the user accesses themselves, although we retain the
conditional EINTR checking for the "was this the first directory entry"
signal handling latency logic.

Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to 
unsafe_put_user()")
Link: 
https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.le...@c-s.fr/
Link: 
https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.le...@c-s.fr/
Reported-and-tested-by: Christophe Leroy 
Signed-off-by: Linus Torvalds 
Signed-off-by: Christophe Leroy 
---
v4: taken from Linus' tree
---
 fs/readdir.c | 73 +---
 1 file changed, 35 insertions(+), 38 deletions(-)

diff --git a/fs/readdir.c b/fs/readdir.c
index d26d5ea4de7b..d5ee72280c82 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -206,7 +206,7 @@ struct linux_dirent {
 struct getdents_callback {
struct dir_context ctx;
struct linux_dirent __user * current_dir;
-   struct linux_dirent __user * previous;
+   int prev_reclen;
int count;
int error;
 };
@@ -214,12 +214,13 @@ struct getdents_callback {
 static int filldir(struct dir_context *ctx, const char *name, int namlen,
   loff_t offset, u64 ino, unsigned int d_type)
 {
-   struct linux_dirent __user * dirent;
+   struct linux_dirent __user *dirent, *prev;
struct getdents_callback *buf =
container_of(ctx, struct getdents_callback, ctx);
unsigned long d_ino;
int reclen = ALIGN(offsetof(struct linux_dirent, d_name) + namlen + 2,
sizeof(long));
+   int prev_reclen;
 
buf->error = verify_dirent_name(name, namlen);
if (unlikely(buf->error))
@@ -232,28 +233,24 @@ static int filldir(struct dir_context *ctx, const char 
*name, int namlen,
buf->error = -EOVERFLOW;
return -EOVERFLOW;
}
-   dirent = buf->previous;
-   if (dirent && signal_pending(current))
+   prev_reclen = buf->prev_reclen;
+   if (prev_reclen && signal_pending(current))
return -EINTR;
-
-   /*
-* Note! This range-checks 'previous' (which may be NULL).
-* The real range was checked in getdents
-*/
-   if (!user_access_begin(dirent, sizeof(*dirent)))
-   goto efault;
-   if (dirent)
-   unsafe_put_user(offset, >d_off, efault_end);
dirent = buf->current_dir;
+   prev = (void __user *) dirent - prev_reclen;
+   if (!user_access_begin(prev, reclen + prev_reclen))
+   goto efault;
+
+   /* This might be 'dirent->d_off', but if so it will get overwritten */
+   unsafe_put_user(offset, >d_off, efault_end);
unsafe_put_user(d_ino, >d_ino, efault_end);
unsafe_put_user(reclen, >d_reclen, efault_end);
unsafe_put_user(d_type, (char __user *) dirent + reclen - 1, 
efault_end);
unsafe_copy_dirent_name(dirent->d_name, name, namlen, efault_end);
 

Re: [FSL P5020 P5040 PPC] Onboard SD card doesn't work anymore after the 'mmc-v5.4-2' updates

2020-01-24 Thread Michael Ellerman
Ulf Hansson  writes:
> On Thu, 16 Jan 2020 at 12:18, Christian Zigotzky  
> wrote:
>>
>> Hi All,
>>
>> We still need the attached patch for our onboard SD card interface
>> [1,2]. Could you please add this patch to the tree?
>
> No, because according to previous discussion that isn't the correct
> solution and more importantly it will break other archs (if I recall
> correctly).
>
> Looks like someone from the ppc community needs to pick up the ball.

That's a pretty small community these days :) :/

Christian can you test this please? I think I got the polarity of all
the tests right, but it's Friday night so maybe I'm wrong :)

cheers


>From 975ba6e8b52d6f5358e93c1f5a47adc4a0b5fb70 Mon Sep 17 00:00:00 2001
From: Michael Ellerman 
Date: Fri, 24 Jan 2020 22:26:59 +1100
Subject: [PATCH] of: Add OF_DMA_DEFAULT_COHERENT & select it on powerpc

There's an OF helper called of_dma_is_coherent(), which checks if a
device has a "dma-coherent" property to see if the device is coherent
for DMA.

But on some platforms devices are coherent by default, and on some
platforms it's not possible to update existing device trees to add the
"dma-coherent" property.

So add a Kconfig symbol to allow arch code to tell
of_dma_is_coherent() that devices are coherent by default, regardless
of the presence of the property.

Select that symbol on powerpc when NOT_COHERENT_CACHE is not set, ie.
when the system has a coherent cache.

Fixes: 92ea637edea3 ("of: introduce of_dma_is_coherent() helper")
Cc: sta...@vger.kernel.org # v3.16+
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/Kconfig | 1 +
 drivers/of/Kconfig   | 4 
 drivers/of/address.c | 6 +-
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 62752c3bfabf..460678ab2375 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -235,6 +235,7 @@ config PPC
select NEED_DMA_MAP_STATE   if PPC64 || NOT_COHERENT_CACHE
select NEED_SG_DMA_LENGTH
select OF
+   select OF_DMA_DEFAULT_COHERENT  if !NOT_COHERENT_CACHE
select OF_EARLY_FLATTREE
select OLD_SIGACTIONif PPC32
select OLD_SIGSUSPEND
diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig
index 37c2ccbefecd..d91618641be6 100644
--- a/drivers/of/Kconfig
+++ b/drivers/of/Kconfig
@@ -103,4 +103,8 @@ config OF_OVERLAY
 config OF_NUMA
bool
 
+config OF_DMA_DEFAULT_COHERENT
+   # arches should select this if DMA is coherent by default for OF devices
+   bool
+
 endif # OF
diff --git a/drivers/of/address.c b/drivers/of/address.c
index 99c1b8058559..e8a39c3ec4d4 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -995,12 +995,16 @@ int of_dma_get_range(struct device_node *np, u64 
*dma_addr, u64 *paddr, u64 *siz
  * @np:device node
  *
  * It returns true if "dma-coherent" property was found
- * for this device in DT.
+ * for this device in the DT, or if DMA is coherent by
+ * default for OF devices on the current platform.
  */
 bool of_dma_is_coherent(struct device_node *np)
 {
struct device_node *node = of_node_get(np);
 
+   if (IS_ENABLED(CONFIG_OF_DMA_DEFAULT_COHERENT))
+   return true;
+
while (node) {
if (of_property_read_bool(node, "dma-coherent")) {
of_node_put(node);
-- 
2.21.1




Re: [PATCH v2 6/6] powerpc: Implement user_access_begin and friends

2020-01-24 Thread Christophe Leroy




Le 23/01/2020 à 13:31, Michael Ellerman a écrit :

Michael Ellerman  writes:

Christophe Leroy  writes:

Today, when a function like strncpy_from_user() is called,
the userspace access protection is de-activated and re-activated
for every word read.

By implementing user_access_begin and friends, the protection
is de-activated at the beginning of the copy and re-activated at the
end.

Implement user_access_begin(), user_access_end() and
unsafe_get_user(), unsafe_put_user() and unsafe_copy_to_user()

For the time being, we keep user_access_save() and
user_access_restore() as nops.


That means we will run with user access enabled in a few more places, but
it's only used sparingly AFAICS:

   kernel/trace/trace_branch.c:unsigned long flags = user_access_save();
   lib/ubsan.c:unsigned long flags = user_access_save();
   lib/ubsan.c:unsigned long ua_flags = user_access_save();
   mm/kasan/common.c:  unsigned long flags = user_access_save();

And we don't have objtool checking that user access enablement isn't
leaking in the first place, so I guess it's OK for us not to implement
these to begin with?


It looks like we can implement them on on all three KUAP
implementations.

For radix and 8xx we just return/set the relevant SPR.

For book3s/32/kup.h I think we'd just need to add a KUAP_CURRENT case to
allow_user_access()?


Can't do that, we don't want to keep the info in current->thread.kuap 
after user_access_save(), otherwise we might unexpectedly re-open access 
through an interrupt.


And if we use KUAP_CURRENT case of prevent_user_access(), it means we'll 
read current->thread.kuap twice.


So, just regenerate addr and end from the flags, and use 
allow_user_access() and prevent_user_access() as usual.


I'll have it in v4

Christophe


[PATCH 4.19 447/639] perf/ioctl: Add check for the sample_period value

2020-01-24 Thread Greg Kroah-Hartman
From: Ravi Bangoria 

[ Upstream commit 913a90bc5a3a06b1f04c337320e9aeee2328dd77 ]

perf_event_open() limits the sample_period to 63 bits. See:

  0819b2e30ccb ("perf: Limit perf_event_attr::sample_period to 63 bits")

Make ioctl() consistent with it.

Also on PowerPC, negative sample_period could cause a recursive
PMIs leading to a hang (reported when running perf-fuzzer).

Signed-off-by: Ravi Bangoria 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Cc: a...@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: ma...@linux.vnet.ibm.com
Cc: m...@ellerman.id.au
Fixes: 0819b2e30ccb ("perf: Limit perf_event_attr::sample_period to 63 bits")
Link: https://lkml.kernel.org/r/20190604042953.914-1-ravi.bango...@linux.ibm.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Sasha Levin 
---
 kernel/events/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 751888cbed5c0..16af86ab24c42 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5012,6 +5012,9 @@ static int perf_event_period(struct perf_event *event, 
u64 __user *arg)
if (perf_event_check_period(event, value))
return -EINVAL;
 
+   if (!event->attr.freq && (value & (1ULL << 63)))
+   return -EINVAL;
+
event_function_call(event, __perf_event_period, );
 
return 0;
-- 
2.20.1





Re: vmlinux ELF header sometimes corrupt

2020-01-24 Thread Rasmus Villemoes
On 24/01/2020 11.50, Michael Ellerman wrote:
> Rasmus Villemoes  writes:
>> I'm building for a ppc32 (mpc8309) target using Yocto, and I'm hitting a
>> very hard to debug problem that maybe someone else has encountered. This
>> doesn't happen always, perhaps 1 in 8 times or something like that.
>>
>> The issue is that when the build gets to do "${CROSS}objcopy -O binary
>> ... vmlinux", vmlinux is not (no longer) a proper ELF file, so naturally
>> that fails with
>>
>>   powerpc-oe-linux-objcopy:vmlinux: file format not recognized
>>
>>
>> Any ideas?
> 
> Not really sorry. Haven't seen or heard of that before.
> 
> Are you doing a parallel make? If so does -j 1 fix it?

Hard to say, I'll have to try that a number of times to see if it can be
reproduced with that setting.

> If it seems like sortextable is at fault then strace'ing it would be my
> next step.

I don't think sortextable is at fault, that was just my first "I know
that at least pokes around in the ELF file". I do "cp vmlinux
vmlinux.before_sort" and "cp vmlinux vmlinux.after_sort", and both of
those copies are proper ELF files - and the .after_sort is identical to
the corrupt vmlinux apart from vmlinux ending up with its ELF header wiped.

So it's something that happens during some later build step (Yocto has a
lot of steps), perhaps "make modules" or "make modules_install" or
something ends up somehow deciding "hey, vmlinux isn't quite uptodate,
let's nuke it". I'm not even sure it's a Kbuild problem, but I've seen
the same thing happen using another meta-build system called oe-lite,
which is why I'm not primarily suspecting the Yocto logic.

Rasmus


Re: vmlinux ELF header sometimes corrupt

2020-01-24 Thread Michael Ellerman
Rasmus Villemoes  writes:
> I'm building for a ppc32 (mpc8309) target using Yocto, and I'm hitting a
> very hard to debug problem that maybe someone else has encountered. This
> doesn't happen always, perhaps 1 in 8 times or something like that.
>
> The issue is that when the build gets to do "${CROSS}objcopy -O binary
> ... vmlinux", vmlinux is not (no longer) a proper ELF file, so naturally
> that fails with
>
>   powerpc-oe-linux-objcopy:vmlinux: file format not recognized
>
> So I hacked link-vmlinux.sh to stash copies of vmlinux before and after
> sortextable vmlinux. Both of those are proper ELF files, and comparing
> the corrupted vmlinux to vmlinux.after_sort they are identical after the
> first 52 bytes; in vmlinux, those first 52 bytes are all 0.
>
> I also saved stat(1) info to see if vmlinux is being replaced or
> modified in-place.
>
> $ cat vmlinux.stat.after_sort
>   File: 'vmlinux'
>   Size: 8608456 Blocks: 16696  IO Block: 4096   regular file
> Device: 811h/2065d  Inode: 21919132Links: 1
> Access: (0755/-rwxr-xr-x)  Uid: ( 1000/user)   Gid: ( 1001/user)
> Access: 2020-01-22 10:52:38.946703081 +
> Modify: 2020-01-22 10:52:38.954703105 +
> Change: 2020-01-22 10:52:38.954703105 +
>
> $ stat vmlinux
>   File: 'vmlinux'
>   Size: 8608456 Blocks: 16688  IO Block: 4096   regular file
> Device: 811h/2065d  Inode: 21919132Links: 1
> Access: (0755/-rwxr-xr-x)  Uid: ( 1000/user)   Gid: ( 1001/user)
> Access: 2020-01-22 17:20:00.650379057 +
> Modify: 2020-01-22 10:52:38.954703105 +
> Change: 2020-01-22 10:52:38.954703105 +
>
> So the inode number and mtime/ctime are exactly the same, but for some
> reason Blocks: has changed? This is on an ext4 filesystem, but I don't
> suspect the filesystem to be broken, because it's always just vmlinux
> that ends up corrupt, and always in exactly this way with the first 52
> bytes having been wiped.
>
> Any ideas?

Not really sorry. Haven't seen or heard of that before.

Are you doing a parallel make? If so does -j 1 fix it?

If it seems like sortextable is at fault then strace'ing it would be my
next step.

cheers


Re: [PATCH v2 1/6] fs/readdir: Fix filldir() and filldir64() use of user_access_begin()

2020-01-24 Thread Michael Ellerman
Linus Torvalds  writes:
> On Thu, Jan 23, 2020 at 4:00 AM Michael Ellerman  wrote:
>>
>> So I guess I'll wait and see what happens with patch 1.
>
> I've committed my fixes to filldir[64]() directly - they really were
> fixing me being lazy about the range, and the name length checking
> really is a theoretical "access wrong user space pointer" issue with
> corrupted filesystems regardless (even though I suspect it's entirely
> theoretical - even a corrupt filesystem hopefully won't be passing in
> negative directory entry lengths or something like that).

Great, thanks.

> The "pass in read/write" part I'm not entirely convinced about.
> Honestly, if this is just for ppc32 and nobody else really needs it,
> make the ppc32s thing always just enable both user space reads and
> writes. That's the semantics for x86 and arm as is, I'm not convinced
> that we should complicate this for a legacy platform.

We can use the read/write info on Power9 too. That's a niche platform
but hopefully not legacy status yet :P

But it's entirely optional, as you say we can just enable read/write if
we aren't passed the read/write info from the upper-level API.

I think our priority should be getting objtool going on powerpc to check
our user access regions are well contained. Once we have that working
maybe then we can look at plumbing the direction through
user_access_begin() etc.

cheers


[PATCH 4.14 243/343] perf/ioctl: Add check for the sample_period value

2020-01-24 Thread Greg Kroah-Hartman
From: Ravi Bangoria 

[ Upstream commit 913a90bc5a3a06b1f04c337320e9aeee2328dd77 ]

perf_event_open() limits the sample_period to 63 bits. See:

  0819b2e30ccb ("perf: Limit perf_event_attr::sample_period to 63 bits")

Make ioctl() consistent with it.

Also on PowerPC, negative sample_period could cause a recursive
PMIs leading to a hang (reported when running perf-fuzzer).

Signed-off-by: Ravi Bangoria 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Cc: a...@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: ma...@linux.vnet.ibm.com
Cc: m...@ellerman.id.au
Fixes: 0819b2e30ccb ("perf: Limit perf_event_attr::sample_period to 63 bits")
Link: https://lkml.kernel.org/r/20190604042953.914-1-ravi.bango...@linux.ibm.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Sasha Levin 
---
 kernel/events/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index ea4f3f7a0c6f3..2ac73b4cb8a93 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4762,6 +4762,9 @@ static int perf_event_period(struct perf_event *event, 
u64 __user *arg)
if (perf_event_check_period(event, value))
return -EINVAL;
 
+   if (!event->attr.freq && (value & (1ULL << 63)))
+   return -EINVAL;
+
event_function_call(event, __perf_event_period, );
 
return 0;
-- 
2.20.1





Re: [PATCH 1/2] powerpc/irq: don't use current_stack_pointer() in check_stack_overflow()

2020-01-24 Thread Segher Boessenkool
On Fri, Jan 24, 2020 at 07:03:36AM +, Christophe Leroy wrote:
> >Le 24/01/2020 à 06:46, Michael Ellerman a écrit :
> >>
> >>If I do this it seems to work, but feels a little dicey:
> >>
> >>asm ("" : "=r" (r1));
> >>sp = r1 & (THREAD_SIZE - 1);
> >
> >
> >Or we could do add in asm/reg.h what we have in boot/reg.h:
> >
> >register void *__stack_pointer asm("r1");
> >#define get_sp()    (__stack_pointer)
> >
> >And use get_sp()
> >
> 
> It works, and I guess doing it this way is acceptable as it's exactly 
> what's done for current in asm/current.h with register r2.

That is a *global* register variable.  That works.  We still need to
document a bit better what it does exactly, but this is the expected
use case, so that will work.

> Now I (still) get:
> 
>   sp = get_sp() & (THREAD_SIZE - 1);
>  b9c: 54 24 04 fe clrlwi  r4,r1,19
>   if (unlikely(sp < 2048)) {
>  ba4: 2f 84 07 ff cmpwi   cr7,r4,2047
> 
> Allthough GCC 8.1 what doing exactly the same with the form CLANG don't 
> like:
> 
>   register unsigned long r1 asm("r1");
>   long sp = r1 & (THREAD_SIZE - 1);
>  b84: 54 24 04 fe clrlwi  r4,r1,19
>   if (unlikely(sp < 2048)) {
>  b8c: 2f 84 07 ff cmpwi   cr7,r4,2047

Sure, if it did what you expected, things will usually work out fine ;-)

(Pity that the compiler didn't come up with
rlwinm. r4,r1,0,19,20
bne bad
Or are the low bits of r4 used later again?)


Segher


Re: [PATCH 1/2] powerpc/irq: don't use current_stack_pointer() in check_stack_overflow()

2020-01-24 Thread Segher Boessenkool
Hi!

On Fri, Jan 24, 2020 at 04:46:24PM +1100, Michael Ellerman wrote:
> Christophe Leroy  writes:
> >  static inline void check_stack_overflow(void)
> >  {
> >  #ifdef CONFIG_DEBUG_STACKOVERFLOW
> > -   long sp;
> > -
> > -   sp = current_stack_pointer() & (THREAD_SIZE-1);
> > +   register unsigned long r1 asm("r1");
> > +   long sp = r1 & (THREAD_SIZE - 1);
> 
> This appears to work but seems to be "unsupported" by GCC, and clang
> actually complains about it:
> 
>   /linux/arch/powerpc/kernel/irq.c:603:12: error: variable 'r1' is 
> uninitialized when used here [-Werror,-Wuninitialized]
>   long sp = r1 & (THREAD_SIZE - 1);
> ^~
> 
> The GCC docs say:
> 
>   The only supported use for this feature is to specify registers for
>   input and output operands when calling Extended asm (see Extended
>   Asm).
> 
> https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/Local-Register-Variables.html#Local-Register-Variables

Yes.  Don't use local register variables any other way.  It *will* break.

> If I do this it seems to work, but feels a little dicey:
> 
>   asm ("" : "=r" (r1));
>   sp = r1 & (THREAD_SIZE - 1);

The only thing dicey about that is that you are writing to r1.  Heh.
Well that certainly is bad enough, the compiler does not know how to
handle that at all...  Of course you aren't *actually* changing
anything, so it might just work.


Segher