Re: [PATCH v3 2/7] uaccess: Tell user_access_begin() if it's for a write or not
On Thu, Jan 23, 2020 at 10:03 AM Linus Torvalds wrote: > We used to have a read/write argument to the old "verify_area()" and > "access_ok()" model, and it was a mistake. It was due to odd i386 user > access issues. We got rid of it. I'm not convinced this is any better > - it looks very similar and for odd ppc access issues. If the mode (read or write) were made visible to the trap handler, I'd find that useful for machine check recovery. If I'm in the middle of a copy_from_user() and I get a machine check reading poison from a user address ... then I could try to recover in the same way as for the user accessing the poison (offline the page, SIGBUS the task). But if the poison is in kernel memory and we are doing a copy_to_user(), then we are hosed (or would need some more complex recovery plan). [Note that we only get recoverable machine checks on loads... writes are posted, so if something goes wrong it isn't synchronous with the store instruction that initiated it] -Tony
Re: [PATCH v4 1/6] libnvdimm/namespace: Make namespace size validation arch dependent
On Fri, Jan 24, 2020 at 9:07 AM Aneesh Kumar K.V wrote: > > On 1/24/20 10:15 PM, Dan Williams wrote: > > On Thu, Jan 23, 2020 at 11:34 PM Aneesh Kumar K.V > > wrote: > >> > >> On 1/24/20 11:27 AM, Dan Williams wrote: > >>> On Mon, Jan 20, 2020 at 6:08 AM Aneesh Kumar K.V > >>> > >> > >> > >> > > +unsigned long arch_namespace_map_size(void) > +{ > + return PAGE_SIZE; > +} > +EXPORT_SYMBOL_GPL(arch_namespace_map_size); > + > + > static void __cpa_flush_all(void *arg) > { > unsigned long cache = (unsigned long)arg; > diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h > index 9df091bd30ba..a3476dbd2656 100644 > --- a/include/linux/libnvdimm.h > +++ b/include/linux/libnvdimm.h > @@ -284,4 +284,5 @@ static inline void arch_invalidate_pmem(void *addr, > size_t size) > } > #endif > > +unsigned long arch_namespace_map_size(void); > >>> > >>> This property is more generic than the nvdimm namespace mapping size, > >>> it's more the fundamental remap granularity that the architecture > >>> supports. So I would expect this to be defined in core header files. > >>> Something like: > >>> > >>> diff --git a/include/linux/io.h b/include/linux/io.h > >>> index a59834bc0a11..58b3b2091dbb 100644 > >>> --- a/include/linux/io.h > >>> +++ b/include/linux/io.h > >>> @@ -155,6 +155,13 @@ enum { > >>>void *memremap(resource_size_t offset, size_t size, unsigned long > >>> flags); > >>>void memunmap(void *addr); > >>> > >>> +#ifndef memremap_min_align > >>> +static inline unsigned int memremap_min_align(void) > >>> +{ > >>> + return PAGE_SIZE; > >>> +} > >>> +#endif > >>> + > >> > >> > >> Should that be memremap_pages_min_align()? > > > > No, and on second look it needs to be a common value that results in > > properly aligned / sized namespaces across architectures. > > > > What would it take for Power to make it's minimum mapping granularity > > SUBSECTION_SIZE? The minute that the minimum alignment changes across > > architectures we lose compatibility. > > > > The namespaces need to be sized such that the mode can be changed freely. > > > > Linux on ppc64 with hash translation use just one page size for direct > mapping and that is 16MB. Ok, I think this means that the dream of SUBSECTION_SIZE being the minimum compat alignment is dead, or at least a dream deferred. Let's do this, change the name of this function to: memremap_compat_align() ...and define it to be the max() of all the alignment constraints that the arch may require through either memremap(), or memremap_pages(). Then, teach ndctl to make its default alignment compatible by default, 16MiB, with an override to allow namespace creation with the current architecture's memremap_compat_align(), exported via sysfs, if it happens to be less then 16MiB. Finally, cross our fingers and hope that Power remains the only arch that violates the SUBSECTION_SIZE minimum value for memremap_compat_align().
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.5-6 tag
The pull request you sent on Fri, 24 Jan 2020 23:13:30 +1100: > https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git > tags/powerpc-5.5-6 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/3c45d7510cf563be2ebc04f6864b70bc69f68cb9 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [PATCH v4 1/6] libnvdimm/namespace: Make namespace size validation arch dependent
On 1/24/20 10:15 PM, Dan Williams wrote: On Thu, Jan 23, 2020 at 11:34 PM Aneesh Kumar K.V wrote: On 1/24/20 11:27 AM, Dan Williams wrote: On Mon, Jan 20, 2020 at 6:08 AM Aneesh Kumar K.V +unsigned long arch_namespace_map_size(void) +{ + return PAGE_SIZE; +} +EXPORT_SYMBOL_GPL(arch_namespace_map_size); + + static void __cpa_flush_all(void *arg) { unsigned long cache = (unsigned long)arg; diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h index 9df091bd30ba..a3476dbd2656 100644 --- a/include/linux/libnvdimm.h +++ b/include/linux/libnvdimm.h @@ -284,4 +284,5 @@ static inline void arch_invalidate_pmem(void *addr, size_t size) } #endif +unsigned long arch_namespace_map_size(void); This property is more generic than the nvdimm namespace mapping size, it's more the fundamental remap granularity that the architecture supports. So I would expect this to be defined in core header files. Something like: diff --git a/include/linux/io.h b/include/linux/io.h index a59834bc0a11..58b3b2091dbb 100644 --- a/include/linux/io.h +++ b/include/linux/io.h @@ -155,6 +155,13 @@ enum { void *memremap(resource_size_t offset, size_t size, unsigned long flags); void memunmap(void *addr); +#ifndef memremap_min_align +static inline unsigned int memremap_min_align(void) +{ + return PAGE_SIZE; +} +#endif + Should that be memremap_pages_min_align()? No, and on second look it needs to be a common value that results in properly aligned / sized namespaces across architectures. What would it take for Power to make it's minimum mapping granularity SUBSECTION_SIZE? The minute that the minimum alignment changes across architectures we lose compatibility. The namespaces need to be sized such that the mode can be changed freely. Linux on ppc64 with hash translation use just one page size for direct mapping and that is 16MB. -aneesh
[PATCH v2] powerpc/32: Warn and return ENOSYS on syscalls from kernel
Since commit b86fb88855ea ("powerpc/32: implement fast entry for syscalls on non BOOKE") and commit 1a4b739bbb4f ("powerpc/32: implement fast entry for syscalls on BOOKE"), syscalls from kernel are unexpected and can have catastrophic consequences as it will destroy the kernel stack. Test MSR_PR on syscall entry. In case syscall is from kernel, emit a warning and return ENOSYS error. Signed-off-by: Christophe Leroy --- v2: Rebased on powerpc/next-test, ie on top of VMAP_STACK series --- arch/powerpc/kernel/entry_32.S | 26 ++ arch/powerpc/kernel/head_32.h| 18 +++--- arch/powerpc/kernel/head_booke.h | 5 - 3 files changed, 41 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index 3795654d15d1..73b80143ffac 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -574,6 +574,32 @@ syscall_exit_work: bl do_syscall_trace_leave b ret_from_except_full + /* +* System call was called from kernel. We get here with SRR1 in r9. +* Mark the exception as recoverable once we have retrieved SRR0, +* trap a warning and return ENOSYS with CR[SO] set. +*/ + .globl ret_from_kernel_syscall +ret_from_kernel_syscall: + mfspr r11, SPRN_SRR0 +#if !defined(CONFIG_4xx) && !defined(CONFIG_BOOKE) + LOAD_REG_IMMEDIATE(r12, MSR_KERNEL & ~(MSR_IR|MSR_DR)) + MTMSRD(r12) +#endif + +0: trap + EMIT_BUG_ENTRY 0b,__FILE__,__LINE__, BUGFLAG_WARNING + + li r3, ENOSYS + crset so +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PERF_EVENTS) + mtspr SPRN_NRI, r0 +#endif + mtspr SPRN_SRR1, r9 + mtspr SPRN_SRR0, r11 + SYNC + RFI + /* * The fork/clone functions need to copy the full register set into * the child process. Therefore we need to save all the nonvolatile diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h index a6a5fbbf8504..5f3cfc9ef1b6 100644 --- a/arch/powerpc/kernel/head_32.h +++ b/arch/powerpc/kernel/head_32.h @@ -112,13 +112,17 @@ .macro SYSCALL_ENTRY trapno mfspr r12,SPRN_SPRG_THREAD #ifdef CONFIG_VMAP_STACK - mfspr r9, SPRN_SRR0 - mfspr r11, SPRN_SRR1 - stw r9, SRR0(r12) - stw r11, SRR1(r12) + mfspr r11, SPRN_SRR0 + mfspr r9, SPRN_SRR1 + stw r11, SRR0(r12) + stw r9, SRR1(r12) +#else + mfspr r9, SPRN_SRR1 #endif mfcrr10 + andi. r11, r9, MSR_PR lwz r11,TASK_STACK-THREAD(r12) + beq-99f rlwinm r10,r10,0,4,2 /* Clear SO bit in CR */ addir11, r11, THREAD_SIZE - INT_FRAME_SIZE #ifdef CONFIG_VMAP_STACK @@ -128,15 +132,14 @@ #endif tovirt_vmstack r12, r12 tophys_novmstack r11, r11 - mflrr9 stw r10,_CCR(r11) /* save registers */ - stw r9, _LINK(r11) + mflrr10 + stw r10, _LINK(r11) #ifdef CONFIG_VMAP_STACK lwz r10, SRR0(r12) lwz r9, SRR1(r12) #else mfspr r10,SPRN_SRR0 - mfspr r9,SPRN_SRR1 #endif stw r1,GPR1(r11) stw r1,0(r11) @@ -209,6 +212,7 @@ mtspr SPRN_SRR0,r11 SYNC RFI /* jump to handler, enable MMU */ +99:b ret_from_kernel_syscall .endm .macro save_dar_dsisr_on_stack reg1, reg2, sp diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h index 37fc84ed90e3..bd2e5ed8dd50 100644 --- a/arch/powerpc/kernel/head_booke.h +++ b/arch/powerpc/kernel/head_booke.h @@ -104,16 +104,18 @@ FTR_SECTION_ELSE #ifdef CONFIG_KVM_BOOKE_HV ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV) #endif + mfspr r9, SPRN_SRR1 BOOKE_CLEAR_BTB(r11) + andi. r11, r9, MSR_PR lwz r11, TASK_STACK - THREAD(r10) rlwinm r12,r12,0,4,2 /* Clear SO bit in CR */ + beq-99f ALLOC_STACK_FRAME(r11, THREAD_SIZE - INT_FRAME_SIZE) stw r12, _CCR(r11) /* save various registers */ mflrr12 stw r12,_LINK(r11) mfspr r12,SPRN_SRR0 stw r1, GPR1(r11) - mfspr r9,SPRN_SRR1 stw r1, 0(r11) mr r1, r11 stw r12,_NIP(r11) @@ -176,6 +178,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV) mtspr SPRN_SRR0,r11 SYNC RFI /* jump to handler, enable MMU */ +99:b ret_from_kernel_syscall .endm /* To handle the additional exception priority levels on 40x and Book-E -- 2.25.0
[PATCH] powerpc/32: Add missing context synchronisation with CONFIG_VMAP_STACK
After reactivation of data translation by modifying MSR[DR], a isync is required to ensure the translation is effective. Signed-off-by: Christophe Leroy --- Rebased on powerpc/merge-test @mpe: If not too late: - change to head_32.h should be squashed into "powerpc/32: prepare for CONFIG_VMAP_STACK" - change to head_32.S should be squashed into "powerpc/32s: Enable CONFIG_VMAP_STACK" Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_32.S | 1 + arch/powerpc/kernel/head_32.h | 2 ++ 2 files changed, 3 insertions(+) diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S index cb7864091641..0493fcac6409 100644 --- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -277,6 +277,7 @@ MachineCheck: #ifdef CONFIG_VMAP_STACK li r11, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */ mtmsr r11 + isync #endif #ifdef CONFIG_PPC_CHRP mfspr r11, SPRN_SPRG_THREAD diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h index 73a035b40dbf..a6a5fbbf8504 100644 --- a/arch/powerpc/kernel/head_32.h +++ b/arch/powerpc/kernel/head_32.h @@ -43,6 +43,7 @@ .ifeq \for_rtas li r11, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */ mtmsr r11 + isync .endif subir11, r1, INT_FRAME_SIZE /* use r1 if kernel */ #else @@ -123,6 +124,7 @@ #ifdef CONFIG_VMAP_STACK li r9, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */ mtmsr r9 + isync #endif tovirt_vmstack r12, r12 tophys_novmstack r11, r11 -- 2.25.0
Re: vmlinux ELF header sometimes corrupt
On Jan 22 2020, Rasmus Villemoes wrote: > So the inode number and mtime/ctime are exactly the same, but for some > reason Blocks: has changed? This is on an ext4 filesystem, but I don't > suspect the filesystem to be broken, because it's always just vmlinux > that ends up corrupt, and always in exactly this way with the first 52 > bytes having been wiped. Note that the size of the ELF header (Elf32_Ehdr) is 52 bytes. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
[GIT PULL] Please pull powerpc/linux.git powerpc-5.5-6 tag
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi Linus, Please pull some more powerpc fixes for 5.5: The following changes since commit 6da3eced8c5f3b03340b0c395bacd552c4d52411: powerpc/spinlocks: Include correct header for static key (2019-12-30 21:20:41 +1100) are available in the git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-5.5-6 for you to fetch changes up to 5d2e5dd5849b4ef5e8ec35e812cdb732c13cd27e: powerpc/mm/hash: Fix sharing context ids between kernel & userspace (2020-01-23 21:26:20 +1100) - -- powerpc fixes for 5.5 #6 Fix our hash MMU code to avoid having overlapping ids between user and kernel, which isn't as bad as it sounds but led to crashes on some machines. A fix for the Power9 XIVE interrupt code, which could return the wrong interrupt state in obscure error conditions. A minor Kconfig fix for the recently added CONFIG_PPC_UV code. Thanks to: Aneesh Kumar K.V, Bharata B Rao, Cédric Le Goater, Frederic Barrat. - -- Aneesh Kumar K.V (1): powerpc/mm/hash: Fix sharing context ids between kernel & userspace Bharata B Rao (1): powerpc: Ultravisor: Fix the dependencies for CONFIG_PPC_UV Frederic Barrat (1): powerpc/xive: Discard ESB load value when interrupt is invalid arch/powerpc/Kconfig | 6 +- arch/powerpc/include/asm/book3s/64/mmu-hash.h | 5 - arch/powerpc/include/asm/xive-regs.h | 1 + arch/powerpc/sysdev/xive/common.c | 15 --- 4 files changed, 18 insertions(+), 9 deletions(-) -BEGIN PGP SIGNATURE- iQIzBAEBCAAdFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl4q30AACgkQUevqPMjh pYCwSQ//Xbc8BofjbbiGambesHUG3Bkho8K38HSXfj9wdFdDS1f5PnUnTVNtNsch GAXYEnreN33I5MZnOQO3Zaa6Ub+DPyDIn5dq1alo2uv/sJY4zyC/lwCeLNOAbC93 lj1s7laC7SQG6phf7hVT0xFMYA/j0GhQG8w4RIxiyJEXnm+Vb9SGUgbrArxmYRwF dI7wHxEHdixMBgdA1q00inv51UIqmNJS/nPyBJUxBbKp1Kkzy0fOTLhgFCj4um6g kX1AEzEkiNNpWDG30Hu6qrapa5181tet07ABgSxdyTB0pElbsigoFR5mRgeTWuAk mP14couPhOx3NkW90yvuI9AvLAQIlvMH4rGbB61rqNgiLnfTGiYxy1Hvq2ihbXQy TVdlLdjW2pa3vqz04vZa09NOjz9CHY1/llunpkJvUhd/ddXa+Cieu9fzDDC0fheF ftZD4o9LtloGbWpa+re81uuQ/V6MhEOPbP44PJI70bVG+OWY0GPTuvqgKS/yckTv Xxs5lTsU9qKDGsNOFzqwjTCLd3o4hPurfo6xrzyyMin2LRm4t8sTWLRm8SVoL+Wh KRydvI8oE9qXNNCwwDnMYtiDmisZPWpWWRyPG+e34TXjc0SSEoeZUoVV+B/u5pVu BIbzhRbbKY80E/m1virbaB+7Z92omfMnenIUdt3ZDTarh/INMdA= =coVk -END PGP SIGNATURE-
[PATCH v4 7/7] powerpc: Implement user_access_save() and user_access_restore()
Implement user_access_save() and user_access_restore() On 8xx and radix: - On save, get the value of the associated special register then prevent user access. - On restore, set back the saved value to the associated special register. On book3s/32: - On save, get the value stored in current->thread.kuap and prevent user access. - On restore, regenerate address range from the stored value and reopen read/write access for that range. Signed-off-by: Christophe Leroy --- v4: new --- arch/powerpc/include/asm/book3s/32/kup.h | 23 +++ .../powerpc/include/asm/book3s/64/kup-radix.h | 22 ++ arch/powerpc/include/asm/kup.h| 2 ++ arch/powerpc/include/asm/nohash/32/kup-8xx.h | 14 +++ arch/powerpc/include/asm/uaccess.h| 5 ++-- 5 files changed, 63 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/32/kup.h b/arch/powerpc/include/asm/book3s/32/kup.h index 17e069291c72..3c0ba22dc360 100644 --- a/arch/powerpc/include/asm/book3s/32/kup.h +++ b/arch/powerpc/include/asm/book3s/32/kup.h @@ -153,6 +153,29 @@ static __always_inline void prevent_user_access(void __user *to, const void __us kuap_update_sr(mfsrin(addr) | SR_KS, addr, end);/* set Ks */ } +static inline unsigned long prevent_user_access_return(void) +{ + unsigned long flags = current->thread.kuap; + unsigned long addr = flags & 0xf000; + unsigned long end = flags << 28; + void __user *to = (__force void __user *)addr; + + if (flags) + prevent_user_access(to, to, end - addr, KUAP_READ_WRITE); + + return flags; +} + +static inline void restore_user_access(unsigned long flags) +{ + unsigned long addr = flags & 0xf000; + unsigned long end = flags << 28; + void __user *to = (__force void __user *)addr; + + if (flags) + allow_user_access(to, to, end - addr, KUAP_READ_WRITE); +} + static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) { diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h b/arch/powerpc/include/asm/book3s/64/kup-radix.h index a0263e94df33..90dd3a3fc8c7 100644 --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h @@ -63,6 +63,14 @@ * because that would require an expensive read/modify write of the AMR. */ +static inline unsigned long get_kuap(void) +{ + if (!early_mmu_has_feature(MMU_FTR_RADIX_KUAP)) + return 0; + + return mfspr(SPRN_AMR); +} + static inline void set_kuap(unsigned long value) { if (!early_mmu_has_feature(MMU_FTR_RADIX_KUAP)) @@ -98,6 +106,20 @@ static inline void prevent_user_access(void __user *to, const void __user *from, set_kuap(AMR_KUAP_BLOCKED); } +static inline unsigned long prevent_user_access_return(void) +{ + unsigned long flags = get_kuap(); + + set_kuap(AMR_KUAP_BLOCKED); + + return flags; +} + +static inline void restore_user_access(unsigned long flags) +{ + set_kuap(flags); +} + static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) { diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h index c3ce7e8ae9ea..92bcd1a26d73 100644 --- a/arch/powerpc/include/asm/kup.h +++ b/arch/powerpc/include/asm/kup.h @@ -55,6 +55,8 @@ static inline void allow_user_access(void __user *to, const void __user *from, unsigned long size, unsigned long dir) { } static inline void prevent_user_access(void __user *to, const void __user *from, unsigned long size, unsigned long dir) { } +static inline unsigned long prevent_user_access_return(void) { return 0UL; } +static inline void restore_user_access(unsigned long flags) { } static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) { diff --git a/arch/powerpc/include/asm/nohash/32/kup-8xx.h b/arch/powerpc/include/asm/nohash/32/kup-8xx.h index 1d70c80366fd..85ed2390fb99 100644 --- a/arch/powerpc/include/asm/nohash/32/kup-8xx.h +++ b/arch/powerpc/include/asm/nohash/32/kup-8xx.h @@ -46,6 +46,20 @@ static inline void prevent_user_access(void __user *to, const void __user *from, mtspr(SPRN_MD_AP, MD_APG_KUAP); } +static inline unsigned long prevent_user_access_return(void) +{ + unsigned long flags = mfspr(SPRN_MD_AP); + + mtspr(SPRN_MD_AP, MD_APG_KUAP); + + return flags; +} + +static inline void restore_user_access(unsigned long flags) +{ + mtspr(SPRN_MD_AP, flags); +} + static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) { diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index af905d7fc1df..2f500debae21 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -465,9 +465,8 @@
[PATCH v4 6/7] powerpc: Implement user_access_begin and friends
Today, when a function like strncpy_from_user() is called, the userspace access protection is de-activated and re-activated for every word read. By implementing user_access_begin and friends, the protection is de-activated at the beginning of the copy and re-activated at the end. Implement user_access_begin(), user_access_end() and unsafe_get_user(), unsafe_put_user() and unsafe_copy_to_user() For the time being, we keep user_access_save() and user_access_restore() as nops. Signed-off-by: Christophe Leroy --- v2: no change v3: adapted to the new format of user_access_begin/end() v4: back to original format of user_access_begin/end() ; No duplication of raw_copy_to_user() --- arch/powerpc/include/asm/uaccess.h | 85 +++--- 1 file changed, 66 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index cafad1960e76..af905d7fc1df 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -91,9 +91,14 @@ static inline int __access_ok(unsigned long addr, unsigned long size, __put_user_check((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr))) #define __get_user(x, ptr) \ - __get_user_nocheck((x), (ptr), sizeof(*(ptr))) + __get_user_nocheck((x), (ptr), sizeof(*(ptr)), true) #define __put_user(x, ptr) \ - __put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr))) + __put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)), true) + +#define __get_user_allowed(x, ptr) \ + __get_user_nocheck((x), (ptr), sizeof(*(ptr)), false) +#define __put_user_allowed(x, ptr) \ + __put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)), false) #define __get_user_inatomic(x, ptr) \ __get_user_nosleep((x), (ptr), sizeof(*(ptr))) @@ -138,10 +143,9 @@ extern long __put_user_bad(void); : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err)) #endif /* __powerpc64__ */ -#define __put_user_size(x, ptr, size, retval) \ +#define __put_user_size_allowed(x, ptr, size, retval) \ do { \ retval = 0; \ - allow_write_to_user(ptr, size); \ switch (size) { \ case 1: __put_user_asm(x, ptr, retval, "stb"); break; \ case 2: __put_user_asm(x, ptr, retval, "sth"); break; \ @@ -149,17 +153,26 @@ do { \ case 8: __put_user_asm2(x, ptr, retval); break; \ default: __put_user_bad();\ } \ +} while (0) + +#define __put_user_size(x, ptr, size, retval) \ +do { \ + allow_write_to_user(ptr, size); \ + __put_user_size_allowed(x, ptr, size, retval); \ prevent_write_to_user(ptr, size); \ } while (0) -#define __put_user_nocheck(x, ptr, size) \ +#define __put_user_nocheck(x, ptr, size, do_allow) \ ({ \ long __pu_err; \ __typeof__(*(ptr)) __user *__pu_addr = (ptr); \ if (!is_kernel_addr((unsigned long)__pu_addr)) \ might_fault(); \ __chk_user_ptr(ptr);\ - __put_user_size((x), __pu_addr, (size), __pu_err); \ + if (do_allow) \ + __put_user_size((x), __pu_addr, (size), __pu_err); \ + else \ + __put_user_size_allowed((x), __pu_addr, (size), __pu_err); \ __pu_err; \ }) @@ -236,13 +249,12 @@ extern long __get_user_bad(void); : "b" (addr), "i" (-EFAULT), "0" (err)) #endif /* __powerpc64__ */ -#define __get_user_size(x, ptr, size, retval) \ +#define __get_user_size_allowed(x, ptr, size, retval) \ do { \ retval = 0; \ __chk_user_ptr(ptr);\ if (size > sizeof(x)) \ (x) = __get_user_bad(); \ - allow_read_from_user(ptr, size);\ switch (size) { \ case 1: __get_user_asm(x, ptr, retval, "lbz"); break; \ case 2:
[PATCH v4 5/7] powerpc/32s: prepare prevent_user_access() for user_access_end()
In preparation of implementing user_access_begin and friends on powerpc, the book3s/32 version of prevent_user_access() need to be prepared for user_access_end(). user_access_end() doesn't provide the address and size which were passed to user_access_begin(), required by prevent_user_access() to know which segment to modify. The list of segments which where unprotected by allow_user_access() are available in current->kuap. But we don't want prevent_user_access() to read this all the time, especially everytime it is 0 (for instance because the access was not a write access). Implement a special direction named KUAP_CURRENT. In this case only, the addr and end are retrieved from current->kuap. Signed-off-by: Christophe Leroy --- v2: No change v3: This patch was not in v3 v4: Added build protection against bad use of KUAP_CURRENT. --- arch/powerpc/include/asm/book3s/32/kup.h | 23 +++ .../powerpc/include/asm/book3s/64/kup-radix.h | 4 +++- arch/powerpc/include/asm/kup.h| 11 + 3 files changed, 32 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/32/kup.h b/arch/powerpc/include/asm/book3s/32/kup.h index de29fb752cca..17e069291c72 100644 --- a/arch/powerpc/include/asm/book3s/32/kup.h +++ b/arch/powerpc/include/asm/book3s/32/kup.h @@ -108,6 +108,8 @@ static __always_inline void allow_user_access(void __user *to, const void __user u32 addr, end; BUILD_BUG_ON(!__builtin_constant_p(dir)); + BUILD_BUG_ON(dir == KUAP_CURRENT); + if (!(dir & KUAP_WRITE)) return; @@ -117,6 +119,7 @@ static __always_inline void allow_user_access(void __user *to, const void __user return; end = min(addr + size, TASK_SIZE); + current->thread.kuap = (addr & 0xf000) | end - 1) >> 28) + 1) & 0xf); kuap_update_sr(mfsrin(addr) & ~SR_KS, addr, end); /* Clear Ks */ } @@ -127,15 +130,25 @@ static __always_inline void prevent_user_access(void __user *to, const void __us u32 addr, end; BUILD_BUG_ON(!__builtin_constant_p(dir)); - if (!(dir & KUAP_WRITE)) - return; - addr = (__force u32)to; + if (dir == KUAP_CURRENT) { + u32 kuap = current->thread.kuap; - if (unlikely(addr >= TASK_SIZE || !size)) + if (unlikely(!kuap)) + return; + + addr = kuap & 0xf000; + end = kuap << 28; + } else if (dir & KUAP_WRITE) { + addr = (__force u32)to; + end = min(addr + size, TASK_SIZE); + + if (unlikely(addr >= TASK_SIZE || !size)) + return; + } else { return; + } - end = min(addr + size, TASK_SIZE); current->thread.kuap = 0; kuap_update_sr(mfsrin(addr) | SR_KS, addr, end);/* set Ks */ } diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h b/arch/powerpc/include/asm/book3s/64/kup-radix.h index c8d1076e0ebb..a0263e94df33 100644 --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h @@ -86,8 +86,10 @@ static __always_inline void allow_user_access(void __user *to, const void __user set_kuap(AMR_KUAP_BLOCK_WRITE); else if (dir == KUAP_WRITE) set_kuap(AMR_KUAP_BLOCK_READ); - else + else if (dir == KUAP_READ_WRITE) set_kuap(0); + else + BUILD_BUG(); } static inline void prevent_user_access(void __user *to, const void __user *from, diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h index 94f24928916a..c3ce7e8ae9ea 100644 --- a/arch/powerpc/include/asm/kup.h +++ b/arch/powerpc/include/asm/kup.h @@ -5,6 +5,12 @@ #define KUAP_READ 1 #define KUAP_WRITE 2 #define KUAP_READ_WRITE(KUAP_READ | KUAP_WRITE) +/* + * For prevent_user_access() only. + * Use the current saved situation instead of the to/from/size params. + * Used on book3s/32 + */ +#define KUAP_CURRENT 4 #ifdef CONFIG_PPC64 #include @@ -88,6 +94,11 @@ static inline void prevent_read_write_user(void __user *to, const void __user *f prevent_user_access(to, from, size, KUAP_READ_WRITE); } +static inline void prevent_current_access_user(void) +{ + prevent_user_access(NULL, NULL, ~0UL, KUAP_CURRENT); +} + #endif /* !__ASSEMBLY__ */ #endif /* _ASM_POWERPC_KUAP_H_ */ -- 2.25.0
[PATCH v4 4/7] powerpc/32s: Drop NULL addr verification
NULL addr is a user address. Don't waste time checking it. If someone tries to access it, it will SIGFAULT the same way as for address 1, so no need to make it special. The special case is when not doing a write, in that case we want to drop the entire function. This is now handled by 'dir' param and not by the nulity of 'to' anymore. Also make beginning of prevent_user_access() similar to beginning of allow_user_access(), and tell the compiler that writing in kernel space or with a 0 length is unlikely Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/d79cb9f680f4e971e05262303103a4b94baa91ce.1579715466.git.christophe.le...@c-s.fr --- v4: taken from powerpc/merge-test --- arch/powerpc/include/asm/book3s/32/kup.h | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/32/kup.h b/arch/powerpc/include/asm/book3s/32/kup.h index 91c8f1d9bcee..de29fb752cca 100644 --- a/arch/powerpc/include/asm/book3s/32/kup.h +++ b/arch/powerpc/include/asm/book3s/32/kup.h @@ -113,7 +113,7 @@ static __always_inline void allow_user_access(void __user *to, const void __user addr = (__force u32)to; - if (!addr || addr >= TASK_SIZE || !size) + if (unlikely(addr >= TASK_SIZE || !size)) return; end = min(addr + size, TASK_SIZE); @@ -124,16 +124,18 @@ static __always_inline void allow_user_access(void __user *to, const void __user static __always_inline void prevent_user_access(void __user *to, const void __user *from, u32 size, unsigned long dir) { - u32 addr = (__force u32)to; - u32 end = min(addr + size, TASK_SIZE); + u32 addr, end; BUILD_BUG_ON(!__builtin_constant_p(dir)); if (!(dir & KUAP_WRITE)) return; - if (!addr || addr >= TASK_SIZE || !size) + addr = (__force u32)to; + + if (unlikely(addr >= TASK_SIZE || !size)) return; + end = min(addr + size, TASK_SIZE); current->thread.kuap = 0; kuap_update_sr(mfsrin(addr) | SR_KS, addr, end);/* set Ks */ } -- 2.25.0
[PATCH v4 3/7] powerpc/kuap: Fix set direction in allow/prevent_user_access()
__builtin_constant_p() always return 0 for pointers, so on RADIX we always end up opening both direction (by writing 0 in SPR29): 0170 <._copy_to_user>: ... 1b0: 4c 00 01 2c isync 1b4: 39 20 00 00 li r9,0 1b8: 7d 3d 03 a6 mtspr 29,r9 1bc: 4c 00 01 2c isync 1c0: 48 00 00 01 bl 1c0 <._copy_to_user+0x50> 1c0: R_PPC64_REL24 .__copy_tofrom_user ... 0220 <._copy_from_user>: ... 2ac: 4c 00 01 2c isync 2b0: 39 20 00 00 li r9,0 2b4: 7d 3d 03 a6 mtspr 29,r9 2b8: 4c 00 01 2c isync 2bc: 7f c5 f3 78 mr r5,r30 2c0: 7f 83 e3 78 mr r3,r28 2c4: 48 00 00 01 bl 2c4 <._copy_from_user+0xa4> 2c4: R_PPC64_REL24 .__copy_tofrom_user ... Use an explicit parameter for direction selection, so that GCC is able to see it is a constant: 01b0 <._copy_to_user>: ... 1f0: 4c 00 01 2c isync 1f4: 3d 20 40 00 lis r9,16384 1f8: 79 29 07 c6 rldicr r9,r9,32,31 1fc: 7d 3d 03 a6 mtspr 29,r9 200: 4c 00 01 2c isync 204: 48 00 00 01 bl 204 <._copy_to_user+0x54> 204: R_PPC64_REL24 .__copy_tofrom_user ... 0260 <._copy_from_user>: ... 2ec: 4c 00 01 2c isync 2f0: 39 20 ff ff li r9,-1 2f4: 79 29 00 04 rldicr r9,r9,0,0 2f8: 7d 3d 03 a6 mtspr 29,r9 2fc: 4c 00 01 2c isync 300: 7f c5 f3 78 mr r5,r30 304: 7f 83 e3 78 mr r3,r28 308: 48 00 00 01 bl 308 <._copy_from_user+0xa8> 308: R_PPC64_REL24 .__copy_tofrom_user ... Signed-off-by: Christophe Leroy [mpe: Spell out the directions, s/KUAP_R/KUAP_READ/ etc.] Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/56743b700c56e5d341468151f0e95ff0c46663cd.1579715466.git.christophe.le...@c-s.fr --- v4: taken from powerpc/next-test --- arch/powerpc/include/asm/book3s/32/kup.h | 13 ++-- .../powerpc/include/asm/book3s/64/kup-radix.h | 11 +++ arch/powerpc/include/asm/kup.h| 30 ++- arch/powerpc/include/asm/nohash/32/kup-8xx.h | 4 +-- arch/powerpc/include/asm/uaccess.h| 4 +-- 5 files changed, 43 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/32/kup.h b/arch/powerpc/include/asm/book3s/32/kup.h index d88008c8eb85..91c8f1d9bcee 100644 --- a/arch/powerpc/include/asm/book3s/32/kup.h +++ b/arch/powerpc/include/asm/book3s/32/kup.h @@ -102,11 +102,13 @@ static inline void kuap_update_sr(u32 sr, u32 addr, u32 end) isync();/* Context sync required after mtsrin() */ } -static inline void allow_user_access(void __user *to, const void __user *from, u32 size) +static __always_inline void allow_user_access(void __user *to, const void __user *from, + u32 size, unsigned long dir) { u32 addr, end; - if (__builtin_constant_p(to) && to == NULL) + BUILD_BUG_ON(!__builtin_constant_p(dir)); + if (!(dir & KUAP_WRITE)) return; addr = (__force u32)to; @@ -119,11 +121,16 @@ static inline void allow_user_access(void __user *to, const void __user *from, u kuap_update_sr(mfsrin(addr) & ~SR_KS, addr, end); /* Clear Ks */ } -static inline void prevent_user_access(void __user *to, const void __user *from, u32 size) +static __always_inline void prevent_user_access(void __user *to, const void __user *from, + u32 size, unsigned long dir) { u32 addr = (__force u32)to; u32 end = min(addr + size, TASK_SIZE); + BUILD_BUG_ON(!__builtin_constant_p(dir)); + if (!(dir & KUAP_WRITE)) + return; + if (!addr || addr >= TASK_SIZE || !size) return; diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h b/arch/powerpc/include/asm/book3s/64/kup-radix.h index dbbd22cb80f5..c8d1076e0ebb 100644 --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h @@ -77,20 +77,21 @@ static inline void set_kuap(unsigned long value) isync(); } -static inline void allow_user_access(void __user *to, const void __user *from, -unsigned long size) +static __always_inline void allow_user_access(void __user *to, const void __user *from, + unsigned long size, unsigned long dir) { // This is written so we can resolve to a single case at build time - if (__builtin_constant_p(to) && to == NULL) + BUILD_BUG_ON(!__builtin_constant_p(dir)); + if (dir == KUAP_READ) set_kuap(AMR_KUAP_BLOCK_WRITE); - else if (__builtin_constant_p(from) && from == NULL) + else if (dir == KUAP_WRITE)
[PATCH v4 2/7] powerpc/32s: Fix bad_kuap_fault()
At the moment, bad_kuap_fault() reports a fault only if a bad access to userspace occurred while access to userspace was not granted. But if a fault occurs for a write outside the allowed userspace segment(s) that have been unlocked, bad_kuap_fault() fails to detect it and the kernel loops forever in do_page_fault(). Fix it by checking that the accessed address is within the allowed range. Fixes: a68c31fc01ef ("powerpc/32s: Implement Kernel Userspace Access Protection") Cc: sta...@vger.kernel.org # v5.2+ Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/1e07c7de4ffdd9cda35d1ffe8258af75579d3e91.1579715466.git.christophe.le...@c-s.fr --- v4: taken from powerpc/next-test --- arch/powerpc/include/asm/book3s/32/kup.h | 9 +++-- arch/powerpc/include/asm/book3s/64/kup-radix.h | 3 ++- arch/powerpc/include/asm/kup.h | 6 +- arch/powerpc/include/asm/nohash/32/kup-8xx.h | 3 ++- arch/powerpc/mm/fault.c| 2 +- 5 files changed, 17 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/32/kup.h b/arch/powerpc/include/asm/book3s/32/kup.h index f9dc597b0b86..d88008c8eb85 100644 --- a/arch/powerpc/include/asm/book3s/32/kup.h +++ b/arch/powerpc/include/asm/book3s/32/kup.h @@ -131,12 +131,17 @@ static inline void prevent_user_access(void __user *to, const void __user *from, kuap_update_sr(mfsrin(addr) | SR_KS, addr, end);/* set Ks */ } -static inline bool bad_kuap_fault(struct pt_regs *regs, bool is_write) +static inline bool +bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) { + unsigned long begin = regs->kuap & 0xf000; + unsigned long end = regs->kuap << 28; + if (!is_write) return false; - return WARN(!regs->kuap, "Bug: write fault blocked by segment registers !"); + return WARN(address < begin || address >= end, + "Bug: write fault blocked by segment registers !"); } #endif /* CONFIG_PPC_KUAP */ diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h b/arch/powerpc/include/asm/book3s/64/kup-radix.h index f254de956d6a..dbbd22cb80f5 100644 --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h @@ -95,7 +95,8 @@ static inline void prevent_user_access(void __user *to, const void __user *from, set_kuap(AMR_KUAP_BLOCKED); } -static inline bool bad_kuap_fault(struct pt_regs *regs, bool is_write) +static inline bool +bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) { return WARN(mmu_has_feature(MMU_FTR_RADIX_KUAP) && (regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : AMR_KUAP_BLOCK_READ)), diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h index 5b5e39643a27..812e66f31934 100644 --- a/arch/powerpc/include/asm/kup.h +++ b/arch/powerpc/include/asm/kup.h @@ -45,7 +45,11 @@ static inline void allow_user_access(void __user *to, const void __user *from, unsigned long size) { } static inline void prevent_user_access(void __user *to, const void __user *from, unsigned long size) { } -static inline bool bad_kuap_fault(struct pt_regs *regs, bool is_write) { return false; } +static inline bool +bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) +{ + return false; +} #endif /* CONFIG_PPC_KUAP */ static inline void allow_read_from_user(const void __user *from, unsigned long size) diff --git a/arch/powerpc/include/asm/nohash/32/kup-8xx.h b/arch/powerpc/include/asm/nohash/32/kup-8xx.h index 1006a427e99c..f2fea603b929 100644 --- a/arch/powerpc/include/asm/nohash/32/kup-8xx.h +++ b/arch/powerpc/include/asm/nohash/32/kup-8xx.h @@ -46,7 +46,8 @@ static inline void prevent_user_access(void __user *to, const void __user *from, mtspr(SPRN_MD_AP, MD_APG_KUAP); } -static inline bool bad_kuap_fault(struct pt_regs *regs, bool is_write) +static inline bool +bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) { return WARN(!((regs->kuap ^ MD_APG_KUAP) & 0xf000), "Bug: fault blocked by AP register !"); diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index b5047f9b5dec..1baeb045f7f4 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -233,7 +233,7 @@ static bool bad_kernel_fault(struct pt_regs *regs, unsigned long error_code, // Read/write fault in a valid region (the exception table search passed // above), but blocked by KUAP is bad, it can never succeed. - if (bad_kuap_fault(regs, is_write)) + if (bad_kuap_fault(regs, address, is_write)) return true; // What's left? Kernel fault on user in well defined regions (extable -- 2.25.0
[PATCH v4 1/7] readdir: make user_access_begin() use the real access range
From: Linus Torvalds In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()") I changed filldir to not do individual __put_user() accesses, but instead use unsafe_put_user() surrounded by the proper user_access_begin/end() pair. That make them enormously faster on modern x86, where the STAC/CLAC games make individual user accesses fairly heavy-weight. However, the user_access_begin() range was not really the exact right one, since filldir() has the unfortunate problem that it needs to not only fill out the new directory entry, it also needs to fix up the previous one to contain the proper file offset. It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the file offset of the directory entry itself - it's the offset of the next one. So we end up backfilling the offset in the previous entry as we walk along. But since x86 didn't really care about the exact range, and used to be the only architecture that did anything fancy in user_access_begin() to begin with, the filldir[64]() changes did something lazy, and even commented on it: /* * Note! This range-checks 'previous' (which may be NULL). * The real range was checked in getdents */ if (!user_access_begin(dirent, sizeof(*dirent))) goto efault; and it all worked fine. But now 32-bit ppc is starting to also implement user_access_begin(), and the fact that we faked the range to only be the (possibly not even valid) previous directory entry becomes a problem, because ppc32 will actually be using the range that is passed in for more than just "check that it's user space". This is a complete rewrite of Christophe's original patch. By saving off the record length of the previous entry instead of a pointer to it in the filldir data structures, we can simplify the range check and the writing of the previous entry d_off field. No need for any conditionals in the user accesses themselves, although we retain the conditional EINTR checking for the "was this the first directory entry" signal handling latency logic. Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()") Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.le...@c-s.fr/ Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.le...@c-s.fr/ Reported-and-tested-by: Christophe Leroy Signed-off-by: Linus Torvalds Signed-off-by: Christophe Leroy --- v4: taken from Linus' tree --- fs/readdir.c | 73 +--- 1 file changed, 35 insertions(+), 38 deletions(-) diff --git a/fs/readdir.c b/fs/readdir.c index d26d5ea4de7b..d5ee72280c82 100644 --- a/fs/readdir.c +++ b/fs/readdir.c @@ -206,7 +206,7 @@ struct linux_dirent { struct getdents_callback { struct dir_context ctx; struct linux_dirent __user * current_dir; - struct linux_dirent __user * previous; + int prev_reclen; int count; int error; }; @@ -214,12 +214,13 @@ struct getdents_callback { static int filldir(struct dir_context *ctx, const char *name, int namlen, loff_t offset, u64 ino, unsigned int d_type) { - struct linux_dirent __user * dirent; + struct linux_dirent __user *dirent, *prev; struct getdents_callback *buf = container_of(ctx, struct getdents_callback, ctx); unsigned long d_ino; int reclen = ALIGN(offsetof(struct linux_dirent, d_name) + namlen + 2, sizeof(long)); + int prev_reclen; buf->error = verify_dirent_name(name, namlen); if (unlikely(buf->error)) @@ -232,28 +233,24 @@ static int filldir(struct dir_context *ctx, const char *name, int namlen, buf->error = -EOVERFLOW; return -EOVERFLOW; } - dirent = buf->previous; - if (dirent && signal_pending(current)) + prev_reclen = buf->prev_reclen; + if (prev_reclen && signal_pending(current)) return -EINTR; - - /* -* Note! This range-checks 'previous' (which may be NULL). -* The real range was checked in getdents -*/ - if (!user_access_begin(dirent, sizeof(*dirent))) - goto efault; - if (dirent) - unsafe_put_user(offset, >d_off, efault_end); dirent = buf->current_dir; + prev = (void __user *) dirent - prev_reclen; + if (!user_access_begin(prev, reclen + prev_reclen)) + goto efault; + + /* This might be 'dirent->d_off', but if so it will get overwritten */ + unsafe_put_user(offset, >d_off, efault_end); unsafe_put_user(d_ino, >d_ino, efault_end); unsafe_put_user(reclen, >d_reclen, efault_end); unsafe_put_user(d_type, (char __user *) dirent + reclen - 1, efault_end); unsafe_copy_dirent_name(dirent->d_name, name, namlen, efault_end);
Re: [FSL P5020 P5040 PPC] Onboard SD card doesn't work anymore after the 'mmc-v5.4-2' updates
Ulf Hansson writes: > On Thu, 16 Jan 2020 at 12:18, Christian Zigotzky > wrote: >> >> Hi All, >> >> We still need the attached patch for our onboard SD card interface >> [1,2]. Could you please add this patch to the tree? > > No, because according to previous discussion that isn't the correct > solution and more importantly it will break other archs (if I recall > correctly). > > Looks like someone from the ppc community needs to pick up the ball. That's a pretty small community these days :) :/ Christian can you test this please? I think I got the polarity of all the tests right, but it's Friday night so maybe I'm wrong :) cheers >From 975ba6e8b52d6f5358e93c1f5a47adc4a0b5fb70 Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Fri, 24 Jan 2020 22:26:59 +1100 Subject: [PATCH] of: Add OF_DMA_DEFAULT_COHERENT & select it on powerpc There's an OF helper called of_dma_is_coherent(), which checks if a device has a "dma-coherent" property to see if the device is coherent for DMA. But on some platforms devices are coherent by default, and on some platforms it's not possible to update existing device trees to add the "dma-coherent" property. So add a Kconfig symbol to allow arch code to tell of_dma_is_coherent() that devices are coherent by default, regardless of the presence of the property. Select that symbol on powerpc when NOT_COHERENT_CACHE is not set, ie. when the system has a coherent cache. Fixes: 92ea637edea3 ("of: introduce of_dma_is_coherent() helper") Cc: sta...@vger.kernel.org # v3.16+ Signed-off-by: Michael Ellerman --- arch/powerpc/Kconfig | 1 + drivers/of/Kconfig | 4 drivers/of/address.c | 6 +- 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 62752c3bfabf..460678ab2375 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -235,6 +235,7 @@ config PPC select NEED_DMA_MAP_STATE if PPC64 || NOT_COHERENT_CACHE select NEED_SG_DMA_LENGTH select OF + select OF_DMA_DEFAULT_COHERENT if !NOT_COHERENT_CACHE select OF_EARLY_FLATTREE select OLD_SIGACTIONif PPC32 select OLD_SIGSUSPEND diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig index 37c2ccbefecd..d91618641be6 100644 --- a/drivers/of/Kconfig +++ b/drivers/of/Kconfig @@ -103,4 +103,8 @@ config OF_OVERLAY config OF_NUMA bool +config OF_DMA_DEFAULT_COHERENT + # arches should select this if DMA is coherent by default for OF devices + bool + endif # OF diff --git a/drivers/of/address.c b/drivers/of/address.c index 99c1b8058559..e8a39c3ec4d4 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -995,12 +995,16 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *siz * @np:device node * * It returns true if "dma-coherent" property was found - * for this device in DT. + * for this device in the DT, or if DMA is coherent by + * default for OF devices on the current platform. */ bool of_dma_is_coherent(struct device_node *np) { struct device_node *node = of_node_get(np); + if (IS_ENABLED(CONFIG_OF_DMA_DEFAULT_COHERENT)) + return true; + while (node) { if (of_property_read_bool(node, "dma-coherent")) { of_node_put(node); -- 2.21.1
Re: [PATCH v2 6/6] powerpc: Implement user_access_begin and friends
Le 23/01/2020 à 13:31, Michael Ellerman a écrit : Michael Ellerman writes: Christophe Leroy writes: Today, when a function like strncpy_from_user() is called, the userspace access protection is de-activated and re-activated for every word read. By implementing user_access_begin and friends, the protection is de-activated at the beginning of the copy and re-activated at the end. Implement user_access_begin(), user_access_end() and unsafe_get_user(), unsafe_put_user() and unsafe_copy_to_user() For the time being, we keep user_access_save() and user_access_restore() as nops. That means we will run with user access enabled in a few more places, but it's only used sparingly AFAICS: kernel/trace/trace_branch.c:unsigned long flags = user_access_save(); lib/ubsan.c:unsigned long flags = user_access_save(); lib/ubsan.c:unsigned long ua_flags = user_access_save(); mm/kasan/common.c: unsigned long flags = user_access_save(); And we don't have objtool checking that user access enablement isn't leaking in the first place, so I guess it's OK for us not to implement these to begin with? It looks like we can implement them on on all three KUAP implementations. For radix and 8xx we just return/set the relevant SPR. For book3s/32/kup.h I think we'd just need to add a KUAP_CURRENT case to allow_user_access()? Can't do that, we don't want to keep the info in current->thread.kuap after user_access_save(), otherwise we might unexpectedly re-open access through an interrupt. And if we use KUAP_CURRENT case of prevent_user_access(), it means we'll read current->thread.kuap twice. So, just regenerate addr and end from the flags, and use allow_user_access() and prevent_user_access() as usual. I'll have it in v4 Christophe
[PATCH 4.19 447/639] perf/ioctl: Add check for the sample_period value
From: Ravi Bangoria [ Upstream commit 913a90bc5a3a06b1f04c337320e9aeee2328dd77 ] perf_event_open() limits the sample_period to 63 bits. See: 0819b2e30ccb ("perf: Limit perf_event_attr::sample_period to 63 bits") Make ioctl() consistent with it. Also on PowerPC, negative sample_period could cause a recursive PMIs leading to a hang (reported when running perf-fuzzer). Signed-off-by: Ravi Bangoria Signed-off-by: Peter Zijlstra (Intel) Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Cc: a...@kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: ma...@linux.vnet.ibm.com Cc: m...@ellerman.id.au Fixes: 0819b2e30ccb ("perf: Limit perf_event_attr::sample_period to 63 bits") Link: https://lkml.kernel.org/r/20190604042953.914-1-ravi.bango...@linux.ibm.com Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin --- kernel/events/core.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/events/core.c b/kernel/events/core.c index 751888cbed5c0..16af86ab24c42 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5012,6 +5012,9 @@ static int perf_event_period(struct perf_event *event, u64 __user *arg) if (perf_event_check_period(event, value)) return -EINVAL; + if (!event->attr.freq && (value & (1ULL << 63))) + return -EINVAL; + event_function_call(event, __perf_event_period, ); return 0; -- 2.20.1
Re: vmlinux ELF header sometimes corrupt
On 24/01/2020 11.50, Michael Ellerman wrote: > Rasmus Villemoes writes: >> I'm building for a ppc32 (mpc8309) target using Yocto, and I'm hitting a >> very hard to debug problem that maybe someone else has encountered. This >> doesn't happen always, perhaps 1 in 8 times or something like that. >> >> The issue is that when the build gets to do "${CROSS}objcopy -O binary >> ... vmlinux", vmlinux is not (no longer) a proper ELF file, so naturally >> that fails with >> >> powerpc-oe-linux-objcopy:vmlinux: file format not recognized >> >> >> Any ideas? > > Not really sorry. Haven't seen or heard of that before. > > Are you doing a parallel make? If so does -j 1 fix it? Hard to say, I'll have to try that a number of times to see if it can be reproduced with that setting. > If it seems like sortextable is at fault then strace'ing it would be my > next step. I don't think sortextable is at fault, that was just my first "I know that at least pokes around in the ELF file". I do "cp vmlinux vmlinux.before_sort" and "cp vmlinux vmlinux.after_sort", and both of those copies are proper ELF files - and the .after_sort is identical to the corrupt vmlinux apart from vmlinux ending up with its ELF header wiped. So it's something that happens during some later build step (Yocto has a lot of steps), perhaps "make modules" or "make modules_install" or something ends up somehow deciding "hey, vmlinux isn't quite uptodate, let's nuke it". I'm not even sure it's a Kbuild problem, but I've seen the same thing happen using another meta-build system called oe-lite, which is why I'm not primarily suspecting the Yocto logic. Rasmus
Re: vmlinux ELF header sometimes corrupt
Rasmus Villemoes writes: > I'm building for a ppc32 (mpc8309) target using Yocto, and I'm hitting a > very hard to debug problem that maybe someone else has encountered. This > doesn't happen always, perhaps 1 in 8 times or something like that. > > The issue is that when the build gets to do "${CROSS}objcopy -O binary > ... vmlinux", vmlinux is not (no longer) a proper ELF file, so naturally > that fails with > > powerpc-oe-linux-objcopy:vmlinux: file format not recognized > > So I hacked link-vmlinux.sh to stash copies of vmlinux before and after > sortextable vmlinux. Both of those are proper ELF files, and comparing > the corrupted vmlinux to vmlinux.after_sort they are identical after the > first 52 bytes; in vmlinux, those first 52 bytes are all 0. > > I also saved stat(1) info to see if vmlinux is being replaced or > modified in-place. > > $ cat vmlinux.stat.after_sort > File: 'vmlinux' > Size: 8608456 Blocks: 16696 IO Block: 4096 regular file > Device: 811h/2065d Inode: 21919132Links: 1 > Access: (0755/-rwxr-xr-x) Uid: ( 1000/user) Gid: ( 1001/user) > Access: 2020-01-22 10:52:38.946703081 + > Modify: 2020-01-22 10:52:38.954703105 + > Change: 2020-01-22 10:52:38.954703105 + > > $ stat vmlinux > File: 'vmlinux' > Size: 8608456 Blocks: 16688 IO Block: 4096 regular file > Device: 811h/2065d Inode: 21919132Links: 1 > Access: (0755/-rwxr-xr-x) Uid: ( 1000/user) Gid: ( 1001/user) > Access: 2020-01-22 17:20:00.650379057 + > Modify: 2020-01-22 10:52:38.954703105 + > Change: 2020-01-22 10:52:38.954703105 + > > So the inode number and mtime/ctime are exactly the same, but for some > reason Blocks: has changed? This is on an ext4 filesystem, but I don't > suspect the filesystem to be broken, because it's always just vmlinux > that ends up corrupt, and always in exactly this way with the first 52 > bytes having been wiped. > > Any ideas? Not really sorry. Haven't seen or heard of that before. Are you doing a parallel make? If so does -j 1 fix it? If it seems like sortextable is at fault then strace'ing it would be my next step. cheers
Re: [PATCH v2 1/6] fs/readdir: Fix filldir() and filldir64() use of user_access_begin()
Linus Torvalds writes: > On Thu, Jan 23, 2020 at 4:00 AM Michael Ellerman wrote: >> >> So I guess I'll wait and see what happens with patch 1. > > I've committed my fixes to filldir[64]() directly - they really were > fixing me being lazy about the range, and the name length checking > really is a theoretical "access wrong user space pointer" issue with > corrupted filesystems regardless (even though I suspect it's entirely > theoretical - even a corrupt filesystem hopefully won't be passing in > negative directory entry lengths or something like that). Great, thanks. > The "pass in read/write" part I'm not entirely convinced about. > Honestly, if this is just for ppc32 and nobody else really needs it, > make the ppc32s thing always just enable both user space reads and > writes. That's the semantics for x86 and arm as is, I'm not convinced > that we should complicate this for a legacy platform. We can use the read/write info on Power9 too. That's a niche platform but hopefully not legacy status yet :P But it's entirely optional, as you say we can just enable read/write if we aren't passed the read/write info from the upper-level API. I think our priority should be getting objtool going on powerpc to check our user access regions are well contained. Once we have that working maybe then we can look at plumbing the direction through user_access_begin() etc. cheers
[PATCH 4.14 243/343] perf/ioctl: Add check for the sample_period value
From: Ravi Bangoria [ Upstream commit 913a90bc5a3a06b1f04c337320e9aeee2328dd77 ] perf_event_open() limits the sample_period to 63 bits. See: 0819b2e30ccb ("perf: Limit perf_event_attr::sample_period to 63 bits") Make ioctl() consistent with it. Also on PowerPC, negative sample_period could cause a recursive PMIs leading to a hang (reported when running perf-fuzzer). Signed-off-by: Ravi Bangoria Signed-off-by: Peter Zijlstra (Intel) Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Cc: a...@kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: ma...@linux.vnet.ibm.com Cc: m...@ellerman.id.au Fixes: 0819b2e30ccb ("perf: Limit perf_event_attr::sample_period to 63 bits") Link: https://lkml.kernel.org/r/20190604042953.914-1-ravi.bango...@linux.ibm.com Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin --- kernel/events/core.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/events/core.c b/kernel/events/core.c index ea4f3f7a0c6f3..2ac73b4cb8a93 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -4762,6 +4762,9 @@ static int perf_event_period(struct perf_event *event, u64 __user *arg) if (perf_event_check_period(event, value)) return -EINVAL; + if (!event->attr.freq && (value & (1ULL << 63))) + return -EINVAL; + event_function_call(event, __perf_event_period, ); return 0; -- 2.20.1
Re: [PATCH 1/2] powerpc/irq: don't use current_stack_pointer() in check_stack_overflow()
On Fri, Jan 24, 2020 at 07:03:36AM +, Christophe Leroy wrote: > >Le 24/01/2020 à 06:46, Michael Ellerman a écrit : > >> > >>If I do this it seems to work, but feels a little dicey: > >> > >>asm ("" : "=r" (r1)); > >>sp = r1 & (THREAD_SIZE - 1); > > > > > >Or we could do add in asm/reg.h what we have in boot/reg.h: > > > >register void *__stack_pointer asm("r1"); > >#define get_sp() (__stack_pointer) > > > >And use get_sp() > > > > It works, and I guess doing it this way is acceptable as it's exactly > what's done for current in asm/current.h with register r2. That is a *global* register variable. That works. We still need to document a bit better what it does exactly, but this is the expected use case, so that will work. > Now I (still) get: > > sp = get_sp() & (THREAD_SIZE - 1); > b9c: 54 24 04 fe clrlwi r4,r1,19 > if (unlikely(sp < 2048)) { > ba4: 2f 84 07 ff cmpwi cr7,r4,2047 > > Allthough GCC 8.1 what doing exactly the same with the form CLANG don't > like: > > register unsigned long r1 asm("r1"); > long sp = r1 & (THREAD_SIZE - 1); > b84: 54 24 04 fe clrlwi r4,r1,19 > if (unlikely(sp < 2048)) { > b8c: 2f 84 07 ff cmpwi cr7,r4,2047 Sure, if it did what you expected, things will usually work out fine ;-) (Pity that the compiler didn't come up with rlwinm. r4,r1,0,19,20 bne bad Or are the low bits of r4 used later again?) Segher
Re: [PATCH 1/2] powerpc/irq: don't use current_stack_pointer() in check_stack_overflow()
Hi! On Fri, Jan 24, 2020 at 04:46:24PM +1100, Michael Ellerman wrote: > Christophe Leroy writes: > > static inline void check_stack_overflow(void) > > { > > #ifdef CONFIG_DEBUG_STACKOVERFLOW > > - long sp; > > - > > - sp = current_stack_pointer() & (THREAD_SIZE-1); > > + register unsigned long r1 asm("r1"); > > + long sp = r1 & (THREAD_SIZE - 1); > > This appears to work but seems to be "unsupported" by GCC, and clang > actually complains about it: > > /linux/arch/powerpc/kernel/irq.c:603:12: error: variable 'r1' is > uninitialized when used here [-Werror,-Wuninitialized] > long sp = r1 & (THREAD_SIZE - 1); > ^~ > > The GCC docs say: > > The only supported use for this feature is to specify registers for > input and output operands when calling Extended asm (see Extended > Asm). > > https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/Local-Register-Variables.html#Local-Register-Variables Yes. Don't use local register variables any other way. It *will* break. > If I do this it seems to work, but feels a little dicey: > > asm ("" : "=r" (r1)); > sp = r1 & (THREAD_SIZE - 1); The only thing dicey about that is that you are writing to r1. Heh. Well that certainly is bad enough, the compiler does not know how to handle that at all... Of course you aren't *actually* changing anything, so it might just work. Segher