Re: [RFC PATCH] skiboot machine check handler
Mahesh J Salgaonkar's on January 16, 2020 5:03 pm: > On 2019-12-11 20:01:18 Wed, Nicholas Piggin wrote: >> Provide facilities to decode machine checks into human readable >> strings, with only sufficient information required to deal with >> them sanely. >> >> The old machine check stuff was over engineered. The philosophy >> here is that OPAL should correct anything it possibly can, what >> it can't handle but the OS might be able to do something with >> (e.g., uncorrected memory error or SLB multi-hit), it passes back >> to Linux. Anything else, the OS doesn't care. It doesn't want a >> huge struct of severities and levels and originators etc that it >> can't do anything with -- just provide human readable strings >> for what happened and what was done with it. >> >> A Linux driver for this will be able to cope with new processors. >> >> This also uses the same facility to decode machine checks in OPAL >> boot. >> >> The code is a bit in flux because it's sitting on top of a few >> other RFC patches and not quite complete, just wanted opinions >> about it. > > opal_handle_mce() may have to be treated as special opal call. For MCE > that occurs in OPAL context, Linux making opal call will clobber > original opal call stack which hit MCE. Same is true with nested MCE in > OPAL. Should it just continue using same r1 to avoid clobbering or have > a separate stack for mce opal call ? Ah, it wasn't clear in my message, sorry: this would only be made available to kernels which use the new calling convention where the kernel provides its own stack for OPAL to use. That may be controversial itself, that's another RFC but if we went ahead with that approach, then handling re-entrant interrupts like this becomes easy because Linux does all the hard work with NMI/MCE stacks etc. Thanks, Nick
Re: [PATCH -next] powerpc/maple: fix comparing pointer to 0
On Mon, Jan 20, 2020 at 05:52:15PM -0800, Joe Perches wrote: > On Tue, 2020-01-21 at 09:31 +0800, Chen Zhou wrote: > > Fixes coccicheck warning: > > ./arch/powerpc/platforms/maple/setup.c:232:15-16: > > WARNING comparing pointer to 0 > > Does anyone have or use these powerpc maple boards anymore? > > Maybe the whole codebase should just be deleted instead. This is used for *all* non-Apple 970 systems (not running virtualized), not just actual Maple. Segher
Re:Re: [PATCH] powerpc/sysdev: fix compile errors
发件人:Andrew Donnellan 发送日期:2020-01-21 14:13:07 收件人:wangwenhu ,Benjamin Herrenschmidt ,Paul Mackerras ,Michael Ellerman ,Kate Stewart ,Greg Kroah-Hartman ,Richard Fontana ,Thomas Gleixner ,linuxppc-dev@lists.ozlabs.org,linux-ker...@vger.kernel.org 抄送人:triv...@kernel.org,loneh...@hotmail.com,wenhu.w...@vivo.com 主题:Re: [PATCH] powerpc/sysdev: fix compile errors>On 21/1/20 4:31 pm, wangwenhu wrote: >> From: wangwenhu >> >> Include arch/powerpc/include/asm/io.h into fsl_85xx_cache_sram.c to >> fix the implicit declaration compile errors when building Cache-Sram. >> >> arch/powerpc/sysdev/fsl_85xx_cache_sram.c: In function >> ‘instantiate_cache_sram’: >> arch/powerpc/sysdev/fsl_85xx_cache_sram.c:97:26: error: implicit declaration >> of function ‘ioremap_coherent’; did you mean ‘bitmap_complement’? >> [-Werror=implicit-function-declaration] >>cache_sram->base_virt = ioremap_coherent(cache_sram->base_phys, >>^~~~ >>bitmap_complement >> arch/powerpc/sysdev/fsl_85xx_cache_sram.c:97:24: error: assignment makes >> pointer from integer without a cast [-Werror=int-conversion] >>cache_sram->base_virt = ioremap_coherent(cache_sram->base_phys, >> ^ >> arch/powerpc/sysdev/fsl_85xx_cache_sram.c:123:2: error: implicit declaration >> of function ‘iounmap’; did you mean ‘roundup’? >> [-Werror=implicit-function-declaration] >>iounmap(cache_sram->base_virt); >>^~~ >>roundup >> cc1: all warnings being treated as errors >> >> Signed-off-by: wangwenhu > >How long has this code been broken for? It's been broken almost 15 months since the commit below: "commit aa91796ec46339f2ed53da311bd3ea77a3e4dfe1 Author: Christophe Leroy Date: Tue Oct 9 13:51:41 2018 + powerpc: don't use ioremap_prot() nor __ioremap() unless really needed." And we are working on it now for further development. > >> --- >> arch/powerpc/sysdev/fsl_85xx_cache_sram.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c >> b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c >> index f6c665dac725..29b6868eff7d 100644 >> --- a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c >> +++ b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c >> @@ -17,6 +17,7 @@ >> #include >> #include >> #include >> +#include >> >> #include "fsl_85xx_cache_ctlr.h" >> > >-- >Andrew Donnellan OzLabs, ADL Canberra >a...@linux.ibm.com IBM Australia Limited > Wenhu
Re: [PATCH v2 05/27] powerpc: Map & release OpenCAPI LPC memory
On 3/12/19 2:46 pm, Alastair D'Silva wrote: From: Alastair D'Silva This patch adds platform support to map & release LPC memory. Might want to explain what LPC is. Otherwise: Reviewed-by: Andrew Donnellan Signed-off-by: Alastair D'Silva --- arch/powerpc/include/asm/pnv-ocxl.h | 2 ++ arch/powerpc/platforms/powernv/ocxl.c | 42 +++ 2 files changed, 44 insertions(+) diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h index 7de82647e761..f8f8ffb48aa8 100644 --- a/arch/powerpc/include/asm/pnv-ocxl.h +++ b/arch/powerpc/include/asm/pnv-ocxl.h @@ -32,5 +32,7 @@ extern int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle) extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr); extern void pnv_ocxl_free_xive_irq(u32 irq); +extern u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size); +extern void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev); nit: I don't think these need to be extern? -- Andrew Donnellan OzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited
Re:Re: [PATCH] powerpc/Kconfig: Make FSL_85XX_CACHE_SRAM configurable
发件人:Scott Wood 发送日期:2020-01-21 13:49:59 收件人:"王文虎" 抄送人:wangwenhu ,Kumar Gala ,Benjamin Herrenschmidt ,Paul Mackerras ,Michael Ellerman ,linuxppc-dev@lists.ozlabs.org,linux-ker...@vger.kernel.org,triv...@kernel.org,Rai Harninder 主题:Re: [PATCH] powerpc/Kconfig: Make FSL_85XX_CACHE_SRAM configurable>On Tue, 2020-01-21 at 13:20 +0800, 王文虎 wrote: >> From: Scott Wood >> Date: 2020-01-21 11:25:25 >> To: wangwenhu ,Kumar Gala , >> Benjamin Herrenschmidt ,Paul Mackerras < >> pau...@samba.org>,Michael Ellerman , >> linuxppc-dev@lists.ozlabs.org,linux-ker...@vger.kernel.org >> Cc: triv...@kernel.org,wenhu.w...@vivo.com,Rai Harninder < >> harninder@nxp.com> >> Subject: Re: [PATCH] powerpc/Kconfig: Make FSL_85XX_CACHE_SRAM >> configurable>On Mon, 2020-01-20 at 06:43 -0800, wangwenhu wrote: >> > > From: wangwenhu >> > > >> > > When generating .config file with menuconfig on Freescale BOOKE >> > > SOC, FSL_85XX_CACHE_SRAM is not configurable for the lack of >> > > description in the Kconfig field, which makes it impossible >> > > to support L2Cache-Sram driver. Add a description to make it >> > > configurable. >> > > >> > > Signed-off-by: wangwenhu >> > >> > The intent was that drivers using the SRAM API would select the >> > symbol. What >> > is the use case for selecting it manually? >> > >> >> With a repository of multiple products(meaning different defconfigs) and >> multiple >> developers, the Kconfigs of the Kernel Source Tree change frequently. So the >> "make menuconfig" >> process is needed for defconfigs' re-generating or updating for the >> complexity of dependencies >> between different features defined in the Kconfigs. > >That doesn't answer my question of how the SRAM code would be useful other >than to some other driver that uses the API (which would use "select"). There >is no userspace API. You could use the kernel command line to configure the >SRAM but you need to get the address of it for it to be useful. > Like you've asked below, via /dev/mem or direct calling within the Kernel. And they are not submitted yes, under development. >> > Since this code was added almost ten years ago and there are still no (in- >> > tree?) users of the API, we should just remove the sram code (unless this >> > prods someone to submit such a user very soon). >> > >> >> Yes, pretty long a time. But we DO really use the API now for >> PPCE500/Freescale SoC. > >I do not see any users in the kernel tree. Are you talking about out-of-tree >code, or something that you've submitted or will submit soon? Or are you >accessing it via /dev/mem? > Both, but not submitted yet, and partly under development. >> Like sometimes we need to reset the whole RAM, then the L2-Cache would be >> used as >> SRAM for backup using. Since it is useful for us now, a re-consideration is >> recommanded. > >Where is the code that would do this? > Currently under development, and not submitted yet. >-Scott >> > Wenhu
Re: [PATCH] powerpc/sysdev: fix compile errors
On 21/1/20 4:31 pm, wangwenhu wrote: From: wangwenhu Include arch/powerpc/include/asm/io.h into fsl_85xx_cache_sram.c to fix the implicit declaration compile errors when building Cache-Sram. arch/powerpc/sysdev/fsl_85xx_cache_sram.c: In function ‘instantiate_cache_sram’: arch/powerpc/sysdev/fsl_85xx_cache_sram.c:97:26: error: implicit declaration of function ‘ioremap_coherent’; did you mean ‘bitmap_complement’? [-Werror=implicit-function-declaration] cache_sram->base_virt = ioremap_coherent(cache_sram->base_phys, ^~~~ bitmap_complement arch/powerpc/sysdev/fsl_85xx_cache_sram.c:97:24: error: assignment makes pointer from integer without a cast [-Werror=int-conversion] cache_sram->base_virt = ioremap_coherent(cache_sram->base_phys, ^ arch/powerpc/sysdev/fsl_85xx_cache_sram.c:123:2: error: implicit declaration of function ‘iounmap’; did you mean ‘roundup’? [-Werror=implicit-function-declaration] iounmap(cache_sram->base_virt); ^~~ roundup cc1: all warnings being treated as errors Signed-off-by: wangwenhu How long has this code been broken for? --- arch/powerpc/sysdev/fsl_85xx_cache_sram.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c index f6c665dac725..29b6868eff7d 100644 --- a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c +++ b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c @@ -17,6 +17,7 @@ #include #include #include +#include #include "fsl_85xx_cache_ctlr.h" -- Andrew Donnellan OzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited
Re: [PATCH] powerpc/sysdev: fix compile errors
Le 21/01/2020 à 06:31, wangwenhu a écrit : From: wangwenhu Include arch/powerpc/include/asm/io.h into fsl_85xx_cache_sram.c to fix the implicit declaration compile errors when building Cache-Sram. It is usually better to include instead of Christophe
Re: [PATCH] powerpc/Kconfig: Make FSL_85XX_CACHE_SRAM configurable
On Tue, 2020-01-21 at 13:20 +0800, 王文虎 wrote: > From: Scott Wood > Date: 2020-01-21 11:25:25 > To: wangwenhu ,Kumar Gala , > Benjamin Herrenschmidt ,Paul Mackerras < > pau...@samba.org>,Michael Ellerman , > linuxppc-dev@lists.ozlabs.org,linux-ker...@vger.kernel.org > Cc: triv...@kernel.org,wenhu.w...@vivo.com,Rai Harninder < > harninder@nxp.com> > Subject: Re: [PATCH] powerpc/Kconfig: Make FSL_85XX_CACHE_SRAM > configurable>On Mon, 2020-01-20 at 06:43 -0800, wangwenhu wrote: > > > From: wangwenhu > > > > > > When generating .config file with menuconfig on Freescale BOOKE > > > SOC, FSL_85XX_CACHE_SRAM is not configurable for the lack of > > > description in the Kconfig field, which makes it impossible > > > to support L2Cache-Sram driver. Add a description to make it > > > configurable. > > > > > > Signed-off-by: wangwenhu > > > > The intent was that drivers using the SRAM API would select the > > symbol. What > > is the use case for selecting it manually? > > > > With a repository of multiple products(meaning different defconfigs) and > multiple > developers, the Kconfigs of the Kernel Source Tree change frequently. So the > "make menuconfig" > process is needed for defconfigs' re-generating or updating for the > complexity of dependencies > between different features defined in the Kconfigs. That doesn't answer my question of how the SRAM code would be useful other than to some other driver that uses the API (which would use "select"). There is no userspace API. You could use the kernel command line to configure the SRAM but you need to get the address of it for it to be useful. > > Since this code was added almost ten years ago and there are still no (in- > > tree?) users of the API, we should just remove the sram code (unless this > > prods someone to submit such a user very soon). > > > > Yes, pretty long a time. But we DO really use the API now for > PPCE500/Freescale SoC. I do not see any users in the kernel tree. Are you talking about out-of-tree code, or something that you've submitted or will submit soon? Or are you accessing it via /dev/mem? > Like sometimes we need to reset the whole RAM, then the L2-Cache would be > used as > SRAM for backup using. Since it is useful for us now, a re-consideration is > recommanded. Where is the code that would do this? -Scott >
Re:[PATCH] powerpc/Kconfig: Make FSL_85XX_CACHE_SRAM configurable
From: Scott Wood Date: 2020-01-21 11:25:25 To: wangwenhu ,Kumar Gala ,Benjamin Herrenschmidt ,Paul Mackerras ,Michael Ellerman ,linuxppc-dev@lists.ozlabs.org,linux-ker...@vger.kernel.org Cc: triv...@kernel.org,wenhu.w...@vivo.com,Rai Harninder Subject: Re: [PATCH] powerpc/Kconfig: Make FSL_85XX_CACHE_SRAM configurable>On Mon, 2020-01-20 at 06:43 -0800, wangwenhu wrote: >> From: wangwenhu >> >> When generating .config file with menuconfig on Freescale BOOKE >> SOC, FSL_85XX_CACHE_SRAM is not configurable for the lack of >> description in the Kconfig field, which makes it impossible >> to support L2Cache-Sram driver. Add a description to make it >> configurable. >> >> Signed-off-by: wangwenhu > >The intent was that drivers using the SRAM API would select the symbol. What >is the use case for selecting it manually? > With a repository of multiple products(meaning different defconfigs) and multiple developers, the Kconfigs of the Kernel Source Tree change frequently. So the "make menuconfig" process is needed for defconfigs' re-generating or updating for the complexity of dependencies between different features defined in the Kconfigs. >Since this code was added almost ten years ago and there are still no (in- >tree?) users of the API, we should just remove the sram code (unless this >prods someone to submit such a user very soon). > Yes, pretty long a time. But we DO really use the API now for PPCE500/Freescale SoC. Like sometimes we need to reset the whole RAM, then the L2-Cache would be used as SRAM for backup using. Since it is useful for us now, a re-consideration is recommanded. >-Scott > > -- Wenhu vivo
[Bug 205099] KASAN hit at raid6_pq: BUG: Unable to handle kernel data access at 0x00f0fd0d
https://bugzilla.kernel.org/show_bug.cgi?id=205099 --- Comment #19 from Christophe Leroy (christophe.le...@c-s.fr) --- Can you tell exactly where it stops during the boot ? Or take a photo of the screen ? In parallele, could you try (without VMAP_STACK) increasing CONFIG_THREAD_SHIFT to 14 ? It will double the size of the stacks. -- You are receiving this mail because: You are watching the assignee of the bug.
[PATCH] powerpc/sysdev: fix compile errors
From: wangwenhu Include arch/powerpc/include/asm/io.h into fsl_85xx_cache_sram.c to fix the implicit declaration compile errors when building Cache-Sram. arch/powerpc/sysdev/fsl_85xx_cache_sram.c: In function ‘instantiate_cache_sram’: arch/powerpc/sysdev/fsl_85xx_cache_sram.c:97:26: error: implicit declaration of function ‘ioremap_coherent’; did you mean ‘bitmap_complement’? [-Werror=implicit-function-declaration] cache_sram->base_virt = ioremap_coherent(cache_sram->base_phys, ^~~~ bitmap_complement arch/powerpc/sysdev/fsl_85xx_cache_sram.c:97:24: error: assignment makes pointer from integer without a cast [-Werror=int-conversion] cache_sram->base_virt = ioremap_coherent(cache_sram->base_phys, ^ arch/powerpc/sysdev/fsl_85xx_cache_sram.c:123:2: error: implicit declaration of function ‘iounmap’; did you mean ‘roundup’? [-Werror=implicit-function-declaration] iounmap(cache_sram->base_virt); ^~~ roundup cc1: all warnings being treated as errors Signed-off-by: wangwenhu --- arch/powerpc/sysdev/fsl_85xx_cache_sram.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c index f6c665dac725..29b6868eff7d 100644 --- a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c +++ b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c @@ -17,6 +17,7 @@ #include #include #include +#include #include "fsl_85xx_cache_ctlr.h" -- 2.17.1
[PATCH v2 10/10] powerpc/configs/skiroot: Enable CONFIG_PRINTK_CALLER
This adds the CPU or thread number to printk messages. This can help decipher concurrent oopses that have been interleaved. Example output, of PID1 (T1) triggering a warning: [1.581678][T1] WARNING: CPU: 0 PID: 1 at crypto/rsa-pkcs1pad.c:539 pkcs1pad_verify+0x38/0x140 [1.581681][T1] Modules linked in: [1.581693][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc5-gcc-8.2.0-00121-gf84c2e595927-dirty #1515 [1.581700][T1] NIP: c0207d64 LR: c0207d3c CTR: c0207d2c [1.581708][T1] REGS: c000fd2e7560 TRAP: 0700 Not tainted (5.5.0-rc5-gcc-8.2.0-00121-gf84c2e595927-dirty) [1.581712][T1] MSR: 90029033 CR: 44000222 XER: 0004 Signed-off-by: Michael Ellerman --- arch/powerpc/configs/skiroot_defconfig | 1 + 1 file changed, 1 insertion(+) v2: New. diff --git a/arch/powerpc/configs/skiroot_defconfig b/arch/powerpc/configs/skiroot_defconfig index ca6f1842aa29..ae1d7137a84e 100644 --- a/arch/powerpc/configs/skiroot_defconfig +++ b/arch/powerpc/configs/skiroot_defconfig @@ -294,6 +294,7 @@ CONFIG_LIBCRC32C=y # CONFIG_XZ_DEC_ARMTHUMB is not set # CONFIG_XZ_DEC_SPARC is not set CONFIG_PRINTK_TIME=y +CONFIG_PRINTK_CALLER=y CONFIG_MAGIC_SYSRQ=y CONFIG_SLUB_DEBUG_ON=y CONFIG_SCHED_STACK_END_CHECK=y -- 2.21.1
[PATCH v2 09/10] powerpc/configs/skiroot: Enable some more hardening options
Enable more hardening options. Note BUG_ON_DATA_CORRUPTION selects DEBUG_LIST and is essentially just a synonym for it. DEBUG_SG, DEBUG_NOTIFIERS, DEBUG_LIST, DEBUG_CREDENTIALS and SCHED_STACK_END_CHECK should all be low overhead and just add a few extra checks. SLAB_FREELIST_RANDOM, and SLUB_DEBUG_ON will add some overhead to the SLAB allocator, but nothing that should be meaningful for skiroot. Unselecting SLAB_MERGE_DEFAULT causes the SLAB to use more memory, but the skiroot kernel shouldn't be memory constrained on any of our systems, all it does is run a small bootloader. Disabling merging has some security/robustness benefit as it means a user-after-free or overflow will be limited to the objects in that slab, rather than potentially affecting objects from unrelated slabs that have been merged. Note also that slab merging is disabled anyway by enabling SLUB_DEBUG_ON, because of the SLAB_NEVER_MERGE mask. Signed-off-by: Michael Ellerman --- arch/powerpc/configs/skiroot_defconfig | 8 1 file changed, 8 insertions(+) v2: Add more explanation about slab merging. diff --git a/arch/powerpc/configs/skiroot_defconfig b/arch/powerpc/configs/skiroot_defconfig index 28cfd68e8b16..ca6f1842aa29 100644 --- a/arch/powerpc/configs/skiroot_defconfig +++ b/arch/powerpc/configs/skiroot_defconfig @@ -23,6 +23,8 @@ CONFIG_EXPERT=y # CONFIG_AIO is not set CONFIG_PERF_EVENTS=y # CONFIG_COMPAT_BRK is not set +# CONFIG_SLAB_MERGE_DEFAULT is not set +CONFIG_SLAB_FREELIST_RANDOM=y CONFIG_SLAB_FREELIST_HARDENED=y CONFIG_PPC64=y CONFIG_ALTIVEC=y @@ -293,6 +295,8 @@ CONFIG_LIBCRC32C=y # CONFIG_XZ_DEC_SPARC is not set CONFIG_PRINTK_TIME=y CONFIG_MAGIC_SYSRQ=y +CONFIG_SLUB_DEBUG_ON=y +CONFIG_SCHED_STACK_END_CHECK=y CONFIG_DEBUG_STACKOVERFLOW=y CONFIG_PANIC_ON_OOPS=y CONFIG_SOFTLOCKUP_DETECTOR=y @@ -301,6 +305,10 @@ CONFIG_HARDLOCKUP_DETECTOR=y CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y CONFIG_WQ_WATCHDOG=y # CONFIG_SCHED_DEBUG is not set +CONFIG_DEBUG_SG=y +CONFIG_DEBUG_NOTIFIERS=y +CONFIG_BUG_ON_DATA_CORRUPTION=y +CONFIG_DEBUG_CREDENTIALS=y # CONFIG_FTRACE is not set CONFIG_XMON=y # CONFIG_RUNTIME_TESTING_MENU is not set -- 2.21.1
[PATCH v2 08/10] powerpc/configs/skiroot: Disable xmon default & enable reboot on panic
If the skiroot kernel crashes we don't want it sitting at an xmon prompt forever. Instead it's more helpful to reboot and bring the boot loader back up, and if the crash was transient we can then boot successfully. Similarly if we panic we should reboot, with a short timeout in case someone is watching the console. Signed-off-by: Michael Ellerman --- arch/powerpc/configs/skiroot_defconfig | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) v2: No change. diff --git a/arch/powerpc/configs/skiroot_defconfig b/arch/powerpc/configs/skiroot_defconfig index 93b478436a2b..28cfd68e8b16 100644 --- a/arch/powerpc/configs/skiroot_defconfig +++ b/arch/powerpc/configs/skiroot_defconfig @@ -29,6 +29,7 @@ CONFIG_ALTIVEC=y CONFIG_VSX=y CONFIG_NR_CPUS=2048 CONFIG_CPU_LITTLE_ENDIAN=y +CONFIG_PANIC_TIMEOUT=30 # CONFIG_PPC_VAS is not set # CONFIG_PPC_PSERIES is not set # CONFIG_PPC_OF_BOOT_TRAMPOLINE is not set @@ -293,6 +294,7 @@ CONFIG_LIBCRC32C=y CONFIG_PRINTK_TIME=y CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUG_STACKOVERFLOW=y +CONFIG_PANIC_ON_OOPS=y CONFIG_SOFTLOCKUP_DETECTOR=y CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y CONFIG_HARDLOCKUP_DETECTOR=y @@ -301,5 +303,4 @@ CONFIG_WQ_WATCHDOG=y # CONFIG_SCHED_DEBUG is not set # CONFIG_FTRACE is not set CONFIG_XMON=y -CONFIG_XMON_DEFAULT=y # CONFIG_RUNTIME_TESTING_MENU is not set -- 2.21.1
[PATCH v2 07/10] powerpc/configs/skiroot: Enable security features
From: Joel Stanley This turns on HARDENED_USERCOPY with HARDENED_USERCOPY_PAGESPAN, and FORTIFY_SOURCE. It also enables SECURITY_LOCKDOWN_LSM with _EARLY and LOCK_DOWN_KERNEL_FORCE_INTEGRITY options enabled. This still allows xmon to be used in read-only mode. MODULE_SIG is selected by lockdown, so it is still enabled. Signed-off-by: Joel Stanley [mpe: Switch to lockdown integrity mode per oohal] Signed-off-by: Michael Ellerman --- arch/powerpc/configs/skiroot_defconfig | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) v2: Switch to lockdown integrity mode rather than confidentiality as noticed by dja and discussed with jms and oohal. diff --git a/arch/powerpc/configs/skiroot_defconfig b/arch/powerpc/configs/skiroot_defconfig index 24a210fe0049..93b478436a2b 100644 --- a/arch/powerpc/configs/skiroot_defconfig +++ b/arch/powerpc/configs/skiroot_defconfig @@ -49,7 +49,6 @@ CONFIG_JUMP_LABEL=y CONFIG_STRICT_KERNEL_RWX=y CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y -CONFIG_MODULE_SIG=y CONFIG_MODULE_SIG_FORCE=y CONFIG_MODULE_SIG_SHA512=y CONFIG_PARTITION_ADVANCED=y @@ -272,6 +271,16 @@ CONFIG_NLS_ASCII=y CONFIG_NLS_ISO8859_1=y CONFIG_NLS_UTF8=y CONFIG_ENCRYPTED_KEYS=y +CONFIG_SECURITY=y +CONFIG_HARDENED_USERCOPY=y +# CONFIG_HARDENED_USERCOPY_FALLBACK is not set +CONFIG_HARDENED_USERCOPY_PAGESPAN=y +CONFIG_FORTIFY_SOURCE=y +CONFIG_SECURITY_LOCKDOWN_LSM=y +CONFIG_SECURITY_LOCKDOWN_LSM_EARLY=y +CONFIG_LOCK_DOWN_KERNEL_FORCE_INTEGRITY=y +# CONFIG_INTEGRITY is not set +CONFIG_LSM="yama,loadpin,safesetid,integrity" # CONFIG_CRYPTO_HW is not set CONFIG_CRC16=y CONFIG_CRC_ITU_T=y -- 2.21.1
[PATCH v2 06/10] powerpc/configs/skiroot: Update for symbol movement only
Signed-off-by: Michael Ellerman --- arch/powerpc/configs/skiroot_defconfig | 42 +- 1 file changed, 21 insertions(+), 21 deletions(-) v2: No change. diff --git a/arch/powerpc/configs/skiroot_defconfig b/arch/powerpc/configs/skiroot_defconfig index 0aa060eef06c..24a210fe0049 100644 --- a/arch/powerpc/configs/skiroot_defconfig +++ b/arch/powerpc/configs/skiroot_defconfig @@ -1,8 +1,3 @@ -CONFIG_PPC64=y -CONFIG_ALTIVEC=y -CONFIG_VSX=y -CONFIG_NR_CPUS=2048 -CONFIG_CPU_LITTLE_ENDIAN=y CONFIG_KERNEL_XZ=y # CONFIG_SWAP is not set CONFIG_SYSVIPC=y @@ -29,16 +24,11 @@ CONFIG_EXPERT=y CONFIG_PERF_EVENTS=y # CONFIG_COMPAT_BRK is not set CONFIG_SLAB_FREELIST_HARDENED=y -CONFIG_JUMP_LABEL=y -CONFIG_STRICT_KERNEL_RWX=y -CONFIG_MODULES=y -CONFIG_MODULE_UNLOAD=y -CONFIG_MODULE_SIG=y -CONFIG_MODULE_SIG_FORCE=y -CONFIG_MODULE_SIG_SHA512=y -CONFIG_PARTITION_ADVANCED=y -# CONFIG_MQ_IOSCHED_DEADLINE is not set -# CONFIG_MQ_IOSCHED_KYBER is not set +CONFIG_PPC64=y +CONFIG_ALTIVEC=y +CONFIG_VSX=y +CONFIG_NR_CPUS=2048 +CONFIG_CPU_LITTLE_ENDIAN=y # CONFIG_PPC_VAS is not set # CONFIG_PPC_PSERIES is not set # CONFIG_PPC_OF_BOOT_TRAMPOLINE is not set @@ -49,14 +39,24 @@ CONFIG_KEXEC=y CONFIG_PRESERVE_FA_DUMP=y CONFIG_IRQ_ALL_CPUS=y CONFIG_NUMA=y -# CONFIG_COMPACTION is not set -# CONFIG_MIGRATION is not set CONFIG_PPC_64K_PAGES=y CONFIG_SCHED_SMT=y CONFIG_CMDLINE_BOOL=y CONFIG_CMDLINE="console=tty0 console=hvc0 ipr.fast_reboot=1 quiet" # CONFIG_SECCOMP is not set # CONFIG_PPC_MEM_KEYS is not set +CONFIG_JUMP_LABEL=y +CONFIG_STRICT_KERNEL_RWX=y +CONFIG_MODULES=y +CONFIG_MODULE_UNLOAD=y +CONFIG_MODULE_SIG=y +CONFIG_MODULE_SIG_FORCE=y +CONFIG_MODULE_SIG_SHA512=y +CONFIG_PARTITION_ADVANCED=y +# CONFIG_MQ_IOSCHED_DEADLINE is not set +# CONFIG_MQ_IOSCHED_KYBER is not set +# CONFIG_COMPACTION is not set +# CONFIG_MIGRATION is not set CONFIG_NET=y CONFIG_PACKET=y CONFIG_UNIX=y @@ -153,7 +153,6 @@ CONFIG_IGB=m CONFIG_IXGB=m CONFIG_IXGBE=m CONFIG_I40E=m -CONFIG_S2IO=m # CONFIG_NET_VENDOR_MARVELL is not set CONFIG_MLX4_EN=m # CONFIG_MLX4_CORE_GEN2 is not set @@ -164,6 +163,7 @@ CONFIG_MLX5_CORE_EN=y # CONFIG_NET_VENDOR_MICROSEMI is not set CONFIG_MYRI10GE=m # CONFIG_NET_VENDOR_NATSEMI is not set +CONFIG_S2IO=m # CONFIG_NET_VENDOR_NETRONOME is not set # CONFIG_NET_VENDOR_NI is not set # CONFIG_NET_VENDOR_NVIDIA is not set @@ -271,6 +271,8 @@ CONFIG_NLS_CODEPAGE_437=y CONFIG_NLS_ASCII=y CONFIG_NLS_ISO8859_1=y CONFIG_NLS_UTF8=y +CONFIG_ENCRYPTED_KEYS=y +# CONFIG_CRYPTO_HW is not set CONFIG_CRC16=y CONFIG_CRC_ITU_T=y CONFIG_LIBCRC32C=y @@ -289,8 +291,6 @@ CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y CONFIG_WQ_WATCHDOG=y # CONFIG_SCHED_DEBUG is not set # CONFIG_FTRACE is not set -# CONFIG_RUNTIME_TESTING_MENU is not set CONFIG_XMON=y CONFIG_XMON_DEFAULT=y -CONFIG_ENCRYPTED_KEYS=y -# CONFIG_CRYPTO_HW is not set +# CONFIG_RUNTIME_TESTING_MENU is not set -- 2.21.1
[PATCH v2 05/10] powerpc/configs/skiroot: Drop default n CONFIG_CRYPTO_ECHAINIV
It's default n so we don't need to disable it. Signed-off-by: Michael Ellerman --- arch/powerpc/configs/skiroot_defconfig | 1 - 1 file changed, 1 deletion(-) v2: No change. diff --git a/arch/powerpc/configs/skiroot_defconfig b/arch/powerpc/configs/skiroot_defconfig index 74cffb854c0f..0aa060eef06c 100644 --- a/arch/powerpc/configs/skiroot_defconfig +++ b/arch/powerpc/configs/skiroot_defconfig @@ -293,5 +293,4 @@ CONFIG_WQ_WATCHDOG=y CONFIG_XMON=y CONFIG_XMON_DEFAULT=y CONFIG_ENCRYPTED_KEYS=y -# CONFIG_CRYPTO_ECHAINIV is not set # CONFIG_CRYPTO_HW is not set -- 2.21.1
[PATCH v2 04/10] powerpc/configs/skiroot: Drop HID_LOGITECH
Commit bdd08fff4915 ("HID: logitech: Add depends on LEDS_CLASS to Logitech Kconfig entry") made HID_LOGITECH depend on LEDS_CLASS which we do not enable, meaning we are not actually enabling those drivers any more. The Kconfig help text suggests USB HID compliant Logictech devices will continue to work without HID_LOGITECH, so just drop it. Signed-off-by: Michael Ellerman --- arch/powerpc/configs/skiroot_defconfig | 1 - 1 file changed, 1 deletion(-) v2: No change. diff --git a/arch/powerpc/configs/skiroot_defconfig b/arch/powerpc/configs/skiroot_defconfig index 3eee39c50941..74cffb854c0f 100644 --- a/arch/powerpc/configs/skiroot_defconfig +++ b/arch/powerpc/configs/skiroot_defconfig @@ -235,7 +235,6 @@ CONFIG_HID_CYPRESS=y CONFIG_HID_EZKEY=y CONFIG_HID_ITE=y CONFIG_HID_KENSINGTON=y -CONFIG_HID_LOGITECH=y CONFIG_HID_MICROSOFT=y CONFIG_HID_MONTEREY=y CONFIG_USB_HIDDEV=y -- 2.21.1
[PATCH v2 03/10] powerpc/configs: Drop NET_VENDOR_HP which moved to staging
The HP network driver moved to staging in commit 52340b82cf1a ("hp100: Move 100BaseVG AnyLAN driver to staging") meaning we don't need to disable it any more in our defconfigs. Signed-off-by: Michael Ellerman --- arch/powerpc/configs/44x/akebono_defconfig | 1 - arch/powerpc/configs/skiroot_defconfig | 1 - 2 files changed, 2 deletions(-) v2: No change. diff --git a/arch/powerpc/configs/44x/akebono_defconfig b/arch/powerpc/configs/44x/akebono_defconfig index f0c8a07cc274..7705a5c3f4ea 100644 --- a/arch/powerpc/configs/44x/akebono_defconfig +++ b/arch/powerpc/configs/44x/akebono_defconfig @@ -59,7 +59,6 @@ CONFIG_BLK_DEV_SD=y # CONFIG_NET_VENDOR_DLINK is not set # CONFIG_NET_VENDOR_EMULEX is not set # CONFIG_NET_VENDOR_EXAR is not set -# CONFIG_NET_VENDOR_HP is not set CONFIG_IBM_EMAC=y # CONFIG_NET_VENDOR_MARVELL is not set # CONFIG_NET_VENDOR_MELLANOX is not set diff --git a/arch/powerpc/configs/skiroot_defconfig b/arch/powerpc/configs/skiroot_defconfig index eaaffe9ae8b9..3eee39c50941 100644 --- a/arch/powerpc/configs/skiroot_defconfig +++ b/arch/powerpc/configs/skiroot_defconfig @@ -146,7 +146,6 @@ CONFIG_CHELSIO_T1=m # CONFIG_NET_VENDOR_DLINK is not set CONFIG_BE2NET=m # CONFIG_NET_VENDOR_EZCHIP is not set -# CONFIG_NET_VENDOR_HP is not set # CONFIG_NET_VENDOR_HUAWEI is not set CONFIG_E1000=m CONFIG_E1000E=m -- 2.21.1
[PATCH v2 01/10] powerpc/configs: Drop CONFIG_QLGE which moved to staging
The QLGE driver moved to staging in commit 955315b0dc8c ("qlge: Move drivers/net/ethernet/qlogic/qlge/ to drivers/staging/qlge/"), meaning our defconfigs that enable it have no effect as we don't enable CONFIG_STAGING. It sounds like the device is obsolete, so drop the driver. Signed-off-by: Michael Ellerman --- arch/powerpc/configs/powernv_defconfig | 1 - arch/powerpc/configs/ppc64_defconfig | 1 - arch/powerpc/configs/ppc6xx_defconfig | 1 - arch/powerpc/configs/pseries_defconfig | 1 - arch/powerpc/configs/skiroot_defconfig | 1 - 5 files changed, 5 deletions(-) v2: No change. diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig index 32841456a573..71749377d164 100644 --- a/arch/powerpc/configs/powernv_defconfig +++ b/arch/powerpc/configs/powernv_defconfig @@ -181,7 +181,6 @@ CONFIG_MLX5_FPGA=y CONFIG_MLX5_CORE_EN=y CONFIG_MLX5_CORE_IPOIB=y CONFIG_MYRI10GE=m -CONFIG_QLGE=m CONFIG_NETXEN_NIC=m CONFIG_USB_NET_DRIVERS=m # CONFIG_WLAN is not set diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig index b250e6f5a7ca..7e68cb222c7b 100644 --- a/arch/powerpc/configs/ppc64_defconfig +++ b/arch/powerpc/configs/ppc64_defconfig @@ -189,7 +189,6 @@ CONFIG_MLX4_EN=m CONFIG_MYRI10GE=m CONFIG_S2IO=m CONFIG_PASEMI_MAC=y -CONFIG_QLGE=m CONFIG_NETXEN_NIC=m CONFIG_SUNGEM=y CONFIG_GELIC_NET=m diff --git a/arch/powerpc/configs/ppc6xx_defconfig b/arch/powerpc/configs/ppc6xx_defconfig index 7e28919041cf..3e2f44f38ac5 100644 --- a/arch/powerpc/configs/ppc6xx_defconfig +++ b/arch/powerpc/configs/ppc6xx_defconfig @@ -507,7 +507,6 @@ CONFIG_FORCEDETH=m CONFIG_HAMACHI=m CONFIG_YELLOWFIN=m CONFIG_QLA3XXX=m -CONFIG_QLGE=m CONFIG_NETXEN_NIC=m CONFIG_8139CP=m CONFIG_8139TOO=m diff --git a/arch/powerpc/configs/pseries_defconfig b/arch/powerpc/configs/pseries_defconfig index 26126b4d4de3..6b68109e248f 100644 --- a/arch/powerpc/configs/pseries_defconfig +++ b/arch/powerpc/configs/pseries_defconfig @@ -169,7 +169,6 @@ CONFIG_IXGBE=m CONFIG_I40E=m CONFIG_MLX4_EN=m CONFIG_MYRI10GE=m -CONFIG_QLGE=m CONFIG_NETXEN_NIC=m CONFIG_PPP=m CONFIG_PPP_BSDCOMP=m diff --git a/arch/powerpc/configs/skiroot_defconfig b/arch/powerpc/configs/skiroot_defconfig index 069f67f12731..7ff1ff1ddc28 100644 --- a/arch/powerpc/configs/skiroot_defconfig +++ b/arch/powerpc/configs/skiroot_defconfig @@ -171,7 +171,6 @@ CONFIG_MYRI10GE=m # CONFIG_NET_VENDOR_NVIDIA is not set # CONFIG_NET_VENDOR_OKI is not set # CONFIG_NET_VENDOR_PACKET_ENGINES is not set -CONFIG_QLGE=m CONFIG_NETXEN_NIC=m CONFIG_QED=m CONFIG_QEDE=m -- 2.21.1
[PATCH v2 02/10] powerpc/configs: NET_CADENCE became NET_VENDOR_CADENCE
The NET_CADENCE symbol was renamed to NET_VENDOR_CADENCE, so we don't need to disable the former, see commit 0df5f81c481e ("net: ethernet: Add missing VENDOR to Cadence and Packet Engines symbols"). Signed-off-by: Michael Ellerman --- arch/powerpc/configs/skiroot_defconfig | 1 - 1 file changed, 1 deletion(-) v2: No change. diff --git a/arch/powerpc/configs/skiroot_defconfig b/arch/powerpc/configs/skiroot_defconfig index 7ff1ff1ddc28..eaaffe9ae8b9 100644 --- a/arch/powerpc/configs/skiroot_defconfig +++ b/arch/powerpc/configs/skiroot_defconfig @@ -138,7 +138,6 @@ CONFIG_TIGON3=m CONFIG_BNX2X=m # CONFIG_NET_VENDOR_BROCADE is not set # CONFIG_NET_VENDOR_CADENCE is not set -# CONFIG_NET_CADENCE is not set # CONFIG_NET_VENDOR_CAVIUM is not set CONFIG_CHELSIO_T1=m # CONFIG_NET_VENDOR_CISCO is not set -- 2.21.1
Re: [RFC PATCH 9/9] powerpc/configs/skiroot: Enable some more hardening options
Joel Stanley writes: > On Thu, 16 Jan 2020 at 01:48, Michael Ellerman wrote: >> >> Enable more hardening options. >> >> Note BUG_ON_DATA_CORRUPTION selects DEBUG_LIST and is essentially just >> a synonym for it. >> >> DEBUG_SG, DEBUG_NOTIFIERS, DEBUG_LIST, DEBUG_CREDENTIALS and >> SCHED_STACK_END_CHECK should all be low overhead and just add a few >> extra checks. >> >> Unselecting SLAB_MERGE_DEFAULT causes the SLAB to use more memory, but >> the skiroot kernel shouldn't be memory constrained on any of our >> systems, all it does is run a small bootloader. > > Why do we unselect it? The help text pretty much explains it: config SLAB_MERGE_DEFAULT bool "Allow slab caches to be merged" default y help For reduced kernel memory fragmentation, slab caches can be merged when they share the same size and other characteristics. This carries a risk of kernel heap overflows being able to overwrite objects from merged caches (and more easily control cache layout), which makes such heap attacks easier to exploit by attackers. By keeping caches unmerged, these kinds of exploits can usually only damage objects in the same cache. To disable merging at runtime, "slab_nomerge" can be passed on the kernel command line. So unselecting it uses a bit more memory but has some security/robustness benefit. I should probably also mention that it essentially has no effect because we're also enabling SLUB_DEBUG_ON, and that causes some of the flags in SLAB_NEVER_MERGE to be set, which also disables merging. cheers
Re: [PATCH] powerpc/pseries/vio: Fix iommu_table use-after-free refcount warning
On 21/01/2020 09:10, Tyrel Datwyler wrote: > From: Tyrel Datwyler > > Commit e5afdf9dd515 ("powerpc/vfio_spapr_tce: Add reference counting to > iommu_table") missed an iommu_table allocation in the pseries vio code. > The iommu_table is allocated with kzalloc and as a result the associated > kref gets a value of zero. This has the side effect that during a DLPAR > remove of the associated virtual IOA the iommu_tce_table_put() triggers > a use-after-free underflow warning. > > Call Trace: > [c002879e39f0] [c071ecb4] refcount_warn_saturate+0x184/0x190 > (unreliable) > [c002879e3a50] [c00500ac] iommu_tce_table_put+0x9c/0xb0 > [c002879e3a70] [c00f54e4] vio_dev_release+0x34/0x70 > [c002879e3aa0] [c087cfa4] device_release+0x54/0xf0 > [c002879e3b10] [c0d64c84] kobject_cleanup+0xa4/0x240 > [c002879e3b90] [c087d358] put_device+0x28/0x40 > [c002879e3bb0] [c07a328c] dlpar_remove_slot+0x15c/0x250 > [c002879e3c50] [c07a348c] remove_slot_store+0xac/0xf0 > [c002879e3cd0] [c0d64220] kobj_attr_store+0x30/0x60 > [c002879e3cf0] [c04ff13c] sysfs_kf_write+0x6c/0xa0 > [c002879e3d10] [c04fde4c] kernfs_fop_write+0x18c/0x260 > [c002879e3d60] [c0410f3c] __vfs_write+0x3c/0x70 > [c002879e3d80] [c0415408] vfs_write+0xc8/0x250 > [c002879e3dd0] [c04157dc] ksys_write+0x7c/0x120 > [c002879e3e20] [c000b278] system_call+0x5c/0x68 > > Further, since the refcount was always zero the iommu_tce_table_put() > fails to call the iommu_table release function resulting in a leak. > > Fix this issue be initilizing the iommu_table kref immediately after > allocation. > > Fixes: e5afdf9dd515 ("powerpc/vfio_spapr_tce: Add reference counting to > iommu_table") > Signed-off-by: Tyrel Datwyler Reviewed-by: Alexey Kardashevskiy > --- > arch/powerpc/platforms/pseries/vio.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/powerpc/platforms/pseries/vio.c > b/arch/powerpc/platforms/pseries/vio.c > index 79e2287..f682b7b 100644 > --- a/arch/powerpc/platforms/pseries/vio.c > +++ b/arch/powerpc/platforms/pseries/vio.c > @@ -1176,6 +1176,8 @@ static struct iommu_table *vio_build_iommu_table(struct > vio_dev *dev) > if (tbl == NULL) > return NULL; > > + kref_init(>it_kref); > + > of_parse_dma_window(dev->dev.of_node, dma_window, > >it_index, , ); > > -- Alexey
Re: [PATCH] powerpc/Kconfig: Make FSL_85XX_CACHE_SRAM configurable
On Mon, 2020-01-20 at 06:43 -0800, wangwenhu wrote: > From: wangwenhu > > When generating .config file with menuconfig on Freescale BOOKE > SOC, FSL_85XX_CACHE_SRAM is not configurable for the lack of > description in the Kconfig field, which makes it impossible > to support L2Cache-Sram driver. Add a description to make it > configurable. > > Signed-off-by: wangwenhu The intent was that drivers using the SRAM API would select the symbol. What is the use case for selecting it manually? Since this code was added almost ten years ago and there are still no (in- tree?) users of the API, we should just remove the sram code (unless this prods someone to submit such a user very soon). -Scott
Re: [PATCH -next] powerpc/maple: fix comparing pointer to 0
On Tue, 2020-01-21 at 09:31 +0800, Chen Zhou wrote: > Fixes coccicheck warning: > ./arch/powerpc/platforms/maple/setup.c:232:15-16: > WARNING comparing pointer to 0 Does anyone have or use these powerpc maple boards anymore? Maybe the whole codebase should just be deleted instead. If not, setup.c has an unused DBG macro that could be removed too. --- arch/powerpc/platforms/maple/setup.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/arch/powerpc/platforms/maple/setup.c b/arch/powerpc/platforms/maple/setup.c index 47f7310..d6a083c 100644 --- a/arch/powerpc/platforms/maple/setup.c +++ b/arch/powerpc/platforms/maple/setup.c @@ -57,12 +57,6 @@ #include "maple.h" -#ifdef DEBUG -#define DBG(fmt...) udbg_printf(fmt) -#else -#define DBG(fmt...) -#endif - static unsigned long maple_find_nvram_base(void) { struct device_node *rtcs;
[PATCH -next] powerpc/maple: fix comparing pointer to 0
Fixes coccicheck warning: ./arch/powerpc/platforms/maple/setup.c:232:15-16: WARNING comparing pointer to 0 Compare pointer-typed values to NULL rather than 0. Signed-off-by: Chen Zhou --- arch/powerpc/platforms/maple/setup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/maple/setup.c b/arch/powerpc/platforms/maple/setup.c index 47f7310..00a0780 100644 --- a/arch/powerpc/platforms/maple/setup.c +++ b/arch/powerpc/platforms/maple/setup.c @@ -229,7 +229,7 @@ static void __init maple_init_IRQ(void) root = of_find_node_by_path("/"); naddr = of_n_addr_cells(root); opprop = of_get_property(root, "platform-open-pic", ); - if (opprop != 0) { + if (opprop) { openpic_addr = of_read_number(opprop, naddr); has_isus = (opplen > naddr); printk(KERN_DEBUG "OpenPIC addr: %lx, has ISUs: %d\n", -- 2.7.4
[Bug 205099] KASAN hit at raid6_pq: BUG: Unable to handle kernel data access at 0x00f0fd0d
https://bugzilla.kernel.org/show_bug.cgi?id=205099 --- Comment #18 from Erhard F. (erhar...@mailbox.org) --- (In reply to Christophe Leroy from comment #17) > Created attachment 286907 [details] > Patch to fix kasan with KASAN_VMALLOC and VMAP_STACK > > Please try the attached patch, it fixes the setup of the kasan early hash > table when VMAP_STACK is enabled. Sorry, but still the same situation with this patch applied. -- You are receiving this mail because: You are watching the assignee of the bug.
[PATCH] powerpc/pseries/vio: Fix iommu_table use-after-free refcount warning
From: Tyrel Datwyler Commit e5afdf9dd515 ("powerpc/vfio_spapr_tce: Add reference counting to iommu_table") missed an iommu_table allocation in the pseries vio code. The iommu_table is allocated with kzalloc and as a result the associated kref gets a value of zero. This has the side effect that during a DLPAR remove of the associated virtual IOA the iommu_tce_table_put() triggers a use-after-free underflow warning. Call Trace: [c002879e39f0] [c071ecb4] refcount_warn_saturate+0x184/0x190 (unreliable) [c002879e3a50] [c00500ac] iommu_tce_table_put+0x9c/0xb0 [c002879e3a70] [c00f54e4] vio_dev_release+0x34/0x70 [c002879e3aa0] [c087cfa4] device_release+0x54/0xf0 [c002879e3b10] [c0d64c84] kobject_cleanup+0xa4/0x240 [c002879e3b90] [c087d358] put_device+0x28/0x40 [c002879e3bb0] [c07a328c] dlpar_remove_slot+0x15c/0x250 [c002879e3c50] [c07a348c] remove_slot_store+0xac/0xf0 [c002879e3cd0] [c0d64220] kobj_attr_store+0x30/0x60 [c002879e3cf0] [c04ff13c] sysfs_kf_write+0x6c/0xa0 [c002879e3d10] [c04fde4c] kernfs_fop_write+0x18c/0x260 [c002879e3d60] [c0410f3c] __vfs_write+0x3c/0x70 [c002879e3d80] [c0415408] vfs_write+0xc8/0x250 [c002879e3dd0] [c04157dc] ksys_write+0x7c/0x120 [c002879e3e20] [c000b278] system_call+0x5c/0x68 Further, since the refcount was always zero the iommu_tce_table_put() fails to call the iommu_table release function resulting in a leak. Fix this issue be initilizing the iommu_table kref immediately after allocation. Fixes: e5afdf9dd515 ("powerpc/vfio_spapr_tce: Add reference counting to iommu_table") Signed-off-by: Tyrel Datwyler --- arch/powerpc/platforms/pseries/vio.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/platforms/pseries/vio.c b/arch/powerpc/platforms/pseries/vio.c index 79e2287..f682b7b 100644 --- a/arch/powerpc/platforms/pseries/vio.c +++ b/arch/powerpc/platforms/pseries/vio.c @@ -1176,6 +1176,8 @@ static struct iommu_table *vio_build_iommu_table(struct vio_dev *dev) if (tbl == NULL) return NULL; + kref_init(>it_kref); + of_parse_dma_window(dev->dev.of_node, dma_window, >it_index, , ); -- 1.8.3.1
[PATCH] powerpc/Kconfig: Make FSL_85XX_CACHE_SRAM configurable
From: wangwenhu When generating .config file with menuconfig on Freescale BOOKE SOC, FSL_85XX_CACHE_SRAM is not configurable for the lack of description in the Kconfig field, which makes it impossible to support L2Cache-Sram driver. Add a description to make it configurable. Signed-off-by: wangwenhu --- arch/powerpc/platforms/85xx/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/85xx/Kconfig b/arch/powerpc/platforms/85xx/Kconfig index fa3d29dcb57e..ee5ba10b98cb 100644 --- a/arch/powerpc/platforms/85xx/Kconfig +++ b/arch/powerpc/platforms/85xx/Kconfig @@ -17,7 +17,7 @@ if FSL_SOC_BOOKE if PPC32 config FSL_85XX_CACHE_SRAM - bool + bool "Freescale Cache-Sram" select PPC_LIB_RHEAP help When selected, this option enables cache-sram support -- 2.23.0
Re: [PATCH v2 00/10] Impveovements for random.h/archrandom.h
On Mon, Jan 20, 2020 at 05:26:27PM +, Mark Brown wrote: > I think the important thing here is that *someone* takes the patches. > We've now got Ted and Borislav both saying they're OK applying the > patches, an additional proposal that Andrew takes the patches, nobody > saying anything negative about applying the patches and yet the patches > are not applied. The random tree sounds like a sensible enough tree to > take this so if Ted picks them up perhaps that's most sensible? Yes, Ted, pls pick them up so that we're done with this. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Re: [RFC PATCH v4 00/11] powerpc: switch VDSO to C implementation.
On Mon, Jan 20, 2020 at 06:08:23PM +0100, Christophe Leroy wrote: > Not easy I think. > > First we have the unavoidable ASM entry function that can't be dropped > because of the CR[SO] bit the set on error or clear on no error and that > can't be done in C. Yup. > In our ASM VDSO, fixed shifts are used, while in generic C VDSO, shifts > are generic and read from the VDSO data. Does that cost more than just a few cycles? > And there is still some funny code generated by GCC (8.1), like: > > 620: 7d 29 3c 30 srw r9,r9,r7 > 624: 21 87 00 20 subfic r12,r7,32 > 628: 7d 07 3c 31 srw.r7,r8,r7 > 62c: 7d 08 60 30 slw r8,r8,r12 > 630: 7d 0b 4b 78 or r11,r8,r9 (This can be done cheaper for fixed shifts, you can use rlwimi then). > 634: 39 40 00 00 li r10,0 > 638: 40 82 00 84 bne 6bc <__c_kernel_clock_gettime+0x114> > 63c: 81 23 00 24 lwz r9,36(r3) > 640: 81 05 00 00 lwz r8,0(r5) > ... > 6bc: 7d 69 5b 78 mr r9,r11 > 6c0: 7c ea 3b 78 mr r10,r7 > 6c4: 7d 2b 4b 78 mr r11,r9 > 6c8: 4b ff ff 74 b 63c <__c_kernel_clock_gettime+0x94> > > This branch to 6bc is totally useless: > - copying r11 into r9 is pointless as r9 is overwritten in 63c > - copying back r9 into r11 is pointless as r11 has not been modified > inbetween. Yeah, huh, how did that happen. > - loading r10 with 0 then overwritting r10 with r7 when r7 is not 0 is > pointless as well, could have directly put the result of srw. in r10. This may be harder to make the compiler do. But the r9/r11 thing suggests you are preventing optimisation somewhere, maybe with some asm? Do you have some small testcase I can compile? Segher
Re: [PATCH v2 00/10] Impveovements for random.h/archrandom.h
On Fri, Jan 10, 2020 at 12:05:59PM -0500, Theodore Y. Ts'o wrote: > On Fri, Jan 10, 2020 at 04:51:53PM +0100, Borislav Petkov wrote: > > On Fri, Jan 10, 2020 at 02:54:12PM +, Mark Brown wrote: > > > This is a resend of a series from Richard Henderson last posted back in > > > November: > > > > > > https://lore.kernel.org/linux-arm-kernel/20191106141308.30535-1-...@twiddle.net/ > > > Back then Borislav said they looked good and asked if he should take > > > them through the tip tree but things seem to have got lost since then. > > Or, alternatively, akpm could take them. In any case, if someone else > > ends up doing that, for the x86 bits: > Or I can take them through the random.git tree, since we have a lot of > changes this cycle going to Linus anyway. Any objections? I think the important thing here is that *someone* takes the patches. We've now got Ted and Borislav both saying they're OK applying the patches, an additional proposal that Andrew takes the patches, nobody saying anything negative about applying the patches and yet the patches are not applied. The random tree sounds like a sensible enough tree to take this so if Ted picks them up perhaps that's most sensible? signature.asc Description: PGP signature
Re: [RFC PATCH v4 00/11] powerpc: switch VDSO to C implementation.
Le 20/01/2020 à 16:19, Segher Boessenkool a écrit : On Mon, Jan 20, 2020 at 02:56:00PM +, Christophe Leroy wrote: Nice! Much better. It should be tested on more representative hardware, too, but this looks promising alright :-) mpc832x (e300c2 core) at 333 MHz: Before: gettimeofday:vdso: 235 nsec/call clock-gettime-realtime:vdso: 244 nsec/call With the series: gettimeofday:vdso: 271 nsec/call clock-gettime-realtime:vdso: 281 nsec/call Those are important, and degrade ~15%. That is acceptable IMO, but do you see a way to optimise this (later)? Not easy I think. First we have the unavoidable ASM entry function that can't be dropped because of the CR[SO] bit the set on error or clear on no error and that can't be done in C. In our ASM VDSO, fixed shifts are used, while in generic C VDSO, shifts are generic and read from the VDSO data. And there is still some funny code generated by GCC (8.1), like: 620: 7d 29 3c 30 srw r9,r9,r7 624: 21 87 00 20 subfic r12,r7,32 628: 7d 07 3c 31 srw.r7,r8,r7 62c: 7d 08 60 30 slw r8,r8,r12 630: 7d 0b 4b 78 or r11,r8,r9 634: 39 40 00 00 li r10,0 638: 40 82 00 84 bne 6bc <__c_kernel_clock_gettime+0x114> 63c: 81 23 00 24 lwz r9,36(r3) 640: 81 05 00 00 lwz r8,0(r5) ... 6bc: 7d 69 5b 78 mr r9,r11 6c0: 7c ea 3b 78 mr r10,r7 6c4: 7d 2b 4b 78 mr r11,r9 6c8: 4b ff ff 74 b 63c <__c_kernel_clock_gettime+0x94> This branch to 6bc is totally useless: - copying r11 into r9 is pointless as r9 is overwritten in 63c - copying back r9 into r11 is pointless as r11 has not been modified inbetween. - loading r10 with 0 then overwritting r10 with r7 when r7 is not 0 is pointless as well, could have directly put the result of srw. in r10. Christophe
Re: [RFC PATCH v4 00/11] powerpc: switch VDSO to C implementation.
On Mon, Jan 20, 2020 at 02:56:00PM +, Christophe Leroy wrote: > >Nice! Much better. > > > >It should be tested on more representative hardware, too, but this looks > >promising alright :-) > > mpc832x (e300c2 core) at 333 MHz: > > Before: > > gettimeofday:vdso: 235 nsec/call > clock-gettime-realtime:vdso: 244 nsec/call > > With the series: > > gettimeofday:vdso: 271 nsec/call > clock-gettime-realtime:vdso: 281 nsec/call Those are important, and degrade ~15%. That is acceptable IMO, but do you see a way to optimise this (later)? Anyway, excellent results, thanks for your persistence! Segher
Re: [RFC PATCH v4 00/11] powerpc: switch VDSO to C implementation.
Hi On 01/17/2020 08:58 AM, Segher Boessenkool wrote: Hi! On Thu, Jan 16, 2020 at 05:58:24PM +, Christophe Leroy wrote: On a powerpc8xx, with current powerpc/32 ASM VDSO: gettimeofday:vdso: 907 nsec/call clock-getres-realtime:vdso: 484 nsec/call clock-gettime-realtime:vdso: 899 nsec/call The first patch adds VDSO generic C support without any changes to common code. Performance is as follows: gettimeofday:vdso: 1211 nsec/call clock-getres-realtime:vdso: 722 nsec/call clock-gettime-realtime:vdso: 1216 nsec/call Then a few changes in the common code have allowed performance improvement. At the end of the series we have: gettimeofday:vdso: 974 nsec/call clock-getres-realtime:vdso: 545 nsec/call clock-gettime-realtime:vdso: 941 nsec/call The final result is rather close to pure ASM VDSO: * 7% more on gettimeofday (9 cycles) * 5% more on clock-gettime-realtime (6 cycles) * 12% more on clock-getres-realtime (8 cycles) Nice! Much better. It should be tested on more representative hardware, too, but this looks promising alright :-) mpc832x (e300c2 core) at 333 MHz: Before: gettimeofday:vdso: 235 nsec/call clock-getres-realtime-coarse:vdso: 1668 nsec/call clock-gettime-realtime-coarse:vdso: 1338 nsec/call clock-getres-realtime:vdso: 135 nsec/call clock-gettime-realtime:vdso: 244 nsec/call clock-getres-boottime:vdso: 1232 nsec/call clock-gettime-boottime:vdso: 1935 nsec/call clock-getres-tai:vdso: 1257 nsec/call clock-gettime-tai:vdso: 1898 nsec/call clock-getres-monotonic-raw:vdso: 1229 nsec/call clock-gettime-monotonic-raw:vdso: 1541 nsec/call clock-getres-monotonic-coarse:vdso: 1699 nsec/call clock-gettime-monotonic-coarse:vdso: 1477 nsec/call clock-getres-monotonic:vdso: 135 nsec/call clock-gettime-monotonic:vdso: 283 nsec/call With the series: gettimeofday:vdso: 271 nsec/call clock-getres-realtime-coarse:vdso: 159 nsec/call clock-gettime-realtime-coarse:vdso: 184 nsec/call clock-getres-realtime:vdso: 163 nsec/call clock-gettime-realtime:vdso: 281 nsec/call clock-getres-boottime:vdso: 169 nsec/call clock-gettime-boottime:vdso: 274 nsec/call clock-getres-tai:vdso: 163 nsec/call clock-gettime-tai:vdso: 277 nsec/call clock-getres-monotonic-raw:vdso: 166 nsec/call clock-gettime-monotonic-raw:vdso: 302 nsec/call clock-getres-monotonic-coarse:vdso: 159 nsec/call clock-gettime-monotonic-coarse:vdso: 184 nsec/call clock-getres-monotonic:vdso: 166 nsec/call clock-gettime-monotonic:vdso: 274 nsec/call Christophe
Re: [PATCH v2] selftests: vm: Fix 64-bit test builds for powerpc64le
On 1/20/20 7:29 PM, Sandipan Das wrote: > Some tests are built only for 64-bit systems. This makes > sure that these tests are built for both big and little > endian variants of powerpc64. > > Fixes: 7549b3364201 ("selftests: vm: Build/Run 64bit tests only on 64bit > arch") > Signed-off-by: Sandipan Das I was about to suggest, the missing change in run_vmtests script in your V1. Reviewed-by: Kamalesh Babulal -- Kamalesh
[PATCH v2] selftests: vm: Fix 64-bit test builds for powerpc64le
Some tests are built only for 64-bit systems. This makes sure that these tests are built for both big and little endian variants of powerpc64. Fixes: 7549b3364201 ("selftests: vm: Build/Run 64bit tests only on 64bit arch") Signed-off-by: Sandipan Das --- Changelog: v2: - Added required changes in run_vmtests. --- tools/testing/selftests/vm/Makefile| 2 +- tools/testing/selftests/vm/run_vmtests | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 7f9a8a8c31da..f3d11f4fca38 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -19,7 +19,7 @@ TEST_GEN_FILES += thuge-gen TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += userfaultfd -ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 sparc64 x86_64)) +ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 ppc64 ppc64le riscv64 s390x sh64 sparc64 x86_64)) TEST_GEN_FILES += va_128TBswitch TEST_GEN_FILES += virtual_address_range endif diff --git a/tools/testing/selftests/vm/run_vmtests b/tools/testing/selftests/vm/run_vmtests index a692ea828317..db8e0d1c7b39 100755 --- a/tools/testing/selftests/vm/run_vmtests +++ b/tools/testing/selftests/vm/run_vmtests @@ -59,7 +59,7 @@ else fi #filter 64bit architectures -ARCH64STR="arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 sparc64 x86_64" +ARCH64STR="arm64 ia64 mips64 parisc64 ppc64 ppc64le riscv64 s390x sh64 sparc64 x86_64" if [ -z $ARCH ]; then ARCH=`uname -m 2>/dev/null | sed -e 's/aarch64.*/arm64/'` fi -- 2.17.1
[PATCH] selftests: vm: Fix 64-bit test builds for powerpc64le
Some tests are built only for 64-bit systems. This makes sure that these tests are built for both big and little endian variants of powerpc64. Fixes: 7549b3364201 ("selftests: vm: Build/Run 64bit tests only on 64bit arch") Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 7f9a8a8c31da..f3d11f4fca38 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -19,7 +19,7 @@ TEST_GEN_FILES += thuge-gen TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += userfaultfd -ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 sparc64 x86_64)) +ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 ppc64 ppc64le riscv64 s390x sh64 sparc64 x86_64)) TEST_GEN_FILES += va_128TBswitch TEST_GEN_FILES += virtual_address_range endif -- 2.17.1
Re: [PATCH] ide: remove set but not used variable 'hwif'
From: Wang Hai Date: Sat, 26 Oct 2019 09:57:38 +0800 > Fix the following gcc warning: > > drivers/ide/pmac.c: In function pmac_ide_setup_device: > drivers/ide/pmac.c:1027:14: warning: variable hwif set but not used > [-Wunused-but-set-variable] > > Fixes: d58b0c39e32f ("powerpc/macio: Rework hotplug media bay support") > Reported-by: Hulk Robot > Signed-off-by: Wang Hai Applied.
[Bug 205099] KASAN hit at raid6_pq: BUG: Unable to handle kernel data access at 0x00f0fd0d
https://bugzilla.kernel.org/show_bug.cgi?id=205099 --- Comment #17 from Christophe Leroy (christophe.le...@c-s.fr) --- Created attachment 286907 --> https://bugzilla.kernel.org/attachment.cgi?id=286907=edit Patch to fix kasan with KASAN_VMALLOC and VMAP_STACK Please try the attached patch, it fixes the setup of the kasan early hash table when VMAP_STACK is enabled. -- You are receiving this mail because: You are watching the assignee of the bug.
[PATCH v5 10/10] drivers/oprofile: open access for CAP_PERFMON privileged process
Open access to monitoring for CAP_PERFMON privileged processes. For backward compatibility reasons access to the monitoring remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure monitoring is discouraged with respect to CAP_PERFMON capability. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes the operations more secure. Signed-off-by: Alexey Budankov --- drivers/oprofile/event_buffer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/oprofile/event_buffer.c b/drivers/oprofile/event_buffer.c index 12ea4a4ad607..6c9edc8bbc95 100644 --- a/drivers/oprofile/event_buffer.c +++ b/drivers/oprofile/event_buffer.c @@ -113,7 +113,7 @@ static int event_buffer_open(struct inode *inode, struct file *file) { int err = -EPERM; - if (!capable(CAP_SYS_ADMIN)) + if (!perfmon_capable()) return -EPERM; if (test_and_set_bit_lock(0, _opened)) -- 2.20.1
[PATCH v5 09/10] drivers/perf: open access for CAP_PERFMON privileged process
Open access to monitoring for CAP_PERFMON privileged processes. For backward compatibility reasons access to the monitoring remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure monitoring is discouraged with respect to CAP_PERFMON capability. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes the operations more secure. Signed-off-by: Alexey Budankov --- drivers/perf/arm_spe_pmu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c index 4e4984a55cd1..5dff81bc3324 100644 --- a/drivers/perf/arm_spe_pmu.c +++ b/drivers/perf/arm_spe_pmu.c @@ -274,7 +274,7 @@ static u64 arm_spe_event_to_pmscr(struct perf_event *event) if (!attr->exclude_kernel) reg |= BIT(SYS_PMSCR_EL1_E1SPE_SHIFT); - if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && capable(CAP_SYS_ADMIN)) + if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable()) reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT); return reg; @@ -700,7 +700,7 @@ static int arm_spe_pmu_event_init(struct perf_event *event) return -EOPNOTSUPP; reg = arm_spe_event_to_pmscr(event); - if (!capable(CAP_SYS_ADMIN) && + if (!perfmon_capable() && (reg & (BIT(SYS_PMSCR_EL1_PA_SHIFT) | BIT(SYS_PMSCR_EL1_CX_SHIFT) | BIT(SYS_PMSCR_EL1_PCT_SHIFT -- 2.20.1
[PATCH v5 08/10] parisc/perf: open access for CAP_PERFMON privileged process
Open access to monitoring for CAP_PERFMON privileged processes. For backward compatibility reasons access to the monitoring remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure monitoring is discouraged with respect to CAP_PERFMON capability. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes the operations more secure. Signed-off-by: Alexey Budankov --- arch/parisc/kernel/perf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/parisc/kernel/perf.c b/arch/parisc/kernel/perf.c index 676683641d00..c4208d027794 100644 --- a/arch/parisc/kernel/perf.c +++ b/arch/parisc/kernel/perf.c @@ -300,7 +300,7 @@ static ssize_t perf_write(struct file *file, const char __user *buf, else return -EFAULT; - if (!capable(CAP_SYS_ADMIN)) + if (!perfmon_capable()) return -EACCES; if (count != sizeof(uint32_t)) -- 2.20.1
[PATCH v5 07/10] powerpc/perf: open access for CAP_PERFMON privileged process
Open access to monitoring for CAP_PERFMON privileged processes. For backward compatibility reasons access to the monitoring remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure monitoring is discouraged with respect to CAP_PERFMON capability. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes the operations more secure. Signed-off-by: Alexey Budankov --- arch/powerpc/perf/imc-pmu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c index cb50a9e1fd2d..e837717492e4 100644 --- a/arch/powerpc/perf/imc-pmu.c +++ b/arch/powerpc/perf/imc-pmu.c @@ -898,7 +898,7 @@ static int thread_imc_event_init(struct perf_event *event) if (event->attr.type != event->pmu->type) return -ENOENT; - if (!capable(CAP_SYS_ADMIN)) + if (!perfmon_capable()) return -EACCES; /* Sampling not supported */ @@ -1307,7 +1307,7 @@ static int trace_imc_event_init(struct perf_event *event) if (event->attr.type != event->pmu->type) return -ENOENT; - if (!capable(CAP_SYS_ADMIN)) + if (!perfmon_capable()) return -EACCES; /* Return if this is a couting event */ -- 2.20.1
[PATCH v5 06/10] trace/bpf_trace: open access for CAP_PERFMON privileged process
Open access to bpf_trace monitoring for CAP_PERFMON privileged processes. For backward compatibility reasons access to bpf_trace monitoring remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure bpf_trace monitoring is discouraged with respect to CAP_PERFMON capability. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operations more secure. Signed-off-by: Alexey Budankov --- kernel/trace/bpf_trace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index e5ef4ae9edb5..334f1d71ebb1 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -1395,7 +1395,7 @@ int perf_event_query_prog_array(struct perf_event *event, void __user *info) u32 *ids, prog_cnt, ids_len; int ret; - if (!capable(CAP_SYS_ADMIN)) + if (!perfmon_capable()) return -EPERM; if (event->attr.type != PERF_TYPE_TRACEPOINT) return -EINVAL; -- 2.20.1
[PATCH v5 05/10] drm/i915/perf: open access for CAP_PERFMON privileged process
Open access to i915_perf monitoring for CAP_PERFMON privileged processes. For backward compatibility reasons access to i915_perf subsystem remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure i915_perf monitoring is discouraged with respect to CAP_PERFMON capability. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operations more secure. Signed-off-by: Alexey Budankov --- drivers/gpu/drm/i915/i915_perf.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 2ae14bc14931..d89347861b7d 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -3375,10 +3375,10 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf, /* Similar to perf's kernel.perf_paranoid_cpu sysctl option * we check a dev.i915.perf_stream_paranoid sysctl option * to determine if it's ok to access system wide OA counters -* without CAP_SYS_ADMIN privileges. +* without CAP_PERFMON or CAP_SYS_ADMIN privileges. */ if (privileged_op && - i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) { + i915_perf_stream_paranoid && !perfmon_capable()) { DRM_DEBUG("Insufficient privileges to open i915 perf stream\n"); ret = -EACCES; goto err_ctx; @@ -3571,9 +3571,8 @@ static int read_properties_unlocked(struct i915_perf *perf, } else oa_freq_hz = 0; - if (oa_freq_hz > i915_oa_max_sample_rate && - !capable(CAP_SYS_ADMIN)) { - DRM_DEBUG("OA exponent would exceed the max sampling frequency (sysctl dev.i915.oa_max_sample_rate) %uHz without root privileges\n", + if (oa_freq_hz > i915_oa_max_sample_rate && !perfmon_capable()) { + DRM_DEBUG("OA exponent would exceed the max sampling frequency (sysctl dev.i915.oa_max_sample_rate) %uHz without CAP_PERFMON or CAP_SYS_ADMIN privileges\n", i915_oa_max_sample_rate); return -EACCES; } @@ -3994,7 +3993,7 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data, return -EINVAL; } - if (i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) { + if (i915_perf_stream_paranoid && !perfmon_capable()) { DRM_DEBUG("Insufficient privileges to add i915 OA config\n"); return -EACCES; } @@ -4141,7 +4140,7 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data, return -ENOTSUPP; } - if (i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) { + if (i915_perf_stream_paranoid && !perfmon_capable()) { DRM_DEBUG("Insufficient privileges to remove i915 OA config\n"); return -EACCES; } -- 2.20.1
[PATCH v5 04/10] perf tool: extend Perf tool with CAP_PERFMON capability support
Extend error messages to mention CAP_PERFMON capability as an option to substitute CAP_SYS_ADMIN capability for secure system performance monitoring and observability operations. Make perf_event_paranoid_check() and __cmd_ftrace() to be aware of CAP_PERFMON capability. Signed-off-by: Alexey Budankov --- tools/perf/builtin-ftrace.c | 5 +++-- tools/perf/design.txt | 3 ++- tools/perf/util/cap.h | 4 tools/perf/util/evsel.c | 10 +- tools/perf/util/util.c | 1 + 5 files changed, 15 insertions(+), 8 deletions(-) diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c index d5adc417a4ca..55eda54240fb 100644 --- a/tools/perf/builtin-ftrace.c +++ b/tools/perf/builtin-ftrace.c @@ -284,10 +284,11 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv) .events = POLLIN, }; - if (!perf_cap__capable(CAP_SYS_ADMIN)) { + if (!(perf_cap__capable(CAP_PERFMON) || + perf_cap__capable(CAP_SYS_ADMIN))) { pr_err("ftrace only works for %s!\n", #ifdef HAVE_LIBCAP_SUPPORT - "users with the SYS_ADMIN capability" + "users with the CAP_PERFMON or CAP_SYS_ADMIN capability" #else "root" #endif diff --git a/tools/perf/design.txt b/tools/perf/design.txt index 0453ba26cdbd..a42fab308ff6 100644 --- a/tools/perf/design.txt +++ b/tools/perf/design.txt @@ -258,7 +258,8 @@ gets schedule to. Per task counters can be created by any user, for their own tasks. A 'pid == -1' and 'cpu == x' counter is a per CPU counter that counts -all events on CPU-x. Per CPU counters need CAP_SYS_ADMIN privilege. +all events on CPU-x. Per CPU counters need CAP_PERFMON or CAP_SYS_ADMIN +privilege. The 'flags' parameter is currently unused and must be zero. diff --git a/tools/perf/util/cap.h b/tools/perf/util/cap.h index 051dc590ceee..ae52878c0b2e 100644 --- a/tools/perf/util/cap.h +++ b/tools/perf/util/cap.h @@ -29,4 +29,8 @@ static inline bool perf_cap__capable(int cap __maybe_unused) #define CAP_SYSLOG 34 #endif +#ifndef CAP_PERFMON +#define CAP_PERFMON38 +#endif + #endif /* __PERF_CAP_H */ diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index a69e64236120..a35f17723dd3 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -2491,14 +2491,14 @@ int perf_evsel__open_strerror(struct evsel *evsel, struct target *target, "You may not have permission to collect %sstats.\n\n" "Consider tweaking /proc/sys/kernel/perf_event_paranoid,\n" "which controls use of the performance events system by\n" -"unprivileged users (without CAP_SYS_ADMIN).\n\n" +"unprivileged users (without CAP_PERFMON or CAP_SYS_ADMIN).\n\n" "The current value is %d:\n\n" " -1: Allow use of (almost) all events by all users\n" " Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK\n" -">= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN\n" -" Disallow raw tracepoint access by users without CAP_SYS_ADMIN\n" -">= 1: Disallow CPU event access by users without CAP_SYS_ADMIN\n" -">= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN\n\n" +">= 0: Disallow ftrace function tracepoint by users without CAP_PERFMON or CAP_SYS_ADMIN\n" +" Disallow raw tracepoint access by users without CAP_SYS_PERFMON or CAP_SYS_ADMIN\n" +">= 1: Disallow CPU event access by users without CAP_PERFMON or CAP_SYS_ADMIN\n" +">= 2: Disallow kernel profiling by users without CAP_PERFMON or CAP_SYS_ADMIN\n\n" "To make this setting permanent, edit /etc/sysctl.conf too, e.g.:\n\n" " kernel.perf_event_paranoid = -1\n" , target->system_wide ? "system-wide " : "", diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c index 969ae560dad9..51cf3071db74 100644 --- a/tools/perf/util/util.c +++ b/tools/perf/util/util.c @@ -272,6 +272,7 @@ int perf_event_paranoid(void) bool perf_event_paranoid_check(int max_level) { return perf_cap__capable(CAP_SYS_ADMIN) || + perf_cap__capable(CAP_PERFMON) || perf_event_paranoid() <= max_level; } -- 2.20.1
[PATCH v5 03/10] perf/core: open access to anon probes for CAP_PERFMON privileged process
Open access to anon kprobes, uprobes and eBPF tracing for CAP_PERFMON privileged processes. For backward compatibility reasons access remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure monitoring is discouraged with respect to CAP_PERFMON capability. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operations more secure. Anon kprobes and uprobes are used by ftrace and eBPF. perf probe uses ftrace to define new kprobe events, and those events are treated as tracepoint events. eBPF defines new probes via perf_event_open syscall and then the probes are used in eBPF tracing. Signed-off-by: Alexey Budankov --- kernel/events/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index b1fcbbe24849..8a6c0b08451d 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9088,7 +9088,7 @@ static int perf_kprobe_event_init(struct perf_event *event) if (event->attr.type != perf_kprobe.type) return -ENOENT; - if (!capable(CAP_SYS_ADMIN)) + if (!perfmon_capable()) return -EACCES; /* @@ -9148,7 +9148,7 @@ static int perf_uprobe_event_init(struct perf_event *event) if (event->attr.type != perf_uprobe.type) return -ENOENT; - if (!capable(CAP_SYS_ADMIN)) + if (!perfmon_capable()) return -EACCES; /* -- 2.20.1
[PATCH v5 02/10] perf/core: open access to the core for CAP_PERFMON privileged process
Open access to monitoring of kernel code, system, tracepoints and namespaces data for a CAP_PERFMON privileged process. For backward compatibility reasons access to perf_events subsystem remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure perf_events monitoring is discouraged with respect to CAP_PERFMON capability. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. Signed-off-by: Alexey Budankov --- include/linux/perf_event.h | 6 +++--- kernel/events/core.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 6d4c22aee384..730469babcc2 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1285,7 +1285,7 @@ static inline int perf_is_paranoid(void) static inline int perf_allow_kernel(struct perf_event_attr *attr) { - if (sysctl_perf_event_paranoid > 1 && !capable(CAP_SYS_ADMIN)) + if (sysctl_perf_event_paranoid > 1 && !perfmon_capable()) return -EACCES; return security_perf_event_open(attr, PERF_SECURITY_KERNEL); @@ -1293,7 +1293,7 @@ static inline int perf_allow_kernel(struct perf_event_attr *attr) static inline int perf_allow_cpu(struct perf_event_attr *attr) { - if (sysctl_perf_event_paranoid > 0 && !capable(CAP_SYS_ADMIN)) + if (sysctl_perf_event_paranoid > 0 && !perfmon_capable()) return -EACCES; return security_perf_event_open(attr, PERF_SECURITY_CPU); @@ -1301,7 +1301,7 @@ static inline int perf_allow_cpu(struct perf_event_attr *attr) static inline int perf_allow_tracepoint(struct perf_event_attr *attr) { - if (sysctl_perf_event_paranoid > -1 && !capable(CAP_SYS_ADMIN)) + if (sysctl_perf_event_paranoid > -1 && !perfmon_capable()) return -EPERM; return security_perf_event_open(attr, PERF_SECURITY_TRACEPOINT); diff --git a/kernel/events/core.c b/kernel/events/core.c index a1f8bde19b56..b1fcbbe24849 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -11186,7 +11186,7 @@ SYSCALL_DEFINE5(perf_event_open, } if (attr.namespaces) { - if (!capable(CAP_SYS_ADMIN)) + if (!perfmon_capable()) return -EACCES; } -- 2.20.1
[PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space
Introduce CAP_PERFMON capability designed to secure system performance monitoring and observability operations so that CAP_PERFMON would assist CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf and other performance monitoring and observability subsystems. CAP_PERFMON intends to harden system security and integrity during system performance monitoring and observability operations by decreasing attack surface that is available to a CAP_SYS_ADMIN privileged process [1]. Providing access to system performance monitoring and observability operations under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to system performance monitoring and observability operations and balance amount of CAP_SYS_ADMIN credentials following the recommendations in the capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is overloaded; see Notes to kernel developers, below." Although the software running under CAP_PERFMON can not ensure avoidance of related hardware issues, the software can still mitigate these issues following the official embargoed hardware issues mitigation procedure [2]. The bugs in the software itself could be fixed following the standard kernel development process [3] to maintain and harden security of system performance monitoring and observability operations. [1] http://man7.org/linux/man-pages/man7/capabilities.7.html [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html Signed-off-by: Alexey Budankov --- include/linux/capability.h | 12 include/uapi/linux/capability.h | 8 +++- security/selinux/include/classmap.h | 4 ++-- 3 files changed, 21 insertions(+), 3 deletions(-) diff --git a/include/linux/capability.h b/include/linux/capability.h index ecce0f43c73a..8784969d91e1 100644 --- a/include/linux/capability.h +++ b/include/linux/capability.h @@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap); extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap); extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns); +static inline bool perfmon_capable(void) +{ + struct user_namespace *ns = _user_ns; + + if (ns_capable_noaudit(ns, CAP_PERFMON)) + return ns_capable(ns, CAP_PERFMON); + + if (ns_capable_noaudit(ns, CAP_SYS_ADMIN)) + return ns_capable(ns, CAP_SYS_ADMIN); + + return false; +} /* audit system wants to get cap info from files as well */ extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps); diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h index 240fdb9a60f6..8b416e5f3afa 100644 --- a/include/uapi/linux/capability.h +++ b/include/uapi/linux/capability.h @@ -366,8 +366,14 @@ struct vfs_ns_cap_data { #define CAP_AUDIT_READ 37 +/* + * Allow system performance and observability privileged operations + * using perf_events, i915_perf and other kernel subsystems + */ + +#define CAP_PERFMON38 -#define CAP_LAST_CAP CAP_AUDIT_READ +#define CAP_LAST_CAP CAP_PERFMON #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP) diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h index 7db24855e12d..c599b0c2b0e7 100644 --- a/security/selinux/include/classmap.h +++ b/security/selinux/include/classmap.h @@ -27,9 +27,9 @@ "audit_control", "setfcap" #define COMMON_CAP2_PERMS "mac_override", "mac_admin", "syslog", \ - "wake_alarm", "block_suspend", "audit_read" + "wake_alarm", "block_suspend", "audit_read", "perfmon" -#if CAP_LAST_CAP > CAP_AUDIT_READ +#if CAP_LAST_CAP > CAP_PERFMON #error New capability defined, please update COMMON_CAP2_PERMS. #endif -- 2.20.1
Re: [FSL P5020 P5040 PPC] Onboard SD card doesn't work anymore after the 'mmc-v5.4-2' updates
On Mon, 20 Jan 2020 at 10:17, Christian Zigotzky wrote: > > Am 16.01.20 um 16:46 schrieb Ulf Hansson: > > On Thu, 16 Jan 2020 at 12:18, Christian Zigotzky > > wrote: > >> Hi All, > >> > >> We still need the attached patch for our onboard SD card interface > >> [1,2]. Could you please add this patch to the tree? > > No, because according to previous discussion that isn't the correct > > solution and more importantly it will break other archs (if I recall > > correctly). > > > > Looks like someone from the ppc community needs to pick up the ball. > I am not sure if the ppc community have to fix this issue because your > updates (mmc-v5.4-2) are responsible for this issue. If nobody wants to > fix this issue then we will lost the onboard SD card support in the > future. PLEASE check the 'mmc-v5.4-2' updates again. Applying your suggested fix breaks other archs/boards. It's really not a good situation, but I will not take a step back when it's quite easy to take a step forward instead. Someone just need to care and send a patch, it doesn't look that hard to me, but maybe I am wrong. Apologies if this isn't the answer you wanted, but that's all I can do for now, sorry. Kind regards Uffe
[PATCH v5 0/10] Introduce CAP_PERFMON to secure system performance monitoring and observability
Currently access to perf_events, i915_perf and other performance monitoring and observability subsystems of the kernel is open for a privileged process [1] with CAP_SYS_ADMIN capability enabled in the process effective set [2]. This patch set introduces CAP_PERFMON capability designed to secure system performance monitoring and observability operations so that CAP_PERFMON would assist CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf and other performance monitoring and observability subsystems of the kernel. CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to system performance monitoring and observability operations and balance amount of CAP_SYS_ADMIN credentials following the recommendations in the capabilities man page [2] for CAP_SYS_ADMIN: "Note: this capability is overloaded; see Notes to kernel developers, below." CAP_PERFMON intends to harden system security and integrity during system performance monitoring and observability operations by decreasing attack surface that is available to a CAP_SYS_ADMIN privileged process [2]. Providing the access to system performance monitoring and observability operations under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes the operation more secure. For backward compatibility reasons access to system performance monitoring and observability subsystems of the kernel remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN capability usage for secure system performance monitoring and observability operations is discouraged with respect to the designed CAP_PERFMON capability. CAP_PERFMON intends to meet the demand to secure system performance monitoring and observability operations in security sensitive, restricted, multiuser production environments (e.g. HPC clusters, cloud and virtual compute environments) where root or CAP_SYS_ADMIN credentials are not available to mass users of a system because of security considerations. Possible alternative solution to this capabilities balancing, system security hardening task could be to use the existing CAP_SYS_PTRACE capability to govern system performance monitoring and observability operations. However CAP_SYS_PTRACE capability still provides users with more credentials than are required for secure performance monitoring and observability operations and this excess is avoided by the designed CAP_PERFMON capability. Although the software running under CAP_PERFMON can not ensure avoidance of related hardware issues, the software can still mitigate those issues following the official embargoed hardware issues mitigation procedure [3]. The bugs in the software itself could be fixed following the standard kernel development process [4] to maintain and harden security of system performance monitoring and observability operations. After all, the patch set is shaped in the way that simplifies procedure for backtracking of possible issues and bugs [5] as much as possible. The patch set is for tip perf/core repository: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip perf/core sha1: 5738891229a25e9e678122a843cbf0466a456d0c --- Changes in v5: - renamed CAP_SYS_PERFMON to CAP_PERFMON - extended perfmon_capable() with noaudit checks Changes in v4: - converted perfmon_capable() into an inline function - made perf_events kprobes, uprobes, hw breakpoints and namespaces data available to CAP_SYS_PERFMON privileged processes - applied perfmon_capable() to drivers/perf and drivers/oprofile - extended __cmd_ftrace() with support of CAP_SYS_PERFMON Changes in v3: - implemented perfmon_capable() macros aggregating required capabilities checks Changes in v2: - made perf_events trace points available to CAP_SYS_PERFMON privileged processes - made perf_event_paranoid_check() treat CAP_SYS_PERFMON equally to CAP_SYS_ADMIN - applied CAP_SYS_PERFMON to i915_perf, bpf_trace, powerpc and parisc system performance monitoring and observability related subsystems --- Alexey Budankov (10): capabilities: introduce CAP_PERFMON to kernel and user space perf/core: open access to the core for CAP_PERFMON privileged process perf/core: open access to anon probes for CAP_PERFMON privileged process perf tool: extend Perf tool with CAP_PERFMON capability support drm/i915/perf: open access for CAP_PERFMON privileged process trace/bpf_trace: open access for CAP_PERFMON privileged process powerpc/perf: open access for CAP_PERFMON privileged process parisc/perf: open access for CAP_PERFMON privileged process drivers/perf: open access for CAP_PERFMON privileged process drivers/oprofile: open access for CAP_PERFMON privileged process arch/parisc/kernel/perf.c | 2 +- arch/powerpc/perf/imc-pmu.c | 4 ++-- drivers/gpu/drm/i915/i915_perf.c| 13 ++--- drivers/oprofile/event_buffer.c | 2 +- drivers/perf/arm_spe_pmu.c | 4 ++--
[PATCH 2/2] powerpc/perf: Implement a global lock to avoid races between trace, core and thread imc events.
IMC(In-memory Collection Counters) does performance monitoring in two different modes, i.e accumulation mode(core-imc and thread-imc events), and trace mode(trace-imc events). A cpu thread can either be in accumulation-mode or trace-mode at a time and this is done via the LDBAR register in POWER architecture. The current design does not address the races between thread-imc and trace-imc events. Patch implements a global id and lock to avoid the races between core, trace and thread imc events. With this global id-lock implementation, the system can either run core, thread or trace imc events at a time. i.e. to run any core-imc events, thread/trace imc events should not be enabled/monitored. Signed-off-by: Anju T Sudhakar --- arch/powerpc/perf/imc-pmu.c | 177 +++- 1 file changed, 153 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c index cb50a9e1fd2d..2e220f199530 100644 --- a/arch/powerpc/perf/imc-pmu.c +++ b/arch/powerpc/perf/imc-pmu.c @@ -44,6 +44,16 @@ static DEFINE_PER_CPU(u64 *, trace_imc_mem); static struct imc_pmu_ref *trace_imc_refc; static int trace_imc_mem_size; +/* + * Global data structure used to avoid races between thread, + * core and trace-imc + */ +static struct imc_pmu_ref imc_global_refc = { + .lock = __MUTEX_INITIALIZER(imc_global_refc.lock), + .id = 0, + .refc = 0, +}; + static struct imc_pmu *imc_event_to_pmu(struct perf_event *event) { return container_of(event->pmu, struct imc_pmu, pmu); @@ -759,6 +769,20 @@ static void core_imc_counters_release(struct perf_event *event) ref->refc = 0; } mutex_unlock(>lock); + + mutex_lock(_global_refc.lock); + if (imc_global_refc.id == IMC_DOMAIN_CORE) { + imc_global_refc.refc--; + /* +* If no other thread is running any core-imc +* event, set the global id to zero. +*/ + if (imc_global_refc.refc <= 0) { + imc_global_refc.refc = 0; + imc_global_refc.id = 0; + } + } + mutex_unlock(_global_refc.lock); } static int core_imc_event_init(struct perf_event *event) @@ -779,6 +803,22 @@ static int core_imc_event_init(struct perf_event *event) if (event->cpu < 0) return -EINVAL; + /* +* Take the global lock, and make sure +* no other thread is running any trace OR thread imc event +*/ + mutex_lock(_global_refc.lock); + if (imc_global_refc.id == 0) { + imc_global_refc.id = IMC_DOMAIN_CORE; + imc_global_refc.refc++; + } else if (imc_global_refc.id == IMC_DOMAIN_CORE) { + imc_global_refc.refc++; + } else { + mutex_unlock(_global_refc.lock); + return -EBUSY; + } + mutex_unlock(_global_refc.lock); + event->hw.idx = -1; pmu = imc_event_to_pmu(event); @@ -877,7 +917,16 @@ static int ppc_thread_imc_cpu_online(unsigned int cpu) static int ppc_thread_imc_cpu_offline(unsigned int cpu) { - mtspr(SPRN_LDBAR, 0); + /* +* Toggle the bit 0 of LDBAR. +* +* If bit 0 of LDBAR is unset, it will stop posting +* the counetr data to memory. +* For thread-imc, bit 0 of LDBAR will be set to 1 in the +* event_add function. So toggle this bit here, to stop the updates +* to memory in the cpu_offline path. +*/ + mtspr(SPRN_LDBAR, (mfspr(SPRN_LDBAR) ^ (1UL << 63))); return 0; } @@ -889,6 +938,24 @@ static int thread_imc_cpu_init(void) ppc_thread_imc_cpu_offline); } +static void thread_imc_counters_release(struct perf_event *event) +{ + + mutex_lock(_global_refc.lock); + if (imc_global_refc.id == IMC_DOMAIN_THREAD) { + imc_global_refc.refc--; + /* +* If no other thread is running any thread-imc +* event, set the global id to zero. +*/ + if (imc_global_refc.refc <= 0) { + imc_global_refc.refc = 0; + imc_global_refc.id = 0; + } + } + mutex_unlock(_global_refc.lock); +} + static int thread_imc_event_init(struct perf_event *event) { u32 config = event->attr.config; @@ -905,6 +972,27 @@ static int thread_imc_event_init(struct perf_event *event) if (event->hw.sample_period) return -EINVAL; + mutex_lock(_global_refc.lock); + /* +* Check if any other thread is running +* core-engine, if not set the global id to +* thread-imc. +*/ + if (imc_global_refc.id == 0) { + imc_global_refc.id = IMC_DOMAIN_THREAD; + imc_global_refc.refc++; + } else if (imc_global_refc.id == IMC_DOMAIN_THREAD)
[PATCH 1/2] powerpc/powernv: Re-enable imc trace-mode in kernel
commit <249fad734a25> ""powerpc/perf: Disable trace_imc pmu" disables IMC(In-Memory Collection) trace-mode in kernel, since frequent mode switching between accumulation mode and trace mode via the spr LDBAR in the hardware can trigger a checkstop(system crash). Patch to re-enable imc-trace mode in kernel. The following patch in this series will address the mode switching issue by implementing a global lock, and will restrict the usage of accumulation and trace-mode at a time. Signed-off-by: Anju T Sudhakar --- arch/powerpc/platforms/powernv/opal-imc.c | 9 + 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c index 000b350d4060..3b4518f4b643 100644 --- a/arch/powerpc/platforms/powernv/opal-imc.c +++ b/arch/powerpc/platforms/powernv/opal-imc.c @@ -278,14 +278,7 @@ static int opal_imc_counters_probe(struct platform_device *pdev) domain = IMC_DOMAIN_THREAD; break; case IMC_TYPE_TRACE: - /* -* FIXME. Using trace_imc events to monitor application -* or KVM thread performance can cause a checkstop -* (system crash). -* Disable it for now. -*/ - pr_info_once("IMC: disabling trace_imc PMU\n"); - domain = -1; + domain = IMC_DOMAIN_TRACE; break; default: pr_warn("IMC Unknown Device type \n"); -- 2.20.1
Re: [PATCH RFC v1] mm: is_mem_section_removable() overhaul
On 20.01.20 10:14, David Hildenbrand wrote: > On 20.01.20 08:48, Michal Hocko wrote: >> On Fri 17-01-20 08:57:51, Dan Williams wrote: >> [...] >>> Unless the user is willing to hold the device_hotplug_lock over the >>> evaluation then the result is unreliable. >> >> Do we want to hold the device_hotplug_lock from this user readable file >> in the first place? My book says that this just waits to become a >> problem. > > It was the "big hammer" solution for this RFC. > > I think we could do with a try_lock() on the device_lock() paired with a > device->removed flag. The latter is helpful for properly catching zombie > devices on the onlining/offlining path either way (and on my todo list). We do have dev->p->dead which could come in handy. -- Thanks, David / dhildenb
[FSL P5020 P5040 PPC] Onboard SD card doesn't work anymore after the 'mmc-v5.4-2' updates
Am 16.01.20 um 16:46 schrieb Ulf Hansson: On Thu, 16 Jan 2020 at 12:18, Christian Zigotzky wrote: Hi All, We still need the attached patch for our onboard SD card interface [1,2]. Could you please add this patch to the tree? No, because according to previous discussion that isn't the correct solution and more importantly it will break other archs (if I recall correctly). Looks like someone from the ppc community needs to pick up the ball. I am not sure if the ppc community have to fix this issue because your updates (mmc-v5.4-2) are responsible for this issue. If nobody wants to fix this issue then we will lost the onboard SD card support in the future. PLEASE check the 'mmc-v5.4-2' updates again. Thanks, Christian [1] https://www.spinics.net/lists/linux-mmc/msg56211.html I think this discussion even suggested some viable solutions, so it just be a matter of sending a patch :-) [2] http://forum.hyperion-entertainment.com/viewtopic.php?f=58=4349=20#p49012 Kind regards Uffe
Re: [PATCH RFC v1] mm: is_mem_section_removable() overhaul
On 20.01.20 08:48, Michal Hocko wrote: > On Fri 17-01-20 08:57:51, Dan Williams wrote: > [...] >> Unless the user is willing to hold the device_hotplug_lock over the >> evaluation then the result is unreliable. > > Do we want to hold the device_hotplug_lock from this user readable file > in the first place? My book says that this just waits to become a > problem. It was the "big hammer" solution for this RFC. I think we could do with a try_lock() on the device_lock() paired with a device->removed flag. The latter is helpful for properly catching zombie devices on the onlining/offlining path either way (and on my todo list). > > Really, the interface is flawed and should have never been merged in the > first place. We cannot simply remove it altogether I am afraid so let's > at least remove the bogus code and pretend that the world is a better > place where everything is removable except the reality sucks... As I expressed already, the interface works as designed/documented and has been used like that for years. I tend to agree that it never should have been merged like that. We have (at least) two places that are racy (with concurrent memory hotplug): 1. /sys/.../memoryX/removable - a) make it always return yes and make the interface useless - b) add proper locking and keep it running as is (e.g., so David can identify offlineable memory blocks :) ). 2. /sys/.../memoryX/valid_zones - a) always return "none" if the memory is online - b) add proper locking and keep it running as is - c) cache the result ("zone") when a block is onlined (e.g., in mem->zone. If it is NULL, either mixed zones or unknown) At least 2. already scream for a proper device_lock() locking as the mem->state is not stable across the function call. 1a and 2a are the easiest solutions but remove all ways to identify if a memory block could theoretically be offlined - without trying (especially, also to identify the MOVABLE zone). I tend to prefer 1b) and 2c), paired with proper device_lock() locking. We don't affect existing use cases but are able to simplify the code + fix the races. What's your opinion? Any alternatives? -- Thanks, David / dhildenb
[PATCH v17 16/24] selftests/vm/pkeys: Fix assertion in test_pkey_alloc_exhaust()
From: Ram Pai Some pkeys which are valid on the hardware are reserved and not available for application use. These keys cannot be allocated. test_pkey_alloc_exhaust() tries to account for these and has an assertion which validates if all available pkeys have been exahaustively allocated. However, the expression that is currently used is only valid for x86. On powerpc, a pkey is additionally reserved as compared to x86. Hence, the assertion is made to use an arch-specific helper to get the correct count of reserved pkeys. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/protection_keys.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index e6de078a9196..5fcbbc525364 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -1153,6 +1153,7 @@ void test_pkey_alloc_exhaust(int *ptr, u16 pkey) dprintf3("%s()::%d\n", __func__, __LINE__); /* +* On x86: * There are 16 pkeys supported in hardware. Three are * allocated by the time we get here: * 1. The default key (0) @@ -1160,8 +1161,16 @@ void test_pkey_alloc_exhaust(int *ptr, u16 pkey) * 3. One allocated by the test code and passed in via * 'pkey' to this function. * Ensure that we can allocate at least another 13 (16-3). +* +* On powerpc: +* There are either 5, 28, 29 or 32 pkeys supported in +* hardware depending on the page size (4K or 64K) and +* platform (powernv or powervm). Four are allocated by +* the time we get here. These include pkey-0, pkey-1, +* exec-only pkey and the one allocated by the test code. +* Ensure that we can allocate the remaining. */ - pkey_assert(i >= NR_PKEYS-3); + pkey_assert(i >= (NR_PKEYS - get_arch_reserved_keys() - 1)); for (i = 0; i < nr_allocated_pkeys; i++) { err = sys_pkey_free(allocated_pkeys[i]); -- 2.17.1
[PATCH v17 17/24] selftests/vm/pkeys: Improve checks to determine pkey support
From: Ram Pai For the pkeys subsystem to work, both the CPU and the kernel need to have support. So, additionally check if the kernel supports pkeys apart from the CPU feature checks. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-helpers.h| 30 tools/testing/selftests/vm/pkey-powerpc.h| 3 +- tools/testing/selftests/vm/pkey-x86.h| 2 +- tools/testing/selftests/vm/protection_keys.c | 7 +++-- 4 files changed, 37 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h index 2f4b1eb3a680..59ccdff18214 100644 --- a/tools/testing/selftests/vm/pkey-helpers.h +++ b/tools/testing/selftests/vm/pkey-helpers.h @@ -76,6 +76,8 @@ extern void abort_hooks(void); __attribute__((noinline)) int read_ptr(int *ptr); void expected_pkey_fault(int pkey); +int sys_pkey_alloc(unsigned long flags, unsigned long init_val); +int sys_pkey_free(unsigned long pkey); #if defined(__i386__) || defined(__x86_64__) /* arch */ #include "pkey-x86.h" @@ -186,4 +188,32 @@ static inline u32 *siginfo_get_pkey_ptr(siginfo_t *si) #endif } +static inline int kernel_has_pkeys(void) +{ + /* try allocating a key and see if it succeeds */ + int ret = sys_pkey_alloc(0, 0); + if (ret <= 0) { + return 0; + } + sys_pkey_free(ret); + return 1; +} + +static inline int is_pkeys_supported(void) +{ + /* check if the cpu supports pkeys */ + if (!cpu_has_pkeys()) { + dprintf1("SKIP: %s: no CPU support\n", __func__); + return 0; + } + + /* check if the kernel supports pkeys */ + if (!kernel_has_pkeys()) { + dprintf1("SKIP: %s: no kernel support\n", __func__); + return 0; + } + + return 1; +} + #endif /* _PKEYS_HELPER_H */ diff --git a/tools/testing/selftests/vm/pkey-powerpc.h b/tools/testing/selftests/vm/pkey-powerpc.h index 319673bbab0b..7d7c3ffafdd9 100644 --- a/tools/testing/selftests/vm/pkey-powerpc.h +++ b/tools/testing/selftests/vm/pkey-powerpc.h @@ -63,8 +63,9 @@ static inline void __write_pkey_reg(u64 pkey_reg) __func__, __read_pkey_reg(), pkey_reg); } -static inline int cpu_has_pku(void) +static inline int cpu_has_pkeys(void) { + /* No simple way to determine this */ return 1; } diff --git a/tools/testing/selftests/vm/pkey-x86.h b/tools/testing/selftests/vm/pkey-x86.h index a0c59d4f7af2..6421b846aa16 100644 --- a/tools/testing/selftests/vm/pkey-x86.h +++ b/tools/testing/selftests/vm/pkey-x86.h @@ -97,7 +97,7 @@ static inline void __cpuid(unsigned int *eax, unsigned int *ebx, #define X86_FEATURE_PKU(1<<3) /* Protection Keys for Userspace */ #define X86_FEATURE_OSPKE (1<<4) /* OS Protection Keys Enable */ -static inline int cpu_has_pku(void) +static inline int cpu_has_pkeys(void) { unsigned int eax; unsigned int ebx; diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index 5fcbbc525364..95f173049f43 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -1378,7 +1378,7 @@ void test_mprotect_pkey_on_unsupported_cpu(int *ptr, u16 pkey) int size = PAGE_SIZE; int sret; - if (cpu_has_pku()) { + if (cpu_has_pkeys()) { dprintf1("SKIP: %s: no CPU support\n", __func__); return; } @@ -1447,12 +1447,13 @@ void pkey_setup_shadow(void) int main(void) { int nr_iterations = 22; + int pkeys_supported = is_pkeys_supported(); setup_handlers(); - printf("has pku: %d\n", cpu_has_pku()); + printf("has pkeys: %d\n", pkeys_supported); - if (!cpu_has_pku()) { + if (!pkeys_supported) { int size = PAGE_SIZE; int *ptr; -- 2.17.1
[PATCH v17 20/24] selftests/vm/pkeys: Detect write violation on a mapped access-denied-key page
From: Ram Pai Detect write-violation on a page to which access-disabled key is associated much after the page is mapped. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Acked-by: Dave Hansen Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/protection_keys.c | 13 + 1 file changed, 13 insertions(+) diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index cb31a5cdf6d9..8bb4de103874 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -1027,6 +1027,18 @@ void test_write_of_access_disabled_region(int *ptr, u16 pkey) *ptr = __LINE__; expected_pkey_fault(pkey); } + +void test_write_of_access_disabled_region_with_page_already_mapped(int *ptr, + u16 pkey) +{ + *ptr = __LINE__; + dprintf1("disabling access; after accessing the page, " + " to PKEY[%02d], doing write\n", pkey); + pkey_access_deny(pkey); + *ptr = __LINE__; + expected_pkey_fault(pkey); +} + void test_kernel_write_of_access_disabled_region(int *ptr, u16 pkey) { int ret; @@ -1423,6 +1435,7 @@ void (*pkey_tests[])(int *ptr, u16 pkey) = { test_write_of_write_disabled_region, test_write_of_write_disabled_region_with_page_already_mapped, test_write_of_access_disabled_region, + test_write_of_access_disabled_region_with_page_already_mapped, test_kernel_write_of_access_disabled_region, test_kernel_write_of_write_disabled_region, test_kernel_gup_of_access_disabled_region, -- 2.17.1
[PATCH v17 05/24] selftests/vm/pkeys: Move some definitions to arch-specific header
From: Thiago Jung Bauermann In preparation for multi-arch support, move definitions which have arch-specific values to x86-specific header. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Thiago Jung Bauermann Acked-by: Dave Hansen Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-helpers.h| 111 + tools/testing/selftests/vm/pkey-x86.h| 156 +++ tools/testing/selftests/vm/protection_keys.c | 47 -- 3 files changed, 162 insertions(+), 152 deletions(-) create mode 100644 tools/testing/selftests/vm/pkey-x86.h diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h index 6ad1bd54ef94..3ed2f021bf7a 100644 --- a/tools/testing/selftests/vm/pkey-helpers.h +++ b/tools/testing/selftests/vm/pkey-helpers.h @@ -21,9 +21,6 @@ #define PTR_ERR_ENOTSUP ((void *)-ENOTSUP) -#define NR_PKEYS 16 -#define PKEY_BITS_PER_PKEY 2 - #ifndef DEBUG_LEVEL #define DEBUG_LEVEL 0 #endif @@ -73,19 +70,13 @@ extern void abort_hooks(void); } \ } while (0) +#if defined(__i386__) || defined(__x86_64__) /* arch */ +#include "pkey-x86.h" +#else /* arch */ +#error Architecture not supported +#endif /* arch */ + extern unsigned int shadow_pkey_reg; -static inline unsigned int __read_pkey_reg(void) -{ - unsigned int eax, edx; - unsigned int ecx = 0; - unsigned int pkey_reg; - - asm volatile(".byte 0x0f,0x01,0xee\n\t" -: "=a" (eax), "=d" (edx) -: "c" (ecx)); - pkey_reg = eax; - return pkey_reg; -} static inline unsigned int _read_pkey_reg(int line) { @@ -100,19 +91,6 @@ static inline unsigned int _read_pkey_reg(int line) #define read_pkey_reg() _read_pkey_reg(__LINE__) -static inline void __write_pkey_reg(unsigned int pkey_reg) -{ - unsigned int eax = pkey_reg; - unsigned int ecx = 0; - unsigned int edx = 0; - - dprintf4("%s() changing %08x to %08x\n", __func__, - __read_pkey_reg(), pkey_reg); - asm volatile(".byte 0x0f,0x01,0xef\n\t" -: : "a" (eax), "c" (ecx), "d" (edx)); - assert(pkey_reg == __read_pkey_reg()); -} - static inline void write_pkey_reg(unsigned int pkey_reg) { dprintf4("%s() changing %08x to %08x\n", __func__, @@ -157,83 +135,6 @@ static inline void __pkey_write_allow(int pkey, int do_allow_write) dprintf4("pkey_reg now: %08x\n", read_pkey_reg()); } -#define PAGE_SIZE 4096 -#define MB (1<<20) - -static inline void __cpuid(unsigned int *eax, unsigned int *ebx, - unsigned int *ecx, unsigned int *edx) -{ - /* ecx is often an input as well as an output. */ - asm volatile( - "cpuid;" - : "=a" (*eax), - "=b" (*ebx), - "=c" (*ecx), - "=d" (*edx) - : "0" (*eax), "2" (*ecx)); -} - -/* Intel-defined CPU features, CPUID level 0x0007:0 (ecx) */ -#define X86_FEATURE_PKU(1<<3) /* Protection Keys for Userspace */ -#define X86_FEATURE_OSPKE (1<<4) /* OS Protection Keys Enable */ - -static inline int cpu_has_pku(void) -{ - unsigned int eax; - unsigned int ebx; - unsigned int ecx; - unsigned int edx; - - eax = 0x7; - ecx = 0x0; - __cpuid(, , , ); - - if (!(ecx & X86_FEATURE_PKU)) { - dprintf2("cpu does not have PKU\n"); - return 0; - } - if (!(ecx & X86_FEATURE_OSPKE)) { - dprintf2("cpu does not have OSPKE\n"); - return 0; - } - return 1; -} - -#define XSTATE_PKEY_BIT(9) -#define XSTATE_PKEY0x200 - -int pkey_reg_xstate_offset(void) -{ - unsigned int eax; - unsigned int ebx; - unsigned int ecx; - unsigned int edx; - int xstate_offset; - int xstate_size; - unsigned long XSTATE_CPUID = 0xd; - int leaf; - - /* assume that XSTATE_PKEY is set in XCR0 */ - leaf = XSTATE_PKEY_BIT; - { - eax = XSTATE_CPUID; - ecx = leaf; - __cpuid(, , , ); - - if (leaf == XSTATE_PKEY_BIT) { - xstate_offset = ebx; - xstate_size = eax; - } - } - - if (xstate_size == 0) { - printf("could not find size/offset of PKEY in xsave state\n"); - return 0; - } - - return xstate_offset; -} - #define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x))) #define ALIGN_UP(x, align_to) (((x) + ((align_to)-1)) & ~((align_to)-1)) #define ALIGN_DOWN(x, align_to) ((x) & ~((align_to)-1)) diff --git a/tools/testing/selftests/vm/pkey-x86.h b/tools/testing/selftests/vm/pkey-x86.h new file mode 100644 index ..2f04ade8ca9c --- /dev/null +++ b/tools/testing/selftests/vm/pkey-x86.h @@ -0,0 +1,156 @@ +/*
[PATCH v17 11/24] selftests/vm/pkeys: Fix alloc_random_pkey() to make it really random
From: Ram Pai alloc_random_pkey() was allocating the same pkey every time. Not all pkeys were geting tested. This fixes it. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Acked-by: Dave Hansen Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/protection_keys.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index 7fd52d5c4bfd..9cc82b65f828 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -25,6 +25,7 @@ #define __SANE_USERSPACE_TYPES__ #include #include +#include #include #include #include @@ -546,10 +547,10 @@ int alloc_random_pkey(void) int nr_alloced = 0; int random_index; memset(alloced_pkeys, 0, sizeof(alloced_pkeys)); + srand((unsigned int)time(NULL)); /* allocate every possible key and make a note of which ones we got */ max_nr_pkey_allocs = NR_PKEYS; - max_nr_pkey_allocs = 1; for (i = 0; i < max_nr_pkey_allocs; i++) { int new_pkey = alloc_pkey(); if (new_pkey < 0) -- 2.17.1
[PATCH v17 13/24] selftests/vm/pkeys: Introduce generic pkey abstractions
From: Ram Pai This introduces some generic abstractions and provides the corresponding architecture-specfic implementations for these abstractions. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Thiago Jung Bauermann Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-helpers.h| 12 tools/testing/selftests/vm/pkey-x86.h| 15 +++ tools/testing/selftests/vm/protection_keys.c | 8 ++-- 3 files changed, 29 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h index 0e3da7c8d628..621fb2a0a5ef 100644 --- a/tools/testing/selftests/vm/pkey-helpers.h +++ b/tools/testing/selftests/vm/pkey-helpers.h @@ -74,6 +74,9 @@ extern void abort_hooks(void); } \ } while (0) +__attribute__((noinline)) int read_ptr(int *ptr); +void expected_pkey_fault(int pkey); + #if defined(__i386__) || defined(__x86_64__) /* arch */ #include "pkey-x86.h" #else /* arch */ @@ -172,4 +175,13 @@ static inline void __pkey_write_allow(int pkey, int do_allow_write) #define __stringify_1(x...) #x #define __stringify(x...) __stringify_1(x) +static inline u32 *siginfo_get_pkey_ptr(siginfo_t *si) +{ +#ifdef si_pkey + return >si_pkey; +#else + return (u32 *)(((u8 *)si) + si_pkey_offset); +#endif +} + #endif /* _PKEYS_HELPER_H */ diff --git a/tools/testing/selftests/vm/pkey-x86.h b/tools/testing/selftests/vm/pkey-x86.h index def2a1bcf6a5..a0c59d4f7af2 100644 --- a/tools/testing/selftests/vm/pkey-x86.h +++ b/tools/testing/selftests/vm/pkey-x86.h @@ -42,6 +42,7 @@ #endif #define NR_PKEYS 16 +#define NR_RESERVED_PKEYS 2 /* pkey-0 and exec-only-pkey */ #define PKEY_BITS_PER_PKEY 2 #define HPAGE_SIZE (1UL<<21) #define PAGE_SIZE 4096 @@ -158,4 +159,18 @@ int pkey_reg_xstate_offset(void) return xstate_offset; } +static inline int get_arch_reserved_keys(void) +{ + return NR_RESERVED_PKEYS; +} + +void expect_fault_on_read_execonly_key(void *p1, int pkey) +{ + int ptr_contents; + + ptr_contents = read_ptr(p1); + dprintf2("ptr (%p) contents@%d: %x\n", p1, __LINE__, ptr_contents); + expected_pkey_fault(pkey); +} + #endif /* _PKEYS_X86_H */ diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index 535e464e27e9..57c71056c93d 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -1307,9 +1307,7 @@ void test_executing_on_unreadable_memory(int *ptr, u16 pkey) madvise(p1, PAGE_SIZE, MADV_DONTNEED); lots_o_noops_around_write(); do_not_expect_pkey_fault("executing on PROT_EXEC memory"); - ptr_contents = read_ptr(p1); - dprintf2("ptr (%p) contents@%d: %x\n", p1, __LINE__, ptr_contents); - expected_pkey_fault(pkey); + expect_fault_on_read_execonly_key(p1, pkey); } void test_implicit_mprotect_exec_only_memory(int *ptr, u16 pkey) @@ -1336,9 +1334,7 @@ void test_implicit_mprotect_exec_only_memory(int *ptr, u16 pkey) madvise(p1, PAGE_SIZE, MADV_DONTNEED); lots_o_noops_around_write(); do_not_expect_pkey_fault("executing on PROT_EXEC memory"); - ptr_contents = read_ptr(p1); - dprintf2("ptr (%p) contents@%d: %x\n", p1, __LINE__, ptr_contents); - expected_pkey_fault(UNKNOWN_PKEY); + expect_fault_on_read_execonly_key(p1, UNKNOWN_PKEY); /* * Put the memory back to non-PROT_EXEC. Should clear the -- 2.17.1
[PATCH v17 07/24] selftests: vm: pkeys: Use sane types for pkey register
The size of the pkey register can vary across architectures. This converts the data type of all its references to u64 in preparation for multi-arch support. To keep the definition of the u64 type consistent and remove format specifier related warnings, __SANE_USERSPACE_TYPES__ is defined as suggested by Michael Ellerman. Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-helpers.h| 31 +++ tools/testing/selftests/vm/pkey-x86.h| 8 +- tools/testing/selftests/vm/protection_keys.c | 86 3 files changed, 72 insertions(+), 53 deletions(-) diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h index 7f18a82e54fc..dfbce49269ce 100644 --- a/tools/testing/selftests/vm/pkey-helpers.h +++ b/tools/testing/selftests/vm/pkey-helpers.h @@ -14,10 +14,10 @@ #include /* Define some kernel-like types */ -#define u8 uint8_t -#define u16 uint16_t -#define u32 uint32_t -#define u64 uint64_t +#define u8 __u8 +#define u16 __u16 +#define u32 __u32 +#define u64 __u64 #define PTR_ERR_ENOTSUP ((void *)-ENOTSUP) @@ -80,13 +80,14 @@ extern void abort_hooks(void); #error Architecture not supported #endif /* arch */ -extern unsigned int shadow_pkey_reg; +extern u64 shadow_pkey_reg; -static inline unsigned int _read_pkey_reg(int line) +static inline u64 _read_pkey_reg(int line) { - unsigned int pkey_reg = __read_pkey_reg(); + u64 pkey_reg = __read_pkey_reg(); - dprintf4("read_pkey_reg(line=%d) pkey_reg: %x shadow: %x\n", + dprintf4("read_pkey_reg(line=%d) pkey_reg: %016llx" + " shadow: %016llx\n", line, pkey_reg, shadow_pkey_reg); assert(pkey_reg == shadow_pkey_reg); @@ -95,15 +96,15 @@ static inline unsigned int _read_pkey_reg(int line) #define read_pkey_reg() _read_pkey_reg(__LINE__) -static inline void write_pkey_reg(unsigned int pkey_reg) +static inline void write_pkey_reg(u64 pkey_reg) { - dprintf4("%s() changing %08x to %08x\n", __func__, + dprintf4("%s() changing %016llx to %016llx\n", __func__, __read_pkey_reg(), pkey_reg); /* will do the shadow check for us: */ read_pkey_reg(); __write_pkey_reg(pkey_reg); shadow_pkey_reg = pkey_reg; - dprintf4("%s(%08x) pkey_reg: %08x\n", __func__, + dprintf4("%s(%016llx) pkey_reg: %016llx\n", __func__, pkey_reg, __read_pkey_reg()); } @@ -113,7 +114,7 @@ static inline void write_pkey_reg(unsigned int pkey_reg) */ static inline void __pkey_access_allow(int pkey, int do_allow) { - unsigned int pkey_reg = read_pkey_reg(); + u64 pkey_reg = read_pkey_reg(); int bit = pkey * 2; if (do_allow) @@ -121,13 +122,13 @@ static inline void __pkey_access_allow(int pkey, int do_allow) else pkey_reg |= (1< #include #include @@ -48,7 +49,7 @@ int iteration_nr = 1; int test_nr; -unsigned int shadow_pkey_reg; +u64 shadow_pkey_reg; int dprint_in_signal; char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE]; @@ -163,7 +164,7 @@ void dump_mem(void *dumpme, int len_bytes) for (i = 0; i < len_bytes; i += sizeof(u64)) { u64 *ptr = (u64 *)(c + i); - dprintf1("dump[%03d][@%p]: %016jx\n", i, ptr, *ptr); + dprintf1("dump[%03d][@%p]: %016llx\n", i, ptr, *ptr); } } @@ -205,7 +206,8 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext) dprint_in_signal = 1; dprintf1("===SIGSEGV\n"); - dprintf1("%s()::%d, pkey_reg: 0x%x shadow: %x\n", __func__, __LINE__, + dprintf1("%s()::%d, pkey_reg: 0x%016llx shadow: %016llx\n", + __func__, __LINE__, __read_pkey_reg(), shadow_pkey_reg); trapno = uctxt->uc_mcontext.gregs[REG_TRAPNO]; @@ -213,8 +215,9 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext) fpregset = uctxt->uc_mcontext.fpregs; fpregs = (void *)fpregset; - dprintf2("%s() trapno: %d ip: 0x%lx info->si_code: %s/%d\n", __func__, - trapno, ip, si_code_str(si->si_code), si->si_code); + dprintf2("%s() trapno: %d ip: 0x%016lx info->si_code: %s/%d\n", + __func__, trapno, ip, si_code_str(si->si_code), + si->si_code); #ifdef __i386__ /* * 32-bit has some extra padding so that userspace can tell whether @@ -256,8 +259,9 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext) * need __read_pkey_reg() version so we do not do shadow_pkey_reg * checking */ - dprintf1("signal pkey_reg from pkey_reg: %08x\n", __read_pkey_reg()); - dprintf1("pkey from siginfo: %jx\n", siginfo_pkey); + dprintf1("signal pkey_reg from pkey_reg: %016llx\n", + __read_pkey_reg());
[PATCH v17 22/24] selftests/vm/pkeys: Test correct behaviour of pkey-0
From: Ram Pai Ensure that pkey-0 is allocated on start and that it can be attached dynamically in various modes, without failures. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/protection_keys.c | 53 1 file changed, 53 insertions(+) diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index d4952b57cc90..a1cb9a71e77c 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -964,6 +964,58 @@ __attribute__((noinline)) int read_ptr(int *ptr) return *ptr; } +void test_pkey_alloc_free_attach_pkey0(int *ptr, u16 pkey) +{ + int i, err; + int max_nr_pkey_allocs; + int alloced_pkeys[NR_PKEYS]; + int nr_alloced = 0; + long size; + + pkey_assert(pkey_last_malloc_record); + size = pkey_last_malloc_record->size; + /* +* This is a bit of a hack. But mprotect() requires +* huge-page-aligned sizes when operating on hugetlbfs. +* So, make sure that we use something that's a multiple +* of a huge page when we can. +*/ + if (size >= HPAGE_SIZE) + size = HPAGE_SIZE; + + /* allocate every possible key and make sure key-0 never got allocated */ + max_nr_pkey_allocs = NR_PKEYS; + for (i = 0; i < max_nr_pkey_allocs; i++) { + int new_pkey = alloc_pkey(); + pkey_assert(new_pkey != 0); + + if (new_pkey < 0) + break; + alloced_pkeys[nr_alloced++] = new_pkey; + } + /* free all the allocated keys */ + for (i = 0; i < nr_alloced; i++) { + int free_ret; + + if (!alloced_pkeys[i]) + continue; + free_ret = sys_pkey_free(alloced_pkeys[i]); + pkey_assert(!free_ret); + } + + /* attach key-0 in various modes */ + err = sys_mprotect_pkey(ptr, size, PROT_READ, 0); + pkey_assert(!err); + err = sys_mprotect_pkey(ptr, size, PROT_WRITE, 0); + pkey_assert(!err); + err = sys_mprotect_pkey(ptr, size, PROT_EXEC, 0); + pkey_assert(!err); + err = sys_mprotect_pkey(ptr, size, PROT_READ|PROT_WRITE, 0); + pkey_assert(!err); + err = sys_mprotect_pkey(ptr, size, PROT_READ|PROT_WRITE|PROT_EXEC, 0); + pkey_assert(!err); +} + void test_read_of_write_disabled_region(int *ptr, u16 pkey) { int ptr_contents; @@ -1448,6 +1500,7 @@ void (*pkey_tests[])(int *ptr, u16 pkey) = { test_pkey_syscalls_on_non_allocated_pkey, test_pkey_syscalls_bad_args, test_pkey_alloc_exhaust, + test_pkey_alloc_free_attach_pkey0, }; void run_tests_once(void) -- 2.17.1
[PATCH v17 23/24] selftests/vm/pkeys: Override access right definitions on powerpc
From: Ram Pai Some platforms hardcode the x86 values for PKEY_DISABLE_ACCESS and PKEY_DISABLE_WRITE such as those in: /usr/include/bits/mman-shared.h. This overrides the definitions with correct values for powerpc. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-powerpc.h | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/vm/pkey-powerpc.h b/tools/testing/selftests/vm/pkey-powerpc.h index d31665c48f5e..02bd4dd7d467 100644 --- a/tools/testing/selftests/vm/pkey-powerpc.h +++ b/tools/testing/selftests/vm/pkey-powerpc.h @@ -16,11 +16,13 @@ #define fpregs fp_regs #define si_pkey_offset 0x20 -#ifndef PKEY_DISABLE_ACCESS +#ifdef PKEY_DISABLE_ACCESS +#undef PKEY_DISABLE_ACCESS # define PKEY_DISABLE_ACCESS 0x3 /* disable read and write */ #endif -#ifndef PKEY_DISABLE_WRITE +#ifdef PKEY_DISABLE_WRITE +#undef PKEY_DISABLE_WRITE # define PKEY_DISABLE_WRITE0x2 #endif -- 2.17.1
[PATCH v17 19/24] selftests/vm/pkeys: Associate key on a mapped page and detect write violation
From: Ram Pai Detect write-violation on a page to which write-disabled key is associated much after the page is mapped. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Acked-by: Dave Hansen Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/protection_keys.c | 12 1 file changed, 12 insertions(+) diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index f65d384ef6a0..cb31a5cdf6d9 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -1002,6 +1002,17 @@ void test_read_of_access_disabled_region_with_page_already_mapped(int *ptr, expected_pkey_fault(pkey); } +void test_write_of_write_disabled_region_with_page_already_mapped(int *ptr, + u16 pkey) +{ + *ptr = __LINE__; + dprintf1("disabling write access; after accessing the page, " + "to PKEY[%02d], doing write\n", pkey); + pkey_write_deny(pkey); + *ptr = __LINE__; + expected_pkey_fault(pkey); +} + void test_write_of_write_disabled_region(int *ptr, u16 pkey) { dprintf1("disabling write access to PKEY[%02d], doing write\n", pkey); @@ -1410,6 +1421,7 @@ void (*pkey_tests[])(int *ptr, u16 pkey) = { test_read_of_access_disabled_region, test_read_of_access_disabled_region_with_page_already_mapped, test_write_of_write_disabled_region, + test_write_of_write_disabled_region_with_page_already_mapped, test_write_of_access_disabled_region, test_kernel_write_of_access_disabled_region, test_kernel_write_of_write_disabled_region, -- 2.17.1
[PATCH v17 06/24] selftests/vm/pkeys: Make gcc check arguments of sigsafe_printf()
From: Thiago Jung Bauermann This will help us ensure we print pkey_reg_t values correctly in different architectures. Signed-off-by: Thiago Jung Bauermann Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-helpers.h | 4 1 file changed, 4 insertions(+) diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h index 3ed2f021bf7a..7f18a82e54fc 100644 --- a/tools/testing/selftests/vm/pkey-helpers.h +++ b/tools/testing/selftests/vm/pkey-helpers.h @@ -27,6 +27,10 @@ #define DPRINT_IN_SIGNAL_BUF_SIZE 4096 extern int dprint_in_signal; extern char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE]; + +#ifdef __GNUC__ +__attribute__((format(printf, 1, 2))) +#endif static inline void sigsafe_printf(const char *format, ...) { va_list ap; -- 2.17.1
[PATCH v17 08/24] selftests: vm: pkeys: Add helpers for pkey bits
This introduces some functions that help with setting or clearing bits of a particular pkey. This also adds an abstraction for getting a pkey's bit position in the pkey register as this may vary across architectures. Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-helpers.h| 22 ++ tools/testing/selftests/vm/pkey-x86.h| 5 +++ tools/testing/selftests/vm/protection_keys.c | 32 ++-- 3 files changed, 36 insertions(+), 23 deletions(-) diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h index dfbce49269ce..0e3da7c8d628 100644 --- a/tools/testing/selftests/vm/pkey-helpers.h +++ b/tools/testing/selftests/vm/pkey-helpers.h @@ -80,6 +80,28 @@ extern void abort_hooks(void); #error Architecture not supported #endif /* arch */ +#define PKEY_MASK (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE) + +static inline u64 set_pkey_bits(u64 reg, int pkey, u64 flags) +{ + u32 shift = pkey_bit_position(pkey); + /* mask out bits from pkey in old value */ + reg &= ~((u64)PKEY_MASK << shift); + /* OR in new bits for pkey */ + reg |= (flags & PKEY_MASK) << shift; + return reg; +} + +static inline u64 get_pkey_bits(u64 reg, int pkey) +{ + u32 shift = pkey_bit_position(pkey); + /* +* shift down the relevant bits to the lowest two, then +* mask off all the other higher bits +*/ + return ((reg >> shift) & PKEY_MASK); +} + extern u64 shadow_pkey_reg; static inline u64 _read_pkey_reg(int line) diff --git a/tools/testing/selftests/vm/pkey-x86.h b/tools/testing/selftests/vm/pkey-x86.h index 6ffea27e2d2d..def2a1bcf6a5 100644 --- a/tools/testing/selftests/vm/pkey-x86.h +++ b/tools/testing/selftests/vm/pkey-x86.h @@ -118,6 +118,11 @@ static inline int cpu_has_pku(void) return 1; } +static inline u32 pkey_bit_position(int pkey) +{ + return pkey * PKEY_BITS_PER_PKEY; +} + #define XSTATE_PKEY_BIT(9) #define XSTATE_PKEY0x200 diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index efa35cc6f6b9..bed9d4de12b4 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -334,25 +334,13 @@ pid_t fork_lazy_child(void) static u32 hw_pkey_get(int pkey, unsigned long flags) { - u32 mask = (PKEY_DISABLE_ACCESS|PKEY_DISABLE_WRITE); u64 pkey_reg = __read_pkey_reg(); - u64 shifted_pkey_reg; - u32 masked_pkey_reg; dprintf1("%s(pkey=%d, flags=%lx) = %x / %d\n", __func__, pkey, flags, 0, 0); dprintf2("%s() raw pkey_reg: %016llx\n", __func__, pkey_reg); - shifted_pkey_reg = (pkey_reg >> (pkey * PKEY_BITS_PER_PKEY)); - dprintf2("%s() shifted_pkey_reg: %016llx\n", __func__, - shifted_pkey_reg); - masked_pkey_reg = shifted_pkey_reg & mask; - dprintf2("%s() masked pkey_reg: %x\n", __func__, masked_pkey_reg); - /* -* shift down the relevant bits to the lowest two, then -* mask off all the other high bits. -*/ - return masked_pkey_reg; + return (u32) get_pkey_bits(pkey_reg, pkey); } static int hw_pkey_set(int pkey, unsigned long rights, unsigned long flags) @@ -364,12 +352,8 @@ static int hw_pkey_set(int pkey, unsigned long rights, unsigned long flags) /* make sure that 'rights' only contains the bits we expect: */ assert(!(rights & ~mask)); - /* copy old pkey_reg */ - new_pkey_reg = old_pkey_reg; - /* mask out bits from pkey in old value: */ - new_pkey_reg &= ~(mask << (pkey * PKEY_BITS_PER_PKEY)); - /* OR in new bits for pkey: */ - new_pkey_reg |= (rights << (pkey * PKEY_BITS_PER_PKEY)); + /* modify bits accordingly in old pkey_reg and assign it */ + new_pkey_reg = set_pkey_bits(old_pkey_reg, pkey, rights); __write_pkey_reg(new_pkey_reg); @@ -403,7 +387,7 @@ void pkey_disable_set(int pkey, int flags) ret = hw_pkey_set(pkey, pkey_rights, syscall_flags); assert(!ret); /* pkey_reg and flags have the same format */ - shadow_pkey_reg |= flags << (pkey * 2); + shadow_pkey_reg = set_pkey_bits(shadow_pkey_reg, pkey, pkey_rights); dprintf1("%s(%d) shadow: 0x%016llx\n", __func__, pkey, shadow_pkey_reg); @@ -437,7 +421,7 @@ void pkey_disable_clear(int pkey, int flags) pkey_rights |= flags; ret = hw_pkey_set(pkey, pkey_rights, 0); - shadow_pkey_reg &= ~(flags << (pkey * 2)); + shadow_pkey_reg = set_pkey_bits(shadow_pkey_reg, pkey, pkey_rights); pkey_assert(ret >= 0); pkey_rights = hw_pkey_get(pkey, syscall_flags); @@ -513,7 +497,8 @@ int alloc_pkey(void) shadow_pkey_reg); if (ret) { /* clear both the bits: */ - shadow_pkey_reg &= ~(0x3
[PATCH v17 09/24] selftests/vm/pkeys: Fix pkey_disable_clear()
From: Ram Pai Currently, pkey_disable_clear() sets the specified bits instead clearing them. This has been dead code up to now because its only callers i.e. pkey_access/write_allow() are also unused. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Acked-by: Dave Hansen Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/protection_keys.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index bed9d4de12b4..4b1ddb526228 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -418,7 +418,7 @@ void pkey_disable_clear(int pkey, int flags) pkey, pkey, pkey_rights); pkey_assert(pkey_rights >= 0); - pkey_rights |= flags; + pkey_rights &= ~flags; ret = hw_pkey_set(pkey, pkey_rights, 0); shadow_pkey_reg = set_pkey_bits(shadow_pkey_reg, pkey, pkey_rights); @@ -431,7 +431,7 @@ void pkey_disable_clear(int pkey, int flags) dprintf1("%s(%d) pkey_reg: 0x%016llx\n", __func__, pkey, read_pkey_reg()); if (flags) - assert(read_pkey_reg() > orig_pkey_reg); + assert(read_pkey_reg() < orig_pkey_reg); } void pkey_write_allow(int pkey) -- 2.17.1
[PATCH v17 14/24] selftests/vm/pkeys: Introduce powerpc support
From: Ram Pai This makes use of the abstractions added earlier and introduces support for powerpc. For powerpc, after receiving the SIGSEGV, the signal handler must explicitly restore access permissions for the faulting pkey to allow the test to continue. As this makes use of pkey_access_allow(), all of its dependencies and other similar functions have been moved ahead of the signal handler. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-helpers.h| 2 + tools/testing/selftests/vm/pkey-powerpc.h| 90 +++ tools/testing/selftests/vm/protection_keys.c | 269 ++- 3 files changed, 233 insertions(+), 128 deletions(-) create mode 100644 tools/testing/selftests/vm/pkey-powerpc.h diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h index 621fb2a0a5ef..2f4b1eb3a680 100644 --- a/tools/testing/selftests/vm/pkey-helpers.h +++ b/tools/testing/selftests/vm/pkey-helpers.h @@ -79,6 +79,8 @@ void expected_pkey_fault(int pkey); #if defined(__i386__) || defined(__x86_64__) /* arch */ #include "pkey-x86.h" +#elif defined(__powerpc64__) /* arch */ +#include "pkey-powerpc.h" #else /* arch */ #error Architecture not supported #endif /* arch */ diff --git a/tools/testing/selftests/vm/pkey-powerpc.h b/tools/testing/selftests/vm/pkey-powerpc.h new file mode 100644 index ..c79f4160a6a0 --- /dev/null +++ b/tools/testing/selftests/vm/pkey-powerpc.h @@ -0,0 +1,90 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _PKEYS_POWERPC_H +#define _PKEYS_POWERPC_H + +#ifndef SYS_mprotect_key +# define SYS_mprotect_key 386 +#endif +#ifndef SYS_pkey_alloc +# define SYS_pkey_alloc384 +# define SYS_pkey_free 385 +#endif +#define REG_IP_IDX PT_NIP +#define REG_TRAPNO PT_TRAP +#define gregs gp_regs +#define fpregs fp_regs +#define si_pkey_offset 0x20 + +#ifndef PKEY_DISABLE_ACCESS +# define PKEY_DISABLE_ACCESS 0x3 /* disable read and write */ +#endif + +#ifndef PKEY_DISABLE_WRITE +# define PKEY_DISABLE_WRITE0x2 +#endif + +#define NR_PKEYS 32 +#define NR_RESERVED_PKEYS_4K 27 /* pkey-0, pkey-1, exec-only-pkey + and 24 other keys that cannot be + represented in the PTE */ +#define NR_RESERVED_PKEYS_64K 3 /* pkey-0, pkey-1 and exec-only-pkey */ +#define PKEY_BITS_PER_PKEY 2 +#define HPAGE_SIZE (1UL << 24) +#define PAGE_SIZE (1UL << 16) + +static inline u32 pkey_bit_position(int pkey) +{ + return (NR_PKEYS - pkey - 1) * PKEY_BITS_PER_PKEY; +} + +static inline u64 __read_pkey_reg(void) +{ + u64 pkey_reg; + + asm volatile("mfspr %0, 0xd" : "=r" (pkey_reg)); + + return pkey_reg; +} + +static inline void __write_pkey_reg(u64 pkey_reg) +{ + u64 amr = pkey_reg; + + dprintf4("%s() changing %016llx to %016llx\n", +__func__, __read_pkey_reg(), pkey_reg); + + asm volatile("mtspr 0xd, %0" : : "r" ((unsigned long)(amr)) : "memory"); + + dprintf4("%s() pkey register after changing %016llx to %016llx\n", + __func__, __read_pkey_reg(), pkey_reg); +} + +static inline int cpu_has_pku(void) +{ + return 1; +} + +static inline int get_arch_reserved_keys(void) +{ + if (sysconf(_SC_PAGESIZE) == 4096) + return NR_RESERVED_PKEYS_4K; + else + return NR_RESERVED_PKEYS_64K; +} + +void expect_fault_on_read_execonly_key(void *p1, int pkey) +{ + /* +* powerpc does not allow userspace to change permissions of exec-only +* keys since those keys are not allocated by userspace. The signal +* handler wont be able to reset the permissions, which means the code +* will infinitely continue to segfault here. +*/ + return; +} + +/* 4-byte instructions * 16384 = 64K page */ +#define __page_o_noops() asm(".rept 16384 ; nop; .endr") + +#endif /* _PKEYS_POWERPC_H */ diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index 57c71056c93d..e6de078a9196 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -169,6 +169,125 @@ void dump_mem(void *dumpme, int len_bytes) } } +static u32 hw_pkey_get(int pkey, unsigned long flags) +{ + u64 pkey_reg = __read_pkey_reg(); + + dprintf1("%s(pkey=%d, flags=%lx) = %x / %d\n", + __func__, pkey, flags, 0, 0); + dprintf2("%s() raw pkey_reg: %016llx\n", __func__, pkey_reg); + + return (u32) get_pkey_bits(pkey_reg, pkey); +} + +static int hw_pkey_set(int pkey, unsigned long rights, unsigned long flags) +{ + u32 mask = (PKEY_DISABLE_ACCESS|PKEY_DISABLE_WRITE); + u64 old_pkey_reg = __read_pkey_reg(); +
[PATCH v17 24/24] selftests: vm: pkeys: Use the correct page size on powerpc
Both 4K and 64K pages are supported on powerpc. Parts of the selftest code perform alignment computations based on the PAGE_SIZE macro which is currently hardcoded to 64K for powerpc. This causes some test failures on kernels configured with 4K page size. In some cases, we need to enforce function alignment on page size. Since this can only be done at build time, 64K is used as the alignment factor as that also ensures 4K alignment. Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-powerpc.h| 2 +- tools/testing/selftests/vm/protection_keys.c | 5 + 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/vm/pkey-powerpc.h b/tools/testing/selftests/vm/pkey-powerpc.h index 02bd4dd7d467..3a761e51a587 100644 --- a/tools/testing/selftests/vm/pkey-powerpc.h +++ b/tools/testing/selftests/vm/pkey-powerpc.h @@ -36,7 +36,7 @@ pkey-31 and exec-only key */ #define PKEY_BITS_PER_PKEY 2 #define HPAGE_SIZE (1UL << 24) -#define PAGE_SIZE (1UL << 16) +#define PAGE_SIZE sysconf(_SC_PAGESIZE) static inline u32 pkey_bit_position(int pkey) { diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index a1cb9a71e77c..fc19addcb5c8 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -146,7 +146,12 @@ void abort_hooks(void) * will then fault, which makes sure that the fault code handles * execute-only memory properly. */ +#ifdef __powerpc64__ +/* This way, both 4K and 64K alignment are maintained */ +__attribute__((__aligned__(65536))) +#else __attribute__((__aligned__(PAGE_SIZE))) +#endif void lots_o_noops_around_write(int *write_to_me) { dprintf3("running %s()\n", __func__); -- 2.17.1
[PATCH v17 15/24] selftests/vm/pkeys: Fix number of reserved powerpc pkeys
From: "Desnes A. Nunes do Rosario" The number of reserved pkeys in a PowerNV environment is different from that on PowerVM or KVM. Tested on PowerVM and PowerNV environments. Signed-off-by: "Desnes A. Nunes do Rosario" Signed-off-by: Ram Pai Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-powerpc.h | 22 -- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/vm/pkey-powerpc.h b/tools/testing/selftests/vm/pkey-powerpc.h index c79f4160a6a0..319673bbab0b 100644 --- a/tools/testing/selftests/vm/pkey-powerpc.h +++ b/tools/testing/selftests/vm/pkey-powerpc.h @@ -28,7 +28,10 @@ #define NR_RESERVED_PKEYS_4K 27 /* pkey-0, pkey-1, exec-only-pkey and 24 other keys that cannot be represented in the PTE */ -#define NR_RESERVED_PKEYS_64K 3 /* pkey-0, pkey-1 and exec-only-pkey */ +#define NR_RESERVED_PKEYS_64K_3KEYS3 /* PowerNV and KVM: pkey-0, +pkey-1 and exec-only key */ +#define NR_RESERVED_PKEYS_64K_4KEYS4 /* PowerVM: pkey-0, pkey-1, +pkey-31 and exec-only key */ #define PKEY_BITS_PER_PKEY 2 #define HPAGE_SIZE (1UL << 24) #define PAGE_SIZE (1UL << 16) @@ -65,12 +68,27 @@ static inline int cpu_has_pku(void) return 1; } +static inline bool arch_is_powervm() +{ + struct stat buf; + + if ((stat("/sys/firmware/devicetree/base/ibm,partition-name", ) == 0) && + (stat("/sys/firmware/devicetree/base/hmc-managed?", ) == 0) && + (stat("/sys/firmware/devicetree/base/chosen/qemu,graphic-width", ) == -1) ) + return true; + + return false; +} + static inline int get_arch_reserved_keys(void) { if (sysconf(_SC_PAGESIZE) == 4096) return NR_RESERVED_PKEYS_4K; else - return NR_RESERVED_PKEYS_64K; + if (arch_is_powervm()) + return NR_RESERVED_PKEYS_64K_4KEYS; + else + return NR_RESERVED_PKEYS_64K_3KEYS; } void expect_fault_on_read_execonly_key(void *p1, int pkey) -- 2.17.1
[PATCH v17 10/24] selftests/vm/pkeys: Fix assertion in pkey_disable_set/clear()
From: Ram Pai In some cases, a pkey's bits need not necessarily change in a way that the value of the pkey register increases when performing a pkey_disable_set() or decreases when performing a pkey_disable_clear(). For example, on powerpc, if a pkey's current state is PKEY_DISABLE_ACCESS and we perform a pkey_write_disable() on it, the bits still remain the same. We will observe something similar when the pkey's current state is 0 and a pkey_access_enable() is performed on it. Either case would cause some assertions to fail. This fixes the problem. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/protection_keys.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index 4b1ddb526228..7fd52d5c4bfd 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -400,7 +400,7 @@ void pkey_disable_set(int pkey, int flags) dprintf1("%s(%d) pkey_reg: 0x%016llx\n", __func__, pkey, read_pkey_reg()); if (flags) - pkey_assert(read_pkey_reg() > orig_pkey_reg); + pkey_assert(read_pkey_reg() >= orig_pkey_reg); dprintf1("END<---%s(%d, 0x%x)\n", __func__, pkey, flags); } @@ -431,7 +431,7 @@ void pkey_disable_clear(int pkey, int flags) dprintf1("%s(%d) pkey_reg: 0x%016llx\n", __func__, pkey, read_pkey_reg()); if (flags) - assert(read_pkey_reg() < orig_pkey_reg); + assert(read_pkey_reg() <= orig_pkey_reg); } void pkey_write_allow(int pkey) -- 2.17.1
[PATCH v17 21/24] selftests/vm/pkeys: Introduce a sub-page allocator
From: Ram Pai This introduces a new allocator that allocates 4K hardware pages to back 64K linux pages. This allocator is available only on powerpc. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Thiago Jung Bauermann Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-helpers.h| 6 + tools/testing/selftests/vm/pkey-powerpc.h| 25 tools/testing/selftests/vm/pkey-x86.h| 5 tools/testing/selftests/vm/protection_keys.c | 1 + 4 files changed, 37 insertions(+) diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h index 59ccdff18214..622a85848f61 100644 --- a/tools/testing/selftests/vm/pkey-helpers.h +++ b/tools/testing/selftests/vm/pkey-helpers.h @@ -28,6 +28,9 @@ extern int dprint_in_signal; extern char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE]; +extern int test_nr; +extern int iteration_nr; + #ifdef __GNUC__ __attribute__((format(printf, 1, 2))) #endif @@ -78,6 +81,9 @@ __attribute__((noinline)) int read_ptr(int *ptr); void expected_pkey_fault(int pkey); int sys_pkey_alloc(unsigned long flags, unsigned long init_val); int sys_pkey_free(unsigned long pkey); +int mprotect_pkey(void *ptr, size_t size, unsigned long orig_prot, + unsigned long pkey); +void record_pkey_malloc(void *ptr, long size, int prot); #if defined(__i386__) || defined(__x86_64__) /* arch */ #include "pkey-x86.h" diff --git a/tools/testing/selftests/vm/pkey-powerpc.h b/tools/testing/selftests/vm/pkey-powerpc.h index 7d7c3ffafdd9..d31665c48f5e 100644 --- a/tools/testing/selftests/vm/pkey-powerpc.h +++ b/tools/testing/selftests/vm/pkey-powerpc.h @@ -106,4 +106,29 @@ void expect_fault_on_read_execonly_key(void *p1, int pkey) /* 4-byte instructions * 16384 = 64K page */ #define __page_o_noops() asm(".rept 16384 ; nop; .endr") +void *malloc_pkey_with_mprotect_subpage(long size, int prot, u16 pkey) +{ + void *ptr; + int ret; + + dprintf1("doing %s(size=%ld, prot=0x%x, pkey=%d)\n", __func__, + size, prot, pkey); + pkey_assert(pkey < NR_PKEYS); + ptr = mmap(NULL, size, prot, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); + pkey_assert(ptr != (void *)-1); + + ret = syscall(__NR_subpage_prot, ptr, size, NULL); + if (ret) { + perror("subpage_perm"); + return PTR_ERR_ENOTSUP; + } + + ret = mprotect_pkey((void *)ptr, PAGE_SIZE, prot, pkey); + pkey_assert(!ret); + record_pkey_malloc(ptr, size, prot); + + dprintf1("%s() for pkey %d @ %p\n", __func__, pkey, ptr); + return ptr; +} + #endif /* _PKEYS_POWERPC_H */ diff --git a/tools/testing/selftests/vm/pkey-x86.h b/tools/testing/selftests/vm/pkey-x86.h index 6421b846aa16..3be20f5d5275 100644 --- a/tools/testing/selftests/vm/pkey-x86.h +++ b/tools/testing/selftests/vm/pkey-x86.h @@ -173,4 +173,9 @@ void expect_fault_on_read_execonly_key(void *p1, int pkey) expected_pkey_fault(pkey); } +void *malloc_pkey_with_mprotect_subpage(long size, int prot, u16 pkey) +{ + return PTR_ERR_ENOTSUP; +} + #endif /* _PKEYS_X86_H */ diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index 8bb4de103874..d4952b57cc90 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -845,6 +845,7 @@ void *malloc_pkey_mmap_dax(long size, int prot, u16 pkey) void *(*pkey_malloc[])(long size, int prot, u16 pkey) = { malloc_pkey_with_mprotect, + malloc_pkey_with_mprotect_subpage, malloc_pkey_anon_huge, malloc_pkey_hugetlb /* can not do direct with the pkey_mprotect() API: -- 2.17.1
[PATCH v17 18/24] selftests/vm/pkeys: Associate key on a mapped page and detect access violation
From: Ram Pai Detect access-violation on a page to which access-disabled key is associated much after the page is mapped. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Acked-by: Dave Hansen Signed-off: Sandipan Das --- tools/testing/selftests/vm/protection_keys.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index 95f173049f43..f65d384ef6a0 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -984,6 +984,24 @@ void test_read_of_access_disabled_region(int *ptr, u16 pkey) dprintf1("*ptr: %d\n", ptr_contents); expected_pkey_fault(pkey); } + +void test_read_of_access_disabled_region_with_page_already_mapped(int *ptr, + u16 pkey) +{ + int ptr_contents; + + dprintf1("disabling access to PKEY[%02d], doing read @ %p\n", + pkey, ptr); + ptr_contents = read_ptr(ptr); + dprintf1("reading ptr before disabling the read : %d\n", + ptr_contents); + read_pkey_reg(); + pkey_access_deny(pkey); + ptr_contents = read_ptr(ptr); + dprintf1("*ptr: %d\n", ptr_contents); + expected_pkey_fault(pkey); +} + void test_write_of_write_disabled_region(int *ptr, u16 pkey) { dprintf1("disabling write access to PKEY[%02d], doing write\n", pkey); @@ -1390,6 +1408,7 @@ void test_mprotect_pkey_on_unsupported_cpu(int *ptr, u16 pkey) void (*pkey_tests[])(int *ptr, u16 pkey) = { test_read_of_write_disabled_region, test_read_of_access_disabled_region, + test_read_of_access_disabled_region_with_page_already_mapped, test_write_of_write_disabled_region, test_write_of_access_disabled_region, test_kernel_write_of_access_disabled_region, -- 2.17.1
[PATCH v17 03/24] selftests/vm/pkeys: Rename all references to pkru to a generic name
From: Ram Pai This renames PKRU references to "pkey_reg" or "pkey" based on the usage. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Thiago Jung Bauermann Reviewed-by: Dave Hansen Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-helpers.h| 85 +++ tools/testing/selftests/vm/protection_keys.c | 240 ++- 2 files changed, 170 insertions(+), 155 deletions(-) diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h index 254e5436bdd9..d5779be4793f 100644 --- a/tools/testing/selftests/vm/pkey-helpers.h +++ b/tools/testing/selftests/vm/pkey-helpers.h @@ -14,7 +14,7 @@ #include #define NR_PKEYS 16 -#define PKRU_BITS_PER_PKEY 2 +#define PKEY_BITS_PER_PKEY 2 #ifndef DEBUG_LEVEL #define DEBUG_LEVEL 0 @@ -53,85 +53,88 @@ static inline void sigsafe_printf(const char *format, ...) #define dprintf3(args...) dprintf_level(3, args) #define dprintf4(args...) dprintf_level(4, args) -extern unsigned int shadow_pkru; -static inline unsigned int __rdpkru(void) +extern unsigned int shadow_pkey_reg; +static inline unsigned int __read_pkey_reg(void) { unsigned int eax, edx; unsigned int ecx = 0; - unsigned int pkru; + unsigned int pkey_reg; asm volatile(".byte 0x0f,0x01,0xee\n\t" : "=a" (eax), "=d" (edx) : "c" (ecx)); - pkru = eax; - return pkru; + pkey_reg = eax; + return pkey_reg; } -static inline unsigned int _rdpkru(int line) +static inline unsigned int _read_pkey_reg(int line) { - unsigned int pkru = __rdpkru(); + unsigned int pkey_reg = __read_pkey_reg(); - dprintf4("rdpkru(line=%d) pkru: %x shadow: %x\n", - line, pkru, shadow_pkru); - assert(pkru == shadow_pkru); + dprintf4("read_pkey_reg(line=%d) pkey_reg: %x shadow: %x\n", + line, pkey_reg, shadow_pkey_reg); + assert(pkey_reg == shadow_pkey_reg); - return pkru; + return pkey_reg; } -#define rdpkru() _rdpkru(__LINE__) +#define read_pkey_reg() _read_pkey_reg(__LINE__) -static inline void __wrpkru(unsigned int pkru) +static inline void __write_pkey_reg(unsigned int pkey_reg) { - unsigned int eax = pkru; + unsigned int eax = pkey_reg; unsigned int ecx = 0; unsigned int edx = 0; - dprintf4("%s() changing %08x to %08x\n", __func__, __rdpkru(), pkru); + dprintf4("%s() changing %08x to %08x\n", __func__, + __read_pkey_reg(), pkey_reg); asm volatile(".byte 0x0f,0x01,0xef\n\t" : : "a" (eax), "c" (ecx), "d" (edx)); - assert(pkru == __rdpkru()); + assert(pkey_reg == __read_pkey_reg()); } -static inline void wrpkru(unsigned int pkru) +static inline void write_pkey_reg(unsigned int pkey_reg) { - dprintf4("%s() changing %08x to %08x\n", __func__, __rdpkru(), pkru); + dprintf4("%s() changing %08x to %08x\n", __func__, + __read_pkey_reg(), pkey_reg); /* will do the shadow check for us: */ - rdpkru(); - __wrpkru(pkru); - shadow_pkru = pkru; - dprintf4("%s(%08x) pkru: %08x\n", __func__, pkru, __rdpkru()); + read_pkey_reg(); + __write_pkey_reg(pkey_reg); + shadow_pkey_reg = pkey_reg; + dprintf4("%s(%08x) pkey_reg: %08x\n", __func__, + pkey_reg, __read_pkey_reg()); } /* * These are technically racy. since something could - * change PKRU between the read and the write. + * change PKEY register between the read and the write. */ static inline void __pkey_access_allow(int pkey, int do_allow) { - unsigned int pkru = rdpkru(); + unsigned int pkey_reg = read_pkey_reg(); int bit = pkey * 2; if (do_allow) - pkru &= (1<>>>===SIGSEGV\n"); - dprintf1("%s()::%d, pkru: 0x%x shadow: %x\n", __func__, __LINE__, - __rdpkru(), shadow_pkru); + dprintf1("%s()::%d, pkey_reg: 0x%x shadow: %x\n", __func__, __LINE__, + __read_pkey_reg(), shadow_pkey_reg); trapno = uctxt->uc_mcontext.gregs[REG_TRAPNO]; ip = uctxt->uc_mcontext.gregs[REG_IP_IDX]; @@ -289,19 +289,19 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext) */ fpregs += 0x70; #endif - pkru_offset = pkru_xstate_offset(); - pkru_ptr = (void *)([pkru_offset]); + pkey_reg_offset = pkey_reg_xstate_offset(); + pkey_reg_ptr = (void *)([pkey_reg_offset]); dprintf1("siginfo: %p\n", si); dprintf1(" fpregs: %p\n", fpregs); /* -* If we got a PKRU fault, we *HAVE* to have at least one bit set in +* If we got a PKEY fault, we *HAVE* to have at least one bit set in * here. */ - dprintf1("pkru_xstate_offset: %d\n",
[PATCH v17 12/24] selftests: vm: pkeys: Use the correct huge page size
The huge page size can vary across architectures. This will ensure that the correct huge page size is used when accessing the hugetlb controls under sysfs. Instead of using a hardcoded page size (i.e. 2MB), this now uses the HPAGE_SIZE macro which is arch-specific. Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/protection_keys.c | 23 ++-- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index 9cc82b65f828..535e464e27e9 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -739,12 +739,15 @@ void *malloc_pkey_anon_huge(long size, int prot, u16 pkey) } int hugetlb_setup_ok; +#define SYSFS_FMT_NR_HUGE_PAGES "/sys/kernel/mm/hugepages/hugepages-%ldkB/nr_hugepages" #define GET_NR_HUGE_PAGES 10 void setup_hugetlbfs(void) { int err; int fd; - char buf[] = "123"; + char buf[256]; + long hpagesz_kb; + long hpagesz_mb; if (geteuid() != 0) { fprintf(stderr, "WARNING: not run as root, can not do hugetlb test\n"); @@ -755,11 +758,16 @@ void setup_hugetlbfs(void) /* * Now go make sure that we got the pages and that they -* are 2M pages. Someone might have made 1G the default. +* are PMD-level pages. Someone might have made PUD-level +* pages the default. */ - fd = open("/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages", O_RDONLY); + hpagesz_kb = HPAGE_SIZE / 1024; + hpagesz_mb = hpagesz_kb / 1024; + sprintf(buf, SYSFS_FMT_NR_HUGE_PAGES, hpagesz_kb); + fd = open(buf, O_RDONLY); if (fd < 0) { - perror("opening sysfs 2M hugetlb config"); + fprintf(stderr, "opening sysfs %ldM hugetlb config: %s\n", + hpagesz_mb, strerror(errno)); return; } @@ -767,13 +775,14 @@ void setup_hugetlbfs(void) err = read(fd, buf, sizeof(buf)-1); close(fd); if (err <= 0) { - perror("reading sysfs 2M hugetlb config"); + fprintf(stderr, "reading sysfs %ldM hugetlb config: %s\n", + hpagesz_mb, strerror(errno)); return; } if (atoi(buf) != GET_NR_HUGE_PAGES) { - fprintf(stderr, "could not confirm 2M pages, got: '%s' expected %d\n", - buf, GET_NR_HUGE_PAGES); + fprintf(stderr, "could not confirm %ldM pages, got: '%s' expected %d\n", + hpagesz_mb, buf, GET_NR_HUGE_PAGES); return; } -- 2.17.1
[PATCH v17 04/24] selftests/vm/pkeys: Move generic definitions to header file
From: Ram Pai Moved all the generic definition and helper functions to the header file. cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Thiago Jung Bauermann Acked-by: Dave Hansen Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/pkey-helpers.h| 35 +--- tools/testing/selftests/vm/protection_keys.c | 27 --- 2 files changed, 30 insertions(+), 32 deletions(-) diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h index d5779be4793f..6ad1bd54ef94 100644 --- a/tools/testing/selftests/vm/pkey-helpers.h +++ b/tools/testing/selftests/vm/pkey-helpers.h @@ -13,6 +13,14 @@ #include #include +/* Define some kernel-like types */ +#define u8 uint8_t +#define u16 uint16_t +#define u32 uint32_t +#define u64 uint64_t + +#define PTR_ERR_ENOTSUP ((void *)-ENOTSUP) + #define NR_PKEYS 16 #define PKEY_BITS_PER_PKEY 2 @@ -53,6 +61,18 @@ static inline void sigsafe_printf(const char *format, ...) #define dprintf3(args...) dprintf_level(3, args) #define dprintf4(args...) dprintf_level(4, args) +extern void abort_hooks(void); +#define pkey_assert(condition) do {\ + if (!(condition)) { \ + dprintf0("assert() at %s::%d test_nr: %d iteration: %d\n", \ + __FILE__, __LINE__, \ + test_nr, iteration_nr); \ + dprintf0("errno at assert: %d", errno); \ + abort_hooks(); \ + exit(__LINE__); \ + } \ +} while (0) + extern unsigned int shadow_pkey_reg; static inline unsigned int __read_pkey_reg(void) { @@ -137,11 +157,6 @@ static inline void __pkey_write_allow(int pkey, int do_allow_write) dprintf4("pkey_reg now: %08x\n", read_pkey_reg()); } -#define PROT_PKEY0 0x10/* protection key value (bit 0) */ -#define PROT_PKEY1 0x20/* protection key value (bit 1) */ -#define PROT_PKEY2 0x40/* protection key value (bit 2) */ -#define PROT_PKEY3 0x80/* protection key value (bit 3) */ - #define PAGE_SIZE 4096 #define MB (1<<20) @@ -219,4 +234,14 @@ int pkey_reg_xstate_offset(void) return xstate_offset; } +#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x))) +#define ALIGN_UP(x, align_to) (((x) + ((align_to)-1)) & ~((align_to)-1)) +#define ALIGN_DOWN(x, align_to) ((x) & ~((align_to)-1)) +#define ALIGN_PTR_UP(p, ptr_align_to) \ + ((typeof(p))ALIGN_UP((unsigned long)(p), ptr_align_to)) +#define ALIGN_PTR_DOWN(p, ptr_align_to)\ + ((typeof(p))ALIGN_DOWN((unsigned long)(p), ptr_align_to)) +#define __stringify_1(x...) #x +#define __stringify(x...) __stringify_1(x) + #endif /* _PKEYS_HELPER_H */ diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c index 2f4ab81c570d..42ffb58810f2 100644 --- a/tools/testing/selftests/vm/protection_keys.c +++ b/tools/testing/selftests/vm/protection_keys.c @@ -51,31 +51,10 @@ int test_nr; unsigned int shadow_pkey_reg; #define HPAGE_SIZE (1UL<<21) -#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x))) -#define ALIGN_UP(x, align_to) (((x) + ((align_to)-1)) & ~((align_to)-1)) -#define ALIGN_DOWN(x, align_to) ((x) & ~((align_to)-1)) -#define ALIGN_PTR_UP(p, ptr_align_to) ((typeof(p))ALIGN_UP((unsigned long)(p),ptr_align_to)) -#define ALIGN_PTR_DOWN(p, ptr_align_to) ((typeof(p))ALIGN_DOWN((unsigned long)(p), ptr_align_to)) -#define __stringify_1(x...) #x -#define __stringify(x...) __stringify_1(x) - -#define PTR_ERR_ENOTSUP ((void *)-ENOTSUP) int dprint_in_signal; char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE]; -extern void abort_hooks(void); -#define pkey_assert(condition) do {\ - if (!(condition)) { \ - dprintf0("assert() at %s::%d test_nr: %d iteration: %d\n", \ - __FILE__, __LINE__, \ - test_nr, iteration_nr); \ - dprintf0("errno at assert: %d", errno); \ - abort_hooks(); \ - exit(__LINE__); \ - } \ -} while (0) - void cat_into_file(char *str, char *file) { int fd = open(file, O_RDWR); @@ -186,12 +165,6 @@ void lots_o_noops_around_write(int *write_to_me) dprintf3("%s() done\n", __func__); } -/* Define some kernel-like types */ -#define u8 uint8_t -#define u16 uint16_t -#define u32 uint32_t -#define u64 uint64_t - #ifdef __i386__ #ifndef SYS_mprotect_key -- 2.17.1
[PATCH v17 02/24] selftests: vm: pkeys: Fix multilib builds for x86
This ensures that both 32-bit and 64-bit binaries are generated when this is built on a x86_64 system. Most of the changes have been borrowed from tools/testing/selftests/x86/Makefile. Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/Makefile | 49 + 1 file changed, 49 insertions(+) diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 4e9c741be6af..7fa0adf11f6a 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -18,7 +18,56 @@ TEST_GEN_FILES += on-fault-limit TEST_GEN_FILES += thuge-gen TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += userfaultfd + +ifeq ($(ARCH), x86_64) +CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh $(CC) ../x86/trivial_32bit_program.c -m32) +CAN_BUILD_X86_64 := $(shell ./../x86/check_cc.sh $(CC) ../x86/trivial_64bit_program.c) +CAN_BUILD_WITH_NOPIE := $(shell ./../x86/check_cc.sh $(CC) ../x86/trivial_program.c -no-pie) + +TARGETS := protection_keys +BINARIES_32 := $(TARGETS:%=%_32) +BINARIES_64 := $(TARGETS:%=%_64) + +.PHONY: $(TARGETS) + +ifeq ($(CAN_BUILD_WITH_NOPIE),1) +CFLAGS += -no-pie +endif + +ifeq ($(CAN_BUILD_I386),1) +TEST_GEN_FILES += $(BINARIES_32) +$(TARGETS): $(BINARIES_32) +$(BINARIES_32): %_32: %.c + $(CC) -m32 -o $@ $(CFLAGS) $(EXTRA_CFLAGS) $^ -lrt -ldl -lm +endif + +ifeq ($(CAN_BUILD_X86_64),1) +TEST_GEN_FILES += $(BINARIES_64) +$(TARGETS): $(BINARIES_64) +$(BINARIES_64): %_64: %.c + $(CC) -m64 -o $@ $(CFLAGS) $(EXTRA_CFLAGS) $^ -lrt -ldl +endif + +# x86_64 users should be encouraged to install 32-bit libraries +ifeq ($(CAN_BUILD_I386)$(CAN_BUILD_X86_64),01) +$(TARGETS): warn_32bit_failure + +warn_32bit_failure: + @echo "Warning: you seem to have a broken 32-bit build" 2>&1; \ + echo "environment. This will reduce test coverage of 64-bit" 2>&1; \ + echo "kernels. If you are using a Debian-like distribution," 2>&1; \ + echo "try:"; 2>&1; \ + echo ""; \ + echo " apt-get install gcc-multilib libc6-i386 libc6-dev-i386"; \ + echo ""; \ + echo "If you are using a Fedora-like distribution, try:"; \ + echo ""; \ + echo " yum install glibc-devel.*i686"; \ + exit 0; +endif +else TEST_GEN_FILES += protection_keys +endif ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 sparc64 x86_64)) TEST_GEN_FILES += va_128TBswitch -- 2.17.1
[PATCH v17 01/24] selftests/x86/pkeys: Move selftests to arch-neutral directory
From: Ram Pai cc: Dave Hansen cc: Florian Weimer Signed-off-by: Ram Pai Signed-off-by: Thiago Jung Bauermann Acked-by: Ingo Molnar Acked-by: Dave Hansen Signed-off-by: Sandipan Das --- tools/testing/selftests/vm/.gitignore | 1 + tools/testing/selftests/vm/Makefile | 1 + tools/testing/selftests/{x86 => vm}/pkey-helpers.h| 0 tools/testing/selftests/{x86 => vm}/protection_keys.c | 0 tools/testing/selftests/x86/.gitignore| 1 - tools/testing/selftests/x86/Makefile | 2 +- 6 files changed, 3 insertions(+), 2 deletions(-) rename tools/testing/selftests/{x86 => vm}/pkey-helpers.h (100%) rename tools/testing/selftests/{x86 => vm}/protection_keys.c (100%) diff --git a/tools/testing/selftests/vm/.gitignore b/tools/testing/selftests/vm/.gitignore index 31b3c98b6d34..c55837bf39fa 100644 --- a/tools/testing/selftests/vm/.gitignore +++ b/tools/testing/selftests/vm/.gitignore @@ -14,3 +14,4 @@ virtual_address_range gup_benchmark va_128TBswitch map_fixed_noreplace +protection_keys diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 7f9a8a8c31da..4e9c741be6af 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -18,6 +18,7 @@ TEST_GEN_FILES += on-fault-limit TEST_GEN_FILES += thuge-gen TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += userfaultfd +TEST_GEN_FILES += protection_keys ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 sparc64 x86_64)) TEST_GEN_FILES += va_128TBswitch diff --git a/tools/testing/selftests/x86/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h similarity index 100% rename from tools/testing/selftests/x86/pkey-helpers.h rename to tools/testing/selftests/vm/pkey-helpers.h diff --git a/tools/testing/selftests/x86/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c similarity index 100% rename from tools/testing/selftests/x86/protection_keys.c rename to tools/testing/selftests/vm/protection_keys.c diff --git a/tools/testing/selftests/x86/.gitignore b/tools/testing/selftests/x86/.gitignore index 7757f73ff9a3..eb30ffd83876 100644 --- a/tools/testing/selftests/x86/.gitignore +++ b/tools/testing/selftests/x86/.gitignore @@ -11,5 +11,4 @@ ldt_gdt iopl mpx-mini-test ioperm -protection_keys test_vdso diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index 5d49bfec1e9a..5f16821c7f63 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -12,7 +12,7 @@ CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh $(CC) trivial_program.c -no-pie) TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap_vdso \ check_initial_reg_state sigreturn iopl ioperm \ - protection_keys test_vdso test_vsyscall mov_ss_trap \ + test_vdso test_vsyscall mov_ss_trap \ syscall_arg_fault TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \ test_FCMOV test_FCOMI test_FISTTP \ -- 2.17.1
[PATCH v16 00/23] selftests, powerpc, x86: Memory Protection Keys
Memory protection keys enables an application to protect its address space from inadvertent access by its own code. This feature is now enabled on powerpc and has been available since 4.16-rc1. The patches move the selftests to arch neutral directory and enhance their test coverage. Tested on powerpc64 and x86_64 (Skylake-SP). Link to development branch: https://github.com/sandip4n/linux/tree/pkey-selftests Changelog - Link to previous version (v16): https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=153824 v17: (1) Fixed issues with i386 builds when running on x86_64 based on feedback from Dave. (2) Replaced patch 6 from previous version with patch 7. This addresses u64 format specifier related concerns that Michael had raised in v15. v16: (1) Rebased on top of latest master. (2) Switched to u64 instead of using an arch-dependent pkey_reg_t type for references to the pkey register based on suggestions from Dave, Michal and Michael. (3) Removed build time determination of page size based on suggestion from Michael. (4) Fixed comment before the definition of __page_o_noops() from patch 13 ("selftests/vm/pkeys: Introduce powerpc support"). v15: (1) Rebased on top of latest master. (2) Addressed review comments from Dave Hansen. (3) Moved code for getting or setting pkey bits to new helpers. These changes replace patch 7 of v14. (4) Added a fix which ensures that the correct count of reserved keys is used across different platforms. (5) Added a fix which ensures that the correct page size is used as powerpc supports both 4K and 64K pages. v14: (1) Incorporated another round of comments from Dave Hansen. v13: (1) Incorporated comments for Dave Hansen. (2) Added one more test for correct pkey-0 behavior. v12: (1) Fixed the offset of pkey field in the siginfo structure for x86_64 and powerpc. And tries to use the actual field if the headers have it defined. v11: (1) Fixed a deadlock in the ptrace testcase. v10 and prior: (1) Moved the testcase to arch neutral directory. (2) Split the changes into incremental patches. Desnes A. Nunes do Rosario (1): selftests/vm/pkeys: Fix number of reserved powerpc pkeys Ram Pai (16): selftests/x86/pkeys: Move selftests to arch-neutral directory selftests/vm/pkeys: Rename all references to pkru to a generic name selftests/vm/pkeys: Move generic definitions to header file selftests/vm/pkeys: Fix pkey_disable_clear() selftests/vm/pkeys: Fix assertion in pkey_disable_set/clear() selftests/vm/pkeys: Fix alloc_random_pkey() to make it really random selftests/vm/pkeys: Introduce generic pkey abstractions selftests/vm/pkeys: Introduce powerpc support selftests/vm/pkeys: Fix assertion in test_pkey_alloc_exhaust() selftests/vm/pkeys: Improve checks to determine pkey support selftests/vm/pkeys: Associate key on a mapped page and detect access violation selftests/vm/pkeys: Associate key on a mapped page and detect write violation selftests/vm/pkeys: Detect write violation on a mapped access-denied-key page selftests/vm/pkeys: Introduce a sub-page allocator selftests/vm/pkeys: Test correct behaviour of pkey-0 selftests/vm/pkeys: Override access right definitions on powerpc Sandipan Das (5): selftests: vm: pkeys: Fix multilib builds for x86 selftests: vm: pkeys: Use sane types for pkey register selftests: vm: pkeys: Add helpers for pkey bits selftests: vm: pkeys: Use the correct huge page size selftests: vm: pkeys: Use the correct page size on powerpc Thiago Jung Bauermann (2): selftests/vm/pkeys: Move some definitions to arch-specific header selftests/vm/pkeys: Make gcc check arguments of sigsafe_printf() tools/testing/selftests/vm/.gitignore | 1 + tools/testing/selftests/vm/Makefile | 50 ++ tools/testing/selftests/vm/pkey-helpers.h | 225 ++ tools/testing/selftests/vm/pkey-powerpc.h | 136 tools/testing/selftests/vm/pkey-x86.h | 181 + .../selftests/{x86 => vm}/protection_keys.c | 696 ++ tools/testing/selftests/x86/.gitignore| 1 - tools/testing/selftests/x86/Makefile | 2 +- tools/testing/selftests/x86/pkey-helpers.h| 219 -- 9 files changed, 979 insertions(+), 532 deletions(-) create mode 100644 tools/testing/selftests/vm/pkey-helpers.h create mode 100644 tools/testing/selftests/vm/pkey-powerpc.h create mode 100644 tools/testing/selftests/vm/pkey-x86.h rename tools/testing/selftests/{x86 => vm}/protection_keys.c (74%) delete mode 100644 tools/testing/selftests/x86/pkey-helpers.h -- 2.17.1
Re: [PATCH v6 1/5] powerpc/mm: Implement set_memory() routines
Le 24/12/2019 à 06:55, Russell Currey a écrit : The set_memory_{ro/rw/nx/x}() functions are required for STRICT_MODULE_RWX, and are generally useful primitives to have. This implementation is designed to be completely generic across powerpc's many MMUs. It's possible that this could be optimised to be faster for specific MMUs, but the focus is on having a generic and safe implementation for now. This implementation does not handle cases where the caller is attempting to change the mapping of the page it is executing from, or if another CPU is concurrently using the page being altered. These cases likely shouldn't happen, but a more complex implementation with MMU-specific code could safely handle them, so that is left as a TODO for now. Signed-off-by: Russell Currey --- arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/set_memory.h | 32 +++ arch/powerpc/mm/Makefile | 1 + arch/powerpc/mm/pageattr.c| 83 +++ 4 files changed, 117 insertions(+) create mode 100644 arch/powerpc/include/asm/set_memory.h create mode 100644 arch/powerpc/mm/pageattr.c +static int __change_page_attr(pte_t *ptep, unsigned long addr, void *data) +{ + int action = *((int *)data); + pte_t pte_val; pte_val is really not a good naming, because pte_val() is already a function which returns the value of a pte_t var. Here you should name it 'pte' as usual. Christophe + + // invalidate the PTE so it's safe to modify + pte_val = ptep_get_and_clear(_mm, addr, ptep); + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); + + // modify the PTE bits as desired, then apply + switch (action) { + case SET_MEMORY_RO: + pte_val = pte_wrprotect(pte_val); + break; + case SET_MEMORY_RW: + pte_val = pte_mkwrite(pte_val); + break; + case SET_MEMORY_NX: + pte_val = pte_exprotect(pte_val); + break; + case SET_MEMORY_X: + pte_val = pte_mkexec(pte_val); + break; + default: + WARN_ON(true); + return -EINVAL; + } + + set_pte_at(_mm, addr, ptep, pte_val); + + return 0; +} +