Re: [PATCH 2/9] dma: Convert from tasklet to BH workqueue
On 28-03-24, 12:39, Allen wrote: > > I think that is very great idea. having this wrapped in dma_chan would > > be very good way as well > > > > Am not sure if Allen is up for it :-) > > Thanks Arnd, I know we did speak about this at LPC. I did start > working on using completion. I dropped it as I thought it would > be easier to move to workqueues. > > Vinod, I would like to give this a shot and put out a RFC, I would > really appreciate review and feedback. Sounds like a good plan to me -- ~Vinod
Re: [PATCH 2/9] dma: Convert from tasklet to BH workqueue
On 28-03-24, 13:01, Allen wrote: > > >> > Since almost every driver associates the tasklet with the > > >> > dma_chan, we could go one step further and add the > > >> > work_queue structure directly into struct dma_chan, > > >> > with the wrapper operating on the dma_chan rather than > > >> > the work_queue. > > >> > > >> I think that is very great idea. having this wrapped in dma_chan would > > >> be very good way as well > > >> > > >> Am not sure if Allen is up for it :-) > > > > > > Thanks Arnd, I know we did speak about this at LPC. I did start > > > working on using completion. I dropped it as I thought it would > > > be easier to move to workqueues. > > > > It's definitely easier to do the workqueue conversion as a first > > step, and I agree adding support for the completion right away is > > probably too much. Moving the work_struct into the dma_chan > > is probably not too hard though, if you leave your current > > approach for the cases where the tasklet is part of the > > dma_dev rather than the dma_chan. > > > > Alright, I will work on moving work_struck into the dma_chan and > leave the dma_dev as is (using bh workqueues) and post a RFC. > Once reviewed, I could move to the next step. That might be better from a performance pov but the current design is a global tasklet and not a per chan one... We would need to carefully review and test this for sure -- ~Vinod
[PATCH] serial: pmac_zilog: Drop usage of platform_driver_probe()
There are considerations to drop platform_driver_probe() as a concept that isn't relevant any more today. It comes with an added complexity that makes many users hold it wrong. (E.g. this driver should have marked the driver struct with __refdata to prevent the below mentioned false positive section mismatch warning.) This fixes a W=1 build warning: WARNING: modpost: drivers/tty/serial/pmac_zilog: section mismatch in reference: pmz_driver+0x8 (section: .data) -> pmz_detach (section: .exit.text) Signed-off-by: Uwe Kleine-König --- drivers/tty/serial/pmac_zilog.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/tty/serial/pmac_zilog.c b/drivers/tty/serial/pmac_zilog.c index 05d97e89511e..e44621218248 100644 --- a/drivers/tty/serial/pmac_zilog.c +++ b/drivers/tty/serial/pmac_zilog.c @@ -1695,7 +1695,7 @@ static void pmz_dispose_port(struct uart_pmac_port *uap) memset(uap, 0, sizeof(struct uart_pmac_port)); } -static int __init pmz_attach(struct platform_device *pdev) +static int pmz_attach(struct platform_device *pdev) { struct uart_pmac_port *uap; int i; @@ -1714,7 +1714,7 @@ static int __init pmz_attach(struct platform_device *pdev) return uart_add_one_port(_uart_reg, >port); } -static void __exit pmz_detach(struct platform_device *pdev) +static void pmz_detach(struct platform_device *pdev) { struct uart_pmac_port *uap = platform_get_drvdata(pdev); @@ -1789,7 +1789,8 @@ static struct macio_driver pmz_driver = { #else static struct platform_driver pmz_driver = { - .remove_new = __exit_p(pmz_detach), + .probe = pmz_attach, + .remove_new = pmz_detach, .driver = { .name = "scc", }, @@ -1837,7 +1838,7 @@ static int __init init_pmz(void) #ifdef CONFIG_PPC_PMAC return macio_register_driver(_driver); #else - return platform_driver_probe(_driver, pmz_attach); + return platform_driver_register(_driver); #endif } base-commit: a6bd6c997f5a0e2667d4d82fef8c970108f2 -- 2.43.0
[PATCH v3 1/3] arch: Select fbdev helpers with CONFIG_VIDEO
Various Kconfig options selected the per-architecture helpers for fbdev. But none of the contained code depends on fbdev. Standardize on CONFIG_VIDEO, which will allow to add more general helpers for video functionality. CONFIG_VIDEO protects each architecture's video/ directory. This allows for the use of more fine-grained control for each directory's files, such as the use of CONFIG_STI_CORE on parisc. v2: - sparc: rebased onto Makefile changes Signed-off-by: Thomas Zimmermann Reviewed-by: Sam Ravnborg Cc: "James E.J. Bottomley" Cc: Helge Deller Cc: "David S. Miller" Cc: Andreas Larsson Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: x...@kernel.org Cc: "H. Peter Anvin" --- arch/parisc/Makefile | 2 +- arch/sparc/Makefile | 4 ++-- arch/sparc/video/Makefile | 2 +- arch/x86/Makefile | 2 +- arch/x86/video/Makefile | 3 ++- 5 files changed, 7 insertions(+), 6 deletions(-) diff --git a/arch/parisc/Makefile b/arch/parisc/Makefile index 316f84f1d15c8..21b8166a68839 100644 --- a/arch/parisc/Makefile +++ b/arch/parisc/Makefile @@ -119,7 +119,7 @@ export LIBGCC libs-y += arch/parisc/lib/ $(LIBGCC) -drivers-y += arch/parisc/video/ +drivers-$(CONFIG_VIDEO) += arch/parisc/video/ boot := arch/parisc/boot diff --git a/arch/sparc/Makefile b/arch/sparc/Makefile index 2a03daa68f285..757451c3ea1df 100644 --- a/arch/sparc/Makefile +++ b/arch/sparc/Makefile @@ -59,8 +59,8 @@ endif libs-y += arch/sparc/prom/ libs-y += arch/sparc/lib/ -drivers-$(CONFIG_PM) += arch/sparc/power/ -drivers-$(CONFIG_FB_CORE) += arch/sparc/video/ +drivers-$(CONFIG_PM)+= arch/sparc/power/ +drivers-$(CONFIG_VIDEO) += arch/sparc/video/ boot := arch/sparc/boot diff --git a/arch/sparc/video/Makefile b/arch/sparc/video/Makefile index d4d83f1702c61..9dd82880a027a 100644 --- a/arch/sparc/video/Makefile +++ b/arch/sparc/video/Makefile @@ -1,3 +1,3 @@ # SPDX-License-Identifier: GPL-2.0-only -obj-$(CONFIG_FB_CORE) += fbdev.o +obj-y += fbdev.o diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 662d9d4033e6b..b80d15c29ecc6 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -260,7 +260,7 @@ drivers-$(CONFIG_PCI)+= arch/x86/pci/ # suspend and hibernation support drivers-$(CONFIG_PM) += arch/x86/power/ -drivers-$(CONFIG_FB_CORE) += arch/x86/video/ +drivers-$(CONFIG_VIDEO) += arch/x86/video/ # boot loader support. Several targets are kept for legacy purposes diff --git a/arch/x86/video/Makefile b/arch/x86/video/Makefile index 5ebe48752ffc4..9dd82880a027a 100644 --- a/arch/x86/video/Makefile +++ b/arch/x86/video/Makefile @@ -1,2 +1,3 @@ # SPDX-License-Identifier: GPL-2.0-only -obj-$(CONFIG_FB_CORE) += fbdev.o + +obj-y += fbdev.o -- 2.44.0
[PATCH v3 3/3] arch: Rename fbdev header and source files
The per-architecture fbdev code has no dependencies on fbdev and can be used for any video-related subsystem. Rename the files to 'video'. Use video-sti.c on parisc as the source file depends on CONFIG_STI_CORE. On arc, arm, arm64, sh, and um the asm header file is an empty wrapper around the file in asm-generic. Let Kbuild generate the file. The build system does this automatically. Only um needs to generate video.h explicitly, so that it overrides the host architecture's header. The latter would otherwise interfere with the build. Further update all includes statements, include guards, and Makefiles. Also update a few strings and comments to refer to video instead of fbdev. v3: - arc, arm, arm64, sh: generate asm header via build system (Sam, Helge, Arnd) - um: rename fb.h to video.h - fix typo in commit message (Sam) Signed-off-by: Thomas Zimmermann Reviewed-by: Sam Ravnborg Cc: Vineet Gupta Cc: Catalin Marinas Cc: Will Deacon Cc: Huacai Chen Cc: WANG Xuerui Cc: Geert Uytterhoeven Cc: Thomas Bogendoerfer Cc: "James E.J. Bottomley" Cc: Helge Deller Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Yoshinori Sato Cc: Rich Felker Cc: John Paul Adrian Glaubitz Cc: "David S. Miller" Cc: Andreas Larsson Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: x...@kernel.org Cc: "H. Peter Anvin" --- arch/arc/include/asm/fb.h| 8 arch/arm/include/asm/fb.h| 6 -- arch/arm64/include/asm/fb.h | 10 -- arch/loongarch/include/asm/{fb.h => video.h} | 8 arch/m68k/include/asm/{fb.h => video.h} | 8 arch/mips/include/asm/{fb.h => video.h} | 12 ++-- arch/parisc/include/asm/{fb.h => video.h}| 8 arch/parisc/video/Makefile | 2 +- arch/parisc/video/{fbdev.c => video-sti.c} | 2 +- arch/powerpc/include/asm/{fb.h => video.h} | 8 arch/powerpc/kernel/pci-common.c | 2 +- arch/sh/include/asm/fb.h | 7 --- arch/sparc/include/asm/{fb.h => video.h} | 8 arch/sparc/video/Makefile| 2 +- arch/sparc/video/{fbdev.c => video.c}| 4 ++-- arch/um/include/asm/Kbuild | 2 +- arch/x86/include/asm/{fb.h => video.h} | 8 arch/x86/video/Makefile | 2 +- arch/x86/video/{fbdev.c => video.c} | 3 ++- include/asm-generic/Kbuild | 2 +- include/asm-generic/{fb.h => video.h}| 6 +++--- include/linux/fb.h | 2 +- 22 files changed, 45 insertions(+), 75 deletions(-) delete mode 100644 arch/arc/include/asm/fb.h delete mode 100644 arch/arm/include/asm/fb.h delete mode 100644 arch/arm64/include/asm/fb.h rename arch/loongarch/include/asm/{fb.h => video.h} (86%) rename arch/m68k/include/asm/{fb.h => video.h} (86%) rename arch/mips/include/asm/{fb.h => video.h} (76%) rename arch/parisc/include/asm/{fb.h => video.h} (68%) rename arch/parisc/video/{fbdev.c => video-sti.c} (96%) rename arch/powerpc/include/asm/{fb.h => video.h} (76%) delete mode 100644 arch/sh/include/asm/fb.h rename arch/sparc/include/asm/{fb.h => video.h} (89%) rename arch/sparc/video/{fbdev.c => video.c} (86%) rename arch/x86/include/asm/{fb.h => video.h} (77%) rename arch/x86/video/{fbdev.c => video.c} (97%) rename include/asm-generic/{fb.h => video.h} (96%) diff --git a/arch/arc/include/asm/fb.h b/arch/arc/include/asm/fb.h deleted file mode 100644 index 9c2383d29cbb9..0 --- a/arch/arc/include/asm/fb.h +++ /dev/null @@ -1,8 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ - -#ifndef _ASM_FB_H_ -#define _ASM_FB_H_ - -#include - -#endif /* _ASM_FB_H_ */ diff --git a/arch/arm/include/asm/fb.h b/arch/arm/include/asm/fb.h deleted file mode 100644 index ce20a43c30339..0 --- a/arch/arm/include/asm/fb.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef _ASM_FB_H_ -#define _ASM_FB_H_ - -#include - -#endif /* _ASM_FB_H_ */ diff --git a/arch/arm64/include/asm/fb.h b/arch/arm64/include/asm/fb.h deleted file mode 100644 index 1a495d8fb2ce0..0 --- a/arch/arm64/include/asm/fb.h +++ /dev/null @@ -1,10 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Copyright (C) 2012 ARM Ltd. - */ -#ifndef __ASM_FB_H_ -#define __ASM_FB_H_ - -#include - -#endif /* __ASM_FB_H_ */ diff --git a/arch/loongarch/include/asm/fb.h b/arch/loongarch/include/asm/video.h similarity index 86% rename from arch/loongarch/include/asm/fb.h rename to arch/loongarch/include/asm/video.h index 0b218b10a9ec3..9f76845f2d4fd 100644 --- a/arch/loongarch/include/asm/fb.h +++ b/arch/loongarch/include/asm/video.h @@ -2,8 +2,8 @@ /* * Copyright (C) 2020-2022 Loongson Technology Corporation Limited */ -#ifndef _ASM_FB_H_ -#define _ASM_FB_H_ +#ifndef _ASM_VIDEO_H_ +#define _ASM_VIDEO_H_ #include #include @@ -26,6 +26,6 @@ static inline void fb_memset_io(volatile void __iomem
[PATCH v3 2/3] arch: Remove struct fb_info from video helpers
The per-architecture video helpers do not depend on struct fb_info or anything else from fbdev. Remove it from the interface and replace fb_is_primary_device() with video_is_primary_device(). The new helper is similar in functionality, but can operate on non-fbdev devices. Signed-off-by: Thomas Zimmermann Reviewed-by: Sam Ravnborg Cc: "James E.J. Bottomley" Cc: Helge Deller Cc: "David S. Miller" Cc: Andreas Larsson Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: x...@kernel.org Cc: "H. Peter Anvin" --- arch/parisc/include/asm/fb.h | 8 +--- arch/parisc/video/fbdev.c| 9 + arch/sparc/include/asm/fb.h | 7 --- arch/sparc/video/fbdev.c | 17 - arch/x86/include/asm/fb.h| 8 +--- arch/x86/video/fbdev.c | 18 +++--- drivers/video/fbdev/core/fbcon.c | 2 +- include/asm-generic/fb.h | 11 ++- 8 files changed, 41 insertions(+), 39 deletions(-) diff --git a/arch/parisc/include/asm/fb.h b/arch/parisc/include/asm/fb.h index 658a8a7dc5312..ed2a195a3e762 100644 --- a/arch/parisc/include/asm/fb.h +++ b/arch/parisc/include/asm/fb.h @@ -2,11 +2,13 @@ #ifndef _ASM_FB_H_ #define _ASM_FB_H_ -struct fb_info; +#include + +struct device; #if defined(CONFIG_STI_CORE) -int fb_is_primary_device(struct fb_info *info); -#define fb_is_primary_device fb_is_primary_device +bool video_is_primary_device(struct device *dev); +#define video_is_primary_device video_is_primary_device #endif #include diff --git a/arch/parisc/video/fbdev.c b/arch/parisc/video/fbdev.c index e4f8ac99fc9e0..540fa0c919d59 100644 --- a/arch/parisc/video/fbdev.c +++ b/arch/parisc/video/fbdev.c @@ -5,12 +5,13 @@ * Copyright (C) 2001-2002 Thomas Bogendoerfer */ -#include #include #include -int fb_is_primary_device(struct fb_info *info) +#include + +bool video_is_primary_device(struct device *dev) { struct sti_struct *sti; @@ -21,6 +22,6 @@ int fb_is_primary_device(struct fb_info *info) return true; /* return true if it's the default built-in framebuffer driver */ - return (sti->dev == info->device); + return (sti->dev == dev); } -EXPORT_SYMBOL(fb_is_primary_device); +EXPORT_SYMBOL(video_is_primary_device); diff --git a/arch/sparc/include/asm/fb.h b/arch/sparc/include/asm/fb.h index 24440c0fda490..07f0325d6921c 100644 --- a/arch/sparc/include/asm/fb.h +++ b/arch/sparc/include/asm/fb.h @@ -3,10 +3,11 @@ #define _SPARC_FB_H_ #include +#include #include -struct fb_info; +struct device; #ifdef CONFIG_SPARC32 static inline pgprot_t pgprot_framebuffer(pgprot_t prot, @@ -18,8 +19,8 @@ static inline pgprot_t pgprot_framebuffer(pgprot_t prot, #define pgprot_framebuffer pgprot_framebuffer #endif -int fb_is_primary_device(struct fb_info *info); -#define fb_is_primary_device fb_is_primary_device +bool video_is_primary_device(struct device *dev); +#define video_is_primary_device video_is_primary_device static inline void fb_memcpy_fromio(void *to, const volatile void __iomem *from, size_t n) { diff --git a/arch/sparc/video/fbdev.c b/arch/sparc/video/fbdev.c index bff66dd1909a4..e46f0499c2774 100644 --- a/arch/sparc/video/fbdev.c +++ b/arch/sparc/video/fbdev.c @@ -1,26 +1,25 @@ // SPDX-License-Identifier: GPL-2.0 #include -#include +#include #include +#include #include -int fb_is_primary_device(struct fb_info *info) +bool video_is_primary_device(struct device *dev) { - struct device *dev = info->device; - struct device_node *node; + struct device_node *node = dev->of_node; if (console_set_on_cmdline) - return 0; + return false; - node = dev->of_node; if (node && node == of_console_device) - return 1; + return true; - return 0; + return false; } -EXPORT_SYMBOL(fb_is_primary_device); +EXPORT_SYMBOL(video_is_primary_device); MODULE_DESCRIPTION("Sparc fbdev helpers"); MODULE_LICENSE("GPL"); diff --git a/arch/x86/include/asm/fb.h b/arch/x86/include/asm/fb.h index c3b9582de7efd..999db33792869 100644 --- a/arch/x86/include/asm/fb.h +++ b/arch/x86/include/asm/fb.h @@ -2,17 +2,19 @@ #ifndef _ASM_X86_FB_H #define _ASM_X86_FB_H +#include + #include -struct fb_info; +struct device; pgprot_t pgprot_framebuffer(pgprot_t prot, unsigned long vm_start, unsigned long vm_end, unsigned long offset); #define pgprot_framebuffer pgprot_framebuffer -int fb_is_primary_device(struct fb_info *info); -#define fb_is_primary_device fb_is_primary_device +bool video_is_primary_device(struct device *dev); +#define video_is_primary_device video_is_primary_device #include diff --git a/arch/x86/video/fbdev.c b/arch/x86/video/fbdev.c index 1dd6528cc947c..4d87ce8e257fe 100644 --- a/arch/x86/video/fbdev.c +++ b/arch/x86/video/fbdev.c @@ -7,7 +7,6 @@ * */
[PATCH v3 0/3] arch: Remove fbdev dependency from video helpers
Make architecture helpers for display functionality depend on general video functionality instead of fbdev. This avoids the dependency on fbdev and makes the functionality available for non-fbdev code. Patch 1 replaces the variety of Kconfig options that control the Makefiles with CONFIG_VIDEO. More fine-grained control of the build can then be done within each video/ directory; see parisc for an example. Patch 2 replaces fb_is_primary_device() with video_is_primary_device(), which has no dependencies on fbdev. The implementation remains identical on all affected platforms. There's one minor change in fbcon, which is the only caller of fb_is_primary_device(). Patch 3 renames the source and header files from fbdev to video. v3: - arc, arm, arm64, sh, um: generate asm/video.h (Sam, Helge, Arnd) - fix typos (Sam) v2: - improve cover letter - rebase onto v6.9-rc1 Thomas Zimmermann (3): arch: Select fbdev helpers with CONFIG_VIDEO arch: Remove struct fb_info from video helpers arch: Rename fbdev header and source files arch/arc/include/asm/fb.h| 8 -- arch/arm/include/asm/fb.h| 6 - arch/arm64/include/asm/fb.h | 10 arch/loongarch/include/asm/{fb.h => video.h} | 8 +++--- arch/m68k/include/asm/{fb.h => video.h} | 8 +++--- arch/mips/include/asm/{fb.h => video.h} | 12 - arch/parisc/Makefile | 2 +- arch/parisc/include/asm/fb.h | 14 --- arch/parisc/include/asm/video.h | 16 arch/parisc/video/Makefile | 2 +- arch/parisc/video/{fbdev.c => video-sti.c} | 9 --- arch/powerpc/include/asm/{fb.h => video.h} | 8 +++--- arch/powerpc/kernel/pci-common.c | 2 +- arch/sh/include/asm/fb.h | 7 -- arch/sparc/Makefile | 4 +-- arch/sparc/include/asm/{fb.h => video.h} | 15 +-- arch/sparc/video/Makefile| 2 +- arch/sparc/video/fbdev.c | 26 arch/sparc/video/video.c | 25 +++ arch/um/include/asm/Kbuild | 2 +- arch/x86/Makefile| 2 +- arch/x86/include/asm/fb.h| 19 -- arch/x86/include/asm/video.h | 21 arch/x86/video/Makefile | 3 ++- arch/x86/video/{fbdev.c => video.c} | 21 +++- drivers/video/fbdev/core/fbcon.c | 2 +- include/asm-generic/Kbuild | 2 +- include/asm-generic/{fb.h => video.h}| 17 +++-- include/linux/fb.h | 2 +- 29 files changed, 124 insertions(+), 151 deletions(-) delete mode 100644 arch/arc/include/asm/fb.h delete mode 100644 arch/arm/include/asm/fb.h delete mode 100644 arch/arm64/include/asm/fb.h rename arch/loongarch/include/asm/{fb.h => video.h} (86%) rename arch/m68k/include/asm/{fb.h => video.h} (86%) rename arch/mips/include/asm/{fb.h => video.h} (76%) delete mode 100644 arch/parisc/include/asm/fb.h create mode 100644 arch/parisc/include/asm/video.h rename arch/parisc/video/{fbdev.c => video-sti.c} (78%) rename arch/powerpc/include/asm/{fb.h => video.h} (76%) delete mode 100644 arch/sh/include/asm/fb.h rename arch/sparc/include/asm/{fb.h => video.h} (75%) delete mode 100644 arch/sparc/video/fbdev.c create mode 100644 arch/sparc/video/video.c delete mode 100644 arch/x86/include/asm/fb.h create mode 100644 arch/x86/include/asm/video.h rename arch/x86/video/{fbdev.c => video.c} (66%) rename include/asm-generic/{fb.h => video.h} (89%) -- 2.44.0
Re: [PATCH 0/9] enabled -Wformat-truncation for clang
Hello: This series was applied to netdev/net-next.git (main) by Jakub Kicinski : On Tue, 26 Mar 2024 23:37:59 +0100 you wrote: > From: Arnd Bergmann > > With randconfig build testing, I found only eight files that produce > warnings with clang when -Wformat-truncation is enabled. This means > we can just turn it on by default rather than only enabling it for > "make W=1". > > [...] Here is the summary with links: - [2/9] enetc: avoid truncating error message https://git.kernel.org/netdev/net-next/c/9046d581ed58 - [3/9] qed: avoid truncating work queue length https://git.kernel.org/netdev/net-next/c/954fd908f177 - [4/9] mlx5: avoid truncating error message https://git.kernel.org/netdev/net-next/c/b324a960354b You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
Re: [PATCH v12 8/8] PCI: endpoint: Remove "core_init_notifier" flag
On Wed, Mar 27, 2024 at 02:43:37PM +0530, Manivannan Sadhasivam wrote: > "core_init_notifier" flag is set by the glue drivers requiring refclk from > the host to complete the DWC core initialization. Also, those drivers will > send a notification to the EPF drivers once the initialization is fully > completed using the pci_epc_init_notify() API. Only then, the EPF drivers > will start functioning. > > For the rest of the drivers generating refclk locally, EPF drivers will > start functioning post binding with them. EPF drivers rely on the > 'core_init_notifier' flag to differentiate between the drivers. > Unfortunately, this creates two different flows for the EPF drivers. > > So to avoid that, let's get rid of the "core_init_notifier" flag and follow > a single initialization flow for the EPF drivers. This is done by calling > the dw_pcie_ep_init_notify() from all glue drivers after the completion of > dw_pcie_ep_init_registers() API. This will allow all the glue drivers to > send the notification to the EPF drivers once the initialization is fully > completed. > > Only difference here is that, the drivers requiring refclk from host will > send the notification once refclk is received, while others will send it > during probe time itself. > > But this also requires the EPC core driver to deliver the notification > after EPF driver bind. Because, the glue driver can send the notification > before the EPF drivers bind() and in those cases the EPF drivers will miss > the event. To accommodate this, EPC core is now caching the state of the > EPC initialization in 'init_complete' flag and pci-ep-cfs driver sends the > notification to EPF drivers based on that after each EPF driver bind. > > Tested-by: Niklas Cassel Reviewed-by: Frank Li > Signed-off-by: Manivannan Sadhasivam > --- > drivers/pci/controller/cadence/pcie-cadence-ep.c | 2 ++ > drivers/pci/controller/dwc/pci-dra7xx.c | 2 ++ > drivers/pci/controller/dwc/pci-imx6.c | 2 ++ > drivers/pci/controller/dwc/pci-keystone.c | 2 ++ > drivers/pci/controller/dwc/pci-layerscape-ep.c| 2 ++ > drivers/pci/controller/dwc/pcie-artpec6.c | 2 ++ > drivers/pci/controller/dwc/pcie-designware-ep.c | 1 + > drivers/pci/controller/dwc/pcie-designware-plat.c | 2 ++ > drivers/pci/controller/dwc/pcie-keembay.c | 2 ++ > drivers/pci/controller/dwc/pcie-qcom-ep.c | 1 - > drivers/pci/controller/dwc/pcie-rcar-gen4.c | 2 ++ > drivers/pci/controller/dwc/pcie-tegra194.c| 1 - > drivers/pci/controller/dwc/pcie-uniphier-ep.c | 2 ++ > drivers/pci/controller/pcie-rcar-ep.c | 2 ++ > drivers/pci/controller/pcie-rockchip-ep.c | 2 ++ > drivers/pci/endpoint/functions/pci-epf-test.c | 18 +- > drivers/pci/endpoint/pci-ep-cfs.c | 9 + > drivers/pci/endpoint/pci-epc-core.c | 22 ++ > include/linux/pci-epc.h | 7 --- > 19 files changed, 65 insertions(+), 18 deletions(-) > > diff --git a/drivers/pci/controller/cadence/pcie-cadence-ep.c > b/drivers/pci/controller/cadence/pcie-cadence-ep.c > index 81c50dc64da9..55c42ca2b777 100644 > --- a/drivers/pci/controller/cadence/pcie-cadence-ep.c > +++ b/drivers/pci/controller/cadence/pcie-cadence-ep.c > @@ -746,6 +746,8 @@ int cdns_pcie_ep_setup(struct cdns_pcie_ep *ep) > > spin_lock_init(>lock); > > + pci_epc_init_notify(epc); > + > return 0; > > free_epc_mem: > diff --git a/drivers/pci/controller/dwc/pci-dra7xx.c > b/drivers/pci/controller/dwc/pci-dra7xx.c > index 395042b29ffc..d2d17d37d3e0 100644 > --- a/drivers/pci/controller/dwc/pci-dra7xx.c > +++ b/drivers/pci/controller/dwc/pci-dra7xx.c > @@ -474,6 +474,8 @@ static int dra7xx_add_pcie_ep(struct dra7xx_pcie *dra7xx, > return ret; > } > > + dw_pcie_ep_init_notify(ep); > + > return 0; > } > > diff --git a/drivers/pci/controller/dwc/pci-imx6.c > b/drivers/pci/controller/dwc/pci-imx6.c > index 8d28ecc381bc..917c69edee1d 100644 > --- a/drivers/pci/controller/dwc/pci-imx6.c > +++ b/drivers/pci/controller/dwc/pci-imx6.c > @@ -1131,6 +1131,8 @@ static int imx6_add_pcie_ep(struct imx6_pcie *imx6_pcie, > return ret; > } > > + dw_pcie_ep_init_notify(ep); > + > /* Start LTSSM. */ > imx6_pcie_ltssm_enable(dev); > > diff --git a/drivers/pci/controller/dwc/pci-keystone.c > b/drivers/pci/controller/dwc/pci-keystone.c > index 81ebac520650..d3a7d14ee685 100644 > --- a/drivers/pci/controller/dwc/pci-keystone.c > +++ b/drivers/pci/controller/dwc/pci-keystone.c > @@ -1293,6 +1293,8 @@ static int ks_pcie_probe(struct platform_device *pdev) > goto err_ep_init; > } > > + dw_pcie_ep_init_notify(>ep); > + > break; > default: > dev_err(dev, "INVALID device type %d\n", mode); > diff --git
Re: [PATCH v12 7/8] PCI: dwc: ep: Call dw_pcie_ep_init_registers() API directly from all glue drivers
On Wed, Mar 27, 2024 at 02:43:36PM +0530, Manivannan Sadhasivam wrote: > Currently, dw_pcie_ep_init_registers() API is directly called by the glue > drivers requiring active refclk from host. But for the other drivers, it is > getting called implicitly by dw_pcie_ep_init(). This is due to the fact > that this API initializes DWC EP specific registers and that requires an > active refclk (either from host or generated locally by endpoint itsef). > > But, this causes a discrepancy among the glue drivers. So to avoid this > confusion, let's call this API directly from all glue drivers irrespective > of refclk dependency. Only difference here is that the drivers requiring > refclk from host will call this API only after the refclk is received and > other drivers without refclk dependency will call this API right after > dw_pcie_ep_init(). > > With this change, the check for 'core_init_notifier' flag can now be > dropped from dw_pcie_ep_init() API. This will also allow us to remove the > 'core_init_notifier' flag completely in the later commits. > > Reviewed-by: Yoshihiro Shimoda > Reviewed-by: Niklas Cassel Reviewed-by: Frank Li > Signed-off-by: Manivannan Sadhasivam > --- > drivers/pci/controller/dwc/pci-dra7xx.c | 7 +++ > drivers/pci/controller/dwc/pci-imx6.c | 8 > drivers/pci/controller/dwc/pci-keystone.c | 9 + > drivers/pci/controller/dwc/pci-layerscape-ep.c| 7 +++ > drivers/pci/controller/dwc/pcie-artpec6.c | 13 - > drivers/pci/controller/dwc/pcie-designware-ep.c | 22 -- > drivers/pci/controller/dwc/pcie-designware-plat.c | 9 + > drivers/pci/controller/dwc/pcie-keembay.c | 16 +++- > drivers/pci/controller/dwc/pcie-rcar-gen4.c | 12 +++- > drivers/pci/controller/dwc/pcie-uniphier-ep.c | 13 - > 10 files changed, 90 insertions(+), 26 deletions(-) > > diff --git a/drivers/pci/controller/dwc/pci-dra7xx.c > b/drivers/pci/controller/dwc/pci-dra7xx.c > index 0e406677060d..395042b29ffc 100644 > --- a/drivers/pci/controller/dwc/pci-dra7xx.c > +++ b/drivers/pci/controller/dwc/pci-dra7xx.c > @@ -467,6 +467,13 @@ static int dra7xx_add_pcie_ep(struct dra7xx_pcie *dra7xx, > return ret; > } > > + ret = dw_pcie_ep_init_registers(ep); > + if (ret) { > + dev_err(dev, "Failed to initialize DWC endpoint registers\n"); > + dw_pcie_ep_deinit(ep); > + return ret; > + } > + > return 0; > } > > diff --git a/drivers/pci/controller/dwc/pci-imx6.c > b/drivers/pci/controller/dwc/pci-imx6.c > index 99a60270b26c..8d28ecc381bc 100644 > --- a/drivers/pci/controller/dwc/pci-imx6.c > +++ b/drivers/pci/controller/dwc/pci-imx6.c > @@ -1123,6 +1123,14 @@ static int imx6_add_pcie_ep(struct imx6_pcie > *imx6_pcie, > dev_err(dev, "failed to initialize endpoint\n"); > return ret; > } > + > + ret = dw_pcie_ep_init_registers(ep); > + if (ret) { > + dev_err(dev, "Failed to initialize DWC endpoint registers\n"); > + dw_pcie_ep_deinit(ep); > + return ret; > + } > + > /* Start LTSSM. */ > imx6_pcie_ltssm_enable(dev); > > diff --git a/drivers/pci/controller/dwc/pci-keystone.c > b/drivers/pci/controller/dwc/pci-keystone.c > index 844de4418724..81ebac520650 100644 > --- a/drivers/pci/controller/dwc/pci-keystone.c > +++ b/drivers/pci/controller/dwc/pci-keystone.c > @@ -1286,6 +1286,13 @@ static int ks_pcie_probe(struct platform_device *pdev) > ret = dw_pcie_ep_init(>ep); > if (ret < 0) > goto err_get_sync; > + > + ret = dw_pcie_ep_init_registers(>ep); > + if (ret) { > + dev_err(dev, "Failed to initialize DWC endpoint > registers\n"); > + goto err_ep_init; > + } > + > break; > default: > dev_err(dev, "INVALID device type %d\n", mode); > @@ -1295,6 +1302,8 @@ static int ks_pcie_probe(struct platform_device *pdev) > > return 0; > > +err_ep_init: > + dw_pcie_ep_deinit(>ep); > err_get_sync: > pm_runtime_put(dev); > pm_runtime_disable(dev); > diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c > b/drivers/pci/controller/dwc/pci-layerscape-ep.c > index 1f6ee1460ec2..9eb2233e3d7f 100644 > --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c > +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c > @@ -279,6 +279,13 @@ static int __init ls_pcie_ep_probe(struct > platform_device *pdev) > if (ret) > return ret; > > + ret = dw_pcie_ep_init_registers(>ep); > + if (ret) { > + dev_err(dev, "Failed to initialize DWC endpoint registers\n"); > + dw_pcie_ep_deinit(>ep); > + return ret; > + } > + > return ls_pcie_ep_interrupt_init(pcie, pdev); > } >
Re: [PATCH v4 10/15] x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
On 2024-03-29 12:28 PM, Dave Hansen wrote: > On 3/29/24 00:18, Samuel Holland wrote: >> +# >> +# CFLAGS for compiling floating point code inside the kernel. >> +# >> +CC_FLAGS_FPU := -msse -msse2 >> +ifdef CONFIG_CC_IS_GCC >> +# Stack alignment mismatch, proceed with caution. >> +# GCC < 7.1 cannot compile code using `double` and >> -mpreferred-stack-boundary=3 >> +# (8B stack alignment). >> +# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383 >> +# >> +# The "-msse" in the first argument is there so that the >> +# -mpreferred-stack-boundary=3 build error: >> +# >> +# -mpreferred-stack-boundary=3 is not between 4 and 12 >> +# >> +# can be triggered. Otherwise gcc doesn't complain. >> +CC_FLAGS_FPU += -mhard-float >> +CC_FLAGS_FPU += $(call cc-option,-msse >> -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4) >> +endif > > I was expecting to see this (now duplicate) hunk come _out_ of > lib/Makefile somewhere in the series. > > Did I miss that, or is there something keeping the duplicate there? This hunk is removed in patch 15/15, after the conversion of lib/test_fpu.c: https://lore.kernel.org/linux-kernel/20240329072441.591471-16-samuel.holl...@sifive.com/ Regards, Samuel
Re: [PATCH v4 09/15] x86/fpu: Fix asm/fpu/types.h include guard
On 3/29/24 00:18, Samuel Holland wrote: > The include guard should match the filename, or it will conflict with > the newly-added asm/fpu.h. Acked-by: Dave Hansen
Re: [PATCH v4 10/15] x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
On 3/29/24 00:18, Samuel Holland wrote: > +# > +# CFLAGS for compiling floating point code inside the kernel. > +# > +CC_FLAGS_FPU := -msse -msse2 > +ifdef CONFIG_CC_IS_GCC > +# Stack alignment mismatch, proceed with caution. > +# GCC < 7.1 cannot compile code using `double` and > -mpreferred-stack-boundary=3 > +# (8B stack alignment). > +# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383 > +# > +# The "-msse" in the first argument is there so that the > +# -mpreferred-stack-boundary=3 build error: > +# > +# -mpreferred-stack-boundary=3 is not between 4 and 12 > +# > +# can be triggered. Otherwise gcc doesn't complain. > +CC_FLAGS_FPU += -mhard-float > +CC_FLAGS_FPU += $(call cc-option,-msse > -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4) > +endif I was expecting to see this (now duplicate) hunk come _out_ of lib/Makefile somewhere in the series. Did I miss that, or is there something keeping the duplicate there?
Re: [PATCH v2 12/14] sh: Add support for suppressing warning backtraces
On Wed, Mar 27, 2024 at 07:39:20PM +, Simon Horman wrote: [ ... ] > > > > > > Hi Guenter, > > > > > > a minor nit from my side: this change results in a Kernel doc warning. > > > > > > .../bug.h:29: warning: expecting prototype for _EMIT_BUG_ENTRY(). > > > Prototype was for HAVE_BUG_FUNCTION() instead > > > > > > Perhaps either the new code should be placed above the Kernel doc, > > > or scripts/kernel-doc should be enhanced? > > > > > > > Thanks a lot for the feedback. > > > > The definition block needs to be inside CONFIG_DEBUG_BUGVERBOSE, > > so it would be a bit odd to move it above the documentation > > just to make kerneldoc happy. I am not really sure that to do > > about it. > > FWIIW, I agree that would be odd. > But perhaps the #ifdef could also move above the Kernel doc? > Maybe not a great idea, but the best one I've had so far. > I did that for the next version of the patch series. It is a bit more clumsy, so I left it as separate patch on top of this patch. I'd still like to get input from others before making the change final. Thanks, Guenter
Re: [powerpc] WARN at drivers/scsi/sg.c:2236 (sg_remove_sfp_usercontext)
> Can you check the debug patch below and provide output? > When I'm right the warning should be gone and you should just get the > "Modification triggered" instead. When I'm wrong we should at least see, > how many references d_ref has left. > With the debug patch applied, code says d_ref value is 2 # ./ioctl_sg01 tst_test.c:1741: TINFO: LTP version: 20210524-2511-g00b497c47 tst_test.c:1625: TINFO: Timeout per run is 1h 00m 30s ioctl_sg01.c:83: TINFO: Found SCSI device /dev/sg0 [ 36.016630] [ cut here ] [ 36.016674] WARNING: CPU: 19 PID: 460 at drivers/scsi/sg.c:2238 sg_remove_sfp_usercontext+0x270/0x298 [sg] [ 36.016707] Modules linked in: rpadlpar_io rpaphp xsk_diag nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bonding tls rfkill ip_set nf_tables nfnetlink sunrpc binfmt_misc pseries_rng vmx_crypto xfs libcrc32c sd_mod sr_mod t10_pi crc64_rocksoft_generic cdrom crc64_rocksoft crc64 sg ibmvscsi scsi_transport_srp ibmveth fuse [ 36.016834] CPU: 19 PID: 460 Comm: kworker/19:1 Kdump: loaded Not tainted 6.9.0-rc1-next-20240328-dirty #3 [ 36.016849] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf06 of:IBM,FW1060.00 (NH1060_018) hv:phyp pSeries [ 36.016868] Workqueue: events sg_remove_sfp_usercontext [sg] [ 36.016889] NIP: c00815cf4110 LR: c00815cf4000 CTR: c05393b0 [ 36.016903] REGS: c0009414fae0 TRAP: 0700 Not tainted (6.9.0-rc1-next-20240328-dirty) [ 36.016921] MSR: 8282b033 CR: 44000448 XER: [ 36.016962] CFAR: c00815cf400c IRQMASK: 0 [ 36.016962] GPR00: c00815cf4000 c0009414fd80 c00815d18900 [ 36.016962] GPR04: c000 0023 c8dee000 0022 [ 36.016962] GPR08: 00038a6d 0002 c00815cf8c10 [ 36.016962] GPR12: c05393b0 c0038ffe8b00 c01a2bac c8e4e980 [ 36.016962] GPR16: [ 36.016962] GPR20: c0038c993b00 c8dea030 c8dea000 c000a2712000 [ 36.016962] GPR24: cc3bd380 c45ab205 c8deb330 [ 36.016962] GPR28: c0038c993b00 c8dea080 c8deb328 cc3bd418 [ 36.017107] NIP [c00815cf4110] sg_remove_sfp_usercontext+0x270/0x298 [sg] [ 36.017129] LR [c00815cf4000] sg_remove_sfp_usercontext+0x160/0x298 [sg] [ 36.017144] Call Trace: [ 36.017148] [c0009414fd80] [c00815cf4000] sg_remove_sfp_usercontext+0x160/0x298 [sg] (unreliable) [ 36.017169] [c0009414fe40] [c019337c] process_one_work+0x20c/0x4f4 [ 36.017189] [c0009414fef0] [c01942fc] worker_thread+0x378/0x544 [ 36.017208] [c0009414ff90] [c01a2cdc] kthread+0x138/0x140 [ 36.017225] [c0009414ffe0] [c000df98] start_kernel_thread+0x14/0x18 [ 36.017241] Code: 3bf90098 e8c98310 3d22 e8698010 48004509 e8410018 7ec3b378 48004b15 e8410018 81390098 2c090001 4182ff04 <0fe0> 80990098 3d22 78840020 [ 36.017289] ---[ end trace ]--- [ 36.017302] d_ref=2 ioctl_sg01.c:124: TPASS: Output buffer is empty, no data leaked [ 44.707319] d_ref=2 Summary: passed 1 failed 0 broken 0 skipped 0 warnings 0 — Sachin
Re: FAILED: Patch "powerpc: xor_vmx: Add '-mhard-float' to CFLAGS" failed to apply to 5.10-stable tree
On Wed, Mar 27, 2024 at 08:16:13AM -0700, Nathan Chancellor wrote: > On Wed, Mar 27, 2024 at 08:20:07AM -0400, Sasha Levin wrote: > > The patch below does not apply to the 5.10-stable tree. > > If someone wants it applied there, or to any other stable or longterm > > tree, then please email the backport, including the original git commit > > id to . > ... > > -- original commit in Linus's tree -- > > > > From 35f20786c481d5ced9283ff42de5c69b65e5ed13 Mon Sep 17 00:00:00 2001 > > From: Nathan Chancellor > > Date: Sat, 27 Jan 2024 11:07:43 -0700 > > Subject: [PATCH] powerpc: xor_vmx: Add '-mhard-float' to CFLAGS > > I have attached a backport that will work for 5.15 and earlier. I think > you worked around this conflict in 5.15 by taking 04e85bbf71c9 but I am > not sure that is a smart idea. I think it might just be better to drop > that dependency and apply this version in 5.15. I'll go drop it and take this version, thanks! greg k-h
Re: [powerpc] WARN at drivers/scsi/sg.c:2236 (sg_remove_sfp_usercontext)
> Following WARN_ON_ONCE is triggered while running LTP tests > (specifically ioctl_sg01) on IBM Power booted with 6.9.0-rc1-next-20240328 > > [ 64.230233] [ cut here ] > [ 64.230269] WARNING: CPU: 10 PID: 452 at drivers/scsi/sg.c:2236 > sg_remove_sfp_usercontext+0x270/0x280 [sg] > [ 64.230302] Modules linked in: rpadlpar_io rpaphp xsk_diag nft_fib_inet > nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 > nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack > nf_defrag_ipv6 nf_defrag_ipv4 bonding tls rfkill ip_set nf_tables nfnetlink > sunrpc binfmt_misc pseries_rng vmx_crypto xfs libcrc32c sd_mod sr_mod t10_pi > crc64_rocksoft_generic cdrom crc64_rocksoft crc64 sg ibmvscsi ibmveth > scsi_transport_srp fuse > [ 64.230420] CPU: 10 PID: 452 Comm: kworker/10:1 Kdump: loaded Not tainted > 6.9.0-rc1-next-20240328 #2 > [ 64.230438] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf06 > of:IBM,FW1060.00 (NH1060_018) hv:phyp pSeries > [ 64.230449] Workqueue: events sg_remove_sfp_usercontext [sg] > [ 64.230468] NIP: c00815c34110 LR: c00815c33ffc CTR: > c05393b0 > [ 64.230485] REGS: cc1efae0 TRAP: 0700 Not tainted > (6.9.0-rc1-next-20240328) > [ 64.230498] MSR: 8282b033 CR: > 44000408 XER: > [ 64.230535] CFAR: c00815c3400c IRQMASK: 0 > [ 64.230535] GPR00: c00815c33ffc cc1efd80 c00815c58900 > cca8ae98 > [ 64.230535] GPR04: c000 0023 c7c2e000 > 0022 > [ 64.230535] GPR08: 00038a13 0002 > c00815c38bc0 > [ 64.230535] GPR12: c05393b0 c0038fff3f00 c01a2bac > c7c7a9c0 > [ 64.230535] GPR16: > > [ 64.230535] GPR20: c0038c3f3b00 c7c10030 c7c1 > c000901c > [ 64.230535] GPR24: cca8ae00 c45a5805 > c7c11330 > [ 64.230535] GPR28: c0038c3f3b00 c7c10080 c7c11328 > c2fdee54 > [ 64.230671] NIP [c00815c34110] sg_remove_sfp_usercontext+0x270/0x280 > [sg] > [ 64.230690] LR [c00815c33ffc] sg_remove_sfp_usercontext+0x15c/0x280 > [sg] > [ 64.230709] Call Trace: > [ 64.230716] [cc1efd80] [c00815c33ffc] > sg_remove_sfp_usercontext+0x15c/0x280 [sg] (unreliable) > [ 64.230740] [cc1efe40] [c019337c] > process_one_work+0x20c/0x4f4 > [ 64.230767] [cc1efef0] [c01942fc] worker_thread+0x378/0x544 > [ 64.230787] [cc1eff90] [c01a2cdc] kthread+0x138/0x140 > [ 64.230801] [cc1effe0] [c000df98] > start_kernel_thread+0x14/0x18 > [ 64.230819] Code: e8c98310 3d22 e8698010 480044bd e8410018 7ec3b378 > 48004ac9 e8410018 38790098 81390098 2c090001 4182ff04 <0fe0> 4bfffefc > 000247e0 > [ 64.230857] ---[ end trace ]— > > This WARN_ON was introduced with > commit 27f58c04a8f438078583041468ec60597841284d > scsi: sg: Avoid sg device teardown race > > Reverting the patch avoids the warning. The test case passes irrespective of > the > patch is present of not. > The new WARN_ON_ONCE is only an additional logic check. When it triggers it also should trigger when you undo the rest of the change. But when it triggers something with the driver logic must be off. (Or my understanding of the intent of the code is worse than assumed:-) Looking into the d_ref logic I see two additional problems not addressed by the original patch when sg_add_sfp() fails: 1) sg_open() is then also calling first scsi_device_put() and then sg_device_destroy() via kref_put(). That's the wrong order. 2) When sg_add_sfp() fails we never call kref_get(>d_ref). Thus we shoud not call kref_get() here at all. Thus your warning above could be triggered by an error within sg_add_sfp(): In that case d_ref would already be zero when the code gets to the warning. Can you check the debug patch below and provide output? When I'm right the warning should be gone and you should just get the "Modification triggered" instead. When I'm wrong we should at least see, how many references d_ref has left. Alexander --- drivers/scsi/sg.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index ff6894ce5404..1c27d5f8f384 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -373,7 +373,8 @@ sg_open(struct inode *inode, struct file *filp) scsi_autopm_put_device(sdp->device); sdp_put: scsi_device_put(sdp->device); - goto sg_put; + pr_warn("%s: Modification triggered\n", __func__); + return retval; } /* Release resources associated with a successful sg_open() @@ -2233,7 +2234,8 @@ sg_remove_sfp_usercontext(struct work_struct *work) "sg_remove_sfp: sfp=0x%p\n", sfp));
[powerpc] WARN at drivers/scsi/sg.c:2236 (sg_remove_sfp_usercontext)
Following WARN_ON_ONCE is triggered while running LTP tests (specifically ioctl_sg01) on IBM Power booted with 6.9.0-rc1-next-20240328 [ 64.230233] [ cut here ] [ 64.230269] WARNING: CPU: 10 PID: 452 at drivers/scsi/sg.c:2236 sg_remove_sfp_usercontext+0x270/0x280 [sg] [ 64.230302] Modules linked in: rpadlpar_io rpaphp xsk_diag nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bonding tls rfkill ip_set nf_tables nfnetlink sunrpc binfmt_misc pseries_rng vmx_crypto xfs libcrc32c sd_mod sr_mod t10_pi crc64_rocksoft_generic cdrom crc64_rocksoft crc64 sg ibmvscsi ibmveth scsi_transport_srp fuse [ 64.230420] CPU: 10 PID: 452 Comm: kworker/10:1 Kdump: loaded Not tainted 6.9.0-rc1-next-20240328 #2 [ 64.230438] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf06 of:IBM,FW1060.00 (NH1060_018) hv:phyp pSeries [ 64.230449] Workqueue: events sg_remove_sfp_usercontext [sg] [ 64.230468] NIP: c00815c34110 LR: c00815c33ffc CTR: c05393b0 [ 64.230485] REGS: cc1efae0 TRAP: 0700 Not tainted (6.9.0-rc1-next-20240328) [ 64.230498] MSR: 8282b033 CR: 44000408 XER: [ 64.230535] CFAR: c00815c3400c IRQMASK: 0 [ 64.230535] GPR00: c00815c33ffc cc1efd80 c00815c58900 cca8ae98 [ 64.230535] GPR04: c000 0023 c7c2e000 0022 [ 64.230535] GPR08: 00038a13 0002 c00815c38bc0 [ 64.230535] GPR12: c05393b0 c0038fff3f00 c01a2bac c7c7a9c0 [ 64.230535] GPR16: [ 64.230535] GPR20: c0038c3f3b00 c7c10030 c7c1 c000901c [ 64.230535] GPR24: cca8ae00 c45a5805 c7c11330 [ 64.230535] GPR28: c0038c3f3b00 c7c10080 c7c11328 c2fdee54 [ 64.230671] NIP [c00815c34110] sg_remove_sfp_usercontext+0x270/0x280 [sg] [ 64.230690] LR [c00815c33ffc] sg_remove_sfp_usercontext+0x15c/0x280 [sg] [ 64.230709] Call Trace: [ 64.230716] [cc1efd80] [c00815c33ffc] sg_remove_sfp_usercontext+0x15c/0x280 [sg] (unreliable) [ 64.230740] [cc1efe40] [c019337c] process_one_work+0x20c/0x4f4 [ 64.230767] [cc1efef0] [c01942fc] worker_thread+0x378/0x544 [ 64.230787] [cc1eff90] [c01a2cdc] kthread+0x138/0x140 [ 64.230801] [cc1effe0] [c000df98] start_kernel_thread+0x14/0x18 [ 64.230819] Code: e8c98310 3d22 e8698010 480044bd e8410018 7ec3b378 48004ac9 e8410018 38790098 81390098 2c090001 4182ff04 <0fe0> 4bfffefc 000247e0 [ 64.230857] ---[ end trace ]— This WARN_ON was introduced with commit 27f58c04a8f438078583041468ec60597841284d scsi: sg: Avoid sg device teardown race Reverting the patch avoids the warning. The test case passes irrespective of the patch is present of not. -- Sachin
Re: [PATCH v11 00/11] Support page table check PowerPC
Le 28/03/2024 à 08:57, Christophe Leroy a écrit : > > > Le 28/03/2024 à 07:52, Christophe Leroy a écrit : >> >> >> Le 28/03/2024 à 05:55, Rohan McLure a écrit : >>> Support page table check on all PowerPC platforms. This works by >>> serialising assignments, reassignments and clears of page table >>> entries at each level in order to ensure that anonymous mappings >>> have at most one writable consumer, and likewise that file-backed >>> mappings are not simultaneously also anonymous mappings. >>> >>> In order to support this infrastructure, a number of stubs must be >>> defined for all powerpc platforms. Additionally, seperate set_pte_at() >>> and set_pte_at_unchecked(), to allow for internal, uninstrumented >>> mappings. >> >> I gave it a try on QEMU e500 (64 bits), and get the following Oops. >> What do I have to look for ? >> >> Freeing unused kernel image (initmem) memory: 2588K >> This architecture does not have kernel memory protection. >> Run /init as init process >> [ cut here ] >> kernel BUG at mm/page_table_check.c:119! >> Oops: Exception in kernel mode, sig: 5 [#1] >> BE PAGE_SIZE=4K SMP NR_CPUS=32 QEMU e500 > > Same problem on my 8xx board: > > [ 7.358146] Freeing unused kernel image (initmem) memory: 448K > [ 7.363957] Run /init as init process > [ 7.370955] [ cut here ] > [ 7.375411] kernel BUG at mm/page_table_check.c:119! > [ 7.380393] Oops: Exception in kernel mode, sig: 5 [#1] > [ 7.385621] BE PAGE_SIZE=16K PREEMPT CMPC885 Both problems are fixed by following change: diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h index 413d01a51e6f..5b932632a5d7 100644 --- a/arch/powerpc/include/asm/nohash/pgtable.h +++ b/arch/powerpc/include/asm/nohash/pgtable.h @@ -29,6 +29,8 @@ static inline pte_basic_t pte_update(struct mm_struct *mm, unsigned long addr, p #ifndef __ASSEMBLY__ +#include + extern int icache_44x_need_flush; /* @@ -92,7 +94,11 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - return __pte(pte_update(mm, addr, ptep, ~0UL, 0, 0)); + pte_t old_pte = __pte(pte_update(mm, addr, ptep, ~0UL, 0, 0)); + + page_table_check_pte_clear(mm, addr, old_pte); + + return old_pte; } #define __HAVE_ARCH_PTEP_GET_AND_CLEAR
[PATCH v4 15/15] selftests/fpu: Allow building on other architectures
Now that ARCH_HAS_KERNEL_FPU_SUPPORT provides a common way to compile and run floating-point code, this test is no longer x86-specific. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) lib/Kconfig.debug | 2 +- lib/Makefile| 25 ++--- lib/test_fpu_glue.c | 5 - 3 files changed, 7 insertions(+), 25 deletions(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index c63a5fbf1f1c..f93e778e0405 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2890,7 +2890,7 @@ config TEST_FREE_PAGES config TEST_FPU tristate "Test floating point operations in kernel space" - depends on X86 && !KCOV_INSTRUMENT_ALL + depends on ARCH_HAS_KERNEL_FPU_SUPPORT && !KCOV_INSTRUMENT_ALL help Enable this option to add /sys/kernel/debug/selftest_helpers/test_fpu which will trigger a sequence of floating point operations. This is used diff --git a/lib/Makefile b/lib/Makefile index fcb35bf50979..e44ad11f77b5 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -110,31 +110,10 @@ CFLAGS_test_fprobe.o += $(CC_FLAGS_FTRACE) obj-$(CONFIG_FPROBE_SANITY_TEST) += test_fprobe.o obj-$(CONFIG_TEST_OBJPOOL) += test_objpool.o -# -# CFLAGS for compiling floating point code inside the kernel. x86/Makefile turns -# off the generation of FPU/SSE* instructions for kernel proper but FPU_FLAGS -# get appended last to CFLAGS and thus override those previous compiler options. -# -FPU_CFLAGS := -msse -msse2 -ifdef CONFIG_CC_IS_GCC -# Stack alignment mismatch, proceed with caution. -# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3 -# (8B stack alignment). -# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383 -# -# The "-msse" in the first argument is there so that the -# -mpreferred-stack-boundary=3 build error: -# -# -mpreferred-stack-boundary=3 is not between 4 and 12 -# -# can be triggered. Otherwise gcc doesn't complain. -FPU_CFLAGS += -mhard-float -FPU_CFLAGS += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4) -endif - obj-$(CONFIG_TEST_FPU) += test_fpu.o test_fpu-y := test_fpu_glue.o test_fpu_impl.o -CFLAGS_test_fpu_impl.o += $(FPU_CFLAGS) +CFLAGS_test_fpu_impl.o += $(CC_FLAGS_FPU) +CFLAGS_REMOVE_test_fpu_impl.o += $(CC_FLAGS_NO_FPU) # Some KUnit files (hooks.o) need to be built-in even when KUnit is a module, # so we can't just use obj-$(CONFIG_KUNIT). diff --git a/lib/test_fpu_glue.c b/lib/test_fpu_glue.c index 85963d7be826..eef282a2715f 100644 --- a/lib/test_fpu_glue.c +++ b/lib/test_fpu_glue.c @@ -17,7 +17,7 @@ #include #include #include -#include +#include #include "test_fpu.h" @@ -38,6 +38,9 @@ static struct dentry *selftest_dir; static int __init test_fpu_init(void) { + if (!kernel_fpu_available()) + return -EINVAL; + selftest_dir = debugfs_create_dir("selftest_helpers", NULL); if (!selftest_dir) return -ENOMEM; -- 2.44.0
[PATCH v4 14/15] selftests/fpu: Move FP code to a separate translation unit
This ensures no compiler-generated floating-point code can appear outside kernel_fpu_{begin,end}() sections, and some architectures enforce this separation. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - Declare test_fpu() in a header lib/Makefile| 3 ++- lib/test_fpu.h | 8 +++ lib/{test_fpu.c => test_fpu_glue.c} | 32 + lib/test_fpu_impl.c | 37 + 4 files changed, 48 insertions(+), 32 deletions(-) create mode 100644 lib/test_fpu.h rename lib/{test_fpu.c => test_fpu_glue.c} (71%) create mode 100644 lib/test_fpu_impl.c diff --git a/lib/Makefile b/lib/Makefile index ffc6b2341b45..fcb35bf50979 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -133,7 +133,8 @@ FPU_CFLAGS += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-st endif obj-$(CONFIG_TEST_FPU) += test_fpu.o -CFLAGS_test_fpu.o += $(FPU_CFLAGS) +test_fpu-y := test_fpu_glue.o test_fpu_impl.o +CFLAGS_test_fpu_impl.o += $(FPU_CFLAGS) # Some KUnit files (hooks.o) need to be built-in even when KUnit is a module, # so we can't just use obj-$(CONFIG_KUNIT). diff --git a/lib/test_fpu.h b/lib/test_fpu.h new file mode 100644 index ..4459807084bc --- /dev/null +++ b/lib/test_fpu.h @@ -0,0 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ + +#ifndef _LIB_TEST_FPU_H +#define _LIB_TEST_FPU_H + +int test_fpu(void); + +#endif diff --git a/lib/test_fpu.c b/lib/test_fpu_glue.c similarity index 71% rename from lib/test_fpu.c rename to lib/test_fpu_glue.c index e82db19fed84..85963d7be826 100644 --- a/lib/test_fpu.c +++ b/lib/test_fpu_glue.c @@ -19,37 +19,7 @@ #include #include -static int test_fpu(void) -{ - /* -* This sequence of operations tests that rounding mode is -* to nearest and that denormal numbers are supported. -* Volatile variables are used to avoid compiler optimizing -* the calculations away. -*/ - volatile double a, b, c, d, e, f, g; - - a = 4.0; - b = 1e-15; - c = 1e-310; - - /* Sets precision flag */ - d = a + b; - - /* Result depends on rounding mode */ - e = a + b / 2; - - /* Denormal and very large values */ - f = b / c; - - /* Depends on denormal support */ - g = a + c * f; - - if (d > a && e > a && g > a) - return 0; - else - return -EINVAL; -} +#include "test_fpu.h" static int test_fpu_get(void *data, u64 *val) { diff --git a/lib/test_fpu_impl.c b/lib/test_fpu_impl.c new file mode 100644 index ..777894dbbe86 --- /dev/null +++ b/lib/test_fpu_impl.c @@ -0,0 +1,37 @@ +// SPDX-License-Identifier: GPL-2.0+ + +#include + +#include "test_fpu.h" + +int test_fpu(void) +{ + /* +* This sequence of operations tests that rounding mode is +* to nearest and that denormal numbers are supported. +* Volatile variables are used to avoid compiler optimizing +* the calculations away. +*/ + volatile double a, b, c, d, e, f, g; + + a = 4.0; + b = 1e-15; + c = 1e-310; + + /* Sets precision flag */ + d = a + b; + + /* Result depends on rounding mode */ + e = a + b / 2; + + /* Denormal and very large values */ + f = b / c; + + /* Depends on denormal support */ + g = a + c * f; + + if (d > a && e > a && g > a) + return 0; + else + return -EINVAL; +} -- 2.44.0
[PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
Now that all previously-supported architectures select ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead of the existing list of architectures. It can also take advantage of the common kernel-mode FPU API and method of adjusting CFLAGS. Acked-by: Alex Deucher Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - Split altivec removal to a separate patch - Use linux/fpu.h instead of asm/fpu.h in consumers drivers/gpu/drm/amd/display/Kconfig | 2 +- .../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c| 27 ++ drivers/gpu/drm/amd/display/dc/dml/Makefile | 36 ++- drivers/gpu/drm/amd/display/dc/dml2/Makefile | 36 ++- 4 files changed, 7 insertions(+), 94 deletions(-) diff --git a/drivers/gpu/drm/amd/display/Kconfig b/drivers/gpu/drm/amd/display/Kconfig index 901d1961b739..5fcd4f778dc3 100644 --- a/drivers/gpu/drm/amd/display/Kconfig +++ b/drivers/gpu/drm/amd/display/Kconfig @@ -8,7 +8,7 @@ config DRM_AMD_DC depends on BROKEN || !CC_IS_CLANG || ARM64 || RISCV || SPARC64 || X86_64 select SND_HDA_COMPONENT if SND_HDA_CORE # !CC_IS_CLANG: https://github.com/ClangBuiltLinux/linux/issues/1752 - select DRM_AMD_DC_FP if (X86 || LOONGARCH || (PPC64 && ALTIVEC) || (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG)) + select DRM_AMD_DC_FP if ARCH_HAS_KERNEL_FPU_SUPPORT && (!ARM64 || !CC_IS_CLANG) help Choose this option if you want to use the new display engine support for AMDGPU. This adds required support for Vega and diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c index 0de16796466b..e46f8ce41d87 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c @@ -26,16 +26,7 @@ #include "dc_trace.h" -#if defined(CONFIG_X86) -#include -#elif defined(CONFIG_PPC64) -#include -#include -#elif defined(CONFIG_ARM64) -#include -#elif defined(CONFIG_LOONGARCH) -#include -#endif +#include /** * DOC: DC FPU manipulation overview @@ -87,16 +78,9 @@ void dc_fpu_begin(const char *function_name, const int line) WARN_ON_ONCE(!in_task()); preempt_disable(); depth = __this_cpu_inc_return(fpu_recursion_depth); - if (depth == 1) { -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) + BUG_ON(!kernel_fpu_available()); kernel_fpu_begin(); -#elif defined(CONFIG_PPC64) - if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) - enable_kernel_fp(); -#elif defined(CONFIG_ARM64) - kernel_neon_begin(); -#endif } TRACE_DCN_FPU(true, function_name, line, depth); @@ -118,14 +102,7 @@ void dc_fpu_end(const char *function_name, const int line) depth = __this_cpu_dec_return(fpu_recursion_depth); if (depth == 0) { -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) kernel_fpu_end(); -#elif defined(CONFIG_PPC64) - if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) - disable_kernel_fp(); -#elif defined(CONFIG_ARM64) - kernel_neon_end(); -#endif } else { WARN_ON_ONCE(depth < 0); } diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile index 59d3972341d2..a94b6d546cd1 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile @@ -25,40 +25,8 @@ # It provides the general basic services required by other DAL # subcomponents. -ifdef CONFIG_X86 -dml_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float -dml_ccflags := $(dml_ccflags-y) -msse -endif - -ifdef CONFIG_PPC64 -dml_ccflags := -mhard-float -endif - -ifdef CONFIG_ARM64 -dml_rcflags := -mgeneral-regs-only -endif - -ifdef CONFIG_LOONGARCH -dml_ccflags := -mfpu=64 -dml_rcflags := -msoft-float -endif - -ifdef CONFIG_CC_IS_GCC -ifneq ($(call gcc-min-version, 70100),y) -IS_OLD_GCC = 1 -endif -endif - -ifdef CONFIG_X86 -ifdef IS_OLD_GCC -# Stack alignment mismatch, proceed with caution. -# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3 -# (8B stack alignment). -dml_ccflags += -mpreferred-stack-boundary=4 -else -dml_ccflags += -msse2 -endif -endif +dml_ccflags := $(CC_FLAGS_FPU) +dml_rcflags := $(CC_FLAGS_NO_FPU) ifneq ($(CONFIG_FRAME_WARN),0) ifeq ($(filter y,$(CONFIG_KASAN)$(CONFIG_KCSAN)),y) diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile index 7b51364084b5..4f6c804a26ad 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile @@ -24,40 +24,8 @@ # # Makefile for dml2. -ifdef CONFIG_X86 -dml2_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float -dml2_ccflags := $(dml2_ccflags-y) -msse -endif - -ifdef CONFIG_PPC64 -dml2_ccflags :=
[PATCH v4 12/15] drm/amd/display: Only use hard-float, not altivec on powerpc
From: Michael Ellerman The compiler flags enable altivec, but that is not required; hard-float is sufficient for the code to build and function. Drop altivec from the compiler flags and adjust the enable/disable code to only enable FPU use. Signed-off-by: Michael Ellerman Acked-by: Alex Deucher Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - New patch for v2 drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 12 ++-- drivers/gpu/drm/amd/display/dc/dml/Makefile| 2 +- drivers/gpu/drm/amd/display/dc/dml2/Makefile | 2 +- 3 files changed, 4 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c index 4ae4720535a5..0de16796466b 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c @@ -92,11 +92,7 @@ void dc_fpu_begin(const char *function_name, const int line) #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) kernel_fpu_begin(); #elif defined(CONFIG_PPC64) - if (cpu_has_feature(CPU_FTR_VSX_COMP)) - enable_kernel_vsx(); - else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) - enable_kernel_altivec(); - else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) + if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) enable_kernel_fp(); #elif defined(CONFIG_ARM64) kernel_neon_begin(); @@ -125,11 +121,7 @@ void dc_fpu_end(const char *function_name, const int line) #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) kernel_fpu_end(); #elif defined(CONFIG_PPC64) - if (cpu_has_feature(CPU_FTR_VSX_COMP)) - disable_kernel_vsx(); - else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) - disable_kernel_altivec(); - else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) + if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) disable_kernel_fp(); #elif defined(CONFIG_ARM64) kernel_neon_end(); diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile index c4a5efd2dda5..59d3972341d2 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile @@ -31,7 +31,7 @@ dml_ccflags := $(dml_ccflags-y) -msse endif ifdef CONFIG_PPC64 -dml_ccflags := -mhard-float -maltivec +dml_ccflags := -mhard-float endif ifdef CONFIG_ARM64 diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile index acff3449b8d7..7b51364084b5 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile @@ -30,7 +30,7 @@ dml2_ccflags := $(dml2_ccflags-y) -msse endif ifdef CONFIG_PPC64 -dml2_ccflags := -mhard-float -maltivec +dml2_ccflags := -mhard-float endif ifdef CONFIG_ARM64 -- 2.44.0
[PATCH v4 11/15] riscv: Add support for kernel-mode FPU
This is motivated by the amdgpu DRM driver, which needs floating-point code to support recent hardware. That code is not performance-critical, so only provide a minimal non-preemptible implementation for now. Support is limited to riscv64 because riscv32 requires runtime (libgcc) assistance to convert between doubles and 64-bit integers. Acked-by: Palmer Dabbelt Reviewed-by: Palmer Dabbelt Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v3) Changes in v3: - Limit riscv ARCH_HAS_KERNEL_FPU_SUPPORT to 64BIT Changes in v2: - Remove RISC-V architecture-specific preprocessor check arch/riscv/Kconfig | 1 + arch/riscv/Makefile | 3 +++ arch/riscv/include/asm/fpu.h| 16 arch/riscv/kernel/Makefile | 1 + arch/riscv/kernel/kernel_mode_fpu.c | 28 5 files changed, 49 insertions(+) create mode 100644 arch/riscv/include/asm/fpu.h create mode 100644 arch/riscv/kernel/kernel_mode_fpu.c diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index be09c8836d56..3bcd0d250810 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -27,6 +27,7 @@ config RISCV select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_GIGANTIC_PAGE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if 64BIT && FPU select ARCH_HAS_MEMBARRIER_CALLBACKS select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_MMIOWB diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile index 252d63942f34..76ff4033c854 100644 --- a/arch/riscv/Makefile +++ b/arch/riscv/Makefile @@ -84,6 +84,9 @@ KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64i KBUILD_AFLAGS += -march=$(riscv-march-y) +# For C code built with floating-point support, exclude V but keep F and D. +CC_FLAGS_FPU := -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)([^v_]*)v?/\1\2/') + KBUILD_CFLAGS += -mno-save-restore KBUILD_CFLAGS += -DCONFIG_PAGE_OFFSET=$(CONFIG_PAGE_OFFSET) diff --git a/arch/riscv/include/asm/fpu.h b/arch/riscv/include/asm/fpu.h new file mode 100644 index ..91c04c244e12 --- /dev/null +++ b/arch/riscv/include/asm/fpu.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef _ASM_RISCV_FPU_H +#define _ASM_RISCV_FPU_H + +#include + +#define kernel_fpu_available() has_fpu() + +void kernel_fpu_begin(void); +void kernel_fpu_end(void); + +#endif /* ! _ASM_RISCV_FPU_H */ diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index 81d94a8ee10f..5b243d46f4b1 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -67,6 +67,7 @@ obj-$(CONFIG_RISCV_MISALIGNED)+= unaligned_access_speed.o obj-$(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS) += copy-unaligned.o obj-$(CONFIG_FPU) += fpu.o +obj-$(CONFIG_FPU) += kernel_mode_fpu.o obj-$(CONFIG_RISCV_ISA_V) += vector.o obj-$(CONFIG_RISCV_ISA_V) += kernel_mode_vector.o obj-$(CONFIG_SMP) += smpboot.o diff --git a/arch/riscv/kernel/kernel_mode_fpu.c b/arch/riscv/kernel/kernel_mode_fpu.c new file mode 100644 index ..0ac8348876c4 --- /dev/null +++ b/arch/riscv/kernel/kernel_mode_fpu.c @@ -0,0 +1,28 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2023 SiFive + */ + +#include +#include + +#include +#include +#include +#include + +void kernel_fpu_begin(void) +{ + preempt_disable(); + fstate_save(current, task_pt_regs(current)); + csr_set(CSR_SSTATUS, SR_FS); +} +EXPORT_SYMBOL_GPL(kernel_fpu_begin); + +void kernel_fpu_end(void) +{ + csr_clear(CSR_SSTATUS, SR_FS); + fstate_restore(current, task_pt_regs(current)); + preempt_enable(); +} +EXPORT_SYMBOL_GPL(kernel_fpu_end); -- 2.44.0
[PATCH v4 10/15] x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
x86 already provides kernel_fpu_begin() and kernel_fpu_end(), but in a different header. Add a wrapper header, and export the CFLAGS adjustments as found in lib/Makefile. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) arch/x86/Kconfig | 1 + arch/x86/Makefile | 20 arch/x86/include/asm/fpu.h | 13 + 3 files changed, 34 insertions(+) create mode 100644 arch/x86/include/asm/fpu.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 39886bab943a..7c9d032ee675 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -83,6 +83,7 @@ config X86 select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_KCOVif X86_64 + select ARCH_HAS_KERNEL_FPU_SUPPORT select ARCH_HAS_MEM_ENCRYPT select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 662d9d4033e6..5a5f5999c505 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -74,6 +74,26 @@ KBUILD_CFLAGS += -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx KBUILD_RUSTFLAGS += --target=$(objtree)/scripts/target.json KBUILD_RUSTFLAGS += -Ctarget-feature=-sse,-sse2,-sse3,-ssse3,-sse4.1,-sse4.2,-avx,-avx2 +# +# CFLAGS for compiling floating point code inside the kernel. +# +CC_FLAGS_FPU := -msse -msse2 +ifdef CONFIG_CC_IS_GCC +# Stack alignment mismatch, proceed with caution. +# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3 +# (8B stack alignment). +# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383 +# +# The "-msse" in the first argument is there so that the +# -mpreferred-stack-boundary=3 build error: +# +# -mpreferred-stack-boundary=3 is not between 4 and 12 +# +# can be triggered. Otherwise gcc doesn't complain. +CC_FLAGS_FPU += -mhard-float +CC_FLAGS_FPU += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4) +endif + ifeq ($(CONFIG_X86_KERNEL_IBT),y) # # Kernel IBT has S_CET.NOTRACK_EN=0, as such the compilers must not generate diff --git a/arch/x86/include/asm/fpu.h b/arch/x86/include/asm/fpu.h new file mode 100644 index ..b2743fe19339 --- /dev/null +++ b/arch/x86/include/asm/fpu.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef _ASM_X86_FPU_H +#define _ASM_X86_FPU_H + +#include + +#define kernel_fpu_available() true + +#endif /* ! _ASM_X86_FPU_H */ -- 2.44.0
[PATCH v4 09/15] x86/fpu: Fix asm/fpu/types.h include guard
The include guard should match the filename, or it will conflict with the newly-added asm/fpu.h. Signed-off-by: Samuel Holland --- Changes in v4: - New patch for v4 arch/x86/include/asm/fpu/types.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index ace9aa3b78a3..eb17f31b06d2 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -2,8 +2,8 @@ /* * FPU data structures: */ -#ifndef _ASM_X86_FPU_H -#define _ASM_X86_FPU_H +#ifndef _ASM_X86_FPU_TYPES_H +#define _ASM_X86_FPU_TYPES_H #include @@ -596,4 +596,4 @@ struct fpu_state_config { /* FPU state configuration information */ extern struct fpu_state_config fpu_kernel_cfg, fpu_user_cfg; -#endif /* _ASM_X86_FPU_H */ +#endif /* _ASM_X86_FPU_TYPES_H */ -- 2.44.0
[PATCH v4 08/15] powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
PowerPC provides an equivalent to the common kernel-mode FPU API, but in a different header and using different function names. The PowerPC API also requires a non-preemptible context. Add a wrapper header, and export the CFLAGS adjustments. Acked-by: Michael Ellerman (powerpc) Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) arch/powerpc/Kconfig | 1 + arch/powerpc/Makefile | 5 - arch/powerpc/include/asm/fpu.h | 28 3 files changed, 33 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/include/asm/fpu.h diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 1c4be3373686..c42a57b6839d 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -137,6 +137,7 @@ config PPC select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_HUGEPD if HUGETLB_PAGE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if PPC_FPU select ARCH_HAS_MEMBARRIER_CALLBACKS select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_MEMREMAP_COMPAT_ALIGN if PPC_64S_HASH_MMU diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 65261cbe5bfd..93d89f055b70 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -153,6 +153,9 @@ CFLAGS-$(CONFIG_PPC32) += $(call cc-option, $(MULTIPLEWORD)) CFLAGS-$(CONFIG_PPC32) += $(call cc-option,-mno-readonly-in-sdata) +CC_FLAGS_FPU := $(call cc-option,-mhard-float) +CC_FLAGS_NO_FPU:= $(call cc-option,-msoft-float) + ifdef CONFIG_FUNCTION_TRACER ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY KBUILD_CPPFLAGS+= -DCC_USING_PATCHABLE_FUNCTION_ENTRY @@ -174,7 +177,7 @@ asinstr := $(call as-instr,lis 9$(comma)foo@high,-DHAVE_AS_ATHIGH=1) KBUILD_CPPFLAGS+= -I $(srctree)/arch/powerpc $(asinstr) KBUILD_AFLAGS += $(AFLAGS-y) -KBUILD_CFLAGS += $(call cc-option,-msoft-float) +KBUILD_CFLAGS += $(CC_FLAGS_NO_FPU) KBUILD_CFLAGS += $(CFLAGS-y) CPP= $(CC) -E $(KBUILD_CFLAGS) diff --git a/arch/powerpc/include/asm/fpu.h b/arch/powerpc/include/asm/fpu.h new file mode 100644 index ..ca584e4bc40f --- /dev/null +++ b/arch/powerpc/include/asm/fpu.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef _ASM_POWERPC_FPU_H +#define _ASM_POWERPC_FPU_H + +#include + +#include +#include + +#define kernel_fpu_available() (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) + +static inline void kernel_fpu_begin(void) +{ + preempt_disable(); + enable_kernel_fp(); +} + +static inline void kernel_fpu_end(void) +{ + disable_kernel_fp(); + preempt_enable(); +} + +#endif /* ! _ASM_POWERPC_FPU_H */ -- 2.44.0
[PATCH v4 07/15] LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
LoongArch already provides kernel_fpu_begin() and kernel_fpu_end() in asm/fpu.h, so it only needs to add kernel_fpu_available() and export the CFLAGS adjustments. Acked-by: WANG Xuerui Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v3) Changes in v3: - Rebase on v6.9-rc1 arch/loongarch/Kconfig | 1 + arch/loongarch/Makefile | 5 - arch/loongarch/include/asm/fpu.h | 1 + 3 files changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index a5f300ec6f28..2266c6c41c38 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -18,6 +18,7 @@ config LOONGARCH select ARCH_HAS_CURRENT_STACK_POINTER select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if CPU_HAS_FPU select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PTE_SPECIAL diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile index df6caf79537a..efb5440a43ec 100644 --- a/arch/loongarch/Makefile +++ b/arch/loongarch/Makefile @@ -26,6 +26,9 @@ endif 32bit-emul = elf32loongarch 64bit-emul = elf64loongarch +CC_FLAGS_FPU := -mfpu=64 +CC_FLAGS_NO_FPU:= -msoft-float + ifdef CONFIG_UNWINDER_ORC orc_hash_h := arch/$(SRCARCH)/include/generated/asm/orc_hash.h orc_hash_sh := $(srctree)/scripts/orc_hash.sh @@ -59,7 +62,7 @@ ld-emul = $(64bit-emul) cflags-y += -mabi=lp64s endif -cflags-y += -pipe -msoft-float +cflags-y += -pipe $(CC_FLAGS_NO_FPU) LDFLAGS_vmlinux+= -static -n -nostdlib # When the assembler supports explicit relocation hint, we must use it. diff --git a/arch/loongarch/include/asm/fpu.h b/arch/loongarch/include/asm/fpu.h index c2d8962fda00..3177674228f8 100644 --- a/arch/loongarch/include/asm/fpu.h +++ b/arch/loongarch/include/asm/fpu.h @@ -21,6 +21,7 @@ struct sigcontext; +#define kernel_fpu_available() cpu_has_fpu extern void kernel_fpu_begin(void); extern void kernel_fpu_end(void); -- 2.44.0
[PATCH v4 06/15] lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source tree, use it instead of duplicating the flags here. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- Changes in v4: - Add missed CFLAGS changes for recov_neon_inner.c (fixes arm build failures) lib/raid6/Makefile | 33 ++--- 1 file changed, 10 insertions(+), 23 deletions(-) diff --git a/lib/raid6/Makefile b/lib/raid6/Makefile index 385a94aa0b99..0e88bfe6445b 100644 --- a/lib/raid6/Makefile +++ b/lib/raid6/Makefile @@ -33,25 +33,6 @@ CFLAGS_REMOVE_vpermxor8.o += -msoft-float endif endif -# The GCC option -ffreestanding is required in order to compile code containing -# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel) -ifeq ($(CONFIG_KERNEL_MODE_NEON),y) -NEON_FLAGS := -ffreestanding -# Enable -NEON_FLAGS += -isystem $(shell $(CC) -print-file-name=include) -ifeq ($(ARCH),arm) -NEON_FLAGS += -march=armv7-a -mfloat-abi=softfp -mfpu=neon -endif -CFLAGS_recov_neon_inner.o += $(NEON_FLAGS) -ifeq ($(ARCH),arm64) -CFLAGS_REMOVE_recov_neon_inner.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon1.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon2.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon4.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon8.o += -mgeneral-regs-only -endif -endif - quiet_cmd_unroll = UNROLL $@ cmd_unroll = $(AWK) -v N=$* -f $(srctree)/$(src)/unroll.awk < $< > $@ @@ -75,10 +56,16 @@ targets += vpermxor1.c vpermxor2.c vpermxor4.c vpermxor8.c $(obj)/vpermxor%.c: $(src)/vpermxor.uc $(src)/unroll.awk FORCE $(call if_changed,unroll) -CFLAGS_neon1.o += $(NEON_FLAGS) -CFLAGS_neon2.o += $(NEON_FLAGS) -CFLAGS_neon4.o += $(NEON_FLAGS) -CFLAGS_neon8.o += $(NEON_FLAGS) +CFLAGS_neon1.o += $(CC_FLAGS_FPU) +CFLAGS_neon2.o += $(CC_FLAGS_FPU) +CFLAGS_neon4.o += $(CC_FLAGS_FPU) +CFLAGS_neon8.o += $(CC_FLAGS_FPU) +CFLAGS_recov_neon_inner.o += $(CC_FLAGS_FPU) +CFLAGS_REMOVE_neon1.o += $(CC_FLAGS_NO_FPU) +CFLAGS_REMOVE_neon2.o += $(CC_FLAGS_NO_FPU) +CFLAGS_REMOVE_neon4.o += $(CC_FLAGS_NO_FPU) +CFLAGS_REMOVE_neon8.o += $(CC_FLAGS_NO_FPU) +CFLAGS_REMOVE_recov_neon_inner.o += $(CC_FLAGS_NO_FPU) targets += neon1.c neon2.c neon4.c neon8.c $(obj)/neon%.c: $(src)/neon.uc $(src)/unroll.awk FORCE $(call if_changed,unroll) -- 2.44.0
[PATCH v4 05/15] arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source tree, use it instead of duplicating the flags here. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - New patch for v2 arch/arm64/lib/Makefile | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile index 29490be2546b..13e6a2829116 100644 --- a/arch/arm64/lib/Makefile +++ b/arch/arm64/lib/Makefile @@ -7,10 +7,8 @@ lib-y := clear_user.o delay.o copy_from_user.o \ ifeq ($(CONFIG_KERNEL_MODE_NEON), y) obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o -CFLAGS_REMOVE_xor-neon.o += -mgeneral-regs-only -CFLAGS_xor-neon.o += -ffreestanding -# Enable -CFLAGS_xor-neon.o += -isystem $(shell $(CC) -print-file-name=include) +CFLAGS_xor-neon.o += $(CC_FLAGS_FPU) +CFLAGS_REMOVE_xor-neon.o += $(CC_FLAGS_NO_FPU) endif lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o -- 2.44.0
[PATCH v4 04/15] arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
arm64 provides an equivalent to the common kernel-mode FPU API, but in a different header and using different function names. Add a wrapper header, and export CFLAGS adjustments as found in lib/raid6/Makefile. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - Remove file name from header comment arch/arm64/Kconfig | 1 + arch/arm64/Makefile | 9 - arch/arm64/include/asm/fpu.h | 15 +++ 3 files changed, 24 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/include/asm/fpu.h diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 7b11c98b3e84..67f0d3b5b7df 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -30,6 +30,7 @@ config ARM64 select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_GIGANTIC_PAGE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if KERNEL_MODE_NEON select ARCH_HAS_KEEPINITRD select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 0e075d3c546b..3e863e5b0169 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -36,7 +36,14 @@ ifeq ($(CONFIG_BROKEN_GAS_INST),y) $(warning Detected assembler with broken .inst; disassembly will be unreliable) endif -KBUILD_CFLAGS += -mgeneral-regs-only \ +# The GCC option -ffreestanding is required in order to compile code containing +# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel) +CC_FLAGS_FPU := -ffreestanding +# Enable +CC_FLAGS_FPU += -isystem $(shell $(CC) -print-file-name=include) +CC_FLAGS_NO_FPU:= -mgeneral-regs-only + +KBUILD_CFLAGS += $(CC_FLAGS_NO_FPU) \ $(compat_vdso) $(cc_has_k_constraint) KBUILD_CFLAGS += $(call cc-disable-warning, psabi) KBUILD_AFLAGS += $(compat_vdso) diff --git a/arch/arm64/include/asm/fpu.h b/arch/arm64/include/asm/fpu.h new file mode 100644 index ..2ae50bdce59b --- /dev/null +++ b/arch/arm64/include/asm/fpu.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef __ASM_FPU_H +#define __ASM_FPU_H + +#include + +#define kernel_fpu_available() cpu_has_neon() +#define kernel_fpu_begin() kernel_neon_begin() +#define kernel_fpu_end() kernel_neon_end() + +#endif /* ! __ASM_FPU_H */ -- 2.44.0
[PATCH v4 03/15] ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source tree, use it instead of duplicating the flags here. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) arch/arm/lib/Makefile | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile index 650404be6768..0ca5aae1bcc3 100644 --- a/arch/arm/lib/Makefile +++ b/arch/arm/lib/Makefile @@ -40,8 +40,7 @@ $(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S $(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S ifeq ($(CONFIG_KERNEL_MODE_NEON),y) - NEON_FLAGS := -march=armv7-a -mfloat-abi=softfp -mfpu=neon - CFLAGS_xor-neon.o+= $(NEON_FLAGS) + CFLAGS_xor-neon.o+= $(CC_FLAGS_FPU) obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o endif -- 2.44.0
[PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API
This series unifies the kernel-mode FPU API across several architectures by wrapping the existing functions (where needed) in consistently-named functions placed in a consistent header location, with mostly the same semantics: they can be called from preemptible or non-preemptible task context, and are not assumed to be reentrant. Architectures are also expected to provide CFLAGS adjustments for compiling FPU-dependent code. For the moment, SIMD/vector units are out of scope for this common API. This allows us to remove the ifdeffery and duplicated Makefile logic at each FPU user. It then implements the common API on RISC-V, and converts a couple of users to the new API: the AMDGPU DRM driver, and the FPU self test. The underlying goal of this series is to allow using newer AMD GPUs (e.g. Navi) on RISC-V boards such as SiFive's HiFive Unmatched. Those GPUs need CONFIG_DRM_AMD_DC_FP to initialize, which requires kernel-mode FPU support. Previous versions: v3: https://lore.kernel.org/linux-kernel/20240327200157.1097089-1-samuel.holl...@sifive.com/ v2: https://lore.kernel.org/linux-kernel/20231228014220.3562640-1-samuel.holl...@sifive.com/ v1: https://lore.kernel.org/linux-kernel/20231208055501.2916202-1-samuel.holl...@sifive.com/ v0: https://lore.kernel.org/linux-kernel/20231122030621.3759313-1-samuel.holl...@sifive.com/ Changes in v4: - Add missed CFLAGS changes for recov_neon_inner.c (fixes arm build failures) - Fix x86 include guard issue (fixes x86 build failures) Changes in v3: - Rebase on v6.9-rc1 - Limit riscv ARCH_HAS_KERNEL_FPU_SUPPORT to 64BIT Changes in v2: - Add documentation explaining the built-time and runtime APIs - Add a linux/fpu.h header for generic isolation enforcement - Remove file name from header comment - Clean up arch/arm64/lib/Makefile, like for arch/arm - Remove RISC-V architecture-specific preprocessor check - Split altivec removal to a separate patch - Use linux/fpu.h instead of asm/fpu.h in consumers - Declare test_fpu() in a header Michael Ellerman (1): drm/amd/display: Only use hard-float, not altivec on powerpc Samuel Holland (14): arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT x86/fpu: Fix asm/fpu/types.h include guard x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT riscv: Add support for kernel-mode FPU drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT selftests/fpu: Move FP code to a separate translation unit selftests/fpu: Allow building on other architectures Documentation/core-api/floating-point.rst | 78 +++ Documentation/core-api/index.rst | 1 + Makefile | 5 ++ arch/Kconfig | 6 ++ arch/arm/Kconfig | 1 + arch/arm/Makefile | 7 ++ arch/arm/include/asm/fpu.h| 15 arch/arm/lib/Makefile | 3 +- arch/arm64/Kconfig| 1 + arch/arm64/Makefile | 9 ++- arch/arm64/include/asm/fpu.h | 15 arch/arm64/lib/Makefile | 6 +- arch/loongarch/Kconfig| 1 + arch/loongarch/Makefile | 5 +- arch/loongarch/include/asm/fpu.h | 1 + arch/powerpc/Kconfig | 1 + arch/powerpc/Makefile | 5 +- arch/powerpc/include/asm/fpu.h| 28 +++ arch/riscv/Kconfig| 1 + arch/riscv/Makefile | 3 + arch/riscv/include/asm/fpu.h | 16 arch/riscv/kernel/Makefile| 1 + arch/riscv/kernel/kernel_mode_fpu.c | 28 +++ arch/x86/Kconfig | 1 + arch/x86/Makefile | 20 + arch/x86/include/asm/fpu.h| 13 arch/x86/include/asm/fpu/types.h | 6 +- drivers/gpu/drm/amd/display/Kconfig | 2 +- .../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c| 35 + drivers/gpu/drm/amd/display/dc/dml/Makefile | 36 + drivers/gpu/drm/amd/display/dc/dml2/Makefile | 36 + include/linux/fpu.h | 12 +++ lib/Kconfig.debug | 2 +- lib/Makefile | 26 +-- lib/raid6/Makefile| 33 +++- lib/test_fpu.h| 8 ++ lib/{test_fpu.c => test_fpu_glue.c} | 37 ++--- lib/test_fpu_impl.c | 37 + 38 files changed, 348
[PATCH v4 02/15] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
ARM provides an equivalent to the common kernel-mode FPU API, but in a different header and using different function names. Add a wrapper header, and export CFLAGS adjustments as found in lib/raid6/Makefile. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - Remove file name from header comment arch/arm/Kconfig | 1 + arch/arm/Makefile | 7 +++ arch/arm/include/asm/fpu.h | 15 +++ 3 files changed, 23 insertions(+) create mode 100644 arch/arm/include/asm/fpu.h diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index b14aed3a17ab..b1751c2cab87 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -15,6 +15,7 @@ config ARM select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_KEEPINITRD select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if KERNEL_MODE_NEON select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PTE_SPECIAL if ARM_LPAE diff --git a/arch/arm/Makefile b/arch/arm/Makefile index d82908b1b1bb..71afdd98ddf2 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -130,6 +130,13 @@ endif # Accept old syntax despite ".syntax unified" AFLAGS_NOWARN :=$(call as-option,-Wa$(comma)-mno-warn-deprecated,-Wa$(comma)-W) +# The GCC option -ffreestanding is required in order to compile code containing +# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel) +CC_FLAGS_FPU := -ffreestanding +# Enable +CC_FLAGS_FPU += -isystem $(shell $(CC) -print-file-name=include) +CC_FLAGS_FPU += -march=armv7-a -mfloat-abi=softfp -mfpu=neon + ifeq ($(CONFIG_THUMB2_KERNEL),y) CFLAGS_ISA :=-Wa,-mimplicit-it=always $(AFLAGS_NOWARN) AFLAGS_ISA :=$(CFLAGS_ISA) -Wa$(comma)-mthumb diff --git a/arch/arm/include/asm/fpu.h b/arch/arm/include/asm/fpu.h new file mode 100644 index ..2ae50bdce59b --- /dev/null +++ b/arch/arm/include/asm/fpu.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef __ASM_FPU_H +#define __ASM_FPU_H + +#include + +#define kernel_fpu_available() cpu_has_neon() +#define kernel_fpu_begin() kernel_neon_begin() +#define kernel_fpu_end() kernel_neon_end() + +#endif /* ! __ASM_FPU_H */ -- 2.44.0
[PATCH v4 01/15] arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT
Several architectures provide an API to enable the FPU and run floating-point SIMD code in kernel space. However, the function names, header locations, and semantics are inconsistent across architectures, and FPU support may be gated behind other Kconfig options. Provide a standard way for architectures to declare that kernel space FPU support is available. Architectures selecting this option must implement what is currently the most common API (kernel_fpu_begin() and kernel_fpu_end(), plus a new function kernel_fpu_available()) and provide the appropriate CFLAGS for compiling floating-point C code. Suggested-by: Christoph Hellwig Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - Add documentation explaining the built-time and runtime APIs - Add a linux/fpu.h header for generic isolation enforcement Documentation/core-api/floating-point.rst | 78 +++ Documentation/core-api/index.rst | 1 + Makefile | 5 ++ arch/Kconfig | 6 ++ include/linux/fpu.h | 12 5 files changed, 102 insertions(+) create mode 100644 Documentation/core-api/floating-point.rst create mode 100644 include/linux/fpu.h diff --git a/Documentation/core-api/floating-point.rst b/Documentation/core-api/floating-point.rst new file mode 100644 index ..a8d0d4b05052 --- /dev/null +++ b/Documentation/core-api/floating-point.rst @@ -0,0 +1,78 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Floating-point API +== + +Kernel code is normally prohibited from using floating-point (FP) registers or +instructions, including the C float and double data types. This rule reduces +system call overhead, because the kernel does not need to save and restore the +userspace floating-point register state. + +However, occasionally drivers or library functions may need to include FP code. +This is supported by isolating the functions containing FP code to a separate +translation unit (a separate source file), and saving/restoring the FP register +state around calls to those functions. This creates "critical sections" of +floating-point usage. + +The reason for this isolation is to prevent the compiler from generating code +touching the FP registers outside these critical sections. Compilers sometimes +use FP registers to optimize inlined ``memcpy`` or variable assignment, as +floating-point registers may be wider than general-purpose registers. + +Usability of floating-point code within the kernel is architecture-specific. +Additionally, because a single kernel may be configured to support platforms +both with and without a floating-point unit, FPU availability must be checked +both at build time and at run time. + +Several architectures implement the generic kernel floating-point API from +``linux/fpu.h``, as described below. Some other architectures implement their +own unique APIs, which are documented separately. + +Build-time API +-- + +Floating-point code may be built if the option ``ARCH_HAS_KERNEL_FPU_SUPPORT`` +is enabled. For C code, such code must be placed in a separate file, and that +file must have its compilation flags adjusted using the following pattern:: + +CFLAGS_foo.o += $(CC_FLAGS_FPU) +CFLAGS_REMOVE_foo.o += $(CC_FLAGS_NO_FPU) + +Architectures are expected to define one or both of these variables in their +top-level Makefile as needed. For example:: + +CC_FLAGS_FPU := -mhard-float + +or:: + +CC_FLAGS_NO_FPU := -msoft-float + +Normal kernel code is assumed to use the equivalent of ``CC_FLAGS_NO_FPU``. + +Runtime API +--- + +The runtime API is provided in ``linux/fpu.h``. This header cannot be included +from files implementing FP code (those with their compilation flags adjusted as +above). Instead, it must be included when defining the FP critical sections. + +.. c:function:: bool kernel_fpu_available( void ) + +This function reports if floating-point code can be used on this CPU or +platform. The value returned by this function is not expected to change +at runtime, so it only needs to be called once, not before every +critical section. + +.. c:function:: void kernel_fpu_begin( void ) +void kernel_fpu_end( void ) + +These functions create a floating-point critical section. It is only +valid to call ``kernel_fpu_begin()`` after a previous call to +``kernel_fpu_available()`` returned ``true``. These functions are only +guaranteed to be callable from (preemptible or non-preemptible) process +context. + +Preemption may be disabled inside critical sections, so their size +should be minimized. They are *not* required to be reentrant. If the +caller expects to nest critical sections, it must implement its own +reference counting. diff --git a/Documentation/core-api/index.rst