Re: [RFC PATCH 06/12] LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
On 12/8/23 13:54, Samuel Holland wrote: LoongArch already provides kernel_fpu_begin() and kernel_fpu_end() in asm/fpu.h, so it only needs to add kernel_fpu_available() and export the CFLAGS adjustments. Signed-off-by: Samuel Holland --- arch/loongarch/Kconfig | 1 + arch/loongarch/Makefile | 5 - arch/loongarch/include/asm/fpu.h | 1 + 3 files changed, 6 insertions(+), 1 deletion(-) This is all intuitive wrapping, so: Acked-by: WANG Xuerui Thanks! -- WANG "xen0n" Xuerui Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/
Re: [V3] drm/amdgpu/display: Enable DC_FP for LoongArch
On 2023/5/9 00:34, Sui Jingfeng wrote: I have tested glmark2 on ls3a5000 with this patch applied, I have also bought a better gpu (vega 56), which is on the way. currently only have a rx550 at hand. I pasted the performance score here, how about this score? Does this looks normal? [snip of frame-rates mostly in the 7000s for the best cases] This is irrelevant, because RX 550 isn't DCN so the code path being modified here doesn't get executed at all. Though, the results look similar to what I've seen on my setup (3A5000 + LS7A1000 + RX 6400), presumably because the write-combining optimization cannot be used with current LS7A systems, meaning the system is bottlenecked by all the MMIO's. I also see best case frame-rates in the 7000s range. -- WANG "xen0n" Xuerui Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/
Re: [PATCH V2] drm/amdgpu/display: Enable DC_FP for LoongArch
On 2023/5/5 19:32, Huacai Chen wrote: Now LoongArch provides kernel_fpu_begin() and kernel_fpu_end() in commit 2b3bd32ea3a22ea2d ("LoongArch: Provide kernel fpu functions"), so we can enable DC_FP for DCN devices. Some grammatical fixes and paraphrasing: "LoongArch now provides kernel_fpu_{begin,end} that are used like the x86 counterparts in commit 2b3bd32ea3a22ea2d ("LoongArch: Provide kernel fpu functions"), so we can now implement DRM_AMD_DC_FP on LoongArch for supporting more DCN devices." Signed-off-by: WANG Xuerui Signed-off-by: Huacai Chen I just finished my tests according to the link above and all seems fine. * Board: A2101 (Loongson 3A5000 with LS7A1000 bridge) - with the firmware provided at [1] * GPU: RX 6400 (PowerColor ITX RX6400 4GB GDDR6) * Display: Dell P2317H (connected via DisplayPort) * Kernel: next-20230505 with this patch (with the conflict resolved) * Sysroot: up-to-date Gentoo/LoongArch I've tested: * Desktop sessions: Xfce4, Plasma Wayland * Hot-plugging - at tty, at sddm, inside Plasma Wayland session, multiple times each * Changing resolutions * kms_flip tests: every non-skipped case passed (I can't test dual-monitor right now) [1]: https://github.com/loongson/Firmware/tree/main/5000Series/PC/A2101 Hence it's: Tested-by: WANG Xuerui --- V2: Update commit message to add the commit which provides kernel fpu functions. drivers/gpu/drm/amd/display/Kconfig| 2 +- drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 6 -- drivers/gpu/drm/amd/display/dc/dml/Makefile| 5 + 3 files changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/display/Kconfig b/drivers/gpu/drm/amd/display/Kconfig index 2d8e55e29637..49df073962d5 100644 --- a/drivers/gpu/drm/amd/display/Kconfig +++ b/drivers/gpu/drm/amd/display/Kconfig @@ -8,7 +8,7 @@ config DRM_AMD_DC depends on BROKEN || !CC_IS_CLANG || X86_64 || SPARC64 || ARM64 select SND_HDA_COMPONENT if SND_HDA_CORE # !CC_IS_CLANG: https://github.com/ClangBuiltLinux/linux/issues/1752 - select DRM_AMD_DC_FP if (X86 || (PPC64 && ALTIVEC) || (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG)) + select DRM_AMD_DC_FP if (X86 || LOONGARCH || (PPC64 && ALTIVEC) || (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG)) help Choose this option if you want to use the new display engine support for AMDGPU. This adds required support for Vega and diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c index 1743ca0a3641..86f4c0e04654 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c @@ -33,6 +33,8 @@ #include #elif defined(CONFIG_ARM64) #include +#elif defined(CONFIG_LOONGARCH) +#include #endif /** @@ -88,7 +90,7 @@ void dc_fpu_begin(const char *function_name, const int line) *pcpu += 1; if (*pcpu == 1) { -#if defined(CONFIG_X86) +#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) kernel_fpu_begin(); And with the conflict here with linux-next resolved then we may be good to go. #elif defined(CONFIG_PPC64) if (cpu_has_feature(CPU_FTR_VSX_COMP)) { @@ -127,7 +129,7 @@ void dc_fpu_end(const char *function_name, const int line) pcpu = get_cpu_ptr(&fpu_recursion_depth); *pcpu -= 1; if (*pcpu <= 0) { -#if defined(CONFIG_X86) +#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) kernel_fpu_end(); #elif defined(CONFIG_PPC64) if (cpu_has_feature(CPU_FTR_VSX_COMP)) { diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile index 01db035589c5..542962a93e8f 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile @@ -38,6 +38,11 @@ ifdef CONFIG_ARM64 dml_rcflags := -mgeneral-regs-only endif +ifdef CONFIG_LOONGARCH +dml_ccflags := -mfpu=64 +dml_rcflags := -msoft-float +endif + ifdef CONFIG_CC_IS_GCC ifneq ($(call gcc-min-version, 70100),y) IS_OLD_GCC = 1 -- WANG "xen0n" Xuerui Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/
Re: [PATCH V2] drm/amdgpu/display: Enable DC_FP for LoongArch
On 5/6/23 02:00, Alex Deucher wrote: On Fri, May 5, 2023 at 1:57 PM WANG Xuerui wrote: On a side note, I had to modprobe amdgpu with runpm=0, otherwise my dmesg gets flooded with PSP getting resumed every 8~10 seconds or so. I currently have none of the connectors plugged in. I didn't notice any similar reports on the Internet so I don't know if it's due to platform quirks or not. That might just be part of the normal suspend/resume process. If it happens at regular intervals, it sounds like something is waking the GPU at a regular interval. We should probably remove that message to avoid it being too chatty, but you may want to check what is waking it so much as doing so sort of negates the value of runtime power management. Ah. This is extremely helpful as I'm immediately able to confirm it's node_exporter trying to access the hwmon readings (I have a monitoring infra for all my devboxes). Every sufficiently spaced read from say temp1_input wakes up the GPU. Not many people have their boxes working like this I guess... but at least we could probably reduce the log spam a bit if it's not feasible to get GPU metrics while avoiding to resume it? (Currently it's 25 lines per resume, mostly SMU resume logs and ring info.) And of course this is not a big deal, I can always work around it locally. Thanks for the hint again. -- WANG "xen0n" Xuerui Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/
Re: [PATCH V2] drm/amdgpu/display: Enable DC_FP for LoongArch
Hi, On 5/5/23 21:39, Hamza Mahfooz wrote: Hey Huacai, On 5/5/23 07:32, Huacai Chen wrote: Now LoongArch provides kernel_fpu_begin() and kernel_fpu_end() in commit 2b3bd32ea3a22ea2d ("LoongArch: Provide kernel fpu functions"), so we can enable DC_FP for DCN devices. Have you had the chance to test how well this is working on actual hardware, or was it only compile tested? If it was only compile tested, it would be great if you could run some tests. Please see the following for more details: https://lore.kernel.org/amd-gfx/8eb69dfb-ae35-dbf2-3f82-e8cc00e53...@amd.com/ Thanks for the helpful link! I did test an earlier version of this patch along with the arch/loongarch kernel FPU bits before that patch got upstreamed, with a RX 6400 (BEIGE_GOBY) on a Loongson 3A5000 + LS7A1000 system (by far the most popular combination for LoongArch desktops). Things like Plasma Wayland session or glmark2 work just fine, although I didn't go for the more complete testing as detailed in the mail you linked to. I'll try going through that procedure in the next 1~2 days when I have time & get physical access to that box. On a side note, I had to modprobe amdgpu with runpm=0, otherwise my dmesg gets flooded with PSP getting resumed every 8~10 seconds or so. I currently have none of the connectors plugged in. I didn't notice any similar reports on the Internet so I don't know if it's due to platform quirks or not. Signed-off-by: WANG Xuerui Signed-off-by: Huacai Chen --- V2: Update commit message to add the commit which provides kernel fpu functions. drivers/gpu/drm/amd/display/Kconfig | 2 +- drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 6 -- drivers/gpu/drm/amd/display/dc/dml/Makefile | 5 + 3 files changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/display/Kconfig b/drivers/gpu/drm/amd/display/Kconfig index 2d8e55e29637..49df073962d5 100644 --- a/drivers/gpu/drm/amd/display/Kconfig +++ b/drivers/gpu/drm/amd/display/Kconfig @@ -8,7 +8,7 @@ config DRM_AMD_DC depends on BROKEN || !CC_IS_CLANG || X86_64 || SPARC64 || ARM64 select SND_HDA_COMPONENT if SND_HDA_CORE # !CC_IS_CLANG: https://github.com/ClangBuiltLinux/linux/issues/1752 - select DRM_AMD_DC_FP if (X86 || (PPC64 && ALTIVEC) || (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG)) + select DRM_AMD_DC_FP if (X86 || LOONGARCH || (PPC64 && ALTIVEC) || (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG)) help Choose this option if you want to use the new display engine support for AMDGPU. This adds required support for Vega and diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c index 1743ca0a3641..86f4c0e04654 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c @@ -33,6 +33,8 @@ #include #elif defined(CONFIG_ARM64) #include +#elif defined(CONFIG_LOONGARCH) +#include #endif /** @@ -88,7 +90,7 @@ void dc_fpu_begin(const char *function_name, const int line) *pcpu += 1; if (*pcpu == 1) { -#if defined(CONFIG_X86) +#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) kernel_fpu_begin(); This is going to conflict with commit b1bcdd409d2d ("drm/amd/display: Disable migration to ensure consistency of per-CPU variable"), which is present in next-20230505. Resolution is trivial though. #elif defined(CONFIG_PPC64) if (cpu_has_feature(CPU_FTR_VSX_COMP)) { @@ -127,7 +129,7 @@ void dc_fpu_end(const char *function_name, const int line) pcpu = get_cpu_ptr(&fpu_recursion_depth); *pcpu -= 1; if (*pcpu <= 0) { -#if defined(CONFIG_X86) +#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) kernel_fpu_end(); #elif defined(CONFIG_PPC64) if (cpu_has_feature(CPU_FTR_VSX_COMP)) { diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile index 01db035589c5..542962a93e8f 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile @@ -38,6 +38,11 @@ ifdef CONFIG_ARM64 dml_rcflags := -mgeneral-regs-only endif +ifdef CONFIG_LOONGARCH +dml_ccflags := -mfpu=64 +dml_rcflags := -msoft-float +endif + ifdef CONFIG_CC_IS_GCC ifneq ($(call gcc-min-version, 70100),y) IS_OLD_GCC = 1 -- WANG "xen0n" Xuerui Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/
Re: [PATCH] drm/amdgpu: Use uncached ioremap() for LoongArch
On 2023/3/6 10:49, Huacai Chen wrote: Hi, Christian, On Mon, Mar 6, 2023 at 12:40 AM Christian König wrote: Am 05.03.23 um 06:21 schrieb Huacai Chen: LoongArch maintains cache coherency in hardware, but its WUC attribute (Weak-ordered UnCached, which is similar to WC) is out of the scope of cache coherency machanism. This means WUC can only used for write-only memory regions. So use uncached ioremap() for LoongArch in the amdgpu driver. Well NAK. This is leaking platform dependencies into the driver code. Then is it acceptable to let ioremap() depend on drm_arch_can_wc_memory()? Note: he's likely meaning "is it acceptable to use drm_arch_can_wc_memory() to decide between ioremap() and ioremap_wc()". Although I doubt it's acceptable to you (driver) folks either, because while drm_arch_can_wc_memory() does isolate platform details from driver proper, it's still papering over platform PCIe violation in VRAM domain. Still better than having platform defines though. Also making use of drm_arch_can_wc_memory might fix this fdo issue [1] on aarch64 too (where I replied earlier). It seems people simply can't stop inventing such micro-architectures sadly... [1]: https://gitlab.freedesktop.org/drm/amd/-/issues/2313 When you have a limitation that ioremap_wc() can't guarantee read/write ordering then that's pretty clearly a platform bug and you would need to apply this workaround to all drivers using ioremap_wc() which isn't really feasible. I agree in this case perhaps all of ioremap_wc() usages would have to degrade into ioremap() for correctness on such platforms. In which case amdgpu wouldn't have to be individually called out / touched anyway. Whether this is easily doable/upstreamable is another question though... -- WANG "xen0n" Xuerui Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/