Re: [RFC PATCH 06/12] LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

2023-12-13 Thread WANG Xuerui

On 12/8/23 13:54, Samuel Holland wrote:

LoongArch already provides kernel_fpu_begin() and kernel_fpu_end() in
asm/fpu.h, so it only needs to add kernel_fpu_available() and export
the CFLAGS adjustments.

Signed-off-by: Samuel Holland 
---

  arch/loongarch/Kconfig   | 1 +
  arch/loongarch/Makefile  | 5 -
  arch/loongarch/include/asm/fpu.h | 1 +
  3 files changed, 6 insertions(+), 1 deletion(-)


This is all intuitive wrapping, so:

Acked-by: WANG Xuerui 

Thanks!

--
WANG "xen0n" Xuerui

Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/



Re: [V3] drm/amdgpu/display: Enable DC_FP for LoongArch

2023-05-09 Thread WANG Xuerui

On 2023/5/9 00:34, Sui Jingfeng wrote:

I have tested glmark2 on ls3a5000 with this patch applied,

I have also bought a better gpu (vega 56), which is on the way.

currently only have a rx550 at hand.

I pasted the performance score here, how about this score?

Does this looks normal?

[snip of frame-rates mostly in the 7000s for the best cases]


This is irrelevant, because RX 550 isn't DCN so the code path being 
modified here doesn't get executed at all.


Though, the results look similar to what I've seen on my setup (3A5000 + 
LS7A1000 + RX 6400), presumably because the write-combining optimization 
cannot be used with current LS7A systems, meaning the system is 
bottlenecked by all the MMIO's. I also see best case frame-rates in the 
7000s range.


--
WANG "xen0n" Xuerui

Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/



Re: [PATCH V2] drm/amdgpu/display: Enable DC_FP for LoongArch

2023-05-08 Thread WANG Xuerui

On 2023/5/5 19:32, Huacai Chen wrote:

Now LoongArch provides kernel_fpu_begin() and kernel_fpu_end() in commit
2b3bd32ea3a22ea2d ("LoongArch: Provide kernel fpu functions"), so we can
enable DC_FP for DCN devices.


Some grammatical fixes and paraphrasing:

"LoongArch now provides kernel_fpu_{begin,end} that are used like the 
x86 counterparts in commit 2b3bd32ea3a22ea2d ("LoongArch: Provide kernel 
fpu functions"), so we can now implement DRM_AMD_DC_FP on LoongArch for 
supporting more DCN devices."




Signed-off-by: WANG Xuerui 
Signed-off-by: Huacai Chen 


I just finished my tests according to the link above and all seems fine.

* Board: A2101 (Loongson 3A5000 with LS7A1000 bridge)
  - with the firmware provided at [1]
* GPU: RX 6400 (PowerColor ITX RX6400 4GB GDDR6)
* Display: Dell P2317H (connected via DisplayPort)
* Kernel: next-20230505 with this patch (with the conflict resolved)
* Sysroot: up-to-date Gentoo/LoongArch

I've tested:

* Desktop sessions: Xfce4, Plasma Wayland
* Hot-plugging
  - at tty, at sddm, inside Plasma Wayland session, multiple times each
* Changing resolutions
* kms_flip tests: every non-skipped case passed (I can't test 
dual-monitor right now)


[1]: https://github.com/loongson/Firmware/tree/main/5000Series/PC/A2101

Hence it's:

Tested-by: WANG Xuerui 


---
V2: Update commit message to add the commit which provides kernel fpu
 functions.

  drivers/gpu/drm/amd/display/Kconfig| 2 +-
  drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 6 --
  drivers/gpu/drm/amd/display/dc/dml/Makefile| 5 +
  3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/Kconfig 
b/drivers/gpu/drm/amd/display/Kconfig
index 2d8e55e29637..49df073962d5 100644
--- a/drivers/gpu/drm/amd/display/Kconfig
+++ b/drivers/gpu/drm/amd/display/Kconfig
@@ -8,7 +8,7 @@ config DRM_AMD_DC
depends on BROKEN || !CC_IS_CLANG || X86_64 || SPARC64 || ARM64
select SND_HDA_COMPONENT if SND_HDA_CORE
# !CC_IS_CLANG: https://github.com/ClangBuiltLinux/linux/issues/1752
-   select DRM_AMD_DC_FP if (X86 || (PPC64 && ALTIVEC) || (ARM64 && 
KERNEL_MODE_NEON && !CC_IS_CLANG))
+   select DRM_AMD_DC_FP if (X86 || LOONGARCH || (PPC64 && ALTIVEC) || (ARM64 && 
KERNEL_MODE_NEON && !CC_IS_CLANG))
help
  Choose this option if you want to use the new display engine
  support for AMDGPU. This adds required support for Vega and
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
index 1743ca0a3641..86f4c0e04654 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
@@ -33,6 +33,8 @@
  #include 
  #elif defined(CONFIG_ARM64)
  #include 
+#elif defined(CONFIG_LOONGARCH)
+#include 
  #endif
  
  /**

@@ -88,7 +90,7 @@ void dc_fpu_begin(const char *function_name, const int line)
*pcpu += 1;
  
  	if (*pcpu == 1) {

-#if defined(CONFIG_X86)
+#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
kernel_fpu_begin();


And with the conflict here with linux-next resolved then we may be good 
to go.



  #elif defined(CONFIG_PPC64)
if (cpu_has_feature(CPU_FTR_VSX_COMP)) {
@@ -127,7 +129,7 @@ void dc_fpu_end(const char *function_name, const int line)
pcpu = get_cpu_ptr(&fpu_recursion_depth);
*pcpu -= 1;
if (*pcpu <= 0) {
-#if defined(CONFIG_X86)
+#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
kernel_fpu_end();
  #elif defined(CONFIG_PPC64)
if (cpu_has_feature(CPU_FTR_VSX_COMP)) {
diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile 
b/drivers/gpu/drm/amd/display/dc/dml/Makefile
index 01db035589c5..542962a93e8f 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile
@@ -38,6 +38,11 @@ ifdef CONFIG_ARM64
  dml_rcflags := -mgeneral-regs-only
  endif
  
+ifdef CONFIG_LOONGARCH

+dml_ccflags := -mfpu=64
+dml_rcflags := -msoft-float
+endif
+
  ifdef CONFIG_CC_IS_GCC
  ifneq ($(call gcc-min-version, 70100),y)
  IS_OLD_GCC = 1


--
WANG "xen0n" Xuerui

Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/



Re: [PATCH V2] drm/amdgpu/display: Enable DC_FP for LoongArch

2023-05-05 Thread WANG Xuerui

On 5/6/23 02:00, Alex Deucher wrote:

On Fri, May 5, 2023 at 1:57 PM WANG Xuerui  wrote:


On a side note, I had to modprobe amdgpu with runpm=0, otherwise my
dmesg gets flooded with PSP getting resumed every 8~10 seconds or so. I
currently have none of the connectors plugged in. I didn't notice any
similar reports on the Internet so I don't know if it's due to platform
quirks or not.

That might just be part of the normal suspend/resume process.  If it
happens at regular intervals, it sounds like something is waking the
GPU at a regular interval.  We should probably remove that message to
avoid it being too chatty, but you may want to check what is waking it
so much as doing so sort of negates the value of runtime power
management.


Ah. This is extremely helpful as I'm immediately able to confirm it's 
node_exporter trying to access the hwmon readings (I have a monitoring 
infra for all my devboxes). Every sufficiently spaced read from say 
temp1_input wakes up the GPU. Not many people have their boxes working 
like this I guess... but at least we could probably reduce the log spam 
a bit if it's not feasible to get GPU metrics while avoiding to resume 
it? (Currently it's 25 lines per resume, mostly SMU resume logs and ring 
info.)


And of course this is not a big deal, I can always work around it 
locally. Thanks for the hint again.


--
WANG "xen0n" Xuerui

Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/



Re: [PATCH V2] drm/amdgpu/display: Enable DC_FP for LoongArch

2023-05-05 Thread WANG Xuerui

Hi,

On 5/5/23 21:39, Hamza Mahfooz wrote:


Hey Huacai,

On 5/5/23 07:32, Huacai Chen wrote:

Now LoongArch provides kernel_fpu_begin() and kernel_fpu_end() in commit
2b3bd32ea3a22ea2d ("LoongArch: Provide kernel fpu functions"), so we can
enable DC_FP for DCN devices.


Have you had the chance to test how well this is working on actual
hardware, or was it only compile tested? If it was only compile tested,
it would be great if you could run some tests. Please see the following
for more details:
https://lore.kernel.org/amd-gfx/8eb69dfb-ae35-dbf2-3f82-e8cc00e53...@amd.com/ 




Thanks for the helpful link!

I did test an earlier version of this patch along with the 
arch/loongarch kernel FPU bits before that patch got upstreamed, with a 
RX 6400 (BEIGE_GOBY) on a Loongson 3A5000 + LS7A1000 system (by far the 
most popular combination for LoongArch desktops). Things like Plasma 
Wayland session or glmark2 work just fine, although I didn't go for the 
more complete testing as detailed in the mail you linked to. I'll try 
going through that procedure in the next 1~2 days when I have time & get 
physical access to that box.


On a side note, I had to modprobe amdgpu with runpm=0, otherwise my 
dmesg gets flooded with PSP getting resumed every 8~10 seconds or so. I 
currently have none of the connectors plugged in. I didn't notice any 
similar reports on the Internet so I don't know if it's due to platform 
quirks or not.




Signed-off-by: WANG Xuerui 
Signed-off-by: Huacai Chen 
---
V2: Update commit message to add the commit which provides kernel fpu
 functions.

  drivers/gpu/drm/amd/display/Kconfig    | 2 +-
  drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 6 --
  drivers/gpu/drm/amd/display/dc/dml/Makefile    | 5 +
  3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/Kconfig 
b/drivers/gpu/drm/amd/display/Kconfig

index 2d8e55e29637..49df073962d5 100644
--- a/drivers/gpu/drm/amd/display/Kconfig
+++ b/drivers/gpu/drm/amd/display/Kconfig
@@ -8,7 +8,7 @@ config DRM_AMD_DC
  depends on BROKEN || !CC_IS_CLANG || X86_64 || SPARC64 || ARM64
  select SND_HDA_COMPONENT if SND_HDA_CORE
  # !CC_IS_CLANG: 
https://github.com/ClangBuiltLinux/linux/issues/1752
-    select DRM_AMD_DC_FP if (X86 || (PPC64 && ALTIVEC) || (ARM64 && 
KERNEL_MODE_NEON && !CC_IS_CLANG))
+    select DRM_AMD_DC_FP if (X86 || LOONGARCH || (PPC64 && ALTIVEC) 
|| (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG))

  help
    Choose this option if you want to use the new display engine
    support for AMDGPU. This adds required support for Vega and
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c

index 1743ca0a3641..86f4c0e04654 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
@@ -33,6 +33,8 @@
  #include 
  #elif defined(CONFIG_ARM64)
  #include 
+#elif defined(CONFIG_LOONGARCH)
+#include 
  #endif
    /**
@@ -88,7 +90,7 @@ void dc_fpu_begin(const char *function_name, const 
int line)

  *pcpu += 1;
    if (*pcpu == 1) {
-#if defined(CONFIG_X86)
+#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
  kernel_fpu_begin();
This is going to conflict with commit b1bcdd409d2d ("drm/amd/display: 
Disable migration to ensure consistency of per-CPU variable"), which is 
present in next-20230505. Resolution is trivial though.

  #elif defined(CONFIG_PPC64)
  if (cpu_has_feature(CPU_FTR_VSX_COMP)) {
@@ -127,7 +129,7 @@ void dc_fpu_end(const char *function_name, const 
int line)

  pcpu = get_cpu_ptr(&fpu_recursion_depth);
  *pcpu -= 1;
  if (*pcpu <= 0) {
-#if defined(CONFIG_X86)
+#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
  kernel_fpu_end();
  #elif defined(CONFIG_PPC64)
  if (cpu_has_feature(CPU_FTR_VSX_COMP)) {
diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile 
b/drivers/gpu/drm/amd/display/dc/dml/Makefile

index 01db035589c5..542962a93e8f 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile
@@ -38,6 +38,11 @@ ifdef CONFIG_ARM64
  dml_rcflags := -mgeneral-regs-only
  endif
  +ifdef CONFIG_LOONGARCH
+dml_ccflags := -mfpu=64
+dml_rcflags := -msoft-float
+endif
+
  ifdef CONFIG_CC_IS_GCC
  ifneq ($(call gcc-min-version, 70100),y)
  IS_OLD_GCC = 1


--
WANG "xen0n" Xuerui

Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/



Re: [PATCH] drm/amdgpu: Use uncached ioremap() for LoongArch

2023-03-06 Thread WANG Xuerui

On 2023/3/6 10:49, Huacai Chen wrote:

Hi, Christian,

On Mon, Mar 6, 2023 at 12:40 AM Christian König
 wrote:


Am 05.03.23 um 06:21 schrieb Huacai Chen:

LoongArch maintains cache coherency in hardware, but its WUC attribute
(Weak-ordered UnCached, which is similar to WC) is out of the scope of
cache coherency machanism. This means WUC can only used for write-only
memory regions. So use uncached ioremap() for LoongArch in the amdgpu
driver.


Well NAK. This is leaking platform dependencies into the driver code.

Then is it acceptable to let ioremap() depend on drm_arch_can_wc_memory()?


Note: he's likely meaning "is it acceptable to use 
drm_arch_can_wc_memory() to decide between ioremap() and ioremap_wc()".


Although I doubt it's acceptable to you (driver) folks either, because 
while drm_arch_can_wc_memory() does isolate platform details from driver 
proper, it's still papering over platform PCIe violation in VRAM domain. 
Still better than having platform defines though.


Also making use of drm_arch_can_wc_memory might fix this fdo issue [1] 
on aarch64 too (where I replied earlier). It seems people simply can't 
stop inventing such micro-architectures sadly...


[1]: https://gitlab.freedesktop.org/drm/amd/-/issues/2313



When you have a limitation that ioremap_wc() can't guarantee read/write
ordering then that's pretty clearly a platform bug and you would need to
apply this workaround to all drivers using ioremap_wc() which isn't
really feasible.



I agree in this case perhaps all of ioremap_wc() usages would have to 
degrade into ioremap() for correctness on such platforms. In which case 
amdgpu wouldn't have to be individually called out / touched anyway. 
Whether this is easily doable/upstreamable is another question though...


--
WANG "xen0n" Xuerui

Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/