On Fri, Nov 01, 2024 at 09:32:23AM -0400, Thomas Frohwein wrote:
> >Synopsis: amdgpu Radeon 780M graphics hang after page fault
> >Category: system
> >Environment:
> System : OpenBSD 7.6
> Details : OpenBSD 7.6-current (GENERIC.MP) #393: Sat Oct 26
> 21:59:25 MDT 2024
>
> [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
> Architecture: OpenBSD.amd64
> Machine : amd64
> >Description:
> After a variable amount of time of running a Godot game (Brotato),
> generally
> 1-5 minutes, the system hangs and xconsole/messages show the following
> just
> before the hang:
>
> gmc_v11_0_process_interrupt *ERROR* [gfxhub] page fault (src_id:0 ring:24
> vmid:2 pasid:32779, for process godot pid 7378 thread godot pid 387416)
> gmc_v11_0_process_interrupt *ERROR* in page starting at address
> 0x0000000000000000 from client 10
> gfxhub_v3_0_print_l2_protection_fault_status *ERROR*
> GCVM_L2_PROTECTION_FAULT_STATUS:0x00201430
> gfxhub_v3_0_print_l2_protection_fault_status *ERROR* Faulty UTCL2 client
> ID: SQC (data) (0xa)
> gfxhub_v3_0_print_l2_protection_fault_status *ERROR* MORE_FAULTS: 0x0
> gfxhub_v3_0_print_l2_protection_fault_status *ERROR* WALKER_ERROR: 0x0
> gfxhub_v3_0_print_l2_protection_fault_status *ERROR* PERMISSION_FAULTS: 0x3
> gfxhub_v3_0_print_l2_protection_fault_status *ERROR* MAPPING_ERROR: 0x0
> gfxhub_v3_0_print_l2_protection_fault_status *ERROR* RW: 0x0
>
> >How-To-Repeat:
> So far only found with the Godot 3 game Brotato (but repeatedly, after
> variable time). I've tried hard to reproduce it with 0ad or other Godot
> games, but so far has only happened with Brotato.
> I have reproduced it several times with it.
>
> >Fix:
> Unknown. I have tried setting AMD_DEBUG=dcc and then AMD_DEBUG=nodcc
> based on [1,2].
>
> [1] https://gitlab.freedesktop.org/drm/amd/-/issues/2496
> [2] https://gitlab.freedesktop.org/drm/amd/-/issues/2690
AMD people usually flag these as problems with Mesa (without
specifying specific fixes).
> amdgpu0: IP DISCOVERY GC 11.0.1 12 CU rev 0x09
There is newer GC 11.0.1 firmware, does it help?
The linux-firmware commit doesn't specify what changed.
Index: sysutils/firmware/amdgpu/Makefile
===================================================================
RCS file: /cvs/ports/sysutils/firmware/amdgpu/Makefile,v
diff -u -p -r1.30 Makefile
--- sysutils/firmware/amdgpu/Makefile 11 Sep 2024 06:32:58 -0000 1.30
+++ sysutils/firmware/amdgpu/Makefile 2 Nov 2024 11:34:05 -0000
@@ -1,5 +1,5 @@
FW_DRIVER= amdgpu
-FW_VER= 20240909
+FW_VER= 20241017
DISTNAME= linux-firmware-${FW_VER}
EXTRACT_SUFX= .tar.xz
EXTRACT_FILES= ${DISTNAME}/{LICENSE.\*,\*.bin}
Index: sysutils/firmware/amdgpu/distinfo
===================================================================
RCS file: /cvs/ports/sysutils/firmware/amdgpu/distinfo,v
diff -u -p -r1.27 distinfo
--- sysutils/firmware/amdgpu/distinfo 11 Sep 2024 06:32:58 -0000 1.27
+++ sysutils/firmware/amdgpu/distinfo 2 Nov 2024 11:34:42 -0000
@@ -1,2 +1,2 @@
-SHA256 (firmware/linux-firmware-20240909.tar.xz) =
lD+9GYg8+OrfieCyJCJUnbBWVXsezTClZABhWXE2lnE=
-SIZE (firmware/linux-firmware-20240909.tar.xz) = 383099276
+SHA256 (firmware/linux-firmware-20241017.tar.xz) =
omw471qDJy8rmM6L+MoYZahSo97qSc5ajdgEuRQ1EnM=
+SIZE (firmware/linux-firmware-20241017.tar.xz) = 397400292
Index: sysutils/firmware/amdgpu/pkg/PLIST
===================================================================
RCS file: /cvs/ports/sysutils/firmware/amdgpu/pkg/PLIST,v
diff -u -p -r1.19 PLIST
--- sysutils/firmware/amdgpu/pkg/PLIST 1 Jul 2024 06:46:15 -0000 1.19
+++ sysutils/firmware/amdgpu/pkg/PLIST 2 Nov 2024 11:35:10 -0000
@@ -160,6 +160,13 @@ firmware/amdgpu/gc_11_5_1_mes1.bin
firmware/amdgpu/gc_11_5_1_mes_2.bin
firmware/amdgpu/gc_11_5_1_pfp.bin
firmware/amdgpu/gc_11_5_1_rlc.bin
+firmware/amdgpu/gc_11_5_2_imu.bin
+firmware/amdgpu/gc_11_5_2_me.bin
+firmware/amdgpu/gc_11_5_2_mec.bin
+firmware/amdgpu/gc_11_5_2_mes1.bin
+firmware/amdgpu/gc_11_5_2_mes_2.bin
+firmware/amdgpu/gc_11_5_2_pfp.bin
+firmware/amdgpu/gc_11_5_2_rlc.bin
firmware/amdgpu/gc_9_4_3_mec.bin
firmware/amdgpu/gc_9_4_3_rlc.bin
firmware/amdgpu/green_sardine_asd.bin
@@ -393,6 +400,8 @@ firmware/amdgpu/psp_14_0_0_ta.bin
firmware/amdgpu/psp_14_0_0_toc.bin
firmware/amdgpu/psp_14_0_1_ta.bin
firmware/amdgpu/psp_14_0_1_toc.bin
+firmware/amdgpu/psp_14_0_4_ta.bin
+firmware/amdgpu/psp_14_0_4_toc.bin
firmware/amdgpu/raven2_asd.bin
firmware/amdgpu/raven2_ce.bin
firmware/amdgpu/raven2_gpu_info.bin
@@ -438,6 +447,7 @@ firmware/amdgpu/sdma_6_0_2.bin
firmware/amdgpu/sdma_6_0_3.bin
firmware/amdgpu/sdma_6_1_0.bin
firmware/amdgpu/sdma_6_1_1.bin
+firmware/amdgpu/sdma_6_1_2.bin
firmware/amdgpu/si58_mc.bin
firmware/amdgpu/sienna_cichlid_ce.bin
firmware/amdgpu/sienna_cichlid_dmcub.bin
@@ -579,6 +589,7 @@ firmware/amdgpu/verde_smc.bin
firmware/amdgpu/verde_uvd.bin
firmware/amdgpu/vpe_6_1_0.bin
firmware/amdgpu/vpe_6_1_1.bin
+firmware/amdgpu/vpe_6_1_3.bin
firmware/amdgpu/yellow_carp_asd.bin
firmware/amdgpu/yellow_carp_ce.bin
firmware/amdgpu/yellow_carp_dmcub.bin