On Fri, 05 Apr 2024 11:36:32 -0600 Ivan Stanton <northivanas...@gmail.com> 
wrote:
> Package: libgl1-mesa-dri
> Version: 23.3.5-1
> Severity: important
> 
> Dear Maintainer,
> 
> I and some others have been unable to play 3D games or run GPU-intensive
> software on the Framework Laptop 13 AMD 7040 Edition due to GPU resets
> occurring while doing so. I've previously reported this to the Framework
> Community forums:
> 
> https://community.frame.work/t/solved-debian-12-on-laptop-13-ryzen-7640u-gpu-hangs-in-some-games/
> 
> And others have reported similar issues:
> 
> https://community.frame.work/t/vram-is-lost-due-to-gpu-reset-followed-by-a-crash/
> 
>    * What led up to the situation? I attempted to play the Steam version of
> Garry's Mod. This also occurred with The Stanley Parable: Ultra Deluxe 
> (Steam),
> DSDA Doom (from the Debian repo) and Xonotic (from flathub). All 3D games > 
seem
> to be affected, and possibly other GPU-intensive applications.
>    * What exactly did you do (or not do) that was effective (or
>      ineffective)? I first encountered this bug on bookworm, with mesa
> 22.3.6-1+deb12u1. Upgrading linux-firmware, both from upstream and from
> testing, had no effect. Upgrading the kernel from backports had no effect.
> Upgrading mesa, using the packages from trixie, made the crashes less  
frequent
> but did not resolve the issue. After some A/B testing, the crash seems to be
> resolved only by both upgrading mesa and setting the kernel parameter
> amdgpu.sg_display=0, which judging by the kernel documentation, I should not
> have to set unless there is a bug. It would also be nice to get this fixed 
for
> Debian Stable users, if possible.
>    * What was the outcome of this action? A few seconds into the game, the
> display froze (though audio kept playing). After a few seconds, it flickered
> and the graphics became partially corrupted. About a minute later, I was 
> kicked to the login screen.
>    * What outcome did you expect instead? Game continues playing without any
> graphical glitches or freezes.
> 
> I'm not an expert on the GNU/Linux graphics stack and I haven't reported a 
bug 
> to Debian in a while, so apologies if I got something wrong.
> 
> Here's an extract of dmesg from one occurrence of the bug:
> 
> [   62.824231] amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0
> ring:24 vmid:6 pasid:32787, for process dsda-doom pid 2910 thread dsda-
> doom:cs0
> pid 2926)
> [   62.824267] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address
> 0x000000409b40c000 from client 10
> [   62.824285] amdgpu 0000:c1:00.0: amdgpu:
> GCVM_L2_PROTECTION_FAULT_STATUS:0x00601030
> [   62.824297] amdgpu 0000:c1:00.0: amdgpu:      Faulty UTCL2 client ID: TCP
> (0x8)
> [   62.824310] amdgpu 0000:c1:00.0: amdgpu:      MORE_FAULTS: 0x0
> [   62.824321] amdgpu 0000:c1:00.0: amdgpu:      WALKER_ERROR: 0x0
> [   62.824331] amdgpu 0000:c1:00.0: amdgpu:      PERMISSION_FAULTS: 0x3
> [   62.824340] amdgpu 0000:c1:00.0: amdgpu:      MAPPING_ERROR: 0x0
> [   62.824349] amdgpu 0000:c1:00.0: amdgpu:      RW: 0x0
> [   72.941268] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0

I haven't been able to replicate this exact behavior since upgrading to 
Framework's BIOS version 3.05b and disabling all of my previous workarounds, 
but I did get this log from a regular app crash that was similar:

[75883.804346] amdgpu 0000:c1:00.0: amdgpu: [gfxhub] page fault (src_id:0 
ring:24 vmid:2 pasid:32807, for process Discord pid 8547 thread Discord:cs0 
pid 8579)
[75883.804356] amdgpu 0000:c1:00.0: amdgpu:   in page starting at address 
0x00004d023e345000 from client 10
[75883.804359] amdgpu 0000:c1:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:
0x00201430
[75883.804361] amdgpu 0000:c1:00.0: amdgpu:      Faulty UTCL2 client ID: SQC 
(data) (0xa)
[75883.804363] amdgpu 0000:c1:00.0: amdgpu:      MORE_FAULTS: 0x0
[75883.804365] amdgpu 0000:c1:00.0: amdgpu:      WALKER_ERROR: 0x0
[75883.804368] amdgpu 0000:c1:00.0: amdgpu:      PERMISSION_FAULTS: 0x3
[75883.804370] amdgpu 0000:c1:00.0: amdgpu:      MAPPING_ERROR: 0x0
[75883.804371] amdgpu 0000:c1:00.0: amdgpu:      RW: 0x0
[75893.925804] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 
timeout, but soft recovered

In this case, KWin reloaded due to a graphics reset, instead of logging me 
out, and I did not experience freezing.

Reply via email to