[Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
https://bugzilla.kernel.org/show_bug.cgi?id=216625 --- Comment #11 from Pierre Ossman (pierre-bugzi...@ossman.eu) --- (In reply to Pierre Ossman from comment #9) > > It now hangs more arbitrarily, not just when trying to play a video. Having > done a suspend/resume cycle is still a requirement though. > I tried disabling video acceleration, and the hangs are now gone. So it does seem to be the culprit after all. Could this help you pinpoint things somehow? -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
https://bugzilla.kernel.org/show_bug.cgi?id=216625 --- Comment #10 from Pierre Ossman (pierre-bugzi...@ossman.eu) --- I finally got that old version of mesa to build. Unfortunately, the hangs still happen even with that. :/ > Mar 09 07:18:30 kernel: radeon :00:01.0: ring 3 stalled for more than > 10028msec > Mar 09 07:18:30 kernel: radeon :00:01.0: GPU lockup (current fence id > 0xfa91 last fence id 0xfabc on ring 3) > Mar 09 07:18:31 kernel: radeon :00:01.0: ring 5 stalled for more than > 10077msec > Mar 09 07:18:31 kernel: radeon :00:01.0: GPU lockup (current fence id > 0x18fb last fence id 0x18fe on ring 5) > Mar 09 07:18:31 kernel: radeon :00:01.0: ring 0 stalled for more than > 10202msec > ... What can we do next to pinpoint this? It seems to fail rather reliably after a suspend/resume. Is there some test suite I can run to provoke things? -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
https://bugzilla.kernel.org/show_bug.cgi?id=216625 --- Comment #9 from Pierre Ossman (pierre-bugzi...@ossman.eu) --- FYI, it seems to have gotten worse since upgrading from kernel-6.1.8-100.fc36.x86_64 to kernel-6.1.13-100.fc36.x86_64. It now hangs more arbitrarily, not just when trying to play a video. Having done a suspend/resume cycle is still a requirement though. I'm struggling building the old version of mesa that still worked. It isn't very compatible with newer LLVM, and there is something wrong with Fedora's packaging of LLVM 12 (that seems to be the matching LLVM version for that old mesa). I'll need some more effort to get that test up and running. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
https://bugzilla.kernel.org/show_bug.cgi?id=216625 --- Comment #8 from Alex Deucher (alexdeuc...@gmail.com) --- (In reply to Pierre Ossman from comment #7) > > Is that also handled by mesa, or some other component? Yes, mesa handles video APIs (VAAPI, OpenMAX, VDPAU) as well as 3D (OpenGL, Vulkan). -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
https://bugzilla.kernel.org/show_bug.cgi?id=216625 --- Comment #7 from Pierre Ossman (pierre-bugzi...@ossman.eu) --- Sorry, I haven't had time to look at downgrading Mesa yet. But FYI, it does still happen with mesa 22.1.7 and kernel 6.0.10. I am now almost 100% certain that it is videos that are triggering this. And possibly not all videos. So I'm thinking, perhaps the video acceleration? Is that also handled by mesa, or some other component? -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
https://bugzilla.kernel.org/show_bug.cgi?id=216625 --- Comment #6 from Alex Deucher (alexdeuc...@gmail.com) --- (In reply to Pierre Ossman from comment #5) > > Could the issue be with the firmware? Has that changed recently for these > devices? > > The last good firmware should be: > > linux-firmware-20220509-132.fc34.noarch > > And the first bad firmware should be: > > linux-firmware-20220708-136.fc35.noarch Not likely. The firmware for this chip has not changed in years. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
https://bugzilla.kernel.org/show_bug.cgi?id=216625 --- Comment #5 from Pierre Ossman (pierre-bugzi...@ossman.eu) --- The lockup happens on 5.17.2 as well, so it seems the kernel is not the most likely suspect. I'll see if I can try an older mesa next. Could the issue be with the firmware? Has that changed recently for these devices? The last good firmware should be: linux-firmware-20220509-132.fc34.noarch And the first bad firmware should be: linux-firmware-20220708-136.fc35.noarch -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
https://bugzilla.kernel.org/show_bug.cgi?id=216625 --- Comment #4 from Pierre Ossman (pierre-bugzi...@ossman.eu) --- I just got a GPU lockup on 5.18.4. So it's either not the kernel, or a bug that appeared in the 5.18 series. I'll go back to the known good kernel now and see if I can get the bug there. One thought though, even if it is mesa that happens to issue a bad sequence of commands, shouldn't the kernel driver be able to reset the GPU? It certainly indicates that it is trying. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
https://bugzilla.kernel.org/show_bug.cgi?id=216625 --- Comment #3 from Pierre Ossman (pierre-bugzi...@ossman.eu) --- This is wrong, I checked the wrong lines in dnf's history: > Last working system: > > kernel-5.13.8-100.fc33.x86_64 The last working kernel is actually 5.17.12-100.fc34.x86_64. So if it's the kernel it's likely 5.18 or 5.19 that regressed. I'll give 5.18.1 a spin. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
https://bugzilla.kernel.org/show_bug.cgi?id=216625 --- Comment #2 from Pierre Ossman (pierre-bugzi...@ossman.eu) --- A bisect will be difficult, given that I can't reproduce it. :/ Any clues from the dmesg that could tell how to provoke it? Or some settings that could provide more information? I can try a few version and see if I'm able to narrow it down somewhat. It's difficult to know when to assume it's a good version as in some cases it has gone weeks without a lookup... -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
https://bugzilla.kernel.org/show_bug.cgi?id=216625 Alex Deucher (alexdeuc...@gmail.com) changed: What|Removed |Added CC||alexdeuc...@gmail.com --- Comment #1 from Alex Deucher (alexdeuc...@gmail.com) --- Any chance you could bisect? There have been very few changes to the radeon kernel driver over the last few years. I could also be a mesa regression. Does upgrading or downgrading mesa help? -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 216625] [regression] GPU lockup on Radeon R7 Kaveri
https://bugzilla.kernel.org/show_bug.cgi?id=216625 Pierre Ossman (pierre-bugzi...@ossman.eu) changed: What|Removed |Added Tree|Mainline|Fedora -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.