Hi!

I think we have a serious kernel bug that is related to or inside in drivers/gpu/drm/ttm/ttm_bo.c

The reason for my assumptions lies in one of my recent system freezes with kernel 6.3.4 that go along with massive kernel error logs in journalctl. An extract from the logs:

...
May 28 14:38:41 fedora.domain kernel: WARNING: CPU: 4 PID: 5523 at 
drivers/gpu/drm/ttm/ttm_bo.c:326 ttm_bo_release+0x289/0x2e0 [ttm]
...
May 28 14:38:41 fedora.domain kernel: WARNING: CPU: 4 PID: 5523 at 
drivers/gpu/drm/ttm/ttm_bo.c:327 ttm_bo_release+0x296/0x2e0 [ttm]
...
May 28 14:38:41 fedora.domain kernel: kernel BUG at 
drivers/gpu/drm/ttm/ttm_bo.c:193!
...

The above information is more detailed than most of the occurrences, and its the first occurrence that did not end up in a freeze immediately or a few seconds after it. However, the corrupted state of the system became again apparent when I tried to shutdown some time after the above errors:

...

|May 28 14:51:09 fedora.domain kernel: #PF: error_code(0x0000) - not-present page May 28 14:51:09 fedora.domain kernel: #PF: supervisor read access in kernel mode May 28 14:51:09 fedora.domain kernel: BUG: unable to handle page fault for address: 0000003000300010|
...

I have that issue already for a longer time, at least since 6.2.X.

You can find my bug report and many full logs (including the full logs of the above) from root's journalctl in: https://bugzilla.redhat.com/show_bug.cgi?id=2193110

Ignore the title and the initial comments of the bug report, it is definitely not related to Firefox. Assuming that you want to focus on the kernel error logs of 6.3.X, you might focus only on the last 5 comments.

Additionally to the journalctl error logs that I already added through links in the bug report, I tested today once again 6.3.4 with amd_pstate=active (by default I am on amd_state=passive which feels most stable on my hardware) -> see https://gitlab.com/py0xc31/public-tmp-storage/-/blob/main/retry6.3.4/fullSystemFreeze.kernel6.3.4.pstate-ACTIVE.log (I have not yet put this into the bug report since I no longer assume it is relevant)


Some other people from Fedora have experienced related issues; see the comments on the test result pages in our update system:

https://bodhi.fedoraproject.org/updates/FEDORA-2023-514965dd8a (6.3.3 & 6.3.4)

https://bodhi.fedoraproject.org/updates/FEDORA-2023-26325e5399 (6.2.15) -> I am quite sure I have seen that issue already before 6.2.15.

Maybe also related (but without explicit information referring to ttm_bo.c):

https://gitlab.freedesktop.org/drm/amd/-/issues/2548

https://gitlab.freedesktop.org/drm/amd/-/issues/2447


Let me know if you need more information or if I can help with testing.

My hardware: AMD Ryzen 6850 Pro, I have no dedicated graphics but only the AMD graphics of my Ryzen. I use Fedora 38 KDE -> cat /proc/sys/kernel/tainted = 0.

I will try updating my BIOS in the next days when I have time to see if that makes a difference, but I guess this is not related given the logs.


Regards,

Chris

Reply via email to