On Wed, 5 Feb 2020 22:31:52 -0700 Aaron Bieber <aa...@bolddaemon.com> wrote:
> On Wed, 05 Feb 2020 at 20:29:31 +0100, William Orr wrote: > > > > Hey, > > > > On recent a snap (04/02/2020), the unpriv'ed process of Xorg seems to hang, > > becoming totally unresponsive. Running `ktrace` on the process fails to log > > any output. `top` shows that the process is waiting on `fsleep`. I'm using > > the > > amdgpu driver. > > Similar issue here. It seems to happen randomly (possibly more often under > high > memory usage). It's always after X has started and I have been using it for > some time (days sometimes). > > MPD will continue to play music in the background and pressing the power > button > for a few seconds seems to result in a shutdown, however, it doesn't quite > shutdown properly. The screen will go blank and the fans will start to spin at > full speed. At which point holding the power button seems to be the only fix. I studied my problem with startxfce4 (where Xorg gets stuck and I use Ctrl+Alt+BackSpace to reset Xorg), but that is a different bug, not an amdgpu glitch. Today, I froze Xorg in a different way. I was stressing supertuxkart on my amdgpu machine by playing at full screen (1920x1080), graphics setting 6, and 19 AI karts. This sometimes causes a visual glitch where objects in the game either disappear or cast large black shadows. There is a LOADING screen before each race. The LOADING screen seems to decide the amount of glitches in each race: none, few, or many. If I reload the track, I may have more or fewer glitches. Today, my last race got stuck at the LOADING screen. Xorg stopped responding to the keyboard: Ctrl+Alt+F4 (to switch virtual console) didn't work. The system was still alive: ping(8) and ssh(1) continued to work (from a second computer to the amdgpu machine). In the ssh(1) session, top(1) showed one thread of supertuxkart being consistently "onproc" even though the machine was mostly idle. I became root and attached egdb (from package gdb-7.12.1p9) to supertuxkart. The thread seemed to be stuck in DRM_IOCTL_AMDGPU_WAIT_CS, called from /usr/xenocara/lib/libdrm/amdgpu/amdgpu_cs.c; this appears to call /sys/dev/pci/drm/amd/amdgpu_cs.c amdgpu_cs_wait_ioctl(). I detached egdb, then told top(1) to kill supertuxkart. The system stopped answering ping(8), and top(1) froze. In top(1), supertuxkart had WAIT "drmweti" and Xorg had wait "dmafenc". I forced a reboot. The rest of this mail is a backtrace of one thread of supertuxkart-0.9.3p0 (copy from photo, so beware of typos). --George (gdb) bt #0 ioctl () at -:3 #1 0x000006d86059e3c0 in drmIoctl () from /usr/X11R6/lib/libdrm.so.7.8 #2 0x000006d941e83739 in amdgpu_cs_query_fence_status () from /usr/X11R6/lib/libdrm_amdgpu.so.1.9 #3 0x000006d8f800e951 in amdgpu_fence_wait () from /usr/X11R6/lib/modules/dri/radeon_dri.so #4 0x000006d8f7f448a6 in si_fence_finish () from /usr/X11R6/lib/modules/dri/radeon_dri.so #5 0x000006d8f79f04d3 in st_client_wait_sync () from /usr/X11R6/lib/modules/dri/radeonsi_dri.so #6 0x000006d8f793136e in _mesa_ClientWaitSync () from /usr/X11R6/lib/modules/dri/radeonsi_dri.so #7 0x000006d65d7c48d7 in DrawCalls::prepareDrawCalls(ShadowMatrices&, irr::scene::ICameraSceneNode #8 0x000006d65d887aee in ShaderBasedRenderer::renderScene( irr::scene::ICameraSceneNode*, float, bool, bool) () #9 0x000006d65d88a5c3 in ShaderBasedRenderer::render(float) () #10 0x000006d65d8068ed in IrrDriver::update(float) () #11 0x000006d65d9eaa0d in MainLoop::run() () #12 0x000006d65d9e74d0 in main () (gdb) info registers rax 0x36 54 rbx 0x16e2a71b28d7 25162721994967 rcx 0x6d88cf49a3a 7527147543098 rdx 0x7f7ffffdc508 140187732395272 rsi 0xc0206449 3223348297 # DRM_IOCTL_AMDGPU_WAIT_CS rdi 0x8 8 rbp 0x7f7ffffdc4e0 0x7f7ffffdc4e0 rsp 0x7f7ffffdc4b8 0x7f7ffffdc4b8 r8 0x6d88cf85cf8 7527147789560 r9 0x0 0 r10 0x0 0 r11 0x246 582 r12 0x8 8 r13 0x16e2a71b28d7 25162721994967 r14 0x7f7fffdc508 140187732395272 r15 0xc0206449 3223348297 rip 0x6d88cf49a3a 0x6d88cf49a3a <ioctl+10>