[Desktop-packages] [Bug 1863390] Re: GPU lockup ring 0 stalled for more than X msec

2020-03-11 Thread Jamie Bainbridge
After happening every day for a week, this hasn't happened again since I
logged this bug.

I also disabled Firefox WebRender so maybe that was a contributor.

I'll re-open if I can provide any useful data.

** Changed in: xserver-xorg-video-ati (Ubuntu)
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of Desktop
Packages, which is subscribed to xserver-xorg-video-ati in Ubuntu.
https://bugs.launchpad.net/bugs/1863390

Title:
  GPU lockup ring 0 stalled for more than X msec

Status in xserver-xorg-video-ati package in Ubuntu:
  Incomplete

Bug description:
  Since the update:

   xserver-xorg-video-ati-hwe-18.04 (1:19.0.1-1ubuntu1~18.04.1) bionic;

  which resulted from:

   https://bugs.launchpad.net/fedora/+source/xserver-xorg-video-
  ati/+bug/1841718

  I've experienced GPU freezes where all video becomes unresponsive,
  both Xorg and Ctrl+Alt terminal switching, and the GPU fan goes to
  full. I am still able to access the system via SSH.

  Sometimes dmesg ends up full of this message repeating over and over:

   radeon :01:00.0: ring 0 stalled for more than 24040msec
   radeon :01:00.0: GPU lockup (current fence id 0x9e44 last 
fence id 0x9e49 on ring 0)

  I sometimes get a few GPU soft reset which seem to fail in drm(?):

   radeon :01:00.0: Saved 110839 dwords of commands on ring 0.
   radeon :01:00.0: GPU softreset: 0x0008
   ...
   radeon :01:00.0: Wait for MC idle timedout !
   radeon :01:00.0: Wait for MC idle timedout !
   [drm] PCIE GART of 1024M enabled (table at 0x00162000).
   radeon :01:00.0: WB enabled 
   radeon :01:00.0: fence driver on ring 0 use gpu addr 0x4c00 
and cpu addr 0x725651ad
   radeon :01:00.0: fence driver on ring 3 use gpu addr 0x4c0c 
and cpu addr 0xc3678ed8
   radeon :01:00.0: fence driver on ring 5 use gpu addr 0x00072118 
and cpu addr 0xdbd9e01b
   [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed 
(scratch(0x8504)=0xCAFEDEAD)
   [drm:evergreen_resume [radeon]] *ERROR* evergreen startup failed on resume

  Even if the above reset doesn't happen, this freeze always results in
  a unable to handle page fault" BUG in radeon_ring_backup, entered from
  various call paths, eg:

   BUG: unable to handle page fault for address: bc2d80574ffc
   ...
   Oops:  [#1] SMP PTI 
   CPU: 2 PID: 11243 Comm: kworker/2:1H Not tainted 5.5.0-050500-generic 
#202001262030
   Workqueue: radeon-crtc radeon_flip_work_func [radeon]
   RIP: 0010:radeon_ring_backup+0xc9/0x140 [radeon]
   Call Trace:
radeon_gpu_reset+0xc3/0x2f0 [radeon]
radeon_flip_work_func+0x1f3/0x250 [radeon]
? __schedule+0x2e0/0x760
process_one_work+0x1b5/0x370
worker_thread+0x50/0x3d0
kthread+0x104/0x140
? process_one_work+0x370/0x370
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40

  or:

   BUG: unable to handle page fault for address: c03901000ffc
   ...
   Oops:  [#1] SMP PTI

   CPU: 3 PID: 2227 Comm: compton Not tainted 5.3.0-28-generic 
#30~18.04.1-Ubuntu
   RIP: 0010:radeon_ring_backup+0xd3/0x140 [radeon]
   Call Trace:
radeon_gpu_reset+0xb9/0x340 [radeon]
? dma_fence_wait_timeout+0x48/0x110
? reservation_object_wait_timeout_rcu+0x19d/0x340
radeon_gem_handle_lockup.part.4+0xe/0x20 [radeon]
radeon_gem_wait_idle_ioctl+0xa6/0x110 [radeon]
? radeon_gem_busy_ioctl+0x80/0x80 [radeon]
drm_ioctl_kernel+0xb0/0x100 [drm]
drm_ioctl+0x389/0x450 [drm]
? radeon_gem_busy_ioctl+0x80/0x80 [radeon]
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
radeon_drm_ioctl+0x4f/0x80 [radeon]
do_vfs_ioctl+0xa9/0x640
? __schedule+0x2b0/0x670
ksys_ioctl+0x75/0x80
__x64_sys_ioctl+0x1a/0x20
do_syscall_64+0x5a/0x130
entry_SYSCALL_64_after_hwframe+0x44/0xa9

  I've tried both 5.3.0-28-generic and 5.5.0-050500-generic from kernel-
  ppa but that made no difference. It appears to be a bug in radeon.

  Nothing specific makes this happen, just regular usage with a
  compositing window manager. I'm not playing games or particularly
  exercising the GPU. The last two times I was just reading in web
  browser. It's also happened in the middle of the night while I was
  asleep. Sometimes I have a few days uptime, sometimes it happens in
  less than 24 hours from boot.

  This never happened before the radeon update mentioned on the first
  line.

  I'll attach two files of dmesg output. As per
  https://wiki.ubuntu.com/X/Troubleshooting/Freeze I've installed and
  started apport for next time it happens.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-ati/+bug/1863390/+subscriptions

-- 

[Desktop-packages] [Bug 1863390] [NEW] GPU lockup ring 0 stalled for more than X msec

2020-02-14 Thread Jamie Bainbridge
Public bug reported:

Since the update:

 xserver-xorg-video-ati-hwe-18.04 (1:19.0.1-1ubuntu1~18.04.1) bionic;

which resulted from:

 https://bugs.launchpad.net/fedora/+source/xserver-xorg-video-
ati/+bug/1841718

I've experienced GPU freezes where all video becomes unresponsive, both
Xorg and Ctrl+Alt terminal switching, and the GPU fan goes to full. I am
still able to access the system via SSH.

Sometimes dmesg ends up full of this message repeating over and over:

 radeon :01:00.0: ring 0 stalled for more than 24040msec
 radeon :01:00.0: GPU lockup (current fence id 0x9e44 last 
fence id 0x9e49 on ring 0)

I sometimes get a few GPU soft reset which seem to fail in drm(?):

 radeon :01:00.0: Saved 110839 dwords of commands on ring 0.
 radeon :01:00.0: GPU softreset: 0x0008
 ...
 radeon :01:00.0: Wait for MC idle timedout !
 radeon :01:00.0: Wait for MC idle timedout !
 [drm] PCIE GART of 1024M enabled (table at 0x00162000).
 radeon :01:00.0: WB enabled 
 radeon :01:00.0: fence driver on ring 0 use gpu addr 0x4c00 
and cpu addr 0x725651ad
 radeon :01:00.0: fence driver on ring 3 use gpu addr 0x4c0c 
and cpu addr 0xc3678ed8
 radeon :01:00.0: fence driver on ring 5 use gpu addr 0x00072118 
and cpu addr 0xdbd9e01b
 [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed 
(scratch(0x8504)=0xCAFEDEAD)
 [drm:evergreen_resume [radeon]] *ERROR* evergreen startup failed on resume

Even if the above reset doesn't happen, this freeze always results in a
unable to handle page fault" BUG in radeon_ring_backup, entered from
various call paths, eg:

 BUG: unable to handle page fault for address: bc2d80574ffc
 ...
 Oops:  [#1] SMP PTI 
 CPU: 2 PID: 11243 Comm: kworker/2:1H Not tainted 5.5.0-050500-generic 
#202001262030
 Workqueue: radeon-crtc radeon_flip_work_func [radeon]
 RIP: 0010:radeon_ring_backup+0xc9/0x140 [radeon]
 Call Trace:
  radeon_gpu_reset+0xc3/0x2f0 [radeon]
  radeon_flip_work_func+0x1f3/0x250 [radeon]
  ? __schedule+0x2e0/0x760
  process_one_work+0x1b5/0x370
  worker_thread+0x50/0x3d0
  kthread+0x104/0x140
  ? process_one_work+0x370/0x370
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40

or:

 BUG: unable to handle page fault for address: c03901000ffc
 ...
 Oops:  [#1] SMP PTI

 CPU: 3 PID: 2227 Comm: compton Not tainted 5.3.0-28-generic #30~18.04.1-Ubuntu
 RIP: 0010:radeon_ring_backup+0xd3/0x140 [radeon]
 Call Trace:
  radeon_gpu_reset+0xb9/0x340 [radeon]
  ? dma_fence_wait_timeout+0x48/0x110
  ? reservation_object_wait_timeout_rcu+0x19d/0x340
  radeon_gem_handle_lockup.part.4+0xe/0x20 [radeon]
  radeon_gem_wait_idle_ioctl+0xa6/0x110 [radeon]
  ? radeon_gem_busy_ioctl+0x80/0x80 [radeon]
  drm_ioctl_kernel+0xb0/0x100 [drm]
  drm_ioctl+0x389/0x450 [drm]
  ? radeon_gem_busy_ioctl+0x80/0x80 [radeon]
  ? __switch_to_asm+0x40/0x70
  ? __switch_to_asm+0x34/0x70
  ? __switch_to_asm+0x40/0x70
  ? __switch_to_asm+0x40/0x70
  ? __switch_to_asm+0x34/0x70
  ? __switch_to_asm+0x40/0x70
  ? __switch_to_asm+0x34/0x70
  ? __switch_to_asm+0x40/0x70
  radeon_drm_ioctl+0x4f/0x80 [radeon]
  do_vfs_ioctl+0xa9/0x640
  ? __schedule+0x2b0/0x670
  ksys_ioctl+0x75/0x80
  __x64_sys_ioctl+0x1a/0x20
  do_syscall_64+0x5a/0x130
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

I've tried both 5.3.0-28-generic and 5.5.0-050500-generic from kernel-
ppa but that made no difference. It appears to be a bug in radeon.

Nothing specific makes this happen, just regular usage with a
compositing window manager. I'm not playing games or particularly
exercising the GPU. The last two times I was just reading in web
browser. It's also happened in the middle of the night while I was
asleep. Sometimes I have a few days uptime, sometimes it happens in less
than 24 hours from boot.

This never happened before the radeon update mentioned on the first
line.

I'll attach two files of dmesg output. As per
https://wiki.ubuntu.com/X/Troubleshooting/Freeze I've installed and
started apport for next time it happens.

** Affects: xserver-xorg-video-ati (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Desktop
Packages, which is subscribed to xserver-xorg-video-ati in Ubuntu.
https://bugs.launchpad.net/bugs/1863390

Title:
  GPU lockup ring 0 stalled for more than X msec

Status in xserver-xorg-video-ati package in Ubuntu:
  New

Bug description:
  Since the update:

   xserver-xorg-video-ati-hwe-18.04 (1:19.0.1-1ubuntu1~18.04.1) bionic;

  which resulted from:

   https://bugs.launchpad.net/fedora/+source/xserver-xorg-video-
  ati/+bug/1841718

  I've experienced GPU freezes where all video becomes unresponsive,
  both Xorg and Ctrl+Alt terminal switching, and the GPU fan goes to
  full. I am still able to access the system via SSH.

  Sometimes dmesg ends up full of this message repeating over and over:

   radeon :01:00.0: ring 

[Desktop-packages] [Bug 1863390] Re: GPU lockup ring 0 stalled for more than X msec

2020-02-14 Thread Jamie Bainbridge
** Attachment added: "dmesg-2020-02-14.txt"
   
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-ati/+bug/1863390/+attachment/5328273/+files/dmesg-2020-02-14.txt

-- 
You received this bug notification because you are a member of Desktop
Packages, which is subscribed to xserver-xorg-video-ati in Ubuntu.
https://bugs.launchpad.net/bugs/1863390

Title:
  GPU lockup ring 0 stalled for more than X msec

Status in xserver-xorg-video-ati package in Ubuntu:
  New

Bug description:
  Since the update:

   xserver-xorg-video-ati-hwe-18.04 (1:19.0.1-1ubuntu1~18.04.1) bionic;

  which resulted from:

   https://bugs.launchpad.net/fedora/+source/xserver-xorg-video-
  ati/+bug/1841718

  I've experienced GPU freezes where all video becomes unresponsive,
  both Xorg and Ctrl+Alt terminal switching, and the GPU fan goes to
  full. I am still able to access the system via SSH.

  Sometimes dmesg ends up full of this message repeating over and over:

   radeon :01:00.0: ring 0 stalled for more than 24040msec
   radeon :01:00.0: GPU lockup (current fence id 0x9e44 last 
fence id 0x9e49 on ring 0)

  I sometimes get a few GPU soft reset which seem to fail in drm(?):

   radeon :01:00.0: Saved 110839 dwords of commands on ring 0.
   radeon :01:00.0: GPU softreset: 0x0008
   ...
   radeon :01:00.0: Wait for MC idle timedout !
   radeon :01:00.0: Wait for MC idle timedout !
   [drm] PCIE GART of 1024M enabled (table at 0x00162000).
   radeon :01:00.0: WB enabled 
   radeon :01:00.0: fence driver on ring 0 use gpu addr 0x4c00 
and cpu addr 0x725651ad
   radeon :01:00.0: fence driver on ring 3 use gpu addr 0x4c0c 
and cpu addr 0xc3678ed8
   radeon :01:00.0: fence driver on ring 5 use gpu addr 0x00072118 
and cpu addr 0xdbd9e01b
   [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed 
(scratch(0x8504)=0xCAFEDEAD)
   [drm:evergreen_resume [radeon]] *ERROR* evergreen startup failed on resume

  Even if the above reset doesn't happen, this freeze always results in
  a unable to handle page fault" BUG in radeon_ring_backup, entered from
  various call paths, eg:

   BUG: unable to handle page fault for address: bc2d80574ffc
   ...
   Oops:  [#1] SMP PTI 
   CPU: 2 PID: 11243 Comm: kworker/2:1H Not tainted 5.5.0-050500-generic 
#202001262030
   Workqueue: radeon-crtc radeon_flip_work_func [radeon]
   RIP: 0010:radeon_ring_backup+0xc9/0x140 [radeon]
   Call Trace:
radeon_gpu_reset+0xc3/0x2f0 [radeon]
radeon_flip_work_func+0x1f3/0x250 [radeon]
? __schedule+0x2e0/0x760
process_one_work+0x1b5/0x370
worker_thread+0x50/0x3d0
kthread+0x104/0x140
? process_one_work+0x370/0x370
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40

  or:

   BUG: unable to handle page fault for address: c03901000ffc
   ...
   Oops:  [#1] SMP PTI

   CPU: 3 PID: 2227 Comm: compton Not tainted 5.3.0-28-generic 
#30~18.04.1-Ubuntu
   RIP: 0010:radeon_ring_backup+0xd3/0x140 [radeon]
   Call Trace:
radeon_gpu_reset+0xb9/0x340 [radeon]
? dma_fence_wait_timeout+0x48/0x110
? reservation_object_wait_timeout_rcu+0x19d/0x340
radeon_gem_handle_lockup.part.4+0xe/0x20 [radeon]
radeon_gem_wait_idle_ioctl+0xa6/0x110 [radeon]
? radeon_gem_busy_ioctl+0x80/0x80 [radeon]
drm_ioctl_kernel+0xb0/0x100 [drm]
drm_ioctl+0x389/0x450 [drm]
? radeon_gem_busy_ioctl+0x80/0x80 [radeon]
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
radeon_drm_ioctl+0x4f/0x80 [radeon]
do_vfs_ioctl+0xa9/0x640
? __schedule+0x2b0/0x670
ksys_ioctl+0x75/0x80
__x64_sys_ioctl+0x1a/0x20
do_syscall_64+0x5a/0x130
entry_SYSCALL_64_after_hwframe+0x44/0xa9

  I've tried both 5.3.0-28-generic and 5.5.0-050500-generic from kernel-
  ppa but that made no difference. It appears to be a bug in radeon.

  Nothing specific makes this happen, just regular usage with a
  compositing window manager. I'm not playing games or particularly
  exercising the GPU. The last two times I was just reading in web
  browser. It's also happened in the middle of the night while I was
  asleep. Sometimes I have a few days uptime, sometimes it happens in
  less than 24 hours from boot.

  This never happened before the radeon update mentioned on the first
  line.

  I'll attach two files of dmesg output. As per
  https://wiki.ubuntu.com/X/Troubleshooting/Freeze I've installed and
  started apport for next time it happens.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-ati/+bug/1863390/+subscriptions

-- 
Mailing list: https://launchpad.net/~desktop-packages
Post to : desktop-packages@lists.launchpad.net
Unsubscribe : 

[Desktop-packages] [Bug 1863390] Re: GPU lockup ring 0 stalled for more than X msec

2020-02-14 Thread Jamie Bainbridge
** Attachment added: "dmesg-2020-02-15.txt"
   
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-ati/+bug/1863390/+attachment/5328274/+files/dmesg-2020-02-15.txt

-- 
You received this bug notification because you are a member of Desktop
Packages, which is subscribed to xserver-xorg-video-ati in Ubuntu.
https://bugs.launchpad.net/bugs/1863390

Title:
  GPU lockup ring 0 stalled for more than X msec

Status in xserver-xorg-video-ati package in Ubuntu:
  New

Bug description:
  Since the update:

   xserver-xorg-video-ati-hwe-18.04 (1:19.0.1-1ubuntu1~18.04.1) bionic;

  which resulted from:

   https://bugs.launchpad.net/fedora/+source/xserver-xorg-video-
  ati/+bug/1841718

  I've experienced GPU freezes where all video becomes unresponsive,
  both Xorg and Ctrl+Alt terminal switching, and the GPU fan goes to
  full. I am still able to access the system via SSH.

  Sometimes dmesg ends up full of this message repeating over and over:

   radeon :01:00.0: ring 0 stalled for more than 24040msec
   radeon :01:00.0: GPU lockup (current fence id 0x9e44 last 
fence id 0x9e49 on ring 0)

  I sometimes get a few GPU soft reset which seem to fail in drm(?):

   radeon :01:00.0: Saved 110839 dwords of commands on ring 0.
   radeon :01:00.0: GPU softreset: 0x0008
   ...
   radeon :01:00.0: Wait for MC idle timedout !
   radeon :01:00.0: Wait for MC idle timedout !
   [drm] PCIE GART of 1024M enabled (table at 0x00162000).
   radeon :01:00.0: WB enabled 
   radeon :01:00.0: fence driver on ring 0 use gpu addr 0x4c00 
and cpu addr 0x725651ad
   radeon :01:00.0: fence driver on ring 3 use gpu addr 0x4c0c 
and cpu addr 0xc3678ed8
   radeon :01:00.0: fence driver on ring 5 use gpu addr 0x00072118 
and cpu addr 0xdbd9e01b
   [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed 
(scratch(0x8504)=0xCAFEDEAD)
   [drm:evergreen_resume [radeon]] *ERROR* evergreen startup failed on resume

  Even if the above reset doesn't happen, this freeze always results in
  a unable to handle page fault" BUG in radeon_ring_backup, entered from
  various call paths, eg:

   BUG: unable to handle page fault for address: bc2d80574ffc
   ...
   Oops:  [#1] SMP PTI 
   CPU: 2 PID: 11243 Comm: kworker/2:1H Not tainted 5.5.0-050500-generic 
#202001262030
   Workqueue: radeon-crtc radeon_flip_work_func [radeon]
   RIP: 0010:radeon_ring_backup+0xc9/0x140 [radeon]
   Call Trace:
radeon_gpu_reset+0xc3/0x2f0 [radeon]
radeon_flip_work_func+0x1f3/0x250 [radeon]
? __schedule+0x2e0/0x760
process_one_work+0x1b5/0x370
worker_thread+0x50/0x3d0
kthread+0x104/0x140
? process_one_work+0x370/0x370
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40

  or:

   BUG: unable to handle page fault for address: c03901000ffc
   ...
   Oops:  [#1] SMP PTI

   CPU: 3 PID: 2227 Comm: compton Not tainted 5.3.0-28-generic 
#30~18.04.1-Ubuntu
   RIP: 0010:radeon_ring_backup+0xd3/0x140 [radeon]
   Call Trace:
radeon_gpu_reset+0xb9/0x340 [radeon]
? dma_fence_wait_timeout+0x48/0x110
? reservation_object_wait_timeout_rcu+0x19d/0x340
radeon_gem_handle_lockup.part.4+0xe/0x20 [radeon]
radeon_gem_wait_idle_ioctl+0xa6/0x110 [radeon]
? radeon_gem_busy_ioctl+0x80/0x80 [radeon]
drm_ioctl_kernel+0xb0/0x100 [drm]
drm_ioctl+0x389/0x450 [drm]
? radeon_gem_busy_ioctl+0x80/0x80 [radeon]
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
radeon_drm_ioctl+0x4f/0x80 [radeon]
do_vfs_ioctl+0xa9/0x640
? __schedule+0x2b0/0x670
ksys_ioctl+0x75/0x80
__x64_sys_ioctl+0x1a/0x20
do_syscall_64+0x5a/0x130
entry_SYSCALL_64_after_hwframe+0x44/0xa9

  I've tried both 5.3.0-28-generic and 5.5.0-050500-generic from kernel-
  ppa but that made no difference. It appears to be a bug in radeon.

  Nothing specific makes this happen, just regular usage with a
  compositing window manager. I'm not playing games or particularly
  exercising the GPU. The last two times I was just reading in web
  browser. It's also happened in the middle of the night while I was
  asleep. Sometimes I have a few days uptime, sometimes it happens in
  less than 24 hours from boot.

  This never happened before the radeon update mentioned on the first
  line.

  I'll attach two files of dmesg output. As per
  https://wiki.ubuntu.com/X/Troubleshooting/Freeze I've installed and
  started apport for next time it happens.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-ati/+bug/1863390/+subscriptions

-- 
Mailing list: https://launchpad.net/~desktop-packages
Post to : desktop-packages@lists.launchpad.net
Unsubscribe :